@Discord guys: What is modern chess programming about?

Viz · Post by **Viz** » Sat Apr 27, 2024 2:53 pm

smatovic wrote: ↑Sat Apr 27, 2024 2:40 pm
Viz wrote: ↑Sat Apr 27, 2024 2:34 pm Intuition is nothing, statistics are everything.
[...]
And here I disagree. As far as I got it there are no test-suites used as regression test on Fishtest. Why? Cos it does not give Elo? The aim of Stockfish is to be the strongest engine, Elo wise? But there are other engines, with different objectives, Elo, "statistics", is just one possible metric in computer chess.

--
Srdja

Because test suites are useless. There are multiple layers where test suite appoach is wrong.
1) You can't take 100, 1000 or even 10000 positions and say that this is what chess is about. Chess has like what, 10^40 possible positions?
2) Test suites are more often bugged than not and require constant revisiting and refining. Either there are dual solutions or no solutions at all, or whatever;
3) With Lazy SMP search of AB engines becomes unstable (and why would you only test on 1 thread, especially if you had any sort of SMP tweaks?) and it can solve exactly the same position with exactly the same hardware and other settings in vastly different times;
4) Test suites are quite often actually anti-stockfish, as funny as it sounds. Because positions which it solves (or a lot of engines solve) too fast are getting excluded - but this only means that engines that don't solve this positions will get unfair testsuite advantage;
5) Stockfish actually improves in almost any test suite runs years after years without ever testing any test suite performance (well, apart from matetrack). More or less what it comes to - testing on games improves your performance on your average testsuite over time while engines that are oprimized for test suites like crystal are much weaker in actual games, so testing on testsuites regresses in actual play. Even some garbage like shashchess which commits almost all master patches still is much weaker than master in any test with any actual sample size which performing slightly better (maybe) in some testsuites (constructed mostly in a way for stockfish to not solve them). Heck, some positions are getting solved much faster if you let's say disable null move pruning but I would like to see a person that says that you should throw away 40-50 elo to solve 2-3 positions/100 from some test suite.
So this is what it is.
If test suites were any sort of useful metric - well, you need to show it. Take engine, freeze it code, never make any upstream changes and commit only stuff that improves test suite results. And try not to get this type of outcome - https://github.com/Matthies/RubiChess/w ... te-results
So far I haven't seen a single person doing so, the only "test suite solvers" are "take stockfish, make it worth but better in some suites, call it a success, shit on original every opportunity you can get but never forget to merge all upstream patches - ones that actually gain you playing strength".

Viz · Post by **Viz** » Sat Apr 27, 2024 2:57 pm

chrisw wrote: ↑Sat Apr 27, 2024 2:52 pm
Viz wrote: ↑Sat Apr 27, 2024 2:34 pm
smatovic wrote: ↑Sat Apr 27, 2024 2:03 pm Well, therefore my post and questions. What is modern chess programming about? What was old chess programming about? What are "we", here on TC, missing?

--
Srdja
Intuition is nothing, statistics are everything. What I mean - ofc intuition is needed, you need to write ideas that make sense and guess which of them are worth it, but everything needs to be statistically proven in a proper way by playing actual games.
Not 800 games, proper SPRT, not some metrics like time to depth, depth to time, average game length, branching factor and other stuff, just pure wins, draws and losses. Yes, it's annoying and requires a lot of computation, but this is the only real way to progress. And if you don't have enough resources - lowering time controls that you test on is infinitely better than committing something on insufficient amount of games, no matter how long this games are. This also directly leads that you can't realistically test almost anything on time controls longer than bullet, unless you have some things that you have big suspicion to scale non-linearly.
Precise numbers are also found with automated tuning - but then need to also be statistically tested. SPRT is the king, everything else is almost always completely useless.
Old chess programming is basically not doing this.
Using some bogus metrics that mean nothing, playing 800 games "but at least this is rapid", guessing values based on god only knows what basis, using other engines instead of selfplay (this can be done but will increase computational time required for any patch by some 2x factor or so).
So, it’s wheels, gears, structure, electrical systems already invented, the work already done, to be tuned better? No wonder we see a very low cost of entry and a gazillion new “engine developers”. Some of whom have the audacity to call themselves “top engine developers”. Takes what to not reinvent any wheels? A few months? Or weeks?

You can invent new wheels - a lot of new and original ideas appeared in engines that are not named stockfish, for example correction history, pawn structure history, threat based history, etc.
Also you need to implement wheels properly because welp, it interacts with everything.
If you look at how IIR is implemented in different engines despite logic behind it being the same implementations look completely different - because yes, it was all tested and for some search+eval combinations you can do stockfish stuff (-3 for Pv, -2 for cutnodes with depth limit), some engines do flat -1, some engines also do extra -1 for low depth tt hits, some engines do -1 for pv/cut, some have depth limits, some don't, etc.
Idea can gain massively different amount of elo depending on where you implement it (or not gain at all), and you always need to test it.
And this is why when person asks for some let's say "how to do things" there he almost always get completely wrong answer and some people get flabbergasted by it. Either "conventional solution" that is 20 years old or just atrocious advices like "just test for 800 games lul" or "just look at time to depth".

smatovic · Post by **smatovic** » Sat Apr 27, 2024 3:01 pm

Viz wrote: ↑Sat Apr 27, 2024 2:53 pm [...]
5) Stockfish actually improves in almost any test suite runs years after years without ever testing any test suite performance (well, apart from matetrack). More or less what it comes to - testing on games improves your performance on your average testsuite over time while engines that are oprimized for test suites like crystal are much weaker in actual games, so testing on testsuites regresses in actual play. Even some garbage like shashchess which commits almost all master patches still is much weaker than master in any test with any actual sample size which performing slightly better (maybe) in some testsuites (constructed mostly in a way for stockfish to not solve them). Heck, some positions are getting solved much faster if you let's say disable null move pruning but I would like to see a person that says that you should throw away 40-50 elo to solve 2-3 positions/100 from some test suite.
[...]

Yes, this is pretty much my point, there are actual people who use Crystal and ShashChess, despite loosing Elo.

--
Srdja

Viz · Post by **Viz** » Sat Apr 27, 2024 3:03 pm

smatovic wrote: ↑Sat Apr 27, 2024 3:01 pm
Viz wrote: ↑Sat Apr 27, 2024 2:53 pm [...]
5) Stockfish actually improves in almost any test suite runs years after years without ever testing any test suite performance (well, apart from matetrack). More or less what it comes to - testing on games improves your performance on your average testsuite over time while engines that are oprimized for test suites like crystal are much weaker in actual games, so testing on testsuites regresses in actual play. Even some garbage like shashchess which commits almost all master patches still is much weaker than master in any test with any actual sample size which performing slightly better (maybe) in some testsuites (constructed mostly in a way for stockfish to not solve them). Heck, some positions are getting solved much faster if you let's say disable null move pruning but I would like to see a person that says that you should throw away 40-50 elo to solve 2-3 positions/100 from some test suite.
[...]
Yes, this is pretty much my point, there are people who use Crystal and ShashChess, despite loosing Elo.

--
Srdja

This people are usually just pretty clueless, this is all. They think they can look at engine lines in 5 positions and say "this analysis quality is better" (not a joke, I've heard this one not a single time).
Good for them that they are mostly correspondence players and it's a dead draw if you don't do smth stupid anyway so even weakened stockfish will draw vs normal stockfish because well, chess is such a big draw.

chrisw · Post by **chrisw** » Sat Apr 27, 2024 3:16 pm

Viz wrote: ↑Sat Apr 27, 2024 2:57 pm
chrisw wrote: ↑Sat Apr 27, 2024 2:52 pm
Viz wrote: ↑Sat Apr 27, 2024 2:34 pm
smatovic wrote: ↑Sat Apr 27, 2024 2:03 pm Well, therefore my post and questions. What is modern chess programming about? What was old chess programming about? What are "we", here on TC, missing?

--
Srdja
Intuition is nothing, statistics are everything. What I mean - ofc intuition is needed, you need to write ideas that make sense and guess which of them are worth it, but everything needs to be statistically proven in a proper way by playing actual games.
Not 800 games, proper SPRT, not some metrics like time to depth, depth to time, average game length, branching factor and other stuff, just pure wins, draws and losses. Yes, it's annoying and requires a lot of computation, but this is the only real way to progress. And if you don't have enough resources - lowering time controls that you test on is infinitely better than committing something on insufficient amount of games, no matter how long this games are. This also directly leads that you can't realistically test almost anything on time controls longer than bullet, unless you have some things that you have big suspicion to scale non-linearly.
Precise numbers are also found with automated tuning - but then need to also be statistically tested. SPRT is the king, everything else is almost always completely useless.
Old chess programming is basically not doing this.
Using some bogus metrics that mean nothing, playing 800 games "but at least this is rapid", guessing values based on god only knows what basis, using other engines instead of selfplay (this can be done but will increase computational time required for any patch by some 2x factor or so).
So, it’s wheels, gears, structure, electrical systems already invented, the work already done, to be tuned better? No wonder we see a very low cost of entry and a gazillion new “engine developers”. Some of whom have the audacity to call themselves “top engine developers”. Takes what to not reinvent any wheels? A few months? Or weeks?
You can invent new wheels - a lot of new and original ideas appeared in engines that are not named stockfish, for example correction history, pawn structure history, threat based history, etc.
Also you need to implement wheels properly because welp, it interacts with everything.
If you look at how IIR is implemented in different engines despite logic behind it being the same implementations look completely different - because yes, it was all tested and for some search+eval combinations you can do stockfish stuff (-3 for Pv, -2 for cutnodes with depth limit), some engines do flat -1, some engines also do extra -1 for low depth tt hits, some engines do -1 for pv/cut, some have depth limits, some don't, etc.
Idea can gain massively different amount of elo depending on where you implement it (or not gain at all), and you always need to test it.
And this is why when person asks for some let's say "how to do things" there he almost always get completely wrong answer and some people get flabbergasted by it. Either "conventional solution" that is 20 years old or just atrocious advices like "just test for 800 games lul" or "just look at time to depth".

I do most stuff alone without talking to anybody, so I invent my own language and then something might get mentioned elsewhere and I often try to check it out. These recent Finny tables, for example, only to discover they’re the same idea as in CSTal about 18 months ago. Some ideas are just kind of obvious when you get into thinking about stuff.
Maybe I understand threat based history, pawn structure history, correction history but called something else? Or maybe not. Care to elaborate what they are from POV you?

RubiChess · Post by **RubiChess** » Sat Apr 27, 2024 3:32 pm

chrisw wrote: ↑Sat Apr 27, 2024 3:16 pm These recent Finny tables, for example, only to discover they’re the same idea as in CSTal about 18 months ago.

If you would do open source, we maybe would call them Crissy tables.
And anybody would have fixed your code for compiling in Linux.

chrisw · Post by **chrisw** » Sat Apr 27, 2024 3:50 pm

RubiChess wrote: ↑Sat Apr 27, 2024 3:32 pm
chrisw wrote: ↑Sat Apr 27, 2024 3:16 pm These recent Finny tables, for example, only to discover they’re the same idea as in CSTal about 18 months ago.
If you would do open source, we maybe would call them Crissy tables.
And anybody would have fixed your code for compiling in Linux.

If they’re even worthy of a name. You stack accumulators and go look for one nearby for updating, right? So it’s pretty obvious to have sideways stacks for kingix/phase and go look in those on phase changes, no? This is just plain programming, not chess programming. Faster better optimised code is hardly chess specialist. We make far too much fuss about programming tricks.

Rebel · Post by **Rebel** » Sat Apr 27, 2024 3:57 pm

RubiChess wrote: ↑Sat Apr 27, 2024 3:32 pm
chrisw wrote: ↑Sat Apr 27, 2024 3:16 pm These recent Finny tables, for example, only to discover they’re the same idea as in CSTal about 18 months ago.
If you would do open source, we maybe would call them Crissy tables.
And anybody would have fixed your code for compiling in Linux.

And then some guys may come along, make some irrelevant changes, add their name to the authors list and start bragging about it in public. No thank you, the recent examples were a great lesson for me.

RubiChess · Post by **RubiChess** » Sat Apr 27, 2024 4:17 pm

chrisw wrote: ↑Sat Apr 27, 2024 3:50 pm This is just plain programming, not chess programming. Faster better optimised code is hardly chess specialist

You got it! This IS chess programming nowadays.

You don't need to know anything about chess. But at least you learn something about it watching engines in a GUI.

JacquesRW · Post by **JacquesRW** » Sat Apr 27, 2024 4:34 pm

chrisw wrote: ↑Sat Apr 27, 2024 3:16 pm These recent Finny tables, for example, only to discover they’re the same idea as in CSTal about 18 months ago. Some ideas are just kind of obvious when you get into thinking about stuff.

The original Finny Tables test was over 2 years ago FWIW: http://chess.grantnet.us/test/25036/
I think everyone was surprised that SF didn't have them already, really a testament to the fact that no one reads or wants to work on SF NNUE code.

@Discord guys: What is modern chess programming about?

Re: @Discord guys: What is modern chess programming about?

Re: @Discord guys: What is modern chess programming about?

Re: @Discord guys: What is modern chess programming about?

Re: @Discord guys: What is modern chess programming about?

Re: @Discord guys: What is modern chess programming about?

Re: @Discord guys: What is modern chess programming about?

Re: @Discord guys: What is modern chess programming about?

Re: @Discord guys: What is modern chess programming about?

Re: @Discord guys: What is modern chess programming about?

Re: @Discord guys: What is modern chess programming about?