Because test suites are useless. There are multiple layers where test suite appoach is wrong.smatovic wrote: ↑Sat Apr 27, 2024 2:40 pmAnd here I disagree. As far as I got it there are no test-suites used as regression test on Fishtest. Why? Cos it does not give Elo? The aim of Stockfish is to be the strongest engine, Elo wise? But there are other engines, with different objectives, Elo, "statistics", is just one possible metric in computer chess.
--
Srdja
1) You can't take 100, 1000 or even 10000 positions and say that this is what chess is about. Chess has like what, 10^40 possible positions?
2) Test suites are more often bugged than not and require constant revisiting and refining. Either there are dual solutions or no solutions at all, or whatever;
3) With Lazy SMP search of AB engines becomes unstable (and why would you only test on 1 thread, especially if you had any sort of SMP tweaks?) and it can solve exactly the same position with exactly the same hardware and other settings in vastly different times;
4) Test suites are quite often actually anti-stockfish, as funny as it sounds. Because positions which it solves (or a lot of engines solve) too fast are getting excluded - but this only means that engines that don't solve this positions will get unfair testsuite advantage;
5) Stockfish actually improves in almost any test suite runs years after years without ever testing any test suite performance (well, apart from matetrack). More or less what it comes to - testing on games improves your performance on your average testsuite over time while engines that are oprimized for test suites like crystal are much weaker in actual games, so testing on testsuites regresses in actual play. Even some garbage like shashchess which commits almost all master patches still is much weaker than master in any test with any actual sample size which performing slightly better (maybe) in some testsuites (constructed mostly in a way for stockfish to not solve them). Heck, some positions are getting solved much faster if you let's say disable null move pruning but I would like to see a person that says that you should throw away 40-50 elo to solve 2-3 positions/100 from some test suite.
So this is what it is.
If test suites were any sort of useful metric - well, you need to show it. Take engine, freeze it code, never make any upstream changes and commit only stuff that improves test suite results. And try not to get this type of outcome - https://github.com/Matthies/RubiChess/w ... te-results
So far I haven't seen a single person doing so, the only "test suite solvers" are "take stockfish, make it worth but better in some suites, call it a success, shit on original every opportunity you can get but never forget to merge all upstream patches - ones that actually gain you playing strength".