Time flies. Then we stay at Finny tables of course.JacquesRW wrote: ↑Sat Apr 27, 2024 4:34 pm The original Finny Tables test was over 2 years ago FWIW: http://chess.grantnet.us/test/25036/
@Discord guys: What is modern chess programming about?
Moderator: Ras
- 
				RubiChess  
- Posts: 646
- Joined: Fri Mar 30, 2018 7:20 am
- Full name: Andreas Matthies
Re: @Discord guys: What is modern chess programming about?
- 
				Uri Blass
- Posts: 10906
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: @Discord guys: What is modern chess programming about?
I disagree test suites are useless.Viz wrote: ↑Sat Apr 27, 2024 2:53 pmBecause test suites are useless. There are multiple layers where test suite appoach is wrong.smatovic wrote: ↑Sat Apr 27, 2024 2:40 pmAnd here I disagree. As far as I got it there are no test-suites used as regression test on Fishtest. Why? Cos it does not give Elo? The aim of Stockfish is to be the strongest engine, Elo wise? But there are other engines, with different objectives, Elo, "statistics", is just one possible metric in computer chess.
--
Srdja
1) You can't take 100, 1000 or even 10000 positions and say that this is what chess is about. Chess has like what, 10^40 possible positions?
2) Test suites are more often bugged than not and require constant revisiting and refining. Either there are dual solutions or no solutions at all, or whatever;
3) With Lazy SMP search of AB engines becomes unstable (and why would you only test on 1 thread, especially if you had any sort of SMP tweaks?) and it can solve exactly the same position with exactly the same hardware and other settings in vastly different times;
4) Test suites are quite often actually anti-stockfish, as funny as it sounds. Because positions which it solves (or a lot of engines solve) too fast are getting excluded - but this only means that engines that don't solve this positions will get unfair testsuite advantage;
5) Stockfish actually improves in almost any test suite runs years after years without ever testing any test suite performance (well, apart from matetrack). More or less what it comes to - testing on games improves your performance on your average testsuite over time while engines that are oprimized for test suites like crystal are much weaker in actual games, so testing on testsuites regresses in actual play. Even some garbage like shashchess which commits almost all master patches still is much weaker than master in any test with any actual sample size which performing slightly better (maybe) in some testsuites (constructed mostly in a way for stockfish to not solve them). Heck, some positions are getting solved much faster if you let's say disable null move pruning but I would like to see a person that says that you should throw away 40-50 elo to solve 2-3 positions/100 from some test suite.
So this is what it is.
If test suites were any sort of useful metric - well, you need to show it. Take engine, freeze it code, never make any upstream changes and commit only stuff that improves test suite results. And try not to get this type of outcome - https://github.com/Matthies/RubiChess/w ... te-results
So far I haven't seen a single person doing so, the only "test suite solvers" are "take stockfish, make it worth but better in some suites, call it a success, shit on original every opportunity you can get but never forget to merge all upstream patches - ones that actually gain you playing strength".
Everyone can decide what he optimize the engine for.
Some people decide that they optimize for games
Other may decide that they optimize for test suites.
For the very bad results in games
I think that a test suite of 120 positions is clearly too small and this is one of the reason for the bad results.
Another reason is that I guesds there are too many sacrifices in the test suite.
It is possible never to test engine in games and simply analyze the games that it played to find the mistakes that it did and always add positions that the engine blundered to the test suite and optimize the engine to avoid them.
It may be not the best engine but at least the author can release the test suite and results and show the users improvement of the new version in specific positions.
It is possible that sometime latest version is going to be weaker than the older version but I expect the engine to improve in the long run in this case.
- 
				chrisw
- Posts: 4648
- Joined: Tue Apr 03, 2012 4:28 pm
- Location: Midi-Pyrénées
- Full name: Christopher Whittington
Re: @Discord guys: What is modern chess programming about?
Entirely feasible, but why then did Berserk and more recently Rubichess wait until a month or so ago before implementing them? I read on Discord, btw, that Sopel only just saw the idea (his comment queried the amount of RAM required). Maybe it will find its way into SF, or not.JacquesRW wrote: ↑Sat Apr 27, 2024 4:34 pmThe original Finny Tables test was over 2 years ago FWIW: http://chess.grantnet.us/test/25036/
I think everyone was surprised that SF didn't have them already, really a testament to the fact that no one reads or wants to work on SF NNUE code.
- 
				JacquesRW
- Posts: 128
- Joined: Sat Jul 30, 2022 12:12 pm
- Full name: Jamie Whiting
Re: @Discord guys: What is modern chess programming about?
Beserk did so 22 months ago: http://chess.grantnet.us/test/26390/
My surprise is more about Stockfish specifically, as successful patches in other engines tend to be tried very quickly. An individual engine author may not have implemented them until recently for any number of reasons e.g. just not being bothered to think about it.
On our instance (https://chess.swehosting.se/) where we communicate basically everything we try it went more or less like so
Cosmo: Adds Finny tables before most of us use buckets at all
Cosmo: Hey guys this patch is super good you should definitely do it when you add buckets
I guess this is fairly common in Discord, and probably why the name Finny Tables has stuck, is because if someone mentions that they're going to try a bucketed net, it'll usually be suggested to them straight away - you are welcome to have your own opinions on if this is good or bad, personally I think there's a much greater sense of community among devs on discord as a result of things like this.
Already is: https://github.com/official-stockfish/S ... 06b36b3e3c
- 
				RubiChess  
- Posts: 646
- Joined: Fri Mar 30, 2018 7:20 am
- Full name: Andreas Matthies
Re: @Discord guys: What is modern chess programming about?
Simple answer. Because it is a hobby for me, I have several things that fill my day and I don't look 24/7 at computer chess.
So I just noticed that there is something like an accumulator cache called Finny tables when they were mentioned in several discord channels some weeks ago and I obviously missed them when they were invented (you would call it engineered).
- 
				chrisw
- Posts: 4648
- Joined: Tue Apr 03, 2012 4:28 pm
- Location: Midi-Pyrénées
- Full name: Christopher Whittington
Re: @Discord guys: What is modern chess programming about?
That was quick, given I think Sopel only noticed very recently.JacquesRW wrote: ↑Sat Apr 27, 2024 6:00 pmBeserk did so 22 months ago: http://chess.grantnet.us/test/26390/
My surprise is more about Stockfish specifically, as successful patches in other engines tend to be tried very quickly. An individual engine author may not have implemented them until recently for any number of reasons e.g. just not being bothered to think about it.
On our instance (https://chess.swehosting.se/) where we communicate basically everything we try it went more or less like so
Cosmo: Adds Finny tables before most of us use buckets at all
Cosmo: Hey guys this patch is super good you should definitely do it when you add buckets
I guess this is fairly common in Discord, and probably why the name Finny Tables has stuck, is because if someone mentions that they're going to try a bucketed net, it'll usually be suggested to them straight away - you are welcome to have your own opinions on if this is good or bad, personally I think there's a much greater sense of community among devs on discord as a result of things like this.
Already is: https://github.com/official-stockfish/S ... 06b36b3e3c
Isn’t the implication though, that comp chess is split into several nodes, several of which don’t appear to communicate very well / or are secret from each other?
- 
				JacquesRW
- Posts: 128
- Joined: Sat Jul 30, 2022 12:12 pm
- Full name: Jamie Whiting
Re: @Discord guys: What is modern chess programming about?
They were implemented by Gabe (Obsidian author) after he read SF NNUE code and realised they weren't there. I'm not sure the extent to which Sopel works directly on NNUE code now.
Perhaps, I think this specific case is a bit unique in that many people would have probably assumed SF already had it, or don't bother with NNUE-related patches - SF devs would have absolutely seen the tables passing on OpenBench.
I think communication is good enough (although perhaps implicitly by browsing FishTest/OpenBench instances) that I don't need to pay attention to the SF dev channels in the SF discord, and hear about patches (mostly the ones that did not originate in SF, admittedly) through conversation.
- 
				Viz
- Posts: 223
- Joined: Tue Apr 09, 2024 6:24 am
- Full name: Michael Chaly
Re: @Discord guys: What is modern chess programming about?
Well, example, example please.Uri Blass wrote: ↑Sat Apr 27, 2024 5:27 pm I disagree test suites are useless.
Everyone can decide what he optimize the engine for.
Some people decide that they optimize for games
Other may decide that they optimize for test suites.
For the very bad results in games
I think that a test suite of 120 positions is clearly too small and this is one of the reason for the bad results.
Another reason is that I guesds there are too many sacrifices in the test suite.
It is possible never to test engine in games and simply analyze the games that it played to find the mistakes that it did and always add positions that the engine blundered to the test suite and optimize the engine to avoid them.
It may be not the best engine but at least the author can release the test suite and results and show the users improvement of the new version in specific positions.
It is possible that sometime latest version is going to be weaker than the older version but I expect the engine to improve in the long run in this case.
Test suite of not 100, let's say 10000 positions, I don't really care about it size.
Which doesn't merge upstream stockfish patches to gain strength, tries to gain via improving from optimizing to test suites.
Show me ANY engine that gains playing strength doing so in a big time run, like let's say 1 year and have progress comparable to master SF (or any progress at all actually).
Spoiler - they don't exist. At all. All engines that "optimize for testsuites" are not only weaker than stockfish master but also merge every single stockfish patch eventually, so are in reality "complete stockfish clone slightly optimized for testsuites".
- 
				CornfedForever
- Posts: 648
- Joined: Mon Jun 20, 2022 4:08 am
- Full name: Brian D. Smith
Re: @Discord guys: What is modern chess programming about?
Modern Chess Programming....is about GUI's that are being configured so as to make the most out of engine output....to make it truly useful to humans.
			
			
									
						
										
						- 
				Uri Blass
- Posts: 10906
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: @Discord guys: What is modern chess programming about?
I remember many years ago there was an engine with the name knightdreamer that was optimized for test suites.Viz wrote: ↑Sat Apr 27, 2024 8:37 pmWell, example, example please.Uri Blass wrote: ↑Sat Apr 27, 2024 5:27 pm I disagree test suites are useless.
Everyone can decide what he optimize the engine for.
Some people decide that they optimize for games
Other may decide that they optimize for test suites.
For the very bad results in games
I think that a test suite of 120 positions is clearly too small and this is one of the reason for the bad results.
Another reason is that I guesds there are too many sacrifices in the test suite.
It is possible never to test engine in games and simply analyze the games that it played to find the mistakes that it did and always add positions that the engine blundered to the test suite and optimize the engine to avoid them.
It may be not the best engine but at least the author can release the test suite and results and show the users improvement of the new version in specific positions.
It is possible that sometime latest version is going to be weaker than the older version but I expect the engine to improve in the long run in this case.
Test suite of not 100, let's say 10000 positions, I don't really care about it size.
Which doesn't merge upstream stockfish patches to gain strength, tries to gain via improving from optimizing to test suites.
Show me ANY engine that gains playing strength doing so in a big time run, like let's say 1 year and have progress comparable to master SF (or any progress at all actually).
Spoiler - they don't exist. At all. All engines that "optimize for testsuites" are not only weaker than stockfish master but also merge every single stockfish patch eventually, so are in reality "complete stockfish clone slightly optimized for testsuites".
https://www.wbec-ridderkerk.nl/html/lastupdates.htm
I see it is version3.3 from 25.7.2004
It is certainly was not based on stockfish and stockfish did not exist at that time.
Unfortunately I do not find webpage with the result of it at the GCP test suite that were at that time better than other engines based on my memory.