
PGN - http://kirill-kryukov.com/chess/discuss ... p?id=39139
Moderator: Ras
Code: Select all
CCRL 40/40 Rating List - Custom engine selection
693335 games played by 1834 programs, run by 19 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 40 minutes on Athlon 64 X2 4600+ (2.4 GHz), about 15 minutes on a modern Intel CPU.
Computed on February 25, 2017 with Bayeselo based on 693'335 games
Tested by CCRL team, 2005-2017, http://computerchess.org.uk/ccrl/4040/
Rank Engine Elo + - Score AvOp Games
1 Ethereal 8.02 64-bit 2616 +33 -33 53.5% -24.5 325
Ethereal 8.05 64-bit 2589 +32 -32 42.1% +52.6 312
Ethereal 7.78 64-bit 2586 +25 -25 53.4% -22.5 525
Ethereal 7.70 64-bit 2508 +31 -31 47.1% +15.1 365Seems unlikely that the test results are skewed enough to show 8.02 > 8.05 when it is the other way around.Tested under the name QTrans1 (AndyGrant/TestEngines)
Passed after 5470 games @ 5+.05s
WDL = (1895, 1855, 1720)
Delta = 11.1 +- 9.2 (Z = 1.96)
Tested under the name QTrans1 (AndyGrant/TestEngines)
Passed after 3440 games @ 60+.5s
WDL = (1062, 1460, 918)
Delta = 14.6 +- 11.6 (Z = 1.96)
I use 256mb hash for 1CPU where able.AndrewGrant wrote:Hmm, 8.05 tested very well. I was afraid of this patch being losing, so I ran my tests at 5+.05s (regular) and even 60+.6s and found positive results.
Seems unlikely that the test results are skewed enough to show 8.02 > 8.05 when it is the other way around.Tested under the name QTrans1 (AndyGrant/TestEngines)
Passed after 5470 games @ 5+.05s
WDL = (1895, 1855, 1720)
Delta = 11.1 +- 9.2 (Z = 1.96)
Tested under the name QTrans1 (AndyGrant/TestEngines)
Passed after 3440 games @ 60+.5s
WDL = (1062, 1460, 918)
Delta = 14.6 +- 11.6 (Z = 1.96)
Those games were played with a hash size of 16MB. Perhaps this patch is only good for small hash tables? CCRL tests with 128MB hash, yes?
Probably going to have to extend my test framework to include variable hash sizes.
It seems unlikely, to me, that this would account for the elo gap.Another thing to note is that the average opponent rating for 8.05 is currently much higher than that for 8.02.
Yes - I always try to smooth out the average opponent ratings a bit over time.AndrewGrant wrote:It seems unlikely, to me, that this would account for the elo gap.Another thing to note is that the average opponent rating for 8.05 is currently much higher than that for 8.02.
Do you plan on doing any more testing with this version?