Crafty 25.3 64-bit Gauntlet for CCRL 40/15

Graham Banks · Post by **Graham Banks** » Sun Jan 26, 2020 11:42 pm

PGN - http://kirill-kryukov.com/chess/discuss ... p?id=45759

Graham Banks · Post by **Graham Banks** » Sun Jan 26, 2020 11:44 pm

CCRL 40/15 Rating List - Custom engine selection
1116816 games played by 2606 programs, run by 23 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 15 minutes on an Intel i7-4770k.
Computed on January 25, 2020 with Bayeselo based on 1'116'816 games
Tested by CCRL team, 2005-2020, http://ccrl.chessdom.com/ccrl/4040/

Rank               Engine                Elo   +    -   Score  AvOp  Games
1 Crafty 25.2 64-bit                 2931  +11  -11  48.8%   +6.0  2688
  Crafty 25.3 64-bit                 2904  +28  -28  49.0%   +8.5   400
  Crafty 25.0 64-bit                 2864  +20  -20  46.9%  +19.5   794
  Crafty 25.1 64-bit                 2853  +30  -30  42.8%  +47.6   356
  Crafty 23.8 64-bit                 2811  +23  -23  51.5%   -8.7   604
  Crafty 24.1 64-bit                 2803  +22  -22  49.5%   +2.9   659
  Crafty 23.6 64-bit                 2788  +25  -25  53.0%  -19.7   478
  Crafty 24.0 64-bit                 2772  +28  -28  47.0%  +19.5   396
  Crafty 23.3 64-bit                 2756  +32  -32  50.5%   -3.6   309
  Crafty 23.5 64-bit                 2747  +28  -28  51.2%  -10.2   406
  Crafty 23.4 64-bit                 2736  +28  -28  46.7%  +20.4   419
  Crafty 23.4 32-bit                 2733  +20  -20  50.8%   -5.5   861
  Crafty 23.3 32-bit                 2716  +32  -32  50.5%   -4.3   309
  Crafty 23.2 64-bit                 2711  +30  -30  49.0%   +6.9   357
  Crafty 23.2 32-bit                 2696  +32  -32  47.8%  +15.7   320
  Crafty 23.1 32-bit                 2687  +18  -18  46.2%  +24.9   970
  Crafty 23.0 32-bit                 2630  +29  -29  49.9%   +1.7   380
  Crafty 22.8 32-bit                 2596  +32  -32  48.7%   +6.9   315
  Crafty 22.4 32-bit                 2580  +32  -32  48.1%  +12.2   318
  Crafty 22.10 32-bit                2573  +32  -32  47.3%  +14.4   320
  Crafty 22.1 32-bit                 2565  +28  -28  50.2%   -3.6   421
  Crafty 21.6 32-bit                 2550  +34  -34  47.8%  +12.6   302
  Crafty 21.5 32-bit                 2542  +32  -32  48.3%  +13.7   344
  Crafty 22.0 32-bit                 2538  +33  -33  48.5%   +9.6   304
  Crafty 20.14 32-bit                2517  +27  -27  47.4%  +19.5   482
  Crafty 20.13 32-bit                2510  +33  -33  48.7%   +9.8   312
  Crafty 20.11 32-bit                2502  +33  -33  49.8%   +3.2   307

mar · Post by **mar** » Fri Jan 31, 2020 11:45 am

Hmm, what's wrong with 25.3 or 25.2, how can a supposedly equal version (assuming bugfixes only) be nearly 30 elo weaker in CCRL list?
CEGT on the other hand has 25.3 30 elo stronger than 25.2, so the relative spread is 60 elo, that's insanely huge and scary

we can talk error bars, but this seems way off, epecially at this TC

is something wrong with Crafty or does independent testing actually produce huge noise in this case?

mar · Post by **mar** » Fri Jan 31, 2020 12:19 pm

This is not meant as a criticism, I'm just trying to understand the discrepancy.
From what I've seen the change going from 400 to 3k games in CCRL is typically less than 10 elo points.
I don't understand the CEGT results for 25.3 either.

xr_a_y · Post by **xr_a_y** » Fri Jan 31, 2020 9:24 pm

Code: Select all

1 Crafty 25.2 64-bit                 2931  +11  -11  48.8%   +6.0  2688
  Crafty 25.3 64-bit                 2904  +28  -28  49.0%   +8.5   400

I guess +/-11 and +/-28 makes it a little too soon to conclude.

mar · Post by **mar** » Fri Jan 31, 2020 9:44 pm

CEGT:

Code: Select all

Crafty 25.3 x64 1CPU	2823	14	14
Crafty 25.2 x64 1CPU	2791	12	12

too soon only if you assume CCRL has worst case for 25.3 and CEGT has best case, what's the probabily of two independent lists hitting the opposite extrema of the error bars?

Graham Banks · Post by **Graham Banks** » Fri Jan 31, 2020 10:00 pm

mar wrote: ↑Fri Jan 31, 2020 11:45 am Hmm, what's wrong with 25.3 or 25.2, how can a supposedly equal version (assuming bugfixes only) be nearly 30 elo weaker in CCRL list?
CEGT on the other hand has 25.3 30 elo stronger than 25.2, so the relative spread is 60 elo, that's insanely huge and scary

we can talk error bars, but this seems way off, epecially at this TC

is something wrong with Crafty or does independent testing actually produce huge noise in this case?

All I can say is that I've run all of the 25.3 games and most (if not all) of the 25.2 games.

MikeB · Post by **MikeB** » Sun Feb 02, 2020 12:26 am

mar wrote: ↑Fri Jan 31, 2020 11:45 am Hmm, what's wrong with 25.3 or 25.2, how can a supposedly equal version (assuming bugfixes only) be nearly 30 elo weaker in CCRL list?
CEGT on the other hand has 25.3 30 elo stronger than 25.2, so the relative spread is 60 elo, that's insanely huge and scary

we can talk error bars, but this seems way off, epecially at this TC

is something wrong with Crafty or does independent testing actually produce huge noise in this case?

Error bars are not rock solid and more so after just 400 games. The error bars come with a confidence level , typically 95%, which means one out of 20 will a failed result . The likelihood of one of the error bars when you have two runs is 10% - one out of 10 times. It happens far more often than people realize. Makes for good debates and exclamations that something is wrong. That is why Bob typically tested changes way over 100,000 games - he likes to get a 100% confidence level. Generally speaking if they both provide the same bench nodes in single CPU mode , both versions are functionally the same. Other Elo noise can be added by different operating systems and different compilers as well as just using different time controls or different CPUs , different opening books / positions etc etc.

The statistical artifact shown may mean nothing at all. Try version 25.6. Most of the changes since 25.2 have been bug fixes. Some of the bugs were very rare , others would depend on how users used Crafty ( for example enabling draw offers or allowing resignations ). There are a lot of variables that may not be consistent from one user to the next.

Crafty 25.3 64-bit Gauntlet for CCRL 40/15

Crafty 25.3 64-bit Gauntlet for CCRL 40/15

Re: Crafty 25.3 64-bit Gauntlet for CCRL 40/15

Re: Crafty 25.3 64-bit Gauntlet for CCRL 40/15

Re: Crafty 25.3 64-bit Gauntlet for CCRL 40/15

Re: Crafty 25.3 64-bit Gauntlet for CCRL 40/15

Re: Crafty 25.3 64-bit Gauntlet for CCRL 40/15

Re: Crafty 25.3 64-bit Gauntlet for CCRL 40/15

Re: Crafty 25.3 64-bit Gauntlet for CCRL 40/15