CCRL Blitz, free single-CPU engines

Kirill Kryukov · Post by **Kirill Kryukov** » Wed Jul 25, 2007 11:30 am

Top 36 free single-CPU engines now have more or less reliable ranking in CCRL blitz list:

 Rank                 Engine                  ELO   +    -   Score  AvOp  Games
    1 Rybka 1.0 Beta 64-bit                  2953  +20  -19  77.1% -201.1  1088
    2 Strelka 1.8 32-bit                     2871  +21  -21  56.1%  -46.2   734
    3 Toga II 1.2.1a 32-bit                  2860   +8   -8  48.1%   +9.7  5987
    4 Spike 1.2 Turin                        2828   +7   -8  46.6%  +22.9  7502
    5 Naum 2.0 32-bit                        2798   +9   -9  48.9%   +1.4  5310
    6 Glaurung 1.2.1 32-bit                  2759   +9   -9  41.9%  +58.1  5269
    7 Scorpio 1.91 &#40;5-men-egbb&#41;              2732  +18  -18  50.7%   -6.2  1090
    8 Pro Deo 1.2                            2722  +12  -12  43.4%  +50.1  2531
    9 Alaric 704                             2720  +15  -14  49.8%   +1.1  1649
   10 Delfi 5.1                              2716  +13  -13  50.6%   -4.2  2225
11-12 Slow Chess Blitz WV2.1                 2711  +10   -9  42.0%  +59.1  4570
11-12 Zappa 1.1 64-bit                       2711  +18  -17  46.7%  +22.8  1120
   13 Frenzee 3.0 64-bit                     2709  +17  -17  49.7%   +2.6  1221
14-15 List 5.12                              2698  +12  -12  50.6%   -4.5  2504
14-15 Pharaon 3.5.1                          2698  +12  -12  38.6%  +85.4  2504
   16 WildCat 7                              2696  +12  -12  47.2%  +20.9  2631
   17 SOS 5.1                                2683  +13  -13  47.6%  +17.0  2057
   18 Pseudo 0.7c                            2674  +12  -12  51.2%  -12.6  2613
   19 Ruffian 1.0.5                          2673  +14  -14  50.4%   -2.9  1922
20-21 Aristarch 4.50                         2668  +12  -12  46.3%  +28.8  2407
20-21 Petir 4.39                             2668  +15  -15  45.1%  +36.4  1533
   22 Colossus 2007a                         2661  +16  -16  44.0%  +42.8  1371
   23 The Baron 1.8.1                        2654  +18  -18  47.7%  +16.8  1074
   24 Booot 4.13.1                           2652  +23  -23  49.4%   +4.0   637
   25 Crafty 21.5 PS 64-bit                  2649  +17  -17  43.3%  +49.1  1264
26-27 Jonny 2.83 32-bit                      2647  +15  -15  43.5%  +49.4  1565
26-27 Smarthink 0.17a                        2647  +18  -18  47.6%  +17.1  1022
   28 Fritz 5.32                             2641  +18  -18  53.2%  -25.7  1111
   29 Anaconda 2.0.1                         2637  +16  -16  48.7%   +9.4  1427
   30 Movei 0.08.403                         2635  +17  -17  52.2%  -16.9  1241
   31 Thinker 4.7a                           2632  +14  -14  48.2%   +8.1  1929
32-33 AnMon 5.60                             2617  +16  -16  46.3%  +26.3  1392
32-33 Trace 1.37a                            2617  +19  -19  43.7%  +47.7  1012
   34 Little Goliath Evolution 3.12          2614  +15  -15  53.4%  -27.6  1690
   35 Ufim 8.02                              2613  +13  -13  49.9%   -2.8  2165
   36 Yace 0.99.87                           2608  +22  -23  44.4%  +36.9   672

(My vision of the list, it includes only stable public engines with default settings).

Conditions:

Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 4 minutes on Athlon 64 X2 4600+ (2.4 GHz)

Complete rating list and cross-tables (best free single-CPU engines in CCRL 40/4).

A few comments:

Booot 4.13.1 and Strelka 1.8 should still play more games, to make their ratings more reliable. My goal is that every listed engine should play 30-game matches with at least 20 nearest opponents (10 higher rated and 10 lower rated), but I try to do more when possible.

Colossus 2007b and Alaric 707 are probably stronger than previous versions, but they need more testing.

32-bit versions of Naum 2.0 and Glaurung 1.2.1 are unfortunately higher rated than 64-bit versions at the moment. I don't know if it's an artifact or a real situation.

I list "Scorpio 1.91 (5-men-egbb)" because it is the latest official release from Daniel, but honestly I have no idea which version is the strongest. Shaun may be able to answer that as he tested Scorpio versions extensively.

Pro Deo 1.2 (also 1.3, 1.4 and 1.5) does not run on my machines, so I can't help filling the gaps for it. Waiting for Ed to release a fixed version.

Frenzee 3.0 64-bit 1-CPU was a surprise for me, it is listed as #13 currently.

"10-10-10" setting of Movei 00.8.403 is probably stronger than default setting, I may test it at some point in future. Right now "10-10-10" has too few games for me to list it.

Fritz 5.32 turned out to be stronger than Fritz 6 Light in our blitz conditions, to my surprise. So from now on Fritz 5.32 represents the free Fritz in our list.

Little Goliath Evolution 3.12 appears to play stronger without tablebases (its default setting) than with tablebases. The difference is not large though, so it still may be an artifact.

Trace 1.37a, Ufim 8.02 and Yace 0.99.87 played many new games recently, and received stable ratings.

Work will continue to expand the well-tested region of the rating list down to include more weaker engines. Personally I am much more enjoying to test and watch lower ranked engines than the top ones.

Best,
Kirill

Uri Blass · Post by **Uri Blass** » Wed Jul 25, 2007 11:46 am

My thoughts about it.

1)There is no reason that new versions are prefered relative to personalities(the author can certainly release personality as new engine
if he likes to do it and I see no reason to test in one case and not to test in another case)

2)There should be rules that after some engine is tested you need to wait at least 6 months(or different decided period of time) before testing a newer version for your list(of course testers should be allowed to test what they like but in that case you will not include the engines in the special top free single cpu but only in the list of all engines).

3)authors who plan to release newer version soon and do not like the version that they release to be listed in your list in order to have newer version in your list should say it and in the time that they release their version so their version is not going to be included in your list regardless of the results unless they do not release newer version in the period of the next 6 months.

Edit:
4)it seems that you forgot glaurung new version
of course it needs more games but there were already enough games to know that new glaurung is stronger than 1.2.1

Glaurung 2-epsilon/4 64-bit 2847 +43 −43 44.5% +36.3 33.9% 174
51.2%
Glaurung 1.2.1 32-bit 2759 +9 −9 41.9% +58.1 29.8% 5269
84.2%

Uri

Kirill Kryukov · Post by **Kirill Kryukov** » Wed Jul 25, 2007 4:01 pm

Hi Uri!

Uri Blass wrote:My thoughts about it.

1)There is no reason that new versions are prefered relative to personalities(the author can certainly release personality as new engine
if he likes to do it and I see no reason to test in one case and not to test in another case)

I agree. As I said some while back, I will test "Movei 00.8.403 10-10-10" eventually. CPU time is limited, and right now I have more interesting things to test (Alaric 707, Booot 4.13.1, etc).

Uri Blass wrote:2)There should be rules that after some engine is tested you need to wait at least 6 months(or different decided period of time) before testing a newer version for your list(of course testers should be allowed to test what they like but in that case you will not include the engines in the special top free single cpu but only in the list of all engines).

I don't think it's a good idea.. We have freedom of testing as one of our basic principles. If we compromise this freedom, testing will become less fun and we will have much fewer games. Personally I test engines that I am curious to test at the moment. There are many engines that need testing, so I have to choose carefully every time I start a gauntlet. Sometimes a new version or setting has to wait because there are still engines which don't have any version tested in our list.

Uri Blass wrote:3)authors who plan to release newer version soon and do not like the version that they release to be listed in your list in order to have newer version in your list should say it and in the time that they release their version so their version is not going to be included in your list regardless of the results unless they do not release newer version in the period of the next 6 months.

Yes, such information from the authors is always appreciated. Easiest way for authors to provide this information is to clearly mark development (intermediate) versions as such (alpha, beta). I usually don't test development versions because so many stable engines still don't have reliable rating.

Uri Blass wrote:Edit:
4)it seems that you forgot glaurung new version
of course it needs more games but there were already enough games to know that new glaurung is stronger than 1.2.1

Glaurung 2-epsilon/4 64-bit 2847 +43 −43 44.5% +36.3 33.9% 174
51.2%
Glaurung 1.2.1 32-bit 2759 +9 −9 41.9% +58.1 29.8% 5269
84.2%

Uri

I will test new Glaurung when Tord sais it is stable. If we test every intermediate version, it will result in something like what we have with Scorpio - which means we will have no idea what version is strongest, and no version will have reliable rating as testing efforts will be divided among 10 versions.

Mike S. · Post by **Mike S.** » Wed Jul 25, 2007 4:13 pm

Hi. Another thing about the great and very useful CCRL statistics, in general:

The ponder hit statistics are based on the Fritz GUI information which puts moves different from the expected moves in the notation (even if there actually was ponder = off). Obviously this is based on the second ply of the previous pv.

In the CCRL ponder hit tables (40/40 or 40/4) there are engines which have very high ponderhit percentages against various opponents, Atlas and microMax, where I am almost sure that it is a data misinterpretation. I guess Atlas, like microMax, simply does not output a pv which means the Fritz GUI does not know the expected reply. So it doesn't insert a different ponder move, because it cannot know what was expected.

I think such engines should be removed from the ponderhit statistics (if this is possible), especially from the standard tables for the most similar pairs, because they take space from the truly similar pairs where the data is correct.

But thanks anyway for these great statistics!

Uri Blass · Post by **Uri Blass** » Wed Jul 25, 2007 4:22 pm

Kirill Kryukov wrote:Hi Uri!

Uri Blass wrote:2)There should be rules that after some engine is tested you need to wait at least 6 months(or different decided period of time) before testing a newer version for your list(of course testers should be allowed to test what they like but in that case you will not include the engines in the special top free single cpu but only in the list of all engines).
I don't think it's a good idea.. We have freedom of testing as one of our basic principles. If we compromise this freedom, testing will become less fun and we will have much fewer games. Personally I test engines that I am curious to test at the moment. There are many engines that need testing, so I have to choose carefully every time I start a gauntlet. Sometimes a new version or setting has to wait because there are still engines which don't have any version tested in our list.

Hi Kirill,
I did not suggest to prevent testers to test what they like.
My point was simply not to include engines that do not follow specific rules in your special list(of course they can be included in the full list).

Uri

Dirt · Post by **Dirt** » Wed Jul 25, 2007 9:46 pm

Kirill Kryukov wrote: Time control: Equivalent to 40 moves in 4 minutes on Athlon 64 X2 4600+ (2.4 GHz)

In my experience, a number of engines begin to have time management problems when the time control gets much below three minutes. Do you think this will cause problems for the blitz ratings list in the near future?

Shaun · Post by **Shaun** » Wed Jul 25, 2007 10:10 pm

Dirt wrote:
Kirill Kryukov wrote: Time control: Equivalent to 40 moves in 4 minutes on Athlon 64 X2 4600+ (2.4 GHz)
In my experience, a number of engines begin to have time management problems when the time control gets much below three minutes. Do you think this will cause problems for the blitz ratings list in the near future?

This is something we need to be careful of one way this issue is currently reduced is testers tend to use slower machines for blitz as they can still produce a useful number of games with slower machines.

I have one machine where I am playing 40/11 adjusted to 40/4

Shaun

Shaun · Post by **Shaun** » Wed Jul 25, 2007 10:11 pm

Mike S. wrote:Hi. Another thing about the great and very useful CCRL statistics, in general:

The ponder hit statistics are based on the Fritz GUI information which puts moves different from the expected moves in the notation (even if there actually was ponder = off). Obviously this is based on the second ply of the previous pv.

In the CCRL ponder hit tables (40/40 or 40/4) there are engines which have very high ponderhit percentages against various opponents, Atlas and microMax, where I am almost sure that it is a data misinterpretation. I guess Atlas, like microMax, simply does not output a pv which means the Fritz GUI does not know the expected reply. So it doesn't insert a different ponder move, because it cannot know what was expected.

I think such engines should be removed from the ponderhit statistics (if this is possible), especially from the standard tables for the most similar pairs, because they take space from the truly similar pairs where the data is correct.

But thanks anyway for these great statistics!

Hi Mike,

yes this issue is on Kirill to-do list...

Shaun

Kirill Kryukov · Post by **Kirill Kryukov** » Thu Jul 26, 2007 6:51 am

Mike S. wrote:Hi. Another thing about the great and very useful CCRL statistics, in general:

The ponder hit statistics are based on the Fritz GUI information which puts moves different from the expected moves in the notation (even if there actually was ponder = off). Obviously this is based on the second ply of the previous pv.

In the CCRL ponder hit tables (40/40 or 40/4) there are engines which have very high ponderhit percentages against various opponents, Atlas and microMax, where I am almost sure that it is a data misinterpretation. I guess Atlas, like microMax, simply does not output a pv which means the Fritz GUI does not know the expected reply. So it doesn't insert a different ponder move, because it cannot know what was expected.

I think such engines should be removed from the ponderhit statistics (if this is possible), especially from the standard tables for the most similar pairs, because they take space from the truly similar pairs where the data is correct.

But thanks anyway for these great statistics!

Hi Mike, thanks, we are aware of this issue.

Problem with Atlas is that it does output a pv sometimes, just not every time. Engines that never output a pv don't give any problems. We will try to resolve this, in the meantime you can remove Atlas from the comparison using engine selection control.

Kirill Kryukov · Post by **Kirill Kryukov** » Thu Jul 26, 2007 6:55 am

Uri Blass wrote:Hi Kirill,
I did not suggest to prevent testers to test what they like.
My point was simply not to include engines that do not follow specific rules in your special list(of course they can be included in the full list).

Uri

I see now.

I'll try to think if I can make any formal rule for this list. I think using such rule will add work of tracking the dates of engine release, or of the start of testing. So I actually doubt I will do this tracking.. I'll try to think if I can automate it in some way..

CCRL Blitz, free single-CPU engines

CCRL Blitz, free single-CPU engines

Re: CCRL Blitz, free single-CPU engines

Re: CCRL Blitz, free single-CPU engines

Re: CCRL Blitz, free single-CPU engines

Re: CCRL Blitz, free single-CPU engines

Re: CCRL Blitz, free single-CPU engines

Re: CCRL Blitz, free single-CPU engines

Re: CCRL Blitz, free single-CPU engines

Re: CCRL Blitz, free single-CPU engines

Re: CCRL Blitz, free single-CPU engines