Stockfish 100813 94a3608 versus Komodo 3 and Critter 1.6a

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

gaard
Posts: 447
Joined: Mon Jun 07, 2010 3:13 am
Location: Holland, MI
Full name: Martin W

Stockfish 100813 94a3608 versus Komodo 3 and Critter 1.6a

Post by gaard »

Conditions:
cutechess-cli
Threads=1
Ponder=false
AMD FX(tm)-6100 Six-Core Processor
All engines 64-bit

-concurrency 5
-resign movecount=3 score=500
-each book=varied.bin tc=2/6
-draw movenumber=60 movecount=30 score=10
-tournament round-robin -games 2 -rounds 1000 -recover

The 2moves/6seconds time control should be close to CCRL 40/4 for this CPU although I didn't go to great lengths to ensure that it was.

Code: Select all

Rank Name                         Elo    +    - games score oppo. draws 
   1 Stockfish 100813 64 SSE4.2    33   19   19   473   57%   -11   47% 
   2 Critter 1.6a 64-bit            8   19   19   472   52%    -3   46% 
   3 Stockfish 3 64bit SSE4.2     -16   19   19   473   46%     5   47% 
   4 Komodo64 3 sse4.2            -25   19   19   472   45%     8   40% 
After removing self-play games:

Code: Select all

Rank Name                         Elo    +    - games score oppo. draws 
   1 Stockfish 100813 64 SSE4.2    29   23   23   316   56%    -8   46% 
   2 Critter 1.6a 64-bit            8   19   19   472   52%    -3   46% 
   3 Stockfish 3 64bit SSE4.2     -12   23   23   316   49%    -8   46% 
   4 Komodo64 3 sse4.2            -25   19   19   472   45%     8   40% 
More games to come!

Games: http://www.fileswap.com/dl/PhbypzbvsL/
beram
Posts: 1187
Joined: Wed Jan 06, 2010 3:11 pm

Re: Stockfish 100813 94a3608 versus Komodo 3 and Critter 1.6

Post by beram »

Nice test Martin, and welcome to the club of STockies testers

btw you dont have Komodo 5.1 MP?

grts Bram
gaard
Posts: 447
Joined: Mon Jun 07, 2010 3:13 am
Location: Holland, MI
Full name: Martin W

Re: Stockfish 100813 94a3608 versus Komodo 3 and Critter 1.6

Post by gaard »

Thanks. I do have Komodo 5.1 but it's compiled against a more recent version of glibc than my system runs but I'm in the process of upgrading to make it work. My next results will be with the same opponents but with a constant time per move of 2" and positions repeated.
gaard
Posts: 447
Joined: Mon Jun 07, 2010 3:13 am
Location: Holland, MI
Full name: Martin W

Re: Stockfish 100813 94a3608 versus Komodo 3 and Critter 1.6

Post by gaard »

Now testing with a more recent version (modified 8/15/13) that is functionally equivalent to the current dev version (8/17/2013). Time control is 2" per move. Added -repeat option and all engines now use 64MB hash. I thought some of the Elo gain may have been from the improved time management but these results are actually more impressive than before. Replaced Komodo 3 with Komodo 5.1r2. All versions are 64-bit compiled with popcnt support when possible.

Code: Select all

Rank Name             Elo    +    - games score oppo. draws 
   1 Stockfish 4f55    28   16   16   672   55%     0   45% 
   2 Critter 1.6a       1   15   15   761   50%     0   47% 
   3 Komodo 5.1r2       0   15   15   763   50%     0   44% 
   4 Stockfish 3      -28   16   16   678   45%     0   45% 

Code: Select all

   1 Stockfish 4f55    28 672.0 (366.5 : 305.5)
                          336.0 (186.0 : 150.0) Critter 1.6a       1
                          336.0 (180.5 : 155.5) Komodo 5.1r2       0
   2 Critter 1.6a       1 761.0 (381.5 : 379.5)
                          336.0 (150.0 : 186.0) Stockfish 4f55    28
                           87.0 ( 42.5 :  44.5) Komodo 5.1r2       0
                          338.0 (189.0 : 149.0) Stockfish 3      -28
   3 Komodo 5.1r2       0 763.0 (381.5 : 381.5)
                          336.0 (155.5 : 180.5) Stockfish 4f55    28
                           87.0 ( 44.5 :  42.5) Critter 1.6a       1
                          340.0 (181.5 : 158.5) Stockfish 3      -28
   4 Stockfish 3      -28 678.0 (307.5 : 370.5)
                          338.0 (149.0 : 189.0) Critter 1.6a       1
                          340.0 (158.5 : 181.5) Komodo 5.1r2       0
Games: http://www.fileswap.com/dl/m9LGAlr3wc/
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Stockfish 100813 94a3608 versus Komodo 3 and Critter 1.6

Post by lucasart »

gaard wrote:Conditions:
cutechess-cli
Threads=1
Ponder=false
AMD FX(tm)-6100 Six-Core Processor
All engines 64-bit

-concurrency 5
-resign movecount=3 score=500
-each book=varied.bin tc=2/6
-draw movenumber=60 movecount=30 score=10
-tournament round-robin -games 2 -rounds 1000 -recover
All the above makes sense, but 2/6 is a very awckward time control. As an engine developper myself, I am really not sure that all the engines can cope with it very well. And the manner in which they do could affect results significantly.

I would strongly recommend using a fisher clock (eg. 5min + 3sec increment) or a standard tournament clock (40 moves in X seconds repeating).

The current regression testing of SF shows +51 elo. But that number is measured in self-play, and at tc=60"+0.05". So it should be less against foreign opponents, and at long time control. I would be very surprised if SF has indeed gained more than 50 ELO in testing conditions comparable to CCRL 40/20.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
gaard
Posts: 447
Joined: Mon Jun 07, 2010 3:13 am
Location: Holland, MI
Full name: Martin W

Re: Stockfish 100813 94a3608 versus Komodo 3 and Critter 1.6

Post by gaard »

Removing Komodo 3 from the 2/6 match and Komodo 5 from the later 2" per move match, and normalizing by setting Critter's Elo to 0 we have:

Code: Select all

Rank Name             Elo 
   1 Stockfish 4f55    27   
   2 Critter 1.6a       0  
   3 Stockfish 3      -29  

Code: Select all

Rank Name                         Elo   
   1 Stockfish 100813 64 SSE4.2    21  
   2 Critter 1.6a 64-bit            0   
   3 Stockfish 3 64bit SSE4.2     -20  
Accounting for the approximately 50% longer time control in the 2/6 match (average 3" move time) than in the 2" per move match which does appear to cause some rating compression, the results are very close. This is consistent with other matches I have ran at various time controls leading me to believe that as long as the average move times are close, without significant deviations from the mean, so should the ratings be. I plan on posting 2000+ more games for this match by Monday and then moving to 20" per move to see how well these results hold up at a CEGT 40/20-like time control.

P.S. For removing any and all time management considerations from the match and result, I suggest a constant time per move time control using cutechess-cli's st=N parameter.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Stockfish 100813 94a3608 versus Komodo 3 and Critter 1.6

Post by lucasart »

gaard wrote: P.S. For removing any and all time management considerations from the match and result, I suggest a constant time per move time control using cutechess-cli's st=N parameter.
Not sure that would lead to efficient use of CPU. The probability that move #X decides the game, decreases with X. So it makes sense to allow more time at the beginning and gradually less as the games gets more and more into an endgame.

Also, the way an engine decides to use its time (based on time and increment) is a feature of the engine. And Stockfish' time management is quite refined actually. So it gets some ELO out of that, which you cannot measure with a fixed time per move.

I prefer to always use the same ratio time/inc=100. For example, testing patches, I use 5"+0.05" or 10"+0.1". The IPON rating list uses 5'+3" which is a good compromise, I think.

Anyway, it's your tournament, so do it in whichever way you prefer. It just surprises me to see SF gaining more ELO at long TC against foreign opponents, than at short TC against itself. Don't get me wrong, I'm a big fan of SF, and it would be great if it did indeed gain almost 60 ELO in rating list conditions. I'm just a bit skeptical ;-)
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: Stockfish 100813 94a3608 versus Komodo 3 and Critter 1.6

Post by Dr.Wael Deeb »

lucasart wrote:
gaard wrote:Conditions:
cutechess-cli
Threads=1
Ponder=false
AMD FX(tm)-6100 Six-Core Processor
All engines 64-bit

-concurrency 5
-resign movecount=3 score=500
-each book=varied.bin tc=2/6
-draw movenumber=60 movecount=30 score=10
-tournament round-robin -games 2 -rounds 1000 -recover
All the above makes sense, but 2/6 is a very awckward time control. As an engine developper myself, I am really not sure that all the engines can cope with it very well. And the manner in which they do could affect results significantly.

I would strongly recommend using a fisher clock (eg. 5min + 3sec increment) or a standard tournament clock (40 moves in X seconds repeating).

The current regression testing of SF shows +51 elo. But that number is measured in self-play, and at tc=60"+0.05". So it should be less against foreign opponents, and at long time control. I would be very surprised if SF has indeed gained more than 50 ELO in testing conditions comparable to CCRL 40/20.
Fully agreed with all the aspects of the statement above....
Cheers,
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
gaard
Posts: 447
Joined: Mon Jun 07, 2010 3:13 am
Location: Holland, MI
Full name: Martin W

Re: Stockfish 100813 94a3608 versus Komodo 3 and Critter 1.6

Post by gaard »

Results at 20" per move look promising but more games will paint a clearer picture. Average game length is 45+ minutes. More games to follow...

Code: Select all

Rank Name             Elo    +    - games score oppo. draws 
   1 Stockfish 4f55    19   45   45    80   53%     1   46% 
   2 Critter 1.6a       8   45   45    80   52%    -1   46% 
   3 Komodo 5.1r2      -7   44   44    80   49%    -1   53% 
   4 Stockfish 3      -20   44   44    80   46%     1   53% 

Code: Select all

   1 Stockfish 4f55    19  80.0 ( 42.5 :  37.5)
                           40.0 ( 23.5 :  16.5) Critter 1.6a       8
                           40.0 ( 19.0 :  21.0) Komodo 5.1r2      -7
   2 Critter 1.6a       8  80.0 ( 41.5 :  38.5)
                           40.0 ( 16.5 :  23.5) Stockfish 4f55    19
                           40.0 ( 25.0 :  15.0) Stockfish 3      -20
   3 Komodo 5.1r2      -7  80.0 ( 39.0 :  41.0)
                           40.0 ( 21.0 :  19.0) Stockfish 4f55    19
                           40.0 ( 18.0 :  22.0) Stockfish 3      -20
   4 Stockfish 3      -20  80.0 ( 37.0 :  43.0)
                           40.0 ( 15.0 :  25.0) Critter 1.6a       8
                           40.0 ( 22.0 :  18.0) Komodo 5.1r2      -7

http://www.fileswap.com/dl/6JCMmTg2hs/
Modern Times
Posts: 3546
Joined: Thu Jun 07, 2012 11:02 pm

Re: Stockfish 100813 94a3608 versus Komodo 3 and Critter 1.6

Post by Modern Times »

I'm running a tournament with Stockfish 180813, 4CPU and 4' + 2" time control. I'm really interested to see how it performs.