Endgame Ratings

gaard · Post by **gaard** » Mon Jun 07, 2010 8:44 pm

Here are my results from a recent endgame match where all engines play all others at 4" per move over 30 pseudo-random endgame positions repeated. Shredder uses the fast 450MB Shredderbases, Nalimov's 3-4-5 EGTB's where possible, and IvanHoe uses the RobboTripleBases.

Code: Select all

Rank Name                  Elo    +    - games score oppo. draws 
   1 IvanHoe 999963         36   25   25   300   57%    -7   59% 
   2 Deep Rybka 4 x64       22   25   24   300   55%    -4   67% 
   3 Stockfish-171-64-ja     8   25   25   300   52%    -2   67% 
   4 Naum 4.2               -5   25   25   300   49%     1   63% 
   5 Shredder 12            -7   25   25   300   49%     1   59% 
   6 Zappa Mexico II x64   -54   25   25   300   39%    11   60%

All 64-bit engines using one core excluding Shredder which is the 32-bit version.

I will post the games tonight.

gaard · Post by **gaard** » Tue Jun 08, 2010 4:39 am

http://www.mediafire.com/download.php?3y3yycxw5c4

Worth noting is that IvanHoe did not have access to TotalBases, only TripleBases. I assume that if it had, its rating would be even higher.

Next up, a 60R RR with Houdini 1.01, Shredder 12, Deep Rybka 4, Stockfish 1.7.1, Naum 4.2, Spark 0.4, and Zappa Mexico II... identical conditions except starting from late opening, early middle game positions.

M ANSARI · Post by **M ANSARI** » Tue Jun 08, 2010 9:27 pm

It would be interesting to see if DR4 would do better if you change the EGTB usage from "rarely" to "normal" in the parameters.

gaard · Post by **gaard** » Wed Jun 09, 2010 12:16 am

I think it would be worth it to test with alternate tablebase usage as well. I am never sure how developers come to set these default settings, to maximize performance in games or analysis. In this case I wonder how using tablebases rarely with a constant time per move time control as opposed to a repeating time control could in influence the results. Anyways, it's on my list of things to do.

gaard · Post by **gaard** » Wed Jun 09, 2010 7:27 pm

Code: Select all

Rank Name                        Elo    +    - games score oppo. draws 
   1 IvanHoe 999963               32   23   23   360   56%    -5   62% 
   2 Deep Rybka 4 x64             20   25   25   300   55%    -6   67% 
   3 Deep Rybka 4 x64 NUNormal    12   26   25   300   53%    -6   65% 
   4 Stockfish-171-64-ja           8   23   23   360   52%    -1   67% 
   5 Naum                         -8   23   23   360   48%     1   64% 
   6 Ds12                        -10   24   24   360   48%     2   58% 
   7 Zappa Mexico II x64         -53   24   24   360   39%     9   60%

Here is R4 playing with NalimovUsage = Normal

You get the counterintuitive impression that R4 might perform better with no tablebase usage at all. Not enough games to draw conclusions when the ratings are so compressed and margin of error's so high, but it makes you wonder...

alpha123 · Post by **alpha123** » Wed Jun 09, 2010 9:02 pm

If it was R3, I'd be more suspicious, but R4's EGTB usage is supposed to be much better (I think anyway

), so it's probably just error margins.

EDIT: It would be interesting to see a match like this with only rook endgames....

Peter

Endgame Ratings

Endgame Ratings

Re: Endgame Ratings

Re: Endgame Ratings

Re: Endgame Ratings

Re: Endgame Ratings

Re: Endgame Ratings