Endgame Ratings

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

gaard
Posts: 463
Joined: Mon Jun 07, 2010 3:13 am
Location: Holland, MI
Full name: Martin W

Endgame Ratings

Post by gaard »

Here are my results from a recent endgame match where all engines play all others at 4" per move over 30 pseudo-random endgame positions repeated. Shredder uses the fast 450MB Shredderbases, Nalimov's 3-4-5 EGTB's where possible, and IvanHoe uses the RobboTripleBases.

Code: Select all

Rank Name                  Elo    +    - games score oppo. draws 
   1 IvanHoe 999963         36   25   25   300   57%    -7   59% 
   2 Deep Rybka 4 x64       22   25   24   300   55%    -4   67% 
   3 Stockfish-171-64-ja     8   25   25   300   52%    -2   67% 
   4 Naum 4.2               -5   25   25   300   49%     1   63% 
   5 Shredder 12            -7   25   25   300   49%     1   59% 
   6 Zappa Mexico II x64   -54   25   25   300   39%    11   60% 
All 64-bit engines using one core excluding Shredder which is the 32-bit version.

I will post the games tonight.
gaard
Posts: 463
Joined: Mon Jun 07, 2010 3:13 am
Location: Holland, MI
Full name: Martin W

Re: Endgame Ratings

Post by gaard »

http://www.mediafire.com/download.php?3y3yycxw5c4

Worth noting is that IvanHoe did not have access to TotalBases, only TripleBases. I assume that if it had, its rating would be even higher.

Next up, a 60R RR with Houdini 1.01, Shredder 12, Deep Rybka 4, Stockfish 1.7.1, Naum 4.2, Spark 0.4, and Zappa Mexico II... identical conditions except starting from late opening, early middle game positions.
User avatar
M ANSARI
Posts: 3726
Joined: Thu Mar 16, 2006 7:10 pm

Re: Endgame Ratings

Post by M ANSARI »

It would be interesting to see if DR4 would do better if you change the EGTB usage from "rarely" to "normal" in the parameters.
gaard
Posts: 463
Joined: Mon Jun 07, 2010 3:13 am
Location: Holland, MI
Full name: Martin W

Re: Endgame Ratings

Post by gaard »

I think it would be worth it to test with alternate tablebase usage as well. I am never sure how developers come to set these default settings, to maximize performance in games or analysis. In this case I wonder how using tablebases rarely with a constant time per move time control as opposed to a repeating time control could in influence the results. Anyways, it's on my list of things to do.
gaard
Posts: 463
Joined: Mon Jun 07, 2010 3:13 am
Location: Holland, MI
Full name: Martin W

Re: Endgame Ratings

Post by gaard »

Code: Select all

Rank Name                        Elo    +    - games score oppo. draws 
   1 IvanHoe 999963               32   23   23   360   56%    -5   62% 
   2 Deep Rybka 4 x64             20   25   25   300   55%    -6   67% 
   3 Deep Rybka 4 x64 NUNormal    12   26   25   300   53%    -6   65% 
   4 Stockfish-171-64-ja           8   23   23   360   52%    -1   67% 
   5 Naum                         -8   23   23   360   48%     1   64% 
   6 Ds12                        -10   24   24   360   48%     2   58% 
   7 Zappa Mexico II x64         -53   24   24   360   39%     9   60% 
Here is R4 playing with NalimovUsage = Normal

You get the counterintuitive impression that R4 might perform better with no tablebase usage at all. Not enough games to draw conclusions when the ratings are so compressed and margin of error's so high, but it makes you wonder...
alpha123
Posts: 660
Joined: Sat Dec 05, 2009 5:13 am
Location: Colorado, USA

Re: Endgame Ratings

Post by alpha123 »

If it was R3, I'd be more suspicious, but R4's EGTB usage is supposed to be much better (I think anyway :P), so it's probably just error margins.

EDIT: It would be interesting to see a match like this with only rook endgames....

Peter