EGTB test with Rybka 4.1 (2200 games)

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Jouni
Posts: 3644
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

EGTB test with Rybka 4.1 (2200 games)

Post by Jouni »

Note 1. I did similar test with Rybka 3 already 2010, but later I found that conditions were flawed, so please forget!
Note 2. I understand well not enough games, but these already take one week..

Conditions: 1 cpu, average time 1s/move, no ponder, starting positions from EG_Msb.epd so we are already
in playing ENDINGS. NOTB = Rybka playing without Nalimov access. Default has 5 piece access.

R4.1 - R4.1 NOTB +114,=381,-105 +5 ELO

R4.1 - Houdini 3 +57,=236,-107 -44 ELO
R4.1 NOTB - H 3 +39,=247,-114 -66 ELO

R4.1 - Stockfish 2.3.1 +74,=240,-86 -10 ELO
R4.1 NOTB - SF 2.3.1 +56,=251,-93 -32 ELO

Houdini and Stockfish matches indicate + 22 ELO gain both. But if 1/5 games are decided in endgames total
EGTB benefit is very small and hardly detectable in test games!?
Jouni
User avatar
Ajedrecista
Posts: 2120
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: EGTB test with Rybka 4.1 (2200 games).

Post by Ajedrecista »

Hello Jouni:
Jouni wrote:Note 1. I did similar test with Rybka 3 already 2010, but later I found that conditions were flawed, so please forget!
Note 2. I understand well not enough games, but these already take one week..

Conditions: 1 cpu, average time 1s/move, no ponder, starting positions from EG_Msb.epd so we are already
in playing ENDINGS. NOTB = Rybka playing without Nalimov access. Default has 5 piece access.

R4.1 - R4.1 NOTB +114,=381,-105 +5 ELO

R4.1 - Houdini 3 +57,=236,-107 -44 ELO
R4.1 NOTB - H 3 +39,=247,-114 -66 ELO

R4.1 - Stockfish 2.3.1 +74,=240,-86 -10 ELO
R4.1 NOTB - SF 2.3.1 +56,=251,-93 -32 ELO

Houdini and Stockfish matches indicate + 22 ELO gain both. But if 1/5 games are decided in endgames total
EGTB benefit is very small and hardly detectable in test games!?
It is an interesting test. I remember that Ingo forgot using EGTB with Houdini 2 when running IPON so he tested Houdini 2 with and without EGTB. The difference in rating was 0 IIRC (I have not included error bars).

I will complete your post with the error bars I have calculated with 95% confidence, using my own programme; I also give LOS. Numbers are rounded:

Code: Select all

R4.1 - R4.1 NOTB +114,=381,-105 +5 ELO

(600 games): +5.21 ± 16.81 Elo.
    My LOS: 72.85%.
Rémi's LOS: 72.8%.

-------------------------------------------

R4.1 - Houdini 3 +57,=236,-107 -44 ELO

(400 games): -43.66 ± 21.75 Elo.
    My LOS: 0%.
Rémi's LOS: 0%.

-------------------------------------------

R4.1 NOTB - H 3 +39,=247,-114 -66 ELO

(400 games): -65.92 ± 20.83 Elo.
    My LOS: 0%.
Rémi's LOS: 0%.

-------------------------------------------

R4.1 - Stockfish 2.3.1 +74,=240,-86 -10 ELO

(400 games): -10.43 ± 21.56 Elo.
    My LOS: 17.11%.
Rémi's LOS: 17.22%.

-------------------------------------------

R4.1 NOTB - SF 2.3.1 +56,=251,-93 -32 ELO

(400 games): -32.23 ± 20.74 Elo.
    My LOS: 0.11%.
Rémi's LOS: 0.12%.
I hope no typos. My LOS is an approximation using a normal distribution with its mean and standard deviation; Rémi's LOS only takes into account the number of wins and loses of each match. The error bars are not symmetric in the model I use, so I have written the average of both values (columns - and + if you use BayesElo).

Using a normal difference distribution, this is what I obtain (rounding up to 0.01 Elo):

Code: Select all

Against Houdini 3:

(R4.1) - (R4.1 NOTB) ~ -43.66 - (-65.92) ± sqrt[(21.75)² + (20.83)²] ~ 22.26 ± 30.12 Elo.

-------------------------------------------

Against SF 2.3.1:

(R4.1) - (R4.1 NOTB) ~ -10.43 - (-32.23) ± sqrt[(21.56)² + (20.74)²] ~ 21.8 ± 29.92 Elo.
The LOS of the 600-game match (R4.1) versus (R4.1 NOTB) is not conclusive.

Thank you very much for your effort!

Regards from Spain.

Ajedrecista.
Jouni
Posts: 3644
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: EGTB test with Rybka 4.1 (2200 games)

Post by Jouni »

I did one additional match vs weaker engine and got:

Rybka - Deep Fritz 12 +153,=214,-33 +108

Rybka NOTB - DF 12 +150,=213,-37 +101

So here TB benefit 7 points.

And total result after 1800 games both +11 ELO.

The reasons I tested just Rybka. 1) At least older versions have holes in some basic endings like a/h pawn and bishop and 2) Rybka in ONLY engine which scores better in endgame test suites with TBs.
Jouni