EGTB test with Rybka 4.1 (2200 games)

Jouni · Post by **Jouni** » Fri Jan 11, 2013 10:52 pm

Note 1. I did similar test with Rybka 3 already 2010, but later I found that conditions were flawed, so please forget!
Note 2. I understand well not enough games, but these already take one week..

Conditions: 1 cpu, average time 1s/move, no ponder, starting positions from EG_Msb.epd so we are already
in playing ENDINGS. NOTB = Rybka playing without Nalimov access. Default has 5 piece access.

R4.1 - R4.1 NOTB +114,=381,-105 +5 ELO

R4.1 - Houdini 3 +57,=236,-107 -44 ELO
R4.1 NOTB - H 3 +39,=247,-114 -66 ELO

R4.1 - Stockfish 2.3.1 +74,=240,-86 -10 ELO
R4.1 NOTB - SF 2.3.1 +56,=251,-93 -32 ELO

Houdini and Stockfish matches indicate + 22 ELO gain both. But if 1/5 games are decided in endgames total
EGTB benefit is very small and hardly detectable in test games!?

Ajedrecista · Post by **Ajedrecista** » Sat Jan 12, 2013 12:18 pm

Hello Jouni:

Jouni wrote:Note 1. I did similar test with Rybka 3 already 2010, but later I found that conditions were flawed, so please forget!
Note 2. I understand well not enough games, but these already take one week..

Conditions: 1 cpu, average time 1s/move, no ponder, starting positions from EG_Msb.epd so we are already
in playing ENDINGS. NOTB = Rybka playing without Nalimov access. Default has 5 piece access.

R4.1 - R4.1 NOTB +114,=381,-105 +5 ELO

R4.1 - Houdini 3 +57,=236,-107 -44 ELO
R4.1 NOTB - H 3 +39,=247,-114 -66 ELO

R4.1 - Stockfish 2.3.1 +74,=240,-86 -10 ELO
R4.1 NOTB - SF 2.3.1 +56,=251,-93 -32 ELO

Houdini and Stockfish matches indicate + 22 ELO gain both. But if 1/5 games are decided in endgames total
EGTB benefit is very small and hardly detectable in test games!?

It is an interesting test. I remember that Ingo forgot using EGTB with Houdini 2 when running IPON so he tested Houdini 2 with and without EGTB. The difference in rating was 0 IIRC (I have not included error bars).

I will complete your post with the error bars I have calculated with 95% confidence, using my own programme; I also give LOS. Numbers are rounded:

Code: Select all

R4.1 - R4.1 NOTB +114,=381,-105 +5 ELO

(600 games): +5.21 ± 16.81 Elo.
    My LOS: 72.85%.
Rémi's LOS: 72.8%.

-------------------------------------------

R4.1 - Houdini 3 +57,=236,-107 -44 ELO

(400 games): -43.66 ± 21.75 Elo.
    My LOS: 0%.
Rémi's LOS: 0%.

-------------------------------------------

R4.1 NOTB - H 3 +39,=247,-114 -66 ELO

(400 games): -65.92 ± 20.83 Elo.
    My LOS: 0%.
Rémi's LOS: 0%.

-------------------------------------------

R4.1 - Stockfish 2.3.1 +74,=240,-86 -10 ELO

(400 games): -10.43 ± 21.56 Elo.
    My LOS: 17.11%.
Rémi's LOS: 17.22%.

-------------------------------------------

R4.1 NOTB - SF 2.3.1 +56,=251,-93 -32 ELO

(400 games): -32.23 ± 20.74 Elo.
    My LOS: 0.11%.
Rémi's LOS: 0.12%.

I hope no typos. My LOS is an approximation using a normal distribution with its mean and standard deviation; Rémi's LOS only takes into account the number of wins and loses of each match. The error bars are not symmetric in the model I use, so I have written the average of both values (columns - and + if you use BayesElo).

Using a normal difference distribution, this is what I obtain (rounding up to 0.01 Elo):

Code: Select all

Against Houdini 3:

(R4.1) - (R4.1 NOTB) ~ -43.66 - (-65.92) ± sqrt[(21.75)² + (20.83)²] ~ 22.26 ± 30.12 Elo.

-------------------------------------------

Against SF 2.3.1:

(R4.1) - (R4.1 NOTB) ~ -10.43 - (-32.23) ± sqrt[(21.56)² + (20.74)²] ~ 21.8 ± 29.92 Elo.

The LOS of the 600-game match (R4.1) versus (R4.1 NOTB) is not conclusive.

Thank you very much for your effort!

Regards from Spain.

Ajedrecista.

Jouni · Post by **Jouni** » Tue Jan 15, 2013 7:23 pm

I did one additional match vs weaker engine and got:

Rybka - Deep Fritz 12 +153,=214,-33 +108

Rybka NOTB - DF 12 +150,=213,-37 +101

So here TB benefit 7 points.

And total result after 1800 games both +11 ELO.

The reasons I tested just Rybka. 1) At least older versions have holes in some basic endings like a/h pawn and bishop and 2) Rybka in ONLY engine which scores better in endgame test suites with TBs.

EGTB test with Rybka 4.1 (2200 games)

EGTB test with Rybka 4.1 (2200 games)

Re: EGTB test with Rybka 4.1 (2200 games).

Re: EGTB test with Rybka 4.1 (2200 games)