Note 1. I did similar test with Rybka 3 already 2010, but later I found that conditions were flawed, so please forget!
Note 2. I understand well not enough games, but these already take one week..
Conditions: 1 cpu, average time 1s/move, no ponder, starting positions from EG_Msb.epd so we are already
in playing ENDINGS. NOTB = Rybka playing without Nalimov access. Default has 5 piece access.
R4.1 - R4.1 NOTB +114,=381,-105 +5 ELO
R4.1 - Houdini 3 +57,=236,-107 -44 ELO
R4.1 NOTB - H 3 +39,=247,-114 -66 ELO
R4.1 - Stockfish 2.3.1 +74,=240,-86 -10 ELO
R4.1 NOTB - SF 2.3.1 +56,=251,-93 -32 ELO
Houdini and Stockfish matches indicate + 22 ELO gain both. But if 1/5 games are decided in endgames total
EGTB benefit is very small and hardly detectable in test games!?
EGTB test with Rybka 4.1 (2200 games)
Moderator: Ras
-
- Posts: 3644
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
-
- Posts: 2120
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: EGTB test with Rybka 4.1 (2200 games).
Hello Jouni:
I will complete your post with the error bars I have calculated with 95% confidence, using my own programme; I also give LOS. Numbers are rounded:
I hope no typos. My LOS is an approximation using a normal distribution with its mean and standard deviation; Rémi's LOS only takes into account the number of wins and loses of each match. The error bars are not symmetric in the model I use, so I have written the average of both values (columns - and + if you use BayesElo).
Using a normal difference distribution, this is what I obtain (rounding up to 0.01 Elo):
The LOS of the 600-game match (R4.1) versus (R4.1 NOTB) is not conclusive.
Thank you very much for your effort!
Regards from Spain.
Ajedrecista.
It is an interesting test. I remember that Ingo forgot using EGTB with Houdini 2 when running IPON so he tested Houdini 2 with and without EGTB. The difference in rating was 0 IIRC (I have not included error bars).Jouni wrote:Note 1. I did similar test with Rybka 3 already 2010, but later I found that conditions were flawed, so please forget!
Note 2. I understand well not enough games, but these already take one week..
Conditions: 1 cpu, average time 1s/move, no ponder, starting positions from EG_Msb.epd so we are already
in playing ENDINGS. NOTB = Rybka playing without Nalimov access. Default has 5 piece access.
R4.1 - R4.1 NOTB +114,=381,-105 +5 ELO
R4.1 - Houdini 3 +57,=236,-107 -44 ELO
R4.1 NOTB - H 3 +39,=247,-114 -66 ELO
R4.1 - Stockfish 2.3.1 +74,=240,-86 -10 ELO
R4.1 NOTB - SF 2.3.1 +56,=251,-93 -32 ELO
Houdini and Stockfish matches indicate + 22 ELO gain both. But if 1/5 games are decided in endgames total
EGTB benefit is very small and hardly detectable in test games!?
I will complete your post with the error bars I have calculated with 95% confidence, using my own programme; I also give LOS. Numbers are rounded:
Code: Select all
R4.1 - R4.1 NOTB +114,=381,-105 +5 ELO
(600 games): +5.21 ± 16.81 Elo.
My LOS: 72.85%.
Rémi's LOS: 72.8%.
-------------------------------------------
R4.1 - Houdini 3 +57,=236,-107 -44 ELO
(400 games): -43.66 ± 21.75 Elo.
My LOS: 0%.
Rémi's LOS: 0%.
-------------------------------------------
R4.1 NOTB - H 3 +39,=247,-114 -66 ELO
(400 games): -65.92 ± 20.83 Elo.
My LOS: 0%.
Rémi's LOS: 0%.
-------------------------------------------
R4.1 - Stockfish 2.3.1 +74,=240,-86 -10 ELO
(400 games): -10.43 ± 21.56 Elo.
My LOS: 17.11%.
Rémi's LOS: 17.22%.
-------------------------------------------
R4.1 NOTB - SF 2.3.1 +56,=251,-93 -32 ELO
(400 games): -32.23 ± 20.74 Elo.
My LOS: 0.11%.
Rémi's LOS: 0.12%.
Using a normal difference distribution, this is what I obtain (rounding up to 0.01 Elo):
Code: Select all
Against Houdini 3:
(R4.1) - (R4.1 NOTB) ~ -43.66 - (-65.92) ± sqrt[(21.75)² + (20.83)²] ~ 22.26 ± 30.12 Elo.
-------------------------------------------
Against SF 2.3.1:
(R4.1) - (R4.1 NOTB) ~ -10.43 - (-32.23) ± sqrt[(21.56)² + (20.74)²] ~ 21.8 ± 29.92 Elo.
Thank you very much for your effort!
Regards from Spain.
Ajedrecista.
-
- Posts: 3644
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Re: EGTB test with Rybka 4.1 (2200 games)
I did one additional match vs weaker engine and got:
Rybka - Deep Fritz 12 +153,=214,-33 +108
Rybka NOTB - DF 12 +150,=213,-37 +101
So here TB benefit 7 points.
And total result after 1800 games both +11 ELO.
The reasons I tested just Rybka. 1) At least older versions have holes in some basic endings like a/h pawn and bishop and 2) Rybka in ONLY engine which scores better in endgame test suites with TBs.
Rybka - Deep Fritz 12 +153,=214,-33 +108
Rybka NOTB - DF 12 +150,=213,-37 +101
So here TB benefit 7 points.
And total result after 1800 games both +11 ELO.
The reasons I tested just Rybka. 1) At least older versions have holes in some basic endings like a/h pawn and bishop and 2) Rybka in ONLY engine which scores better in endgame test suites with TBs.
Jouni