What happens using egbb

Ferdy · Post by **Ferdy** » Sun Oct 13, 2013 1:16 pm

Laskos wrote:
Ferdy wrote:
Yes, it seems a Houdini problem. Is the ELO benefit from egbb measurable in Deuterium?
I start testing the latest released Deuterium with egbb at tc 40moves/60sec, with different positions and color reversed.

(1) From positions with mixed pieces on late middle game but more than 5-men. Target games is 1600
Bayeselo:
Code: Select all
Rank Name                             Elo     Diff     +     -      Games  Score    Oppo.   Draws     Win          W-L-D 
   1 Deuterium-113-egbb              7.85     0.00   7.06   7.06     1091  52.25%   -7.85  57.93%  23.28%       254-205-632
   2 Deuterium-v13.1.31.113-64bit   -7.85   -15.69   7.06   7.06     1091  47.75%    7.85  57.93%  18.79%       205-254-632
SPRT:
Code: Select all
Engine&#58; Deuterium-113-egbb
SPRT&#58; elo0 = -1.5, elo1 = +4.5, a = +0.05, b = +0.05
LLR = +1.25278 (-2.94444, +2.94444&#41;
T = +1091, W = +254, L = +205, D = +632, WNet = +49
Next will be the following, positions generated from Ed's protools.
(2) Rook and pawn ending but more than 5-men. Target games is 200
(3) Pawn ending but more than 5-men. Target games is 200
(4) Queen and pawn ending but more than 5-men. Target games is 200
(5) Bishop and knight and pawn ending but more than 5-men, Target games is 200
Thanks, so it seems than there is a real ELO benefit from egbb in late middlegames. Your error margins are 1SD, right? 15 ELO points are a lot, even if it reduces with more games to 5 ELO points, it's highly significant. Would be even better if you wait until SPRT stop, to have clear uncertainties.

That is 95% confidence from bayeselo default.
Update:

Code: Select all

SPRT&#58; elo0 = -1.5, elo1 = +4.5, a = +0.05, b = +0.05
LLR = +1.42909 (-2.94444, +2.94444&#41;
T = +1448, W = +305, L = +250, D = +893, WNet = +55

Daniel Shawul · Post by **Daniel Shawul** » Sun Oct 13, 2013 2:41 pm

I never tested how much ELOs I get, so I wouldn't know. Endgames rarely happen in engines, but for analysis and endgame testing I know they do work well. Here I am just glad someone produced data that shows it is absurd to say they hurt performance, hopefully that stops the FUD. But I don't dwell on it, because even now after Houdini is shown to be at fault (thanks to you), it goes on with questions. Well at least we made progress since the beginning of this thread, there is no mistaking who is at fault. It is unbelievable the amount of badmouthing egbbs received from the rybka forum link due to Houdini's screwed up implementation.

Michel · Post by **Michel** » Sun Oct 13, 2013 7:37 pm

I am not partnering with anyone. Everybody is free to add support for my tablebase format.

I am kind of curious. Does Houdini reimplement your GPL probing code? Or did you offer the probing code to Houdini under a different license?

Ferdy · Post by **Ferdy** » Sun Oct 13, 2013 9:36 pm

Ferdy wrote:
Yes, it seems a Houdini problem. Is the ELO benefit from egbb measurable in Deuterium?
I start testing the latest released Deuterium with egbb at tc 40moves/60sec, with different positions and color reversed.

(1) From positions with mixed pieces on late middle game but more than 5-men. Target games is 1600
Bayeselo:
Code: Select all
Rank Name                             Elo     Diff     +     -      Games  Score    Oppo.   Draws     Win          W-L-D 
   1 Deuterium-113-egbb              7.85     0.00   7.06   7.06     1091  52.25%   -7.85  57.93%  23.28%       254-205-632
   2 Deuterium-v13.1.31.113-64bit   -7.85   -15.69   7.06   7.06     1091  47.75%    7.85  57.93%  18.79%       205-254-632
SPRT:
Code: Select all
Engine&#58; Deuterium-113-egbb
SPRT&#58; elo0 = -1.5, elo1 = +4.5, a = +0.05, b = +0.05
LLR = +1.25278 (-2.94444, +2.94444&#41;
T = +1091, W = +254, L = +205, D = +632, WNet = +49
Next will be the following, positions generated from Ed's protools.
(2) Rook and pawn ending but more than 5-men. Target games is 200
(3) Pawn ending but more than 5-men. Target games is 200
(4) Queen and pawn ending but more than 5-men. Target games is 200
(5) Bishop and knight and pawn ending but more than 5-men, Target games is 200

I stopped the test as result is probably reliable already for these specific test positions. I set the rook and pawn ending to play 1000 games instead of the planned 200 games.
(1) Mixed pieces
Bayeselo:

Code: Select all

Rank Name                             Elo     Diff     +     -      Games  Score    Oppo.   Draws     Win          W-L-D 
   1 Deuterium-113-egbb              7.18     0.00   5.37   5.36     1600  52.06%   -7.18  62.62%  20.75%       332-266-1002
   2 Deuterium-v13.1.31.113-64bit   -7.18   -14.36   5.36   5.37     1600  47.94%    7.18  62.62%  16.62%       266-332-1002

(2) Rook and pawn ending
Bayeselo:

Code: Select all

Rank Name                             Elo     Diff     +     -      Games  Score    Oppo.   Draws     Win          W-L-D 
   1 Deuterium-113-egbb             12.48     0.00   5.58   5.79      562  53.74%  -12.48  81.85%  12.81%        72-30-460
   2 Deuterium-v13.1.31.113-64bit  -12.48   -24.96   5.79   5.58      562  46.26%   12.48  81.85%   5.34%        30-72-460

Overall bayeselo:

Code: Select all

Rank Name                             Elo     Diff     +     -      Games  Score    Oppo.   Draws     Win          W-L-D 
   1 Deuterium-113-egbb              8.60     0.00   4.37   4.56     2162  52.50%   -8.60  67.62%  18.69%       404-296-1462
   2 Deuterium-v13.1.31.113-64bit   -8.60   -17.20   4.56   4.37     2162  47.50%    8.60  67.62%  13.69%       296-404-1462

Ordo:

Code: Select all

   # PLAYER                          &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 Deuterium-113-egbb              &#58;    8.8    5.3   1135.0    2162   52.5%
   2 Deuterium-v13.1.31.113-64bit    &#58;   -8.8    5.3   1027.0    2162   47.5%

SPRT:

Code: Select all

Engine&#58; Deuterium-113-egbb
SPRT&#58; elo0 = -1.5, elo1 = +4.5, a = +0.05, b = +0.05
LLR = +2.98664 (-2.94444, +2.94444&#41;
T = +2162, W = +404, L = +296, D = +1462, WNet = +108

Daniel Shawul · Post by **Daniel Shawul** » Sun Oct 13, 2013 11:12 pm

Thanks a lot for the tests! The most important one is indeed the KRPkr bitbase. Many other bitbases used to have 4-men and the compulsory KRPKr from the 5-men for this reason. I have always maintained that if the engine reaches endgames often enough, bitbases will help. Often when people say they don't help, I interpret is as the engine they used has a style that wins or looses in mid-game. Anyway it is unfair to say bitbases don't help, and I don't mean only to egbbs. Most of the time the reason is they are not probed (used) well enough.

One can try to code some rules for 4 men, but for 5 men it is a waste of time. Infact what I realized when I tried to use prediction by heuristics such as capture search/neural nets ala Knightdreamer's way is that bitbases like KRPkr store mostly the exceptions! So I lost hope in that after the absymal result I got with prediction. Also why worry about implementing ton of rules for prediction or in one's engine, when you can just probe a bitbase and forget about it. It is a smart choice IMO but convincing others is a tough job.

Cheers

Laskos · Post by **Laskos** » Sun Oct 13, 2013 11:26 pm

Ferdy wrote:
Ferdy wrote:
Yes, it seems a Houdini problem. Is the ELO benefit from egbb measurable in Deuterium?
I start testing the latest released Deuterium with egbb at tc 40moves/60sec, with different positions and color reversed.

(1) From positions with mixed pieces on late middle game but more than 5-men. Target games is 1600
Bayeselo:
Code: Select all
Rank Name                             Elo     Diff     +     -      Games  Score    Oppo.   Draws     Win          W-L-D 
   1 Deuterium-113-egbb              7.85     0.00   7.06   7.06     1091  52.25%   -7.85  57.93%  23.28%       254-205-632
   2 Deuterium-v13.1.31.113-64bit   -7.85   -15.69   7.06   7.06     1091  47.75%    7.85  57.93%  18.79%       205-254-632
SPRT:
Code: Select all
Engine&#58; Deuterium-113-egbb
SPRT&#58; elo0 = -1.5, elo1 = +4.5, a = +0.05, b = +0.05
LLR = +1.25278 (-2.94444, +2.94444&#41;
T = +1091, W = +254, L = +205, D = +632, WNet = +49
Next will be the following, positions generated from Ed's protools.
(2) Rook and pawn ending but more than 5-men. Target games is 200
(3) Pawn ending but more than 5-men. Target games is 200
(4) Queen and pawn ending but more than 5-men. Target games is 200
(5) Bishop and knight and pawn ending but more than 5-men, Target games is 200
I stopped the test as result is probably reliable already for these specific test positions. I set the rook and pawn ending to play 1000 games instead of the planned 200 games.
(1) Mixed pieces
Bayeselo:
Code: Select all
Rank Name                             Elo     Diff     +     -      Games  Score    Oppo.   Draws     Win          W-L-D 
   1 Deuterium-113-egbb              7.18     0.00   5.37   5.36     1600  52.06%   -7.18  62.62%  20.75%       332-266-1002
   2 Deuterium-v13.1.31.113-64bit   -7.18   -14.36   5.36   5.37     1600  47.94%    7.18  62.62%  16.62%       266-332-1002
(2) Rook and pawn ending
Bayeselo:
Code: Select all
Rank Name                             Elo     Diff     +     -      Games  Score    Oppo.   Draws     Win          W-L-D 
   1 Deuterium-113-egbb             12.48     0.00   5.58   5.79      562  53.74%  -12.48  81.85%  12.81%        72-30-460
   2 Deuterium-v13.1.31.113-64bit  -12.48   -24.96   5.79   5.58      562  46.26%   12.48  81.85%   5.34%        30-72-460
Overall bayeselo:
Code: Select all
Rank Name                             Elo     Diff     +     -      Games  Score    Oppo.   Draws     Win          W-L-D 
   1 Deuterium-113-egbb              8.60     0.00   4.37   4.56     2162  52.50%   -8.60  67.62%  18.69%       404-296-1462
   2 Deuterium-v13.1.31.113-64bit   -8.60   -17.20   4.56   4.37     2162  47.50%    8.60  67.62%  13.69%       296-404-1462
Ordo:
Code: Select all
   # PLAYER                          &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 Deuterium-113-egbb              &#58;    8.8    5.3   1135.0    2162   52.5%
   2 Deuterium-v13.1.31.113-64bit    &#58;   -8.8    5.3   1027.0    2162   47.5%
SPRT:
Code: Select all
Engine&#58; Deuterium-113-egbb
SPRT&#58; elo0 = -1.5, elo1 = +4.5, a = +0.05, b = +0.05
LLR = +2.98664 (-2.94444, +2.94444&#41;
T = +2162, W = +404, L = +296, D = +1462, WNet = +108

Wow, impressive results using Scorpio egbb.
I tested Shredder 12 with "all345_fast" Shredderbases at 15''+0.15'' TC, and found that Shredder doesn't use them optimally, a bit like Houdini, but not that bad.
The starting positions are 3-4-5 men white wins, and I used Shredder EGBB against Shredder Nalimov TB, which solves all of them perfectly. The result is

Code: Select all

    Program                            Score     %      Elo   

  1 Shredder Nalimov               &#58; 336.5/600  56.1     21
  2 Shredder EGBB                  &#58; 263.5/600  43.9    -21

In about 20% of cases Shredder with Shredderbases fails to convert the won positions. So, more so I am impressed by your results with Scorpio egbb. The implementation of egbb probing seems to be very important.

Daniel Shawul · Post by **Daniel Shawul** » Sun Oct 13, 2013 11:41 pm

Kai, shredder needs the Nalimov tbs to make progress. So you should test Shredder_Nalimov+egbb vs Shredder_Nalimov. For scorpio egbbs like I mentioned, the fact that I have a separate dll means, I can provide the engine author with a modified score instead of WDL. That score takes into consideration material, distance from root, pawn closeness to promotion, distance of pieces to opponent king, special rules for difficult kBnk etc... All those heuiristic has to be combined in a certain weighted manner, so that you make progress most of the time. The most difficult one is combining it with distance from root. Usually you have MATE-1, MATE-2 to indicate mate in 1 or 2, but when using scorpio egbbs that 1,2 is replaced with 40 and 80 and then combined with the previous heuirstic scores to make progress. You don't need bulky DTM/DTZ/DTC for scorpio egbbs because it has this smart heuristic that helps to make progress. We worked to polish this for months, getting feedback for failed positions, when they were first released. After that I rarely had reports that scorpio failed to win a won position. Others , like Shredder, would rather use DTM/DTZ table to make progress but scorpio egbb users, handle it with WDL alone by making the engine do some work to make progress. Atleast for 5 men it has worked well, but there are question for 6 men but even that has been used by Diep so I will continue using this approach.

Daniel

Laskos · Post by **Laskos** » Mon Oct 14, 2013 12:04 am

Daniel Shawul wrote:Kai, shredder needs the Nalimov tbs to make progress. So you should test Shredder_Nalimov+egbb vs Shredder_Nalimov.

Ah, sorry, so it's harder to measure, as I cannot use 3-4-5 men won starting positions to check egbb implementation, in both cases they will give perfect results. I would probably need thousands of games from late-middlegame positions to check the benefit, but I am too lazy to do that.

For scorpio egbbs like I mentioned, the fact that I have a separate dll means, I can provide the engine author with a modified score instead of WDL. That score takes into consideration material, distance from root, pawn closeness to promotion, distance of pieces to opponent king, special rules for difficult kBnk etc... All those heuiristic has to be combined in a certain weighted manner, so that you make progress most of the time. The most difficult one is combining it with distance from root. Usually you have MATE-1, MATE-2 to indicate mate in 1 or 2, but when using scorpio egbbs that 1,2 is replaced with 40 and 80 and then combined with the previous heuirstic scores to make progress. You don't need bulky DTM/DTZ/DTC for scorpio egbbs because it has this smart heuristic that helps to make progress. We worked to polish this for months, getting feedback for failed positions, when they were first released. After that I rarely had reports that scorpio failed to win a won position. Others , like Shredder, would rather use DTM/DTZ table to make progress but scorpio egbb users, handle it with WDL alone by making the engine do some work to make progress. Atleast for 5 men it has worked well, but there are question for 6 men but even that has been used by Diep so I will continue using this approach.

Daniel

I pretty much got this from your previous replies. I was surprised that the implementation of Scorpio egbb probing in Houdini is screwed up.

Laskos · Post by **Laskos** » Mon Oct 14, 2013 12:40 pm

I managed to get a surprisingly fast conclusive result for Shredder 12 using endgame bases. Shredder 12 + Nalimov + egbb against Shredder 12 without any bases. Standard opening 8-move positions, TC 15''+0.15'', LOS 99.9% as stopping rule.

Code: Select all

Games Completed = 3315 of 30000 &#40;Avg game length = 43.826 sec&#41;
Settings = Gauntlet/16MB/15000ms+150ms/M 700000cp for 1000 moves, D 150000 moves/PGN&#58;C&#58;\LittleBlitzer\swcr.pgn&#40;5120&#41;
Time = 39180 sec elapsed, 315392 sec remaining
 1.  Shredder EGBB            	1723.5/3315	881-749-1685  	&#40;L&#58; m=749 t=0 i=0 a=0&#41;	&#40;D&#58; r=1344 i=283 f=50 s=8 a=0&#41;	&#40;tpm=312.2 d=10.25 nps=765881&#41;
 2.  Shredder                 	1591.5/3315	749-881-1685  	&#40;L&#58; m=881 t=0 i=0 a=0&#41;	&#40;D&#58; r=1344 i=283 f=50 s=8 a=0&#41;	&#40;tpm=328.0 d=11.79 nps=892042&#41;

The NPS is significantly lower using bases, but the net benefit is evident:

Code: Select all

    Program                            Score       %    Elo    +   -    Draws

  1 Shredder EGBB                  &#58; 1723.5/3315  52.0    7    8   8   50.8 %
  2 Shredder                       &#58; 1591.5/3315  48.0   -7    8   8   50.8 %

14 +/- 8 (2SD) ELO points benefit for Shredder using Nalimov + egbb, LOS 99.95%. This is the first time I get a conclusive result using endgame bases, and the benefit is pretty substantial. Ferdy got a benefit with Scorpio egbb, and all this amounts to disproving the skeptics of endgame bases (especially egbb) ELO-wise benefits. Maybe I will manage a match Shredder + Nalimov + egbb vs. Shredder + Nalimov.

Daniel Shawul · Post by **Daniel Shawul** » Mon Oct 14, 2013 2:40 pm

14 +/- 8 (2SD) ELO points benefit for Shredder using Nalimov + egbb, LOS 99.95%. This is the first time I get a conclusive result using endgame bases, and the benefit is pretty substantial. Ferdy got a benefit with Scorpio egbb, and all this amounts to disproving the skeptics of endgame bases (especially egbb) ELO-wise benefits. Maybe I will manage a match Shredder + Nalimov + egbb vs. Shredder + Nalimov.

Exactly. Atleast bitbases should help some since they are loaded in RAM. IMO it is the knowledge that matters when you are playing a KRPkr and such, because you have to search long to the see the promotion of pawns. Even KPk was a must some time ago. I think the fact that we hear EGTBs don't help much so much repeatedly here, which also implies the same for EGBBs, made even me skeptic as you can tell from my initial reaction to Ferd's result, sorry

I guess we should all just do the test and see the result instead of speculating or in some cases blowing hot air.

What happens using egbb

Re: What happens using egbb

Re: What happens using egbb

Re: What happens using egbb

Re: What happens using egbb

Re: What happens using egbb

Re: What happens using egbb

Re: What happens using egbb

Re: What happens using egbb

Re: What happens using egbb

Re: What happens using egbb