Engine: Deuterium-113-egbb
SPRT: elo0 = -1.5, elo1 = +4.5, a = +0.05, b = +0.05
LLR = +1.25278 (-2.94444, +2.94444)
T = +1091, W = +254, L = +205, D = +632, WNet = +49
Next will be the following, positions generated from Ed's protools.
(2) Rook and pawn ending but more than 5-men. Target games is 200
(3) Pawn ending but more than 5-men. Target games is 200
(4) Queen and pawn ending but more than 5-men. Target games is 200
(5) Bishop and knight and pawn ending but more than 5-men, Target games is 200
Thanks, so it seems than there is a real ELO benefit from egbb in late middlegames. Your error margins are 1SD, right? 15 ELO points are a lot, even if it reduces with more games to 5 ELO points, it's highly significant. Would be even better if you wait until SPRT stop, to have clear uncertainties.
That is 95% confidence from bayeselo default.
Update:
I never tested how much ELOs I get, so I wouldn't know. Endgames rarely happen in engines, but for analysis and endgame testing I know they do work well. Here I am just glad someone produced data that shows it is absurd to say they hurt performance, hopefully that stops the FUD. But I don't dwell on it, because even now after Houdini is shown to be at fault (thanks to you), it goes on with questions. Well at least we made progress since the beginning of this thread, there is no mistaking who is at fault. It is unbelievable the amount of badmouthing egbbs received from the rybka forum link due to Houdini's screwed up implementation.
Engine: Deuterium-113-egbb
SPRT: elo0 = -1.5, elo1 = +4.5, a = +0.05, b = +0.05
LLR = +1.25278 (-2.94444, +2.94444)
T = +1091, W = +254, L = +205, D = +632, WNet = +49
Next will be the following, positions generated from Ed's protools.
(2) Rook and pawn ending but more than 5-men. Target games is 200
(3) Pawn ending but more than 5-men. Target games is 200
(4) Queen and pawn ending but more than 5-men. Target games is 200
(5) Bishop and knight and pawn ending but more than 5-men, Target games is 200
I stopped the test as result is probably reliable already for these specific test positions. I set the rook and pawn ending to play 1000 games instead of the planned 200 games.
(1) Mixed pieces
Bayeselo:
Thanks a lot for the tests! The most important one is indeed the KRPkr bitbase. Many other bitbases used to have 4-men and the compulsory KRPKr from the 5-men for this reason. I have always maintained that if the engine reaches endgames often enough, bitbases will help. Often when people say they don't help, I interpret is as the engine they used has a style that wins or looses in mid-game. Anyway it is unfair to say bitbases don't help, and I don't mean only to egbbs. Most of the time the reason is they are not probed (used) well enough.
One can try to code some rules for 4 men, but for 5 men it is a waste of time. Infact what I realized when I tried to use prediction by heuristics such as capture search/neural nets ala Knightdreamer's way is that bitbases like KRPkr store mostly the exceptions! So I lost hope in that after the absymal result I got with prediction. Also why worry about implementing ton of rules for prediction or in one's engine, when you can just probe a bitbase and forget about it. It is a smart choice IMO but convincing others is a tough job.
Engine: Deuterium-113-egbb
SPRT: elo0 = -1.5, elo1 = +4.5, a = +0.05, b = +0.05
LLR = +1.25278 (-2.94444, +2.94444)
T = +1091, W = +254, L = +205, D = +632, WNet = +49
Next will be the following, positions generated from Ed's protools.
(2) Rook and pawn ending but more than 5-men. Target games is 200
(3) Pawn ending but more than 5-men. Target games is 200
(4) Queen and pawn ending but more than 5-men. Target games is 200
(5) Bishop and knight and pawn ending but more than 5-men, Target games is 200
I stopped the test as result is probably reliable already for these specific test positions. I set the rook and pawn ending to play 1000 games instead of the planned 200 games.
(1) Mixed pieces
Bayeselo:
Engine: Deuterium-113-egbb
SPRT: elo0 = -1.5, elo1 = +4.5, a = +0.05, b = +0.05
LLR = +2.98664 (-2.94444, +2.94444)
T = +2162, W = +404, L = +296, D = +1462, WNet = +108
Wow, impressive results using Scorpio egbb.
I tested Shredder 12 with "all345_fast" Shredderbases at 15''+0.15'' TC, and found that Shredder doesn't use them optimally, a bit like Houdini, but not that bad.
The starting positions are 3-4-5 men white wins, and I used Shredder EGBB against Shredder Nalimov TB, which solves all of them perfectly. The result is
Program Score % Elo
1 Shredder Nalimov : 336.5/600 56.1 21
2 Shredder EGBB : 263.5/600 43.9 -21
In about 20% of cases Shredder with Shredderbases fails to convert the won positions. So, more so I am impressed by your results with Scorpio egbb. The implementation of egbb probing seems to be very important.
Kai, shredder needs the Nalimov tbs to make progress. So you should test Shredder_Nalimov+egbb vs Shredder_Nalimov. For scorpio egbbs like I mentioned, the fact that I have a separate dll means, I can provide the engine author with a modified score instead of WDL. That score takes into consideration material, distance from root, pawn closeness to promotion, distance of pieces to opponent king, special rules for difficult kBnk etc... All those heuiristic has to be combined in a certain weighted manner, so that you make progress most of the time. The most difficult one is combining it with distance from root. Usually you have MATE-1, MATE-2 to indicate mate in 1 or 2, but when using scorpio egbbs that 1,2 is replaced with 40 and 80 and then combined with the previous heuirstic scores to make progress. You don't need bulky DTM/DTZ/DTC for scorpio egbbs because it has this smart heuristic that helps to make progress. We worked to polish this for months, getting feedback for failed positions, when they were first released. After that I rarely had reports that scorpio failed to win a won position. Others , like Shredder, would rather use DTM/DTZ table to make progress but scorpio egbb users, handle it with WDL alone by making the engine do some work to make progress. Atleast for 5 men it has worked well, but there are question for 6 men but even that has been used by Diep so I will continue using this approach.
Daniel Shawul wrote:Kai, shredder needs the Nalimov tbs to make progress. So you should test Shredder_Nalimov+egbb vs Shredder_Nalimov.
Ah, sorry, so it's harder to measure, as I cannot use 3-4-5 men won starting positions to check egbb implementation, in both cases they will give perfect results. I would probably need thousands of games from late-middlegame positions to check the benefit, but I am too lazy to do that.
For scorpio egbbs like I mentioned, the fact that I have a separate dll means, I can provide the engine author with a modified score instead of WDL. That score takes into consideration material, distance from root, pawn closeness to promotion, distance of pieces to opponent king, special rules for difficult kBnk etc... All those heuiristic has to be combined in a certain weighted manner, so that you make progress most of the time. The most difficult one is combining it with distance from root. Usually you have MATE-1, MATE-2 to indicate mate in 1 or 2, but when using scorpio egbbs that 1,2 is replaced with 40 and 80 and then combined with the previous heuirstic scores to make progress. You don't need bulky DTM/DTZ/DTC for scorpio egbbs because it has this smart heuristic that helps to make progress. We worked to polish this for months, getting feedback for failed positions, when they were first released. After that I rarely had reports that scorpio failed to win a won position. Others , like Shredder, would rather use DTM/DTZ table to make progress but scorpio egbb users, handle it with WDL alone by making the engine do some work to make progress. Atleast for 5 men it has worked well, but there are question for 6 men but even that has been used by Diep so I will continue using this approach.
Daniel
I pretty much got this from your previous replies. I was surprised that the implementation of Scorpio egbb probing in Houdini is screwed up.
I managed to get a surprisingly fast conclusive result for Shredder 12 using endgame bases. Shredder 12 + Nalimov + egbb against Shredder 12 without any bases. Standard opening 8-move positions, TC 15''+0.15'', LOS 99.9% as stopping rule.
Program Score % Elo + - Draws
1 Shredder EGBB : 1723.5/3315 52.0 7 8 8 50.8 %
2 Shredder : 1591.5/3315 48.0 -7 8 8 50.8 %
14 +/- 8 (2SD) ELO points benefit for Shredder using Nalimov + egbb, LOS 99.95%. This is the first time I get a conclusive result using endgame bases, and the benefit is pretty substantial. Ferdy got a benefit with Scorpio egbb, and all this amounts to disproving the skeptics of endgame bases (especially egbb) ELO-wise benefits. Maybe I will manage a match Shredder + Nalimov + egbb vs. Shredder + Nalimov.
14 +/- 8 (2SD) ELO points benefit for Shredder using Nalimov + egbb, LOS 99.95%. This is the first time I get a conclusive result using endgame bases, and the benefit is pretty substantial. Ferdy got a benefit with Scorpio egbb, and all this amounts to disproving the skeptics of endgame bases (especially egbb) ELO-wise benefits. Maybe I will manage a match Shredder + Nalimov + egbb vs. Shredder + Nalimov.
Exactly. Atleast bitbases should help some since they are loaded in RAM. IMO it is the knowledge that matters when you are playing a KRPkr and such, because you have to search long to the see the promotion of pawns. Even KPk was a must some time ago. I think the fact that we hear EGTBs don't help much so much repeatedly here, which also implies the same for EGBBs, made even me skeptic as you can tell from my initial reaction to Ferd's result, sorry I guess we should all just do the test and see the result instead of speculating or in some cases blowing hot air.