illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

zullil · Post by **zullil** » Sat Jan 11, 2014 11:31 am

Joerg Oster wrote:Hi Eelco,

does it now play b8q immediately?
My bugfix version (with a slight addition to yours ) does!
Code: Select all
info depth 1 seldepth 1 score cp 1260 nodes 15 nps 7500 time 2 multipv 1 pv b7b8q
info depth 2 seldepth 2 score cp 1290 nodes 128 nps 42666 time 3 multipv 1 pv b7b8q f5g4 b8b4 g4f5
Best, Joerg.

My bug fix didn't fix it. Would you mind posting your code so I can see the right way to fix it.

Thanks.

Eelco de Groot · Post by **Eelco de Groot** » Sat Jan 11, 2014 11:39 am

Sure, go ahead Joerg. I don't think there is much difference between the versions? I looked at your Stockfish branch but I did not find any code there yet.

Eelco

hgm · Post by **hgm** » Sat Jan 11, 2014 12:05 pm

syzygy wrote:Quite a few KBBvKN positions need more than 50 moves for capturing the knight.

More accurately: the vast majority. (After you weed out the positions where the Knight is tactically lost in 1 or 2 ply from the very beginning, because it is hanging, or victim of a skewer etc.) If the engine cannot see the gaining of the Knight within its horizon, it should better assume KBBKN is a draw.

Stockfish seems to be pretty backward in its end-game knowledege. I don't think Fruit 2.1 would make the mis-evaluation of the original post. White has no Pawns, and that fact alone deserves a 50% reduction of its naive evaluation advantage.

Fruit would group this material combination in the 'minor ahead, no Pawns' class. Which is severely discounted (a factor 8?). That the defending side has one or two Pawns doesn't make it any easier, and is only taken into account in the sense that it reduces the naive advantage even before the discount is applied.

Vinvin · Post by **Vinvin** » Sat Jan 11, 2014 12:30 pm

Joerg Oster wrote:Eelco, you need to test the position with the bishop promotion.

kn5B/8/1K6/8/8/8/8/B7 b - - 0 1

Code: Select all

position fen kn5B/8/1K6/8/8/8/8/B7 b - - 0 1
go depth 10
info depth 1 seldepth 2 score cp 0 nodes 21 nps 21000 time 1 multipv 1 pv b8d7 b6a5
info depth 2 seldepth 3 score cp 0 nodes 67 nps 67000 time 1 multipv 1 pv b8d7 b6a5 d7c5
info depth 3 seldepth 4 score cp 0 nodes 158 nps 79000 time 2 multipv 1 pv b8d7 b6a5 d7c5 h8b2
info depth 4 seldepth 5 score cp 0 nodes 299 nps 99666 time 3 multipv 1 pv b8d7 b6a5 d7c5 h8b2 c5d7
info depth 5 seldepth 6 score cp 0 nodes 471 nps 157000 time 3 multipv 1 pv b8d7 b6a5 d7c5 h8b2 c5d7 b2c1
info depth 6 seldepth 7 score cp 0 nodes 690 nps 172500 time 4 multipv 1 pv b8d7 b6a5 d7c5 h8b2 c5d7 b2c1 d7b8
info depth 7 seldepth 8 score cp 0 nodes 889 nps 177800 time 5 multipv 1 pv b8d7 b6a5 d7c5 h8b2 c5d7 b2c1 d7b8 c1b2
info depth 8 seldepth 9 score cp 0 nodes 1156 nps 192666 time 6 multipv 1 pv b8d7 b6a5 d7c5 h8b2 c5d7 b2c1 d7b8 c1b2 b8d7
info depth 9 seldepth 10 score cp 0 nodes 1526 nps 218000 time 7 multipv 1 pv b8d7 b6a5 d7c5 h8b2 c5d7 b2c1 d7b8 c1b2 b8d7
info depth 10 seldepth 10 score cp 0 nodes 2123 nps 265375 time 8 multipv 1 pv b8d7 b6a5 d7c5 h8b2 c5d7 b2c1 d7b8 c1b2 b8d7
info nodes 2123 time 8
bestmove b8d7 ponder b6a5

If you don't mind, I will do a pull request with my version ...

Edit: Oh, I just realize you did test it, but no draw score

Note that the rule have to generalized with "no opposite color bishop"

[d]kn1B1B1B/8/1K6/8/8/8/8/B7 b - -

Eelco de Groot · Post by **Eelco de Groot** » Sat Jan 11, 2014 12:33 pm

Joerg Oster wrote: Edit: Oh, I just realize you did test it, but no draw score

I just left in the bonus for trying to capture the Knight, to give the engine something to do. This is more Swindle mode of sorts but it does not really help. With or without the knight there is no way without help from the other side and then even with a knight I think I can't construct a position where it is mate in a corner by a blunder. If trying to capture the knight takes more than 50 moves though in general I think Marco had better move back to Tord's original version that correctly scores a draw with same coloured bishops... Very subtle! I find it a bit cheap of Harm to accuse Stockfish of having poor endgame rules on the basis of one bug found more or less by accident.

Eelco

hgm · Post by **hgm** » Sat Jan 11, 2014 1:58 pm

Uh? I was not talking about the like-colored-Bishops bug here, but about the KBBKBP evaluation in the original post. I don't think this can be traced to any bug. The white Bishops there are a regular pair. It seems a plain omission.

This seems rather a symptom of a very general problem, namely that it does not know that when KXYKZ is a dead draw, KXYKZP and KXYKZPP is even worse. Have you tried this with KRKBP, KRKNP, KBNKNP, KBNKBP, KRBKRP, KRNKRP, KQBKQP, KQNKQP? Fruit (a 10-year-old engine!) would recognize all these material combinations as heavily drawish.

I don't think there is anything 'cheap' in concluding that an engine completely unaware of such elementary facts has 'poor end-game knowledge'.

Joerg Oster · Post by **Joerg Oster** » Sat Jan 11, 2014 2:32 pm

I'm afraid, Marco will not add code/endgame knowledge with zero practical relevance ...

Joerg Oster · Post by **Joerg Oster** » Sat Jan 11, 2014 2:36 pm

You are right. Missing knowledge.

OTOH, you must admit SF is doing very well in real game-play without it.

hgm · Post by **hgm** » Sat Jan 11, 2014 2:53 pm

Well, I sometimes have my doubts how real 'real' really is. It is rather fashionable nowadays to cull all knowledge out of Chess egines. That entails the risk that they now almost all make the same silly mistakes, so that it doesn't hurt much when you make them yourself too. The term 'incestuous testing' has been coined for this.

It is hard for me to believe that not knowing someting as elementary as that without Pawns being a minor ahead is still a dead draw could not be efficiently exploited by an opponent that does know it. Especially by an opponent that knows you are naive in this respect. Just like Pablo exploits that engines are naive towards closing the position. But if you only test agains opponents that would never sucker you for a draw, you wouldn't see the difference. But could you still call such testing 'real game-play'? We might well be creating our own virtual reality here.

The funny thing with the under-promotion is not only that it doesn't recognize the like Bishops, but that it thinks KBBKN is better than KQBKN in the first place. Even with unlike Bishops KBBKN is almost always a 50-move draw against best defense, while KQBKN of course always wins. Recognizing KBBKN as a 'certified win' seems an example of 'wrong knowledge'.

modolief · Post by **modolief** » Sat Jan 11, 2014 5:26 pm

Would there be some alternative testing track that could uncover these kinds of problems and clean them up? Something like randomly generated positions with playouts? Maybe take a randomly generated position and try a self-playout vs a playout vs another engine. Of course "randomly generated" is an extremely wide net, might be some ways to narrow that down without missing the interesting cases we want to detect.

illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00