illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by zullil »

Joerg Oster wrote:Hi Eelco,

does it now play b8q immediately?
My bugfix version (with a slight addition to yours ;-) ) does!

Code: Select all

info depth 1 seldepth 1 score cp 1260 nodes 15 nps 7500 time 2 multipv 1 pv b7b8q
info depth 2 seldepth 2 score cp 1290 nodes 128 nps 42666 time 3 multipv 1 pv b7b8q f5g4 b8b4 g4f5
Best, Joerg.
My bug fix didn't fix it. Would you mind posting your code so I can see the right way to fix it.

Thanks.
User avatar
Eelco de Groot
Posts: 4664
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by Eelco de Groot »

Sure, go ahead Joerg. I don't think there is much difference between the versions? I looked at your Stockfish branch but I did not find any code there yet.

Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
User avatar
hgm
Posts: 28361
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by hgm »

syzygy wrote:Quite a few KBBvKN positions need more than 50 moves for capturing the knight.
More accurately: the vast majority. (After you weed out the positions where the Knight is tactically lost in 1 or 2 ply from the very beginning, because it is hanging, or victim of a skewer etc.) If the engine cannot see the gaining of the Knight within its horizon, it should better assume KBBKN is a draw.

Stockfish seems to be pretty backward in its end-game knowledege. I don't think Fruit 2.1 would make the mis-evaluation of the original post. White has no Pawns, and that fact alone deserves a 50% reduction of its naive evaluation advantage.

Fruit would group this material combination in the 'minor ahead, no Pawns' class. Which is severely discounted (a factor 8?). That the defending side has one or two Pawns doesn't make it any easier, and is only taken into account in the sense that it reduces the naive advantage even before the discount is applied.
Vinvin
Posts: 5290
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by Vinvin »

Joerg Oster wrote:Eelco, you need to test the position with the bishop promotion.

kn5B/8/1K6/8/8/8/8/B7 b - - 0 1

Code: Select all

position fen kn5B/8/1K6/8/8/8/8/B7 b - - 0 1
go depth 10
info depth 1 seldepth 2 score cp 0 nodes 21 nps 21000 time 1 multipv 1 pv b8d7 b6a5
info depth 2 seldepth 3 score cp 0 nodes 67 nps 67000 time 1 multipv 1 pv b8d7 b6a5 d7c5
info depth 3 seldepth 4 score cp 0 nodes 158 nps 79000 time 2 multipv 1 pv b8d7 b6a5 d7c5 h8b2
info depth 4 seldepth 5 score cp 0 nodes 299 nps 99666 time 3 multipv 1 pv b8d7 b6a5 d7c5 h8b2 c5d7
info depth 5 seldepth 6 score cp 0 nodes 471 nps 157000 time 3 multipv 1 pv b8d7 b6a5 d7c5 h8b2 c5d7 b2c1
info depth 6 seldepth 7 score cp 0 nodes 690 nps 172500 time 4 multipv 1 pv b8d7 b6a5 d7c5 h8b2 c5d7 b2c1 d7b8
info depth 7 seldepth 8 score cp 0 nodes 889 nps 177800 time 5 multipv 1 pv b8d7 b6a5 d7c5 h8b2 c5d7 b2c1 d7b8 c1b2
info depth 8 seldepth 9 score cp 0 nodes 1156 nps 192666 time 6 multipv 1 pv b8d7 b6a5 d7c5 h8b2 c5d7 b2c1 d7b8 c1b2 b8d7
info depth 9 seldepth 10 score cp 0 nodes 1526 nps 218000 time 7 multipv 1 pv b8d7 b6a5 d7c5 h8b2 c5d7 b2c1 d7b8 c1b2 b8d7
info depth 10 seldepth 10 score cp 0 nodes 2123 nps 265375 time 8 multipv 1 pv b8d7 b6a5 d7c5 h8b2 c5d7 b2c1 d7b8 c1b2 b8d7
info nodes 2123 time 8
bestmove b8d7 ponder b6a5
If you don't mind, I will do a pull request with my version ...

Edit: Oh, I just realize you did test it, but no draw score :(
Note that the rule have to generalized with "no opposite color bishop"

[d]kn1B1B1B/8/1K6/8/8/8/8/B7 b - -
User avatar
Eelco de Groot
Posts: 4664
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by Eelco de Groot »

Joerg Oster wrote: Edit: Oh, I just realize you did test it, but no draw score :(
I just left in the bonus for trying to capture the Knight, to give the engine something to do. This is more Swindle mode of sorts but it does not really help. With or without the knight there is no way without help from the other side and then even with a knight I think I can't construct a position where it is mate in a corner by a blunder. If trying to capture the knight takes more than 50 moves though in general I think Marco had better move back to Tord's original version that correctly scores a draw with same coloured bishops... Very subtle! I find it a bit cheap of Harm to accuse Stockfish of having poor endgame rules on the basis of one bug found more or less by accident.

Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
User avatar
hgm
Posts: 28361
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by hgm »

Uh? I was not talking about the like-colored-Bishops bug here, but about the KBBKBP evaluation in the original post. I don't think this can be traced to any bug. The white Bishops there are a regular pair. It seems a plain omission.

This seems rather a symptom of a very general problem, namely that it does not know that when KXYKZ is a dead draw, KXYKZP and KXYKZPP is even worse. Have you tried this with KRKBP, KRKNP, KBNKNP, KBNKBP, KRBKRP, KRNKRP, KQBKQP, KQNKQP? Fruit (a 10-year-old engine!) would recognize all these material combinations as heavily drawish.

I don't think there is anything 'cheap' in concluding that an engine completely unaware of such elementary facts has 'poor end-game knowledge'.
Joerg Oster
Posts: 974
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by Joerg Oster »

I'm afraid, Marco will not add code/endgame knowledge with zero practical relevance ... :)
Jörg Oster
Joerg Oster
Posts: 974
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by Joerg Oster »

You are right. Missing knowledge.

OTOH, you must admit SF is doing very well in real game-play without it.
Jörg Oster
User avatar
hgm
Posts: 28361
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by hgm »

Well, I sometimes have my doubts how real 'real' really is. It is rather fashionable nowadays to cull all knowledge out of Chess egines. That entails the risk that they now almost all make the same silly mistakes, so that it doesn't hurt much when you make them yourself too. The term 'incestuous testing' has been coined for this.

It is hard for me to believe that not knowing someting as elementary as that without Pawns being a minor ahead is still a dead draw could not be efficiently exploited by an opponent that does know it. Especially by an opponent that knows you are naive in this respect. Just like Pablo exploits that engines are naive towards closing the position. But if you only test agains opponents that would never sucker you for a draw, you wouldn't see the difference. But could you still call such testing 'real game-play'? We might well be creating our own virtual reality here.

The funny thing with the under-promotion is not only that it doesn't recognize the like Bishops, but that it thinks KBBKN is better than KQBKN in the first place. Even with unlike Bishops KBBKN is almost always a 50-move draw against best defense, while KQBKN of course always wins. Recognizing KBBKN as a 'certified win' seems an example of 'wrong knowledge'.
modolief
Posts: 45
Joined: Tue Apr 30, 2013 6:29 pm

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by modolief »

Would there be some alternative testing track that could uncover these kinds of problems and clean them up? Something like randomly generated positions with playouts? Maybe take a randomly generated position and try a self-playout vs a playout vs another engine. Of course "randomly generated" is an extremely wide net, might be some ways to narrow that down without missing the interesting cases we want to detect.