illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
Ajedrecista
Posts: 2114
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Illogical eval from SF? BBKBPK +- 2.00, then BBKBK = 0.00.

Post by Ajedrecista »

Hello Eelco:

I read your thread at SF Google Group:

https://groups.google.com/forum/?fromgr ... K8AG3GmHF4

You mention SPRT(-6, 0). Just for the record, a 0 Elo change with an a priori drawelo = 270 (typical of LTC at Fishtest), I get 95.1% of chances to pass that test after 10000 simulations, with an average number of games per simulation of 18931 (the median is between 15000 and 16000 games, surely between 15200 and 15300 games; I need to add code to my SPRT simulator but I do not have spare time right now).

For a -3 Bayeselo change (~ -1.73 Elo with drawelo = 270), I get 50.1% of chances to pass that test after 10000 simulations (it should be 50% theoretically, so my simulator seems to work fine), with an average number of games per simulation of 30831 (the median is between 23000 and 24000 games, surely between 23200 and 23300 games).

For a -6 Bayeselo change (~ -3.46 Elo with drawelo = 270), I get 4.73% of chances to pass that test after 10000 simulations, with an average number of games per simulation of 18767 (the median is between 14000 and 15000 games, surely between 14850 and 14950 games). I estimated the medians for each case.

This info is for get an approximate idea of what you can expect. Hope it helps.

Regards from Spain.

Ajedrecista.
phenri
Posts: 284
Joined: Tue Aug 13, 2013 9:44 am

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by phenri »

Mat or Draw?
Please tell me if this game is draw by 3-fold repetition after 87...Qe3+?
Because strange evaluation from stockfish in CB gui. After a blunder (86...Qf3+?) Stockfish refuses to play/eval the move 87...Qe3+

Code: Select all

[SetUp "1"]
[FEN "8/kb6/1p6/p1p1p3/P2b3Q/8/3RK1q1/8 w - - 0 84"]
[PlyCount "18"]

84. Kd1 Qf3+ 85. Ke1 Qe3+ 86. Kd1 Qf3+ ?? (86... Bf3+ !!) 87. Ke1 Qe3+ 88. Kd1
Bf3+ 89. Kc2 Qc3+ 90. Kb1 Qa1+ 91. Kc2 Qb2+ 92. Kd3 Qc3# 0-1
[pgn]
[SetUp "1"]
[FEN "8/kb6/1p6/p1p1p3/P2b3Q/8/3RK1q1/8 w - - 0 84"]
[PlyCount "18"]

84. Kd1 Qf3+ 85. Ke1 Qe3+ 86. Kd1 Qf3+ ?? (86... Bf3+ !!) 87. Ke1 Qe3+ 88. Kd1
Bf3+ 89. Kc2 Qc3+ 90. Kb1 Qa1+ 91. Kc2 Qb2+ 92. Kd3 Qc3# 0-1[/pgn]
Joerg Oster
Posts: 975
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by Joerg Oster »

Hi Louis,

I just pushed the final version, where I also added KBBK.
You find it here: https://github.com/joergoster/Stockfish ... 5f6bfa3ccc
Jörg Oster
Uri Blass
Posts: 10825
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by Uri Blass »

phenri wrote:Mat or Draw?
Please tell me if this game is draw by 3-fold repetition after 87...Qe3+?
Because strange evaluation from stockfish in CB gui. After a blunder (86...Qf3+?) Stockfish refuses to play/eval the move 87...Qe3+

Code: Select all

[SetUp "1"]
[FEN "8/kb6/1p6/p1p1p3/P2b3Q/8/3RK1q1/8 w - - 0 84"]
[PlyCount "18"]

84. Kd1 Qf3+ 85. Ke1 Qe3+ 86. Kd1 Qf3+ ?? (86... Bf3+ !!) 87. Ke1 Qe3+ 88. Kd1
Bf3+ 89. Kc2 Qc3+ 90. Kb1 Qa1+ 91. Kc2 Qb2+ 92. Kd3 Qc3# 0-1
[pgn]
[SetUp "1"]
[FEN "8/kb6/1p6/p1p1p3/P2b3Q/8/3RK1q1/8 w - - 0 84"]
[PlyCount "18"]

84. Kd1 Qf3+ 85. Ke1 Qe3+ 86. Kd1 Qf3+ ?? (86... Bf3+ !!) 87. Ke1 Qe3+ 88. Kd1
Bf3+ 89. Kc2 Qc3+ 90. Kb1 Qa1+ 91. Kc2 Qb2+ 92. Kd3 Qc3# 0-1[/pgn]
The evaluation of stockfish is not strange.
What is strange is your expectation from stockfish to evaluate repetition as non draw.

changing it to evaluate only to evaluate draw as a draw is going to make the code more complex and probably give no elo gain so there is no reason for the stockfish team to do it.
phenri
Posts: 284
Joined: Tue Aug 13, 2013 9:44 am

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by phenri »

Uri Blass wrote: The evaluation of stockfish is not strange.
What is strange is your expectation from stockfish to evaluate repetition as non draw.

changing it to evaluate only to evaluate draw as a draw is going to make the code more complex and probably give no elo gain so there is no reason for the stockfish team to do it.
Uri thank you for your reply.
I think this is a bug in the GUI (CB) which obscured much too early evaluations of moves that are repeated. Indeed it was not draw by repetition.
syzygy
Posts: 5697
Joined: Tue Feb 28, 2012 11:56 pm

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by syzygy »

phenri wrote:I think this is a bug in the GUI (CB) which obscured much too early evaluations of moves that are repeated. Indeed it was not draw by repetition.
Nothing to do with the GUI. SF evaluates first repetitions as draws. This is intentional, so not a bug.
User avatar
hgm
Posts: 28361
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by hgm »

syzygy wrote:This is intentional, so not a bug.
Call it what you like, but that doesn't make it any less wrong.
Ralph Stoesser
Posts: 408
Joined: Sat Mar 06, 2010 9:28 am

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by Ralph Stoesser »

hgm wrote:
syzygy wrote:This is intentional, so not a bug.
Call it what you like, but that doesn't make it any less wrong.
'wrong' sounds similar to 'strong'. Coincidence? ;-)
User avatar
hgm
Posts: 28361
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by hgm »

Well, that requires some better statistics than a single example... Can you also rhyme it to 'design flaw' or 'sabotaged'? :roll:
phenri
Posts: 284
Joined: Tue Aug 13, 2013 9:44 am

Re: illogical eval from SF? BBKBPK +- 2.00 then BBKBK = 0.00

Post by phenri »

Just to illustrate my point ...

Image Image

...do not tell me that you find it normal (even to the point of view of a developer or an end user), because as a result of repeated movement, the engine does not suggest the right move?

This could have implications if interacting with table bases, and they suggest moves that could be repeated but Stockfish (or GUI, I do not know) refuses to show, it becomes problematic for by example a correspondance player or freestyle)