Stockfish zero evals

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

DustyMonkey
Posts: 61
Joined: Wed Feb 19, 2014 10:11 pm

Re: Stockfish zero evals

Post by DustyMonkey »

lkaufman wrote: Thanks. Do you know either what version was tested or about how long ago this was done? Also, was it done for any other engine? The 11% value does sound high, but we should really compare it to some other engine. Regarding the centering around a positive value, is this positive for White or positive for the engine? Positive for White would be expected of course, but positive for the engine might suggest that the side to move bonus was too high. I wonder if a high side to move bonus (or stand pat bonus) might cause more draw scores somehow?
I am that bloke on tcec chat that ran these tests.

The specifics were that the games were taken from CCRL 40/40 where both opponents had a reported elo of over 3000, and the games where the opponents differed by more than 50 elo were then also removed.

This left a total of 30295 games. I only considered the positions at moves 15, 30, 45, 60, and 75. Not all games went as far as move 75 (obviously) and I'm not sure off-hand how many actually got to that point (if its important I can look it up - I know that its still a substantial number due to how long it took to generate the evaluations.)

I generated the evals with both Houdini 4 and Stockfish DD using a fixed depth search (UCI's "go depth x".) Houdini 4 was given a depth of 16 and Stockfish DD was given a depth of 18. Houdini 4 was thusly given a bit more time because of this, but it was within 50% of the total overall SFDD evaluation time for the set of games (IIRC, the SFDD run took ~7 hours and the H4 run took ~10 hours.)

I then filtered all the results to only include evals between -100cp and +100cp for the data I am giving here.

With all that said,
Image[/img]

Edit: Image links are broken (perhaps just with the service I use?) so here is the raw link:

https://public.bn1301.livefilestore.com ... png?psid=1
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Stockfish zero evals

Post by Lyudmil Tsvetkov »

2 more games to understand why SF returns so many 0.0 evals. Well, simply it misses a lot of continuations, either because of wrong eval, or because of very shallow (1 or 2 plies deep) tactical mistakes, as in the games below.

[pgn][PlyCount "199"]
[MLNrOfMoves "99"]
[MLFlags "100100"]
[Date "2014.04.13"]
[Round "1"]
[White "Gull 2.8 beta"]
[Black "Stockfish 14041208"]
[Result "1-0"]
[EventDate "2014.??.??"]
[ECO "E61"]
[TimeControl "240+2"]

1. d4 {book} 1... Nf6 {book} 2. c4 {book} 2... g6 {book} 3. Nc3 {book} 3... Bg7
{book} 4. g3 {book} 4... c5 {book} 5. d5 {book} 5... O-O {book} 6. Bg2 {book}
6... d6 {book} 7. Nf3 {book} 7... e6 {book} 8. O-O {book} 8... exd5 {book} 9.
cxd5 {+0.25/16 4.3s} 9... Re8 {-0.16/20 10s} 10. Re1 {+0.17/18 47s} 10... Nbd7
{0.00/21 12s} 11. Bf4 {+0.21/16 0s} 11... Ng4 {+0.06/19 11s} 12. Rc1
{+0.14/17 14s} 12... Nde5 {+0.15/21 15s} 13. Qd2 {+0.11/17 14s} 13... a6
{+0. 10/21 17s} 14. b3 {+0.17/18 15s} 14... b5 {+0.26/21 12s} 15. h3
{+0.09/18 6.1s} 15... Nxf3+ {+0.28/22 7.9s} 16. exf3 {+0.10/19 34s} 16... Ne5
{+0.15/24 8.1s} 17. Ne4 {+0.10/18 0s} 17... f6 {+0.22/23 4.9s} 18. g4
{+0.04/16 6.8s} 18... Nf7 {+0.23/23 5.1s} 19. Ng3 {+0. 15/16 3.1s} 19... a5
{+0.15/21 7.3s} 20. Rxe8+ {+0.22/15 2.7s} 20... Qxe8 {+0.25/24 5.1s} 21. Re1
{+0.24/16 2.8s} 21... Qf8 {+0.23/25 21s} 22. Qc2 {+0.21/16 6.5s} 22... a4
{+0.17/ 25 11s} 23. Bd2 {+0.23/18 6.7s} 23... axb3 {0.00/24 19s} 24. axb3
{+0.30/16 3.2s} 24... b4 {0.00/24 9.1s} 25. f4 {+0.36/17 2.6s} 25... Bh6
{0.00/26 9.2s} 26. Be4 {+0.35/18 6.8s} 26... Bd7 {0.00/23 7.9s} 27. Kh2
{+0.33/18 15s} 27... Bg7 {0.00/23 9.3s} 28. f5 {+0.43/ 15 2.0s} 28... g5
{-0.12/26 38s} 29. Nh5 {+0.38/18 22s} 29... Ne5 {0.00/25 5.4s} 30. f4
{+0.50/18 15s} 30... gxf4 {-0.10/25 5.2s} 31. Nxg7 {+0.50/18 0.91s} 31... Qxg7
{-0.10/25 7. 6s} 32. Bxf4 {+0.50/16 0s} 32... Qe7 {-0.10/25 4.6s} 33. Rg1
{+0.37/17 11s} 33... Nf7 {0. 00/23 8.1s} 34. h4 {+0.37/16 1.00s} 34... h6
{0.00/22 4.6s} 35. Bd3 {+0.41/15 7.5s} 35... Ra3 {0.00/21 4.8s} 36. Bc4
{+0.28/16 22s} 36... Ne5 {0.00/21 7.5s} 37. Qe2 {+0.33/17 4.1s} 37... Qf7
{0.00/22 3.1s} 38. Bxe5 {+0.40/17 4.5s} 38... fxe5 {0.00/23 5.8s} 39. Rf1
{+0.40/19 4.0s} 39... Qf6 {0.00/28 4.5s} 40. Kg3 {+0.36/18 4.0s} 40... Ra8
{0.00/25 4.7s} 41. Bd3 {+0.40/17 2.6s} 41... Bc8 {0.00/25 2.8s} 42. Qe3
{+0.60/16 2.6s} 42... Ra3 {0.00/ 28 2.8s} 43. Rb1 {+0.60/18 4.6s} 43... Ra7
{0.00/27 2.8s} 44. Be4 {+0.60/18 3.4s} 44... Rg7 {-0.21/23 4.4s} 45. Rf1
{+0.60/18 3.8s} 45... Bd7 {-0.41/24 3.9s} 46. Qf2 {+0.68/17 1.9s} 46... Be8
{-0.41/22 4.0s} 47. Rg1 {+0.78/18 1.4s} 47... Kh8 {-0.41/26 5.4s} 48. Qe1
{+0.80/18 4.5s} 48... Bd7 {-0.41/27 4.3s} 49. Kh2 {+0.95/17 4.9s} 49... Kg8
{-0.41/24 4.3s} 50. Qg3 {+0.98/18 6.5s} 50... Qd8 {-0.48/25 5.0s} 51. Kh3
{+0.98/19 6.4s} 51... Qf6 {-0.48/ 24 3.1s} 52. Qf3 {+0.99/20 8.4s} 52... Rf7
{-0.48/26 2.2s} 53. Ra1 {+1.00/20 6.8s} 53... Qd8 {-0.48/27 2.2s} 54. Qe3
{+1.02/20 5.6s} 54... Qf8 {-0.48/28 2.2s} 55. Qd2 {+1.02/ 18 0s} 55... Rf6
{-0.48/26 3.7s} 56. Ra7 {+1.05/18 4.9s} 56... Rf7 {-0.48/27 2.2s} 57. Kg3
{+1.05/19 3.4s} 57... Rg7 {-0.69/26 9.8s} 58. Qf2 {+1.13/18 1.0s} 58... Qd8
{-0.94/23 4.9s} 59. g5 {+1.23/18 2.4s} 59... hxg5 {0.00/24 2.7s} 60. f6
{+1.50/18 1.2s} 60... gxh4+ {0.00/ 25 1.6s} 61. Kh2 {+1.50/19 1.0s} 61... Rg5
{0.00/27 1.7s} 62. Rxd7 {+1.50/18 0s} 62... Qxd7 {-1.50/26 2.4s} 63. Qxh4
{+1.51/20 0.97s} 63... Rg4 {-1.96/27 3.4s} 64. Qh3 {+1.51/ 19 0s} 64... Kf7
{-2.53/27 4.2s} 65. Qh7+ {+1.56/21 6.6s} 65... Ke8 {-2.68/27 2.4s} 66. Qh5+
{+1.56/20 0s} 66... Qf7 {-2.81/27 2.4s} 67. Qxg4 {+1.56/19 0s} 67... Qxf6
{-2.88/28 2. 4s} 68. Qg8+ {+1.59/22 3.6s} 68... Kd7 {-3.06/26 2.4s} 69. Qh7+
{+1.59/21 0s} 69... Ke8 {-2.77/25 2.4s} 70. Bf5 {+1.67/22 1.2s} 70... e4
{-2.70/20 2.2s} 71. Qg6+ {+2.31/17 2. 0s} 71... Ke7 {-2.65/25 2.6s} 72. Bxe4
{+2.37/17 4.0s} 72... Qxg6 {-2.88/25 2.4s} 73. Bxg6 {+2.73/23 1.0s} 73... c4
{-7.55/33 2.4s} 74. bxc4 {+3.05/22 0.83s} 74... b3 {-35.56/35 2.4s} 75. Kg3
{+4.17/24 2.8s} 75... Kf6 {-51.27/40 2.4s} 76. Bd3 {+5.45/24 1.9s} 76... Ke5
{-69.34/36 2.4s} 77. Kf3 {+10.17/25 25s} 77... Kd4 {-117.74/33 2.4s} 78. Ke2
{+10.17/ 23 0s} 78... Kc5 {-M74/31 2.4s} 79. Ke3 {+9.51/19 1.0s} 79... b2
{-M72/30 1.3s} 80. Ke4 {+10.17/19 0.98s} 80... Kb6 {-M70/29 1.6s} 81. Kd4
{+10.25/21 0.84s} 81... Kc7 {-M42/29 4.0s} 82. c5 {+10.33/21 3.8s} 82... Kd7
{-M34/29 1.5s} 83. Bf5+ {+10.49/15 1.3s} 83... Ke7 {-M32/27 2.2s} 84. c6
{+15.16/18 1.1s} 84... Kd8 {-M34/29 2.4s} 85. Kc3 {+17.32/16 0s} 85... b1N+
{-M30/28 3.9s} 86. Bxb1 {+18.04/21 1.2s} 86... Kc7 {-M28/30 1.3s} 87. Kc4
{+19. 96/20 1.5s} 87... Kb6 {-M26/33 2.7s} 88. Bg6 {+18.71/18 1.1s} 88... Ka6
{-M24/32 3.2s} 89. Bf5 {+20.60/22 2.2s} 89... Kb6 {-M26/33 1.3s} 90. Be6
{+327.41/21 2.1s} 90... Kc7 {-M20/35 2.2s} 91. Kb5 {+327.41/20 0s} 91... Kd8
{-M18/37 1.6s} 92. Kb6 {+327.41/19 0s} 92... Ke7 {-M16/39 2.9s} 93. c7
{+327.45/20 6.8s} 93... Kf6 {-M14/40 1.6s} 94. c8Q {+327.45/19 0s} 94... Ke5
{-M12/44 1.7s} 95. Qc4 {+327.45/18 0s} 95... Kf6 {-M10/1 0s} 96. Qf4+
{+327. 53/23 3.0s} 96... Kg6 {-M8/120 0.53s} 97. Qh4 {+327.53/28 4.2s} 97... Kg7
{-M8/1 0s} 98. Qg5+ {+327.55/28 0.16s} 98... Kh7 {-M4/120 0s} 99. Bf7
{+327.57/63 0.078s} 99... Kh8 {-M2/ 1 0s} 100. Qg8# {+327.59/63 0s, White mates}
1-0

[PlyCount "209"]
[MLNrOfMoves "104"]
[MLFlags "100100"]
[Date "2014.04.13"]
[Round "1"]
[White "Gull 2.8 beta"]
[Black "Stockfish 14041208"]
[Result "1-0"]
[EventDate "2014.??.??"]
[ECO "B30"]
[TimeControl "240+2"]

1. e4 {book} 1... c5 {book} 2. Nf3 {book} 2... Nc6 {book} 3. Bb5 {book} 3... e6
{book} 4. c3 {book} 4... d5 {book} 5. Bxc6+ {book} 5... bxc6 {book} 6. Qa4
{book} 6... Nf6 {book} 7. e5 {book} 7... Nd7 {book} 8. d3 {book} 8... Qb6 {book}
9. O-O {-0.19/17 9.3s} 9... Qa6 {+0.28/21 9.3s} 10. Qc2 {-0.04/16 3.6s} 10...
Be7 {+0.20/21 8.8s} 11. Be3 {-0.11/19 16s} 11... O-O {+0.24/22 6.1s} 12. Nbd2
{-0.03/18 3.3s} 12... Qa5 {+0.20/22 18s} 13. d4 {+0.01/18 13s} 13... Qd8
{+0.20/22 8.0s} 14. Rfc1 {+0.08/18 4.9s} 14... a5 {+0.33/24 28s} 15. dxc5
{0. 00/18 11s} 15... Nxc5 {+0.19/25 7.4s} 16. c4 {0.00/17 0s} 16... Na6
{+0.20/27 4.9s} 17. cxd5 {0.00/18 1.9s} 17... cxd5 {+0.12/27 8.9s} 18. a3
{0.00/20 8.0s} 18... Bd7 {+0.39/25 19s} 19. Qd3 {0.00/19 4.9s} 19... Qb8
{+0.46/25 7.7s} 20. Bg5 {-0.05/20 29s} 20... f6 {+0. 24/23 7.1s} 21. exf6
{-0.11/16 9.3s} 21... gxf6 {+0.35/25 8.8s} 22. Bh6 {+0.16/18 9. 5s} 22... Rf7
{+0.34/24 6.6s} 23. Qd4 {0.00/17 12s} 23... Bd6 {+0.29/25 13s} 24. Rc2
{+0. 08/17 0.72s} 24... Qa7 {+0.37/23 16s} 25. Qg4+ {+0.32/17 5.4s} 25... Kh8
{+0.16/25 21s} 26. Qh5 {+0.21/17 9.4s} 26... Ba4 {+0.28/25 4.2s} 27. Rc3
{+0.21/17 1.4s} 27... Rg8 {+0.27/23 6.0s} 28. Rac1 {+0.21/18 4.2s} 28... Be8
{+0.46/25 27s} 29. g3 {0.00/17 9.4s} 29... e5 {+0. 42/22 12s} 30. Nh4
{+0.08/18 5.6s} 30... Bc5 {+0.20/23 16s} 31. Qf3 {0.00/18 5.0s} 31... a4
{0.00/25 24s} 32. R3c2 {+0.36/16 3.7s} 32... Qb6 {0.00/25 5.9s} 33. Qxd5
{+0.44/17 1. 9s} 33... Bxf2+ {0.00/25 2.5s} 34. Kg2 {+0.52/18 4.7s} 34... Bd4
{0.00/26 2.5s} 35. Rc8 {+0.56/16 5.0s} 35... Rd7 {0.00/23 7.0s} 36. Qf3
{+0.70/15 2.8s} 36... Nc5 {0.00/23 4.2s} 37. Rc4 {+0.58/16 7.3s} 37... Bf7
{0.00/22 3.4s} 38. R8xc5 {+0.80/14 0s} 38... Bxc5 {0.00/ 24 3.7s} 39. Rxc5
{+0.80/14 0s} 39... Rd4 {0.00/26 2.2s} 40. Kf1 {+0.80/16 2.2s} 40... Be8
{-0.39/19 3.2s} 41. Qc3 {+0.88/16 3.5s} 41... Bb5+ {-0.33/19 3.6s} 42. Ke1
{+0.86/16 2.8s} 42... Rb8 {-0.71/21 6.8s} 43. Nf5 {+1.25/16 4.7s} 43... Qb7
{-0.76/22 2.9s} 44. Qf3 {+1.25/15 0s} 44... Qxf3 {-0.97/23 2.7s} 45. Nxf3
{+1.25/14 0s} 45... Rd3 {-0.91/25 1.9s} 46. N3h4 {+1.32/17 3.7s} 46... Kg8
{-0.97/24 3.1s} 47. Bd2 {+1.32/17 3.5s} 47... Kf7 {-0. 91/24 3.4s} 48. Ba5
{+1.23/19 22s} 48... Ke6 {-0.71/21 2.8s} 49. Kf2 {+1.29/18 12s} 49... Rd7
{-0.75/23 2.3s} 50. Ne3 {+1.39/16 3.7s} 50... Rd6 {-0.85/22 2.8s} 51. Nhf5
{+1. 56/16 5.8s} 51... Rc6 {-0.99/24 5.6s} 52. Rd5 {+1.50/15 3.8s} 52... Rc2+
{-1.01/21 3.2s} 53. Ke1 {+1.50/15 0s} 53... Re2+ {-1.06/21 2.4s} 54. Kd1
{+1.60/16 3.4s} 54... Rxb2 {-1.18/ 20 2.4s} 55. Bc7 {+1.49/15 2.2s} 55... Rf8
{-2.18/20 2.4s} 56. Nh6 {+1.57/15 1.7s} 56... Rc8 {-2.27/21 2.4s} 57. Rd6+
{+1.55/16 2.6s} 57... Ke7 {-2.27/1 0s} 58. Nd5+ {+1.63/ 17 3.4s} 58... Ke8
{-2.29/23 4.6s} 59. Nxf6+ {+1.51/17 4.6s} 59... Kf8 {-2.03/21 1.9s} 60. Nd5
{+1.51/16 0s} 60... Rb1+ {-1.39/19 3.1s} 61. Kd2 {+1.51/16 0s} 61... Rb2+
{-1.62/19 2. 2s} 62. Kc1 {+1.51/17 1.4s} 62... Rf2 {-1.59/21 2.6s} 63. Rb6
{+1.51/16 0s} 63... Bd7 {-1. 67/20 2.3s} 64. Kd1 {+1.51/15 0s} 64... Rxc7
{-1.69/22 2.2s} 65. Nxc7 {+1.83/15 1.5s} 65... Rxh2 {-1.68/23 1.5s} 66. Rd6
{+1.87/17 2.6s} 66... Ke7 {-1.75/21 3.4s} 67. Ra6 {+2. 11/18 6.7s} 67... Kf8
{-1.82/22 2.7s} 68. g4 {+2.14/17 3.1s} 68... Rg2 {-2.11/20 2.4s} 69. Nf5
{+2.35/17 3.8s} 69... Bxf5 {-2.46/19 2.1s} 70. gxf5 {+2.59/18 4.2s} 70... Rf2
{-2.81/ 21 2.7s} 71. Rf6+ {+2.78/18 2.7s} 71... Kg8 {-2.87/24 2.4s} 72. Ne6
{+3.02/20 4.7s} 72... h6 {-3.34/26 2.4s} 73. Rf8+ {+3.98/20 3.3s} 73... Kh7
{-2.44/1 0s} 74. Ke1 {+4.80/21 9.6s} 74... Rf3 {-3.96/26 3.3s} 75. Rf7+
{+4.80/20 0s} 75... Kg8 {-4.36/28 2.8s} 76. Ke2 {+4.80/19 0s} 76... e4
{-5.06/28 2.4s} 77. Rf8+ {+4.82/21 6.8s} 77... Kh7 {-5.06/1 0s} 78. Nd8
{+4.82/21 0s} 78... Rxa3 {-6.19/27 4.4s} 79. Rf7+ {+4.82/19 0s} 79... Kh8
{-6.70/28 2. 3s} 80. Ra7 {+4.88/20 3.1s} 80... Rf3 {-7.61/29 3.9s} 81. Nf7+
{+4.88/19 0s} 81... Kg8 {-9. 35/27 2.4s} 82. Nxh6+ {+4.88/20 0.33s} 82... Kf8
{-10.92/25 2.3s} 83. Rxa4 {+4.88/19 0s} 83... Kg7 {-13.27/25 2.5s} 84. Rxe4
{+4.88/20 2.1s} 84... Rb3 {-44.22/33 1.8s} 85. Ng4 {+4.88/19 0s} 85... Rb6
{-46.44/34 3.0s} 86. Kf3 {+5.04/21 3.2s} 86... Kf7 {-47.62/32 2.4s} 87. Kf4
{+5.33/17 2.9s} 87... Ra6 {-49.74/33 2.4s} 88. Ne5+ {+7.49/19 56s} 88... Ke7
{-49. 87/36 2.1s} 89. Kg5 {+8.34/15 2.2s} 89... Ra8 {-51.92/35 2.7s} 90. f6+
{+8.34/15 2.6s} 90... Kd6 {-55.41/35 2.4s} 91. Nc4+ {+8.61/18 1.9s} 91... Kc5
{-M21/29 2.4s} 92. f7 {+10. 93/19 2.5s} 92... Rf8 {-298.90/26 2.4s} 93. Ne5
{+327.07/20 26s} 93... Kd6 {-108.20/31 2.4s} 94. Kf5 {+327.07/18 0s} 94... Kc7
{-M45/24 2.4s} 95. Ke6 {+327.07/18 2.6s} 95... Ra8 {-M44/19 2.4s} 96. Nd7
{+327.23/17 2.0s} 96... Ra6+ {-M18/21 2.4s} 97. Ke7 {+327.39/ 17 1.3s} 97... Ra8
{-M16/24 1.9s} 98. f8Q {+327.43/20 2.8s} 98... Rxf8 {-M14/31 1.5s} 99. Nxf8
{+327.47/24 3.1s} 99... Kc6 {-M12/44 1.6s} 100. Nd7 {+327.49/28 2.7s} 100... Kd5
{-M10/ 120 1.6s} 101. Rb4 {+327.51/29 3.8s} 101... Kc6 {-M8/1 0s} 102. Ke6
{+327.53/30 5.9s} 102... Kc7 {-M6/1 0s} 103. Rb6 {+327.55/36 4.9s} 103... Kd8
{-M4/120 0s} 104. Rc6 {+327.57/ 39 2.7s} 104... Ke8 {-M2/1 0s} 105. Rc8#
{+327.59/46 3.9s, White mates} 1-0
[/pgn]

[d]3q2k1/R2b4/3p1P2/2pPp1r1/1p2B2p/1P6/5Q1K/8 w - - 0 62
SF has just played 61...Rg5 with a 0.0 score. It misses 62.Rd7, sacrificing material. Immediately after the move is played, SF sees white is winning. So just a one-ply loser. Too much pruning?

[d]2R3rk/3r1b1p/1q3p1B/2n1p3/p1Rb3N/P4QP1/1P1N2KP/8 w - - 0 38
SF has just played Bf7, with a mysterious 0.0 eval, not that it misees 38.R8c5, again scarificing material, but 40.Kf1, after which it already quickly sees it is losing. Too much pruning again at shallow depths? Maybe pruning more or less automatically when up in material by more than a pawn? This should be dangerous in situations like that when the kings are unsafe. Same situation in the lost game against Komodo in TCEC with the Qd3 move. Probably, when the kings are in danger, when certain conditions are met, prune much less.

Is this SF behaviour fixable?
bnst
Posts: 87
Joined: Tue Sep 11, 2007 12:16 pm

Re: Stockfish zero evals

Post by bnst »

There might be an other explanation.

Stockfish has a bit primitive way of detecting repeated positions. If a positions has occured in the game and Stockfish finds this position during the search it will return a draw score.

Other engines only do this if the position has occured twice in the game, but they also return a draw score if the position occurs twice during the search. This works much better.

The result of the Stockfish way is that if the opponent repeats a move so Stockfish is forced to do the same the evaluation will drop to zero even if it is only the seconds repeat and one of the sides is winning.

Regards
Andreas
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Stockfish zero evals

Post by Michel »

Other engines only do this if the position has occured twice in the game, but they also return a draw score if the position occurs twice during the search. This works much better.
Yes but unfortunately this very likely incurs a tiny elo loss. At least in SF.

This was tested twice. First in two 40K games fixed length matches at STC and LTC where in both cases a non-significant elo loss of the order of 1 elo was recorded.

Then it was tested at SPRT(-4,0) where it failed (I don't remember if it failed at STC or LTC).

The last test shows with at least 95% confidence that the enhanced repetition detection is a regression.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
carldaman
Posts: 2283
Joined: Sat Jun 02, 2012 2:13 am

Re: Stockfish zero evals

Post by carldaman »

Michel wrote:
Other engines only do this if the position has occured twice in the game, but they also return a draw score if the position occurs twice during the search. This works much better.
Yes but unfortunately this very likely incurs a tiny elo loss. At least in SF.

This was tested twice. First in two 40K games fixed length matches at STC and LTC where in both cases a non-significant elo loss of the order of 1 elo was recorded.

Then it was tested at SPRT(-4,0) where it failed (I don't remember if it failed at STC or LTC).

The last test shows with at least 95% confidence that the enhanced repetition detection is a regression.
Wouldn't getting rid of this annoying 0.00 eval bug be worth a tiny ELo loss?
I mean, we don't just use the engine for test matches, but some of us also like to analyze with it.

Secondly, if a tiny Elo loss is unacceptable, how about having an analysis version of Stockfish where this problem is corrected, besides the 'normal' game play SF version where it can keep all its Elo?

If there's a will, there's a way.

Regards,
CL
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Stockfish zero evals

Post by lkaufman »

DustyMonkey wrote:
lkaufman wrote: Thanks. Do you know either what version was tested or about how long ago this was done? Also, was it done for any other engine? The 11% value does sound high, but we should really compare it to some other engine. Regarding the centering around a positive value, is this positive for White or positive for the engine? Positive for White would be expected of course, but positive for the engine might suggest that the side to move bonus was too high. I wonder if a high side to move bonus (or stand pat bonus) might cause more draw scores somehow?
I am that bloke on tcec chat that ran these tests.

The specifics were that the games were taken from CCRL 40/40 where both opponents had a reported elo of over 3000, and the games where the opponents differed by more than 50 elo were then also removed.

This left a total of 30295 games. I only considered the positions at moves 15, 30, 45, 60, and 75. Not all games went as far as move 75 (obviously) and I'm not sure off-hand how many actually got to that point (if its important I can look it up - I know that its still a substantial number due to how long it took to generate the evaluations.)

I generated the evals with both Houdini 4 and Stockfish DD using a fixed depth search (UCI's "go depth x".) Houdini 4 was given a depth of 16 and Stockfish DD was given a depth of 18. Houdini 4 was thusly given a bit more time because of this, but it was within 50% of the total overall SFDD evaluation time for the set of games (IIRC, the SFDD run took ~7 hours and the H4 run took ~10 hours.)

I then filtered all the results to only include evals between -100cp and +100cp for the data I am giving here.

With all that said,
Image[/img]

Edit: Image links are broken (perhaps just with the service I use?) so here is the raw link:

https://public.bn1301.livefilestore.com ... png?psid=1
Thanks. The much higher percentage of zero scores in SF compared to Houdini at all but low move numbers is highly significant. The comments about calling single repetitions a draw are unlikely to explain much of this since engines generally don't repeat when ahead. I think the granularity of the eval was cut in half after SFDD, so that might explain it. If you repeat the experiment with a current SF that would tell us whether the high frequency of zero scores was due to the score rounding or not.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: Stockfish zero evals

Post by syzygy »

carldaman wrote:Secondly, if a tiny Elo loss is unacceptable, how about having an analysis version of Stockfish where this problem is corrected, besides the 'normal' game play SF version where it can keep all its Elo?

If there's a will, there's a way.
There is certainly a way. Just apply the old patch to the current source and compile.
bnst
Posts: 87
Joined: Tue Sep 11, 2007 12:16 pm

Re: Stockfish zero evals

Post by bnst »

Thanks for a good explanation. I always wondered about why Stockfish did this. You can't argue with solid data :)

Regards
Andreas