Stockfish Natural TB loses heavily to Stockfish master

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Adam Hair »

Let me know if Gaviota 5men TBs are not available online. I can upload them if needed.
User avatar
Nordlandia
Posts: 2821
Joined: Fri Sep 25, 2015 9:38 pm
Location: Sortland, Norway

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Nordlandia »

Adam Hair wrote:Let me know if Gaviota 5men TBs are not available online. I can upload them if needed.
Available here ->

http://oics.olympuschess.com/tracker/index.php
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos »

Adam Hair wrote:Let me know if Gaviota 5men TBs are not available online. I can upload them if needed.
Thanks, I downloaded them in a matter of 20 minutes from the site given by Jon in a post before.
IQ
Posts: 162
Joined: Thu Dec 17, 2009 10:46 am

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by IQ »

mcostalba wrote: So I will add an UCI option "Natural TB" by which users can toggle between natural and traditional behavior. This of course is just a placebo knob, but it takes me much less to add it than to convince people otherwise. Of course during real analysis I expect Natural TB to be always enabled because people wants to see good analysis lines, in multi PV, no odd sacrifices and with proper scores to understand the difference between competing PV lines (and of course wants to see a mate if engine finds it): all things that Natural TB does....well...naturally :-)
I applaud your decision to add a UCI option. I still feel your whole concept about what constitutes "good analysis" lines is deeply flawed and skewed by only looking at some pet examples without considering what positions you might break. Positions where trading down and/or sacrificing material to reach won endings is completely natural instead of going for a complicated mate. Your argument about "with reasonable time" it will eventually find the right move is wrong on so many levels that i do not really want to comment. But as long as i can turn it off i am happy.

I do see some value in your efforts though. If you ditch the concept of "naturalness" and concentrate on maximizing play without DTZ using WDL only (or statet diffenrently: minimize the ELO loss if no DTZ are installed), I see that this might be preparation for including the 6piece WDL bases in the SF testing framework (if one considers DTZ as to large). This would enable people to finally throw out all these endgame clutches and heuristics and even more important better tune middle game SF parameters (as there will be less interactions as the WDL bases can deal with a lot of cases). So i hope your project will evolve in that direction.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos »

Michel wrote:Hi Kai,

Do you have Gaviota? I think Peter claimed that Texel combines both Syzygy (DTZ50) and Gaviota (DTM) in a game theoretically correct way. It might be an interesting comparison.
The claim has some empirical grounds.

I checked Texel 1.07a + Syzygy + Gaviota against Houdini 5 + Syzygy + Nalimov.

1000 games
Suite: Easy 5-men positions at the root:
TC: 1s/move

Score of Texel vs Houdini: 500 - 478 - 22 [0.511] 1000
ELO difference: 7.64 +/- 21.30
Finished match

Houdini fails in 22 out of 500 easy 5-men Wins due to 50 moves rule. It is probably significantly more than 12 failures against SF Master Syzygy. Texel, with DTZ50 and Gaviota, seems to trick Houdini (on 5-men only Nalimov DTM kicks in) into cursed Wins. Robert Houdart should be worried about 4.4% failure rate on easy 5-men Wins against Texel. But considering the Wins only, I expected both Houdini and Texel to play optimally. Not so.


Lengths of the 5-men Wins at the root:

Houdini: 478 Wins
Length of a Win:
Mean: 20.41 moves
Median: 18 moves

Texel: 500 Wins
Length of a Win:
Mean: 18.55 moves
Median: 17 moves


The difference in the mean length of the Win from 5-men positions at the root is significant.

The histograms of lengths of the Wins for the two engines are here:

Image

Image



They are similar, and are similar in shape to Stockfish Master but not at all to old "Natural". Observe longer tail of Houdini Wins compared to Texel Wins. Therefore , Texel TB implementation is checked to be the best around, and the claim that it is theoretically correct way may stand.
Jouni
Posts: 3278
Joined: Wed Mar 08, 2006 8:15 pm

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Jouni »

Houdini 5 syzygy implementation is buggy (almost none benefit from 5 piece and negative! from WDL only). Hope version 6 is better.
Jouni
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos »

Laskos wrote:Marco posted a new update to his Natural and a PGN of 100 games, and the first results and stats are very promising, he finally forces DTZ optimal moves, probably achieving a perfect play from root TB positions (will check later on 6-men).
Well, either I have a buggy compile, or the latest commit is itself buggy, but:

From easy 5-men positions at 0.25s/move with 6-men on SSD:

Score of SF_Master vs SF_NTB_DTZ: 530 - 469 - 1 [0.530] 1000
ELO difference: 21.22 +/- 21.56
Finished match

30 loses and 1 draw in 5-men positions at root.

From easy 6-men positions at root at 0.25s/move:

Score of SF_Master vs SF_NTB_DTZ: 611 - 382 - 7 [0.615] 1000
ELO difference: 81.00 +/- 22.04
Finished match

Completely off.
No any time losses in these matches.

Here are 4 lost games of NTB from easy 5-men TB Wins, I don't see some malfunctioning of engines, Cutechess-Cli or failure in my TBs.

Code: Select all

 [Event "?"]
[Site "?"]
[Date "2017.09.08"]
[Round "46"]
[White "SF_NTB_DTZ"]
[Black "SF_Master"]
[Result "0-1"]
[FEN "5k2/7R/6r1/2K5/8/2P5/8/8 w - - 0 1"]
[PlyCount "180"]
[SetUp "1"]
[TimeControl "0.25/move"]

1. Rd7 {+99.00/19 0.25s} Ke8 {-132.79/18 0.25s} 2. Rd1 {+132.70/18 0.25s}
Rg5+ {-132.79/23 0.25s} 3. Kb6 {+132.63/20 0.26s} Rg4 {-132.79/26 0.25s}
4. Kb5 {+132.64/18 0.25s} Rg5+ {-132.79/27 0.25s} 5. Kb4 {+99.00/20 0.25s}
Rg2 {-132.79/27 0.25s} 6. c4 {+132.69/23 0.25s} Rb2+ {-132.79/26 0.25s}
7. Kc5 {+132.70/20 0.25s} Ra2 {-132.79/24 0.25s} 8. Kb6 {+99.00/19 0.25s}
Rb2+ {-132.79/21 0.25s} 9. Kc6 {+132.70/29 0.25s} Kf7 {-132.79/20 0.25s}
10. Re1 {+132.69/19 0.25s} Rd2 {-132.79/19 0.25s} 11. c5 {+132.70/27 0.25s}
Kf6 {-132.79/20 0.25s} 12. Kb6 {+132.70/15 0.25s} Rb2+ {-132.79/21 0.25s}
13. Kc7 {+132.72/16 0.25s} Rb4 {-132.79/20 0.25s} 14. Rd1 {+51.26/17 0.25s}
Ke6 {-132.79/18 0.25s} 15. Rd7 {+132.72/16 0.25s} Rc4 {-132.79/19 0.25s}
16. Rd6+ {+132.78/17 0.26s} Ke7 {-132.79/19 0.25s} 17. Rd7+ {+9.38/17 0.25s}
Ke6 {-132.79/20 0.25s} 18. Rd6+ {+132.78/19 0.25s} Ke7 {-132.79/21 0.25s}
19. c6 {+132.66/20 0.25s} Rc2 {-132.79/19 0.25s} 20. Rd1 {+132.70/19 0.25s}
Rc3 {-132.79/18 0.25s} 21. Kb7 {+132.70/18 0.25s} Rb3+ {-132.79/18 0.25s}
22. Kc8 {+132.66/20 0.25s} Rb4 {-132.79/18 0.25s} 23. Rd3 {+132.71/29 0.25s}
Rb1 {-132.79/16 0.25s} 24. Rd7+ {+132.77/17 0.25s} Ke6 {-132.79/17 0.25s}
25. Rd8 {+132.69/25 0.25s} Rb3 {-132.79/18 0.25s} 26. Rd1 {+132.77/21 0.25s}
Ke7 {-132.79/17 0.25s} 27. Rd7+ {+132.73/25 0.25s} Ke8 {-132.79/17 0.25s}
28. Rd2 {+132.70/17 0.25s} Rb4 {-132.79/18 0.25s} 29. Re2+ {+132.72/23 0.25s}
Kf7 {-132.79/16 0.26s} 30. Kd7 {+132.75/15 0.26s} Rd4+ {-132.79/15 0.25s}
31. Kc7 {+132.71/28 0.25s} Kf6 {-132.79/16 0.25s} 32. Re1 {+132.71/26 0.25s}
Rb4 {-132.79/16 0.25s} 33. Kd6 {+132.70/16 0.25s} Rd4+ {-132.79/16 0.25s}
34. Kc7 {+132.69/24 0.25s} Rc4 {-132.79/18 0.25s} 35. Rd1 {+132.67/19 0.25s}
Ke6 {-132.79/19 0.25s} 36. Kb7 {+132.40/19 0.25s} Rb4+ {-132.79/17 0.25s}
37. Kc8 {+132.76/19 0.25s} Rb2 {-132.79/20 0.25s} 38. Rd3 {+132.77/21 0.25s}
Rb1 {-132.79/16 0.25s} 39. c7 {+132.54/19 0.25s} Ke7 {-132.79/16 0.25s}
40. Re3+ {+132.77/19 0.25s} Kf7 {-132.79/16 0.25s} 41. Ra3 {+132.68/18 0.25s}
Rb2 {-132.79/17 0.25s} 42. Rf3+ {+132.69/18 0.25s} Ke7 {-132.79/20 0.25s}
43. Ra3 {+132.66/19 0.25s} Rb1 {-132.79/17 0.25s} 44. Re3+ {+132.75/14 0.25s}
Kf7 {-132.79/18 0.25s} 45. Rf3+ {+132.75/21 0.25s} Ke7 {-132.79/19 0.25s}
46. Ra3 {+99.00/17 0.25s} Ke8 {-132.79/21 0.25s} 47. Ra2 {+132.61/19 0.25s}
Kf7 {-132.79/18 0.25s} 48. Kd7 {+132.75/16 0.25s} Rd1+ {-132.79/17 0.25s}
49. Kc8 {+99.00/15 0.25s} Rb1 {-132.79/18 0.25s} 50. Rc2 {+132.39/16 0.26s}
Ke7 {-132.79/21 0.25s} 51. Re2+ {+132.72/18 0.25s} Kf7 {-132.79/20 0.25s}
52. Ra2 {+99.00/18 0.26s} Rb4 {-132.79/19 0.25s} 53. Rf2+ {+132.77/17 0.26s}
Ke6 {-132.79/19 0.25s} 54. Re2+ {+132.72/19 0.25s} Kf6 {-132.79/21 0.25s}
55. Re1 {+132.51/17 0.25s} Rb3 {-132.79/16 0.25s} 56. Kd8 {+132.76/15 0.26s}
Rd3+ {-132.79/22 0.25s} 57. Ke8 {+132.65/18 0.25s} Rc3 {-132.79/27 0.25s}
58. Kd7 {+132.71/30 0.25s} Rd3+ {-132.79/28 0.25s} 59. Ke8 {+132.64/18 0.26s}
Rc3 {-132.79/24 0.25s} 60. Kd8 {+132.66/18 0.25s} Rd3+ {-132.79/20 0.25s}
61. Kc8 {+132.64/19 0.25s} Kg5 {-132.79/21 0.25s} 62. Rb1 {+99.00/16 0.25s}
Kf4 {-132.79/25 0.25s} 63. Kb8 {+132.45/16 0.25s} Rc3 {-132.79/27 0.25s}
64. Rb4+ {+11.61/23 0.25s} Ke3 {-132.79/28 0.25s} 65. Rb2 {+132.58/16 0.25s}
Kd4 {-132.79/22 0.25s} 66. Rb4+ {+47.63/26 0.25s} Kc5 {-132.79/19 0.25s}
67. Rb1 {+132.27/17 0.25s} Kd4 {-M52/25 0.25s} 68. Rb4+ {+132.77/19 0.25s}
Kc5 {-M46/26 0.25s} 69. Rb2 {+132.65/21 0.25s} Kd4 {-M42/28 0.25s}
70. Rd2+ {+117.90/33 0.25s} Ke3 {-132.79/16 0.25s} 71. Rd6 {+132.70/24 0.25s}
Rb3+ {-132.79/21 0.25s} 72. Kc8 {+132.65/20 0.25s} Kf4 {-132.79/15 0.25s}
73. Rd1 {+132.70/16 0.25s} Rg3 {-132.79/15 0.25s} 74. Kb7 {+132.65/17 0.25s}
Rg7 {-298.86/13 0.25s} 75. Kb6 {+132.76/17 0.25s} Rg8 {-298.96/17 0.25s}
76. Kb7 {+99.00/17 0.26s} Rg7 {-M62/20 0.25s} 77. Rf1+ {+132.74/18 0.25s}
Ke4 {-M54/21 0.25s} 78. Rg1 {+132.77/16 0.26s} Rxg1 {-132.79/14 0.25s}
79. c8=Q {+5.80/34 0.25s} Rg7+ {-132.79/19 0.25s} 80. Qd7 {+99.00/18 0.25s}
Rxd7+ {+M23/31 0.25s} 81. Kc6 {-132.67/18 0.25s} Rd5 {+M19/35 0.25s}
82. Kb7 {-M22/27 0.25s} Kd4 {+M17/38 0.25s} 83. Kb6 {-M18/32 0.25s}
Rc5 {+M15/42 0.25s} 84. Ka6 {-M14/38 0.25s} Kc4 {+M13/47 0.25s}
85. Kb6 {-M12/46 0.25s} Kb4 {+M11/56 0.25s} 86. Ka7 {-M10/54 0.25s}
Kb5 {+M9/68 0.25s} 87. Kb7 {-M8/82 0.25s} Rc4 {+M7/82 0.25s}
88. Ka7 {-M6/71 0.25s} Kc6 {+M5/88 0.25s} 89. Ka8 {-M4/81 0.25s}
Kc7 {+M3/127 0.005s} 90. Ka7 {-M2/127 0.003s} Ra4# {+M1/127 0.002s, Black mates}
0-1




[Event "?"]
[Site "?"]
[Date "2017.09.08"]
[Round "64"]
[White "SF_NTB_DTZ"]
[Black "SF_Master"]
[Result "0-1"]
[FEN "K7/8/P4k2/8/8/3R4/r7/8 w - - 0 1"]
[PlyCount "46"]
[SetUp "1"]
[TimeControl "0.25/move"]

1. a7 {+10.35/27 0.25s} Kg5 {-132.79/15 0.25s} 2. Rd5+ {+99.00/19 0.25s}
Kf6 {-132.79/12 0.25s} 3. Kb7 {+99.00/17 0.26s} Ke6 {-132.79/19 0.25s}
4. Rd2 {+99.00/17 0.25s} Rxd2 {-132.79/17 0.25s} 5. a8=Q {+5.84/28 0.25s}
Ke5 {-132.79/17 0.25s} 6. Qa1+ {+99.00/14 0.25s} Kf4 {-132.79/19 0.25s}
7. Qc1 {+99.00/19 0.26s} Ke3 {-132.79/24 0.25s} 8. Kc6 {+99.00/21 0.26s}
Ke2 {-132.79/24 0.25s} 9. Qc2 {+99.00/20 0.25s} Rxc2+ {+M39/23 0.25s}
10. Kd7 {-132.61/20 0.25s} Kf3 {+M27/32 0.25s} 11. Ke6 {-M32/23 0.25s}
Ke4 {+M25/35 0.25s} 12. Kd6 {-M26/29 0.25s} Rc3 {+M23/37 0.25s}
13. Ke6 {-M22/32 0.25s} Rd3 {+M21/41 0.25s} 14. Kf6 {-M20/35 0.25s}
Rd6+ {+M19/43 0.25s} 15. Ke7 {-M18/36 0.25s} Ke5 {+M17/45 0.25s}
16. Kf7 {-M16/40 0.25s} Kf5 {+M15/47 0.25s} 17. Ke7 {-M14/44 0.25s}
Rd5 {+M13/51 0.25s} 18. Kf7 {-M12/51 0.25s} Rd7+ {+M11/57 0.25s}
19. Ke8 {-M10/63 0.25s} Ke6 {+M9/67 0.25s} 20. Kf8 {-M8/112 0.25s}
Kf6 {+M7/77 0.25s} 21. Kg8 {-M6/89 0.25s} Rd8+ {+M5/70 0.25s}
22. Kh7 {-M4/127 0.060s} Rc8 {+M3/102 0.25s} 23. Kh6 {-M2/127 0.15s}
Rh8# {+M1/127 0.003s, Black mates} 0-1



[Event "?"]
[Site "?"]
[Date "2017.09.08"]
[Round "90"]
[White "SF_NTB_DTZ"]
[Black "SF_Master"]
[Result "0-1"]
[FEN "8/4K3/8/r7/8/3R4/2P5/2k5 w - - 0 1"]
[PlyCount "106"]
[SetUp "1"]
[TimeControl "0.25/move"]

1. c4 {+132.73/25 0.25s} Kc2 {-132.79/17 0.25s} 2. Rd5 {+99.00/19 0.25s}
Ra6 {-132.79/19 0.25s} 3. Kd7 {+132.74/18 0.25s} Ra7+ {-132.79/19 0.25s}
4. Kd6 {+132.75/32 0.25s} Ra6+ {-132.79/19 0.25s} 5. Kd7 {+132.74/19 0.25s}
Kc3 {-132.79/19 0.25s} 6. c5 {+8.56/35 0.25s} Kc4 {-132.79/18 0.25s}
7. Re5 {+132.68/29 0.25s} Kd4 {-132.79/18 0.25s} 8. Rh5 {+99.00/18 0.25s}
Ra5 {-132.79/19 0.25s} 9. Kd6 {+132.76/18 0.26s} Kc4 {-132.79/18 0.25s}
10. Rf5 {+132.74/33 0.26s} Kd3 {-132.79/18 0.25s} 11. Rf1 {+132.76/19 0.25s}
Ke4 {-132.79/16 0.25s} 12. Re1+ {+132.77/16 0.26s} Kd4 {-132.79/16 0.25s}
13. Rf1 {+132.70/18 0.26s} Ke4 {-132.79/16 0.25s} 14. Re1+ {+132.68/17 0.25s}
Kd4 {-132.79/16 0.25s} 15. Rd1+ {+132.67/19 0.25s} Ke3 {-132.79/16 0.25s}
16. c6 {+132.68/28 0.25s} Ke2 {-132.79/15 0.25s} 17. Rc1 {+57.90/27 0.25s}
Ra6 {-132.79/16 0.25s} 18. Ke7 {+132.75/17 0.25s} Ra7+ {-132.79/20 0.25s}
19. Kd6 {+132.71/29 0.26s} Kd2 {-132.79/16 0.25s} 20. Rc5 {+132.71/16 0.25s}
Ra6 {-132.79/21 0.25s} 21. Kd7 {+99.00/18 0.25s} Ra4 {-132.79/22 0.25s}
22. Rd5+ {+132.74/21 0.25s} Kc3 {-132.79/21 0.25s} 23. Rc5+ {+132.76/20 0.26s}
Kd4 {-132.79/20 0.25s} 24. Kd6 {+132.66/19 0.25s} Ra6 {-132.79/15 0.25s}
25. Rg5 {+132.77/17 0.26s} Ra8 {-132.79/14 0.25s} 26. Rg4+ {+132.78/22 0.25s}
Kc3 {-132.79/19 0.25s} 27. Kd7 {+132.68/16 0.25s} Kd2 {-132.79/23 0.25s}
28. Rg2+ {+132.78/17 0.25s} Kd3 {-132.79/22 0.25s} 29. Rg5 {+132.69/18 0.25s}
Rh8 {-132.79/23 0.25s} 30. Rg3+ {+53.63/19 0.25s} Ke4 {-132.79/19 0.25s}
31. Rg7 {+132.78/17 0.25s} Rh2 {-132.79/14 0.25s} 32. c7 {+132.41/21 0.25s}
Rd2+ {-132.79/23 0.25s} 33. Ke7 {+132.36/16 0.25s} Rc2 {-132.79/27 0.25s}
34. Kd8 {+99.00/16 0.25s} Rc1 {-132.79/28 0.25s} 35. Re7+ {+132.67/20 0.25s}
Kd5 {-132.79/26 0.25s} 36. Rg7 {+132.39/22 0.25s} Ke5 {-132.79/20 0.25s}
37. Re7+ {+132.20/19 0.25s} Kf6 {-M86/19 0.25s} 38. Re1 {+99.00/17 0.25s}
Rxe1 {-132.79/21 0.25s} 39. c8=Q {+5.84/41 0.25s} Rd1+ {-132.79/21 0.25s}
40. Ke8 {+99.00/22 0.25s} Re1+ {-132.79/23 0.25s} 41. Kf8 {+99.00/21 0.25s}
Re6 {-132.79/24 0.25s} 42. Qc3+ {+99.00/22 0.26s} Re5 {-132.79/26 0.25s}
43. Qc6+ {+99.00/24 0.25s} Kf5 {-132.79/25 0.25s} 44. Kf7 {+99.00/20 0.25s}
Kf4 {-132.79/24 0.25s} 45. Qc3 {+99.00/19 0.25s} Re3 {-132.79/23 0.25s}
46. Qe1 {+99.00/28 0.25s} Rxe1 {+M19/36 0.25s} 47. Kg6 {-M16/26 0.25s}
Re6+ {+M13/41 0.25s} 48. Kh7 {-M12/40 0.25s} Kf5 {+M11/49 0.25s}
49. Kg7 {-M10/58 0.25s} Re7+ {+M9/71 0.25s} 50. Kf8 {-M8/86 0.25s}
Kf6 {+M7/118 0.25s} 51. Kg8 {-M6/127 0.043s} Re8+ {+M5/121 0.25s}
52. Kh7 {-M4/84 0.25s} Rd8 {+M3/127 0.10s} 53. Kh6 {-M2/127 0.001s}
Rh8# {+M1/127 0.002s, Black mates} 0-1



[Event "?"]
[Site "?"]
[Date "2017.09.08"]
[Round "106"]
[White "SF_NTB_DTZ"]
[Black "SF_Master"]
[Result "0-1"]
[FEN "8/8/r3PR2/8/1k6/8/1K6/8 w - - 0 1"]
[PlyCount "12"]
[SetUp "1"]
[TimeControl "0.25/move"]

1. e7 {+132.77/14 0.25s} Rxf6 {-132.79/20 0.25s} 2. e8=Q {+5.92/39 0.25s}
Rd6 {-132.79/22 0.25s} 3. Qc6 {+99.00/36 0.25s} Rxc6 {+M7/91 0.25s}
4. Ka2 {-M6/80 0.25s} Kc3 {+M5/109 0.25s} 5. Ka1 {-M4/85 0.25s}
Kc2 {+M3/127 0.003s} 6. Ka2 {-M2/127 0.003s} Ra6# {+M1/127 0.003s, Black mates}
0-1
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by syzygy »

IQ wrote:
mcostalba wrote:So I will add an UCI option "Natural TB" by which users can toggle between natural and traditional behavior. This of course is just a placebo knob, but it takes me much less to add it than to convince people otherwise. Of course during real analysis I expect Natural TB to be always enabled because people wants to see good analysis lines, in multi PV, no odd sacrifices and with proper scores to understand the difference between competing PV lines (and of course wants to see a mate if engine finds it): all things that Natural TB does....well...naturally :-)
I applaud your decision to add a UCI option.
It's gone already (for the moment... I cannot predict what happens next).

Now, when SF has reached a TB position on the board, it probes DTZ to change its search (in ways that are not very intuitive and judging from Kai's results seem to be counterproductive) and if no mate was found it overrides not just the search score but also the search move and plays a DTZ-optimal move (meaning it is likely to sac a piece in a relatively easy 6-piece ending to trade down to a far more difficult to win 5-piece ending).

That it plays DTZ-optimal moves explains that it now at least plays TB endings perfectly. But "unnatural" is the proper term here.

For the sake of completeness, current SF treats this case (root position is in the TBs) as follows:
- first it selects those moves that preserve the win or draw;
- then it performs a regular search on those moves;
- the move played is the move with the highest search score. This move necessarily preserves the win or draw and is as "natural" as SF's regular search can be called natural;
- the score displayed in all PV lines is not the score returned by the search but a score corresponding to the value of the root position as determined by TBs.
I do see some value in your efforts though. If you ditch the concept of "naturalness" and concentrate on maximizing play without DTZ using WDL only (or statet diffenrently: minimize the ELO loss if no DTZ are installed), I see that this might be preparation for including the 6piece WDL bases in the SF testing framework (if one considers DTZ as to large). This would enable people to finally throw out all these endgame clutches and heuristics and even more important better tune middle game SF parameters (as there will be less interactions as the WDL bases can deal with a lot of cases). So i hope your project will evolve in that direction.
If that were the goal, then all that's needed is to ask me for a patch. (I don't necessarily agree that throwing out the endgame code is a good idea, but I won't enter that discussion now.)
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by syzygy »

syzygy wrote:That it plays DTZ-optimal moves explains that it now at least plays TB endings perfectly.
If it had been implemented correctly... It now goes for the quickest sac, winning or losing.
But "unnatural" is the proper term here.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos »

mcostalba wrote:
Laskos wrote: At this time control, SF FNTB fails in 11 conversions out of 1000 Draws against SF master.
Thanks Kay for testing NTB!

Could you please post the pgn of some game where SF fails to keep the draw? This should not happen. Never.

Instead converting the win is another story. Let me clarify.

SF NTB is always able to convert a win, but because of the way it is designed, finding the winning move is not immediate. We are talking of few seconds, not hours. It is very difficult to give a general rule but you can assume that within 1 minute of search it is able to find anything that there is to find in a position (but in the most cases we are talking of just fractions of a second).

Nevertheless when I read "a reasonable 400ms per move" and then in the same line "expect blunder at TCEC" and "I use it for analysis" I understand to make people change their minds is mission impossible.

So I will add an UCI option "Natural TB" by which users can toggle between natural and traditional behavior. This of course is just a placebo knob, but it takes me much less to add it than to convince people otherwise. Of course during real analysis I expect Natural TB to be always enabled because people wants to see good analysis lines, in multi PV, no odd sacrifices and with proper scores to understand the difference between competing PV lines (and of course wants to see a mate if engine finds it): all things that Natural TB does....well...naturally :-)

One last thing. I have further improved the way DTZ is able to steer the engine in finding the winning line. I have done it in a way to preserve all the good properties of Natural TB, but now winning line is found on average in much shorter time.

I have tested it on your 2 reported games with 5-men where SF failed to convert the wins. Now it does.
Marco, your commit "Force DTZ just before sending bestmove" is probably buggy. It plays sometimes suicide chess in TB positions at root. I checked my compile, PGN output, test conditions, varied time controls, etc. It still plays suicide chess at 60s+0.6s time control. What time control is "reasonable" for your NTB to work, now even when forcing DTZ? In 4 out of 100 5-men White Wins at 60s+0.6s time control (LTC at Fishtest), it loses as White. Check the PGN for those 4 Black Wins. I uploaded the PGN here:
http://s000.tinyupload.com/?file_id=343 ... 9656686888

Score of SF_Master vs SF_NTB_DTZ: 104 - 96 - 0 [0.520] 200
ELO difference: 13.90 +/- 48.43
Finished match

With 6-men it gets even worse.

Do you have testers out there? If I am not doing something completely wrong, this SF_NTB_DTZ, as I call it, will not pass even the regression test from 2moves_v1.epd. IIRC the window there was [-4,0], right? It seems the regression is above 10 ELO points even from the inadequate to test endgames 2moves_v1.epd. From my sensitive to TBs endgame suites, the results are simply ridiculous, and are conclusive even after 100-200 games. Here is a partial result from such an endgame suite at 0.25s per move (no time losses):

Finished game 785 (SF_Master vs SF_NTB_DTZ): 0-1 {Black mates}
Score of SF_Master vs SF_NTB_DTZ: 409 - 203 - 173 [0.631] 785
ELO difference: 93.36 +/- 22.01

I stopped it, it's a waste of my and CPU time.

If you are so talibanized about pushing your NTB, at least check it for many possible issues.