Natural TB

mcostalba · Post by **mcostalba** » Tue Aug 22, 2017 9:48 am

After one year I return at my attempt with Tablebases in Stockfish. My taget is still the same, to make TB to look more natural, without strange scores, hiding mates and unnatural sacrifices...but keeping same ELO level of current implementation.

This year I am trying a much more conservative approach.

Code: Select all

Currently TB are probed just after TT and in case probe succeeds we immediately return with the (odd) TB score. This proposal is to return immediately only in case of lose or draw position, but keep searching at much reduced depth in case of a winning position. Moreover disable further TB probing in the sub-tree below that node.

Some preliminary test shows that this proposal greatly reduces the number of TB probes compared to current implementation.

Sources are here: https://github.com/mcostalba/Stockfish/tree/natural2
And here as a downlodable zip file: https://github.com/mcostalba/Stockfish/ ... tural2.zip

Just before to submit this post I found that most of this idea was already presented by Ronald de Man:

Code: Select all

Ideally, once the engine has determined that it can have at least a TB win, it should start searching beyond the TB probe without doing further TB probes, but in such a way that if that search beyond the TB probe does not return a MATE, the search at the TB position still returns the TB win value.

Original post is here: http://www.talkchess.com/forum/viewtopi ... 31dc11aac4

Differently from Ronald idea, in my proposal we keep searching but at much reduced depth and we don't override the search score with the TB one.

Michel · Post by **Michel** » Tue Aug 22, 2017 12:14 pm

As has been pointed out many times by many people already this particular problem can be fully solved within the context A/B search by regarding TB scores win/loss scores as bounds (like TT upperbound and lowerbound scores).

mcostalba · Post by **mcostalba** » Tue Aug 22, 2017 12:38 pm

Michel wrote:As has been pointed out many times by many people already this particular problem can be fully solved within the context A/B search by regarding TB scores win/loss scores as bounds (like TT upperbound and lowerbound scores).

The point is if you return from a TB probe immediately (no matter with what score) you will never find a mate line.

If you are in a MATE in 5 position, a TB probe returns "win", if you just return failing high and you keep failing high on that node, you will never find the winning line. To find the winning line you have to continue searching, there is no other way.

This was already stated somewhere in this thread's old posts, but admittedly this thread was very noisy and confusing.

Note that to find a mate you need to continue but with TB probe disabled in the sub-tree, otherwise at the child node probing will return a losing score for the opponent that will not continue the search.

hgm · Post by **hgm** » Tue Aug 22, 2017 1:00 pm

That doesn't seem different from hash cutoffs truncating the PV. This is normally solved by not not doing cutoffs in PV nodes.

mcostalba · Post by **mcostalba** » Tue Aug 22, 2017 1:56 pm

hgm wrote:That doesn't seem different from hash cutoffs truncating the PV. This is normally solved by not not doing cutoffs in PV nodes.

There are 2 critical differences.

1. A TT entry is not an insurmountable wall, it is depth dependent, so at next iteration, with increased depth, I can overcome any TT probing step and continue the search.

2. A TT entry has a fixed value, if beta is above that value I can continue, instead if, after a TB probe, I always return beta, no matter of what beta value is, then this very different form the TT case.

Suppose you are in a win position, you search the first PV move and get a score, then search second one and get a fail-high -> new PV move -> research the new first move and get a new score, then you search the second move again, new fail-high and the cycle starts again. I am not able to figure out the outcome, but it is weird...and very different from TT case.

Nordlandia · Post by **Nordlandia** » Tue Aug 22, 2017 2:13 pm

It's not easy to distinguish the difference during game/analysis.

Please provide example.

This game was played between SF NeutralTB and K.

[pgn][Event "i5-5200U | 2-CPU | CuteChess 1"]
[Site "i5-5200U | 2-CPU | CuteChess 1"]
[Date "2017.08.22"]
[Round "?"]
[White "Stockfish 220817 64 BMI2"]
[Black "komodo-11.2.2-64bit"]
[Result "1-0"]
[SetUp "1"]
[FEN "5rk1/p1r2p1p/6p1/3B4/2p1N3/P5P1/4PP1P/R5K1 b - - 0 1"]
[PlyCount "113"]
[EventDate "2017.??.??"]
[TimeControl "900+30"]

1... c3 {-1.12/29 41} 2. Rc1 {1.31/33 58} c2 {-1.03/31 39} 3. Kf1 {1.34/31 6.1s
} Rb8 {-0.95/32 38} 4. Ke1 {1.51/33 51} Kg7 {-1.16/30 56} 5. Nd2 {1.26/30 5.9s}
Rb5 {-1.16/29 40} 6. Bb3 {1.37/31 26} Rc3 {-1.14/30 31} 7. Bxc2 {1.68/27 15}
Rxa3 {-1.24/29 32} 8. Nc4 {1.80/32 56} Rc3 {-1.36/30 62s} 9. Ne3 {1.82/28 9.0s}
Rbc5 {-1.54/29 62s} 10. Kd1 {1.89/33 137s} Ra3 {-1.60/31 171s} 11. Rb1 {
1.97/26 7.8s} Rc7 {-1.57/26 20} 12. Bd3 {2.00/31 66s} Ra4 {-1.57/28 51} 13. f4
{2.00/32 99s} Re7 {-1.71/30 98s} 14. Nc2 {2.10/31 43} Ra5 {-1.73/29 40} 15. Kd2
{2.09/34 61s} Rc7 {-1.77/29 58} 16. Rb2 {2.09/30 61s} Rd7 {-1.76/30 153s} 17.
Rb8 {2.03/35 58} Rad5 {-1.78/28 72s} 18. Rc8 {2.13/35 49} a5 {-1.81/29 56} 19.
Ne3 {2.11/33 116s} Rd8 {-1.89/28 31} 20. Rc6 {2.18/34 54} R5d6 {-1.82/27 17}
21. Rc3 {2.29/32 16} a4 {-1.90/29 36} 22. Ra3 {2.30/35 81s} Ra8 {-1.97/30 65s}
23. Nc4 {2.30/34 79s} Rda6 {-1.96/27 15} 24. e4 {2.30/35 56} Rb8 {-1.99/29 70s}
25. Bc2 {2.29/36 55} Ra7 {-2.10/28 35} 26. Ne3 {2.28/36 184s} Rd8+ {-2.10/30 25
} 27. Kc3 {2.28/37 43} Rc8+ {-2.11/33 60s} 28. Nc4 {2.28/36 45} Rca8 {-2.11/28
23} 29. Nb6 {2.28/37 143s} Rc7+ {-2.11/29 24} 30. Kd2 {2.20/37 33} Ra6 {
-2.10/33 53} 31. Nd5 {2.28/36 27} Rc5 {-2.11/32 31} 32. Ne3 {2.28/35 9.2s} Ra8
{-2.11/34 50} 33. Ke2 {2.28/38 37} Ra7 {-2.11/30 22} 34. h4 {2.20/35 69s} Ra8 {
-2.11/32 33} 35. Kf3 {2.20/40 34} Rca5 {-2.11/33 19} 36. Bd1 {2.28/34 31} f6 {
-2.09/31 17} 37. g4 {2.45/31 30} R8a7 {-2.30/28 66s} 38. g5 {2.67/31 11} R7a6 {
-2.61/31 34} 39. Kg4 {2.75/35 21} h5+ {-2.54/31 57} 40. Kf3 {2.75/36 8.5s} Ra7
{-2.63/33 74s} 41. gxf6+ {3.00/34 23} Kxf6 {-2.80/30 14} 42. Nd5+ {3.15/36 23}
Kg7 {-2.94/28 11} 43. Ke3 {3.41/38 24} R7a6 {-2.94/33 27} 44. Nc3 {4.22/31 18}
Rc6 {-2.94/30 15} 45. Kd4 {3.98/31 9.9s} Rd6+ {-2.95/32 44} 46. Nd5 {4.61/31 21
} Rda6 {-3.00/32 39} 47. e5 {4.85/30 26} Ra8 {-3.36/27 22} 48. Bc2 {5.07/28 14}
Rd8 {-4.42/31 144s} 49. Bb3 {6.18/28 19} Rda8 {-4.81/29 41} 50. Ba2 {6.79/29 26
} Re8 {-4.86/27 13} 51. Rg3 {7.14/29 15} Kh6 {-4.99/27 15} 52. Bb1 {7.83/29 16}
Rg8 {-5.15/28 20} 53. e6 {8.53/29 31} a3 {-5.27/27 22} 54. Ba2 {8.80/27 7.3s}
Ra4+ {-6.51/26 48} 55. Ke5 {9.89/30 35} Re8 {-7.38/27 22} 56. e7 {11.52/29 22}
Ra6 {-8.55/28 46} 57. Re3 {11.68/26 5.8s} Kg7 {adjudication} 1-0

[/pgn]

hgm · Post by **hgm** » Tue Aug 22, 2017 3:04 pm

mcostalba wrote:2. A TT entry has a fixed value, if beta is above that value I can continue, instead if, after a TB probe, I always return beta, no matter of what beta value is, then this very different form the TT case.

Suppose you are in a win position, you search the first PV move and get a score, then search second one and get a fail-high -> new PV move -> research the new first move and get a new score, then you search the second move again, new fail-high and the cycle starts again. I am not able to figure out the outcome, but it is weird...and very different from TT case.

I thought it was already concluded early in this thread that this is completely wrong. You should never return anything above a 'mate-in-infinite' score, or it would prefer conversions to an EGT over a checkmate it can see.

What you describe would never converge. It would keep failing high, even against an aspiration window that runs upto +MATE.

syzygy · Post by **syzygy** » Tue Aug 22, 2017 5:38 pm

mcostalba wrote:After one year I return at my attempt with Tablebases in Stockfish. My taget is still the same, to make TB to look more natural, without strange scores, hiding mates and unnatural sacrifices...but keeping same ELO level of current implementation.

This year I am trying a much more conservative approach.
Code: Select all
Currently TB are probed just after TT and in case probe succeeds we immediately return with the (odd) TB score. This proposal is to return immediately only in case of lose or draw position, but keep searching at much reduced depth in case of a winning position. Moreover disable further TB probing in the sub-tree below that node.

I'm not sure it is helpful to treat TB wins and TB losses differently in the middle of the tree (unless you are talking about win/loss relative to the root position). It should not matter whether a TB position is entered by a capture by the winning side (in which case the TB probe returns a TB loss) or by a (re)capture by the losing side (in which case the TB probe returns a TB win).

If the TB win or TB loss is outside (alpha, beta), it would seem to make sense to immediately return alpha or beta. (Perhaps it is possible to treat a TB win/loss as, say, a score of +10/-10. So alpha or beta (or maybe +10 or -10) would be returned immediately if TB win and beta < 10 or TB loss and alpha > -10.)

What value to return if SF continues to search the subtree but does not find a mate score? Returning the score of the subtree would make SF blind to the TB win or TB loss. Returning a TB win/loss value would make sure that SF does not miss the TB win or loss. If it is desired to prevent all "unnatural play", then perhaps it would work to return the same +10 or -10 value.

Differently from Ronald idea, in my proposal we keep searching but at much reduced depth and we don't override the search score with the TB one.

Not overriding the search score and at the same time reducing the search depth seems like a very bad idea.

Suppose the search reaches a KRPPvKR ending. The TB probe shows that it is winning. Now you let SF search the KRPPvKR subtree at reduced depth, making it highly unlikely that SF will figure out that the position is winning, and then you return the value of that reduced search (probably drawish) even though it is 100% certain that SF would be able to convert the position. How is that a good idea?

In my proposal, SF would continue to search if the TB win (or loss) value is outside the (alpha, beta) window. If the search of the subtree returns a mate score, then that score is returned. If not, then the TB win (or loss) value is returned.

My proposal should eliminate "unnatural play" like an unnecessary queen sac to get into a winning TB position if the search has sufficient time to figure out that avoiding the queen sac also leads to mate. (Arguably, if SF is unable to see that keeping its queen also wins, then saccing the queen is the right thing to do, whether some people might consider the move unnatural or not.)

Nevertheless, if such queen sacs must be removed whatever it takes, then it might work to set the vaue of a TB win/loss to, say, +10/-10. (This might still cause SF to fail to convert a winning position, but it should be rare.)

jhellis3 · Post by **jhellis3** » Tue Aug 22, 2017 6:56 pm

Code: Select all

+                if (    abs(v) <= drawScore
+                    || !ttHit
+                    || (v < -drawScore && ttValue > -VALUE_KNOWN_WIN)
+                    || (v >  drawScore && ttValue <  VALUE_KNOWN_WIN))
+                {
+                    value =  v < -drawScore ? -VALUE_MATE_IN_MAX_PLY + ss->ply + (pos.non_pawn_material(pos.side_to_move()) - pos.non_pawn_material(~pos.side_to_move())) / 256
+                           : v >  drawScore ?  VALUE_MATE_IN_MAX_PLY - ss->ply + (pos.non_pawn_material(pos.side_to_move()) - pos.non_pawn_material(~pos.side_to_move())) / 256
+                                            :  VALUE_DRAW + v * drawScore;
+
+                    tte->save(posKey, value_to_tt(value, ss->ply),
+                              v > drawScore ? BOUND_LOWER : v < -drawScore ? BOUND_UPPER : BOUND_EXACT,
+                              depth, MOVE_NONE, VALUE_NONE, TT.generation());
+
+                    if (abs(v) <= drawScore)
+                        return value;
+                }

That is what I have been using in Matefinder for quite some time now. I agree with Ronald it doesn't make much sense to bias between wins or losses based on side to move.

It does the following:

1) If you already have a correct (winning/losing) score (ttValue) don't access TBs.

2) If you detect a draw, or incorrect/insufficient score, replace with a material and distance biased TB result (small preference for material advantage for the side to move).

3) Save to TT with the correct bounds.

4) Only, but always, return early on draws (since there is no point searching them).

Something else I do in Matefinder is to not stop probing even after we reach TBs in Root. Since you no longer return early for non-draws, you still search, but you get a nice speed up (in depth) due to all the draw cutoffs.

Code: Select all

-    if (RootInTB)
-        Cardinality = 0; // Do not probe tablebases during the search
-
-    else // If DTZ tables are missing, use WDL tables as a fallback
-    {
-        // Filter out moves that do not preserve the draw or the win.
+    if (!RootInTB) // If DTZ tables are missing, use WDL tables as a fallback
         RootInTB = root_probe_wdl(pos, rootMoves, TB::Score);
 
-        // Only probe during search if winning
-        if (RootInTB && TB::Score <= VALUE_DRAW)
-            Cardinality = 0;
-    }
-

jhellis3 · Post by **jhellis3** » Tue Aug 22, 2017 7:20 pm

You also need to change:

Code: Select all

 const Value WDL_to_value[] = {
-   -VALUE_MATE + MAX_PLY + 1,
+   -VALUE_MATE_IN_MAX_PLY + MAX_PLY,
     VALUE_DRAW - 2,
     VALUE_DRAW,
     VALUE_DRAW + 2,
-    VALUE_MATE - MAX_PLY - 1
+    VALUE_MATE_IN_MAX_PLY - MAX_PLY

To take advantage of the move biasing.

Natural TB

Natural TB (take 2)

Re: Natural TB (take 2)

Re: Natural TB (take 2)

Re: Natural TB (take 2)

Re: Natural TB (take 2)

Re: Natural TB (take 2)

Re: Natural TB (take 2)

Re: Natural TB (take 2)

Re: Natural TB (take 2)

Re: Natural TB (take 2)