Stockfish Natural TB loses heavily to Stockfish master

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
gbtami
Posts: 389
Joined: Wed Sep 26, 2012 1:29 pm
Location: Hungary

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by gbtami »

BBauer wrote: The game of chess is not "natural". Think about studies or moves like Bu's last move against Carlsen with his rook. This move was not "natural".
Kind regards
Bernhard
That move was the icing on the cake :)
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos »

Laskos wrote:
Laskos wrote:
syzygy wrote:Ok, understood, might test it soon on one of my suites, I have to see which one is more sensitive. ELO-wise it was tested on non-TB as neutral, but was not tested on non-TB as a mate resolver compared to Master (both using 6-men).
Tested, seems an excellent patch. Going to bed now. I took easy 6-men Wins, but only 5-men Syzygy, I guessed correctly that this combination will give sensible results. And it gave.

1000 games from this 6-men Wins with 5-men TBs at 0.25s per move (no time losses):

Score of SF_Master vs SF_Early_Mate2: 477 - 483 - 40 [0.497] 1000
ELO difference: -2.08 +/- 21.09
Finished match

ELO difference: -2.08 +/- 3.31 (pentanomial error margins)

Won games length:

Stockfish Master Wins:
Mean game length: 33.80 moves
Median game length: 30 moves

Stockfish Early_Mate2 Wins:
Mean game length: 25.28 moves
Median game length: 22 moves


Much shorter paths to Win for Early_Mate2 compared to Master.

Total Mates resolved from the start to the end of the games:
Stockfish Master: 4062
Stockfish Early_Mate2: 5090


Mates resolved in the first 5 moves of the game:
Stockfish Master: 10
Stockfish Early_Mate2: 48


Conclusive and very good results. Taking also into account that the patch is ELO-neutral. Must be committed.
Tested this morning against SF_No_TB too. The same easy 6-men Wins, but only 5-men Syzygy for Early_Mate2.

1000 games at 0.25s per move (no time losses):

Score of SF_Early_Mate2 vs SF_Master_No_TB: 477 - 433 - 90 [0.522] 1000
ELO difference: 15.30 +/- 20.55
Finished match

ELO difference: 15.30 +/- 5.25 (correct pentanomial error margins)

So, SF_No_TB loses ELO points, and this suite is not that adapted to check for ELO differences, they are easy 6-men Wins, I need many Wins from both sides. With other sensitive endgame suites I can get in excess of 50 ELO points between SF with Syzygy-5 enabled and SF_No_TB. But not ELO was the purpose.


Won games length:

Stockfish_Early_Mate2 Wins:
Mean game length: 35.69 moves
Median game length: 33 moves

Stockfish_No_TB Wins:
Mean game length: 28.23 moves
Median game length: 26 moves


Much shorter paths to Win for SF_No_TB compared to Early_Mate2.

Total Mates resolved from the start to the end of the games:
Stockfish_Early_Mate2: 6231
Stockfish_No_TB: 6564


Mates resolved in the first 5 moves of the game:
Stockfish_Early_Mate2: 46
Stockfish_No_TB: 64


When it wins, SF_No_TB finds significantly shorter paths to Win than Early_Mate2 (and Early_Mate2 significantly shorter than Master) from non-TB positions. Mates resolutions are slightly earlier with SF_No_TB than with Early_Mate2 (and Early_Mate2 significantly earlier than Master) from non-TB positions.

It's a bit amazing how good these engines like Stockfish are, left alone without TBs. When they actually convert a Win, they do it nicely. The problem is they often don't convert without TBs. Here is the result in 1000 games at 0.25s/move of SF_Early_Mate2 enabled with 5-men against SF_Master without TBs, from hard 6-men Wins suite:

Score of SF_Early_Mate2 vs SF_Master_No_TB: 445 - 310 - 245 [0.568] 1000
ELO difference: 47.19 +/- 18.82
Finished match

ELO difference: 47.19 +/- 7.12 (correct pentanomial error margins)

SF EM2 5-men failed to convert 55 out of 500 hard 6-men Wins
SF_No_TB failed to convert 190 out of 500 hard 6-men Wins
The difference is huge.

I would be nice to have both worlds: (surprising) nicety of engine's play alone combined with a theoretically sound TB implementation. Early_Mate2 seems to approach this goal.
syzygy
Posts: 5563
Joined: Tue Feb 28, 2012 11:56 pm

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by syzygy »

hgm wrote:
syzygy wrote:Finding a more favorable conversion would require scoring TB win positions depending on, in particular, their material balance.
Well, that is quite trivial to do, and I don't see any possible downside.
In the context of Stockfish, the major problem with it is that it adds at least one line of code. That means it's an uphill fight of Himalayan proportions.
For that you don't even need to search "behind" the conversion into the TBs.
I am not sure of that, as sometimes you might need a simplification or even a gain by forcing the opponent to trade. So that initially the material combination gets worse, but in the first move of that EGT you immediately convert back to a more favorable one.
I agree the two approaches combine to produce an additional benefit.
It seems the two approaches are essentially independent of each other.
They seem the same to me. TB wins are mate scores with a long, undetermined DTM. When you find a very distant mate by finding a way to force a winning conversion, you just search on to see if you can find a closer mate. That can be an actual mate within the search horizon, or a closer undetermined mate by forcing conversion to a better (i.e. closer) TB win.
Well, one idea is the deeper searching, another idea is making a distinction between TB win positions by adding a small score based on material. You could do one without the other. But I agree it makes sense to combine.

To combine the two, I guess the lower bound for a TB win (against which beta is compared for an immediate cutoff) should be set to the TB win score for the "worst" material combination (or lower). Once the search window extends to TB win scores, the search of a TB node will return the TB-win search score returned by searching the subtree, if a TB-win score is returned, and otherwise the TB-win score corresponding to the current node.

I wonder a bit how this interacts with the adjustment by the distance to the root, but I think there are no problems there...
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Michel »

Kai wrote:I would be nice to have both worlds: (surprising) nicety of engine's play alone combined with a theoretically sound TB implementation. Early_Mate2 seems to approach this goal.
Yes early_mate nicely solves the mate blindness of master (which is a cosmetic issue but which may annoy users).

It would be really nice though to understand precisely why no-TB seems to find mates faster than TB, if it can find them (with current implementations of search and TB probing).

It means that if an oracle (e.g. the user doing analysis) tells an engine that it should look for a mate, then the engine's best strategy is to ignore TB's!!

As a corollary it should not take TB-cutoffs when searching at lower depths!! This could mean that the information gathered by not taking TB-cutoffs (through the TT and history mechanisms) pays off later when searching for the mate at higher depths.

Now it is almost 100% certain that TB cutoffs are strongly beneficial when the engine cannot find a mate (TB's must get their elo somewhere!).

So the aim seems to be to have one's cake and eat it too. It is not clear if it is possible but it doesn't hurt to try. There is an interesting analogy with looking for tactics. If the engine is told to specifically look for tactics, it can find them faster by being less aggressive about reductions and pruning. This is the principle behind Houdini Tactical mode. However HT is weaker elowise than regular Houdini.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos »

Michel wrote:
Kai wrote:I would be nice to have both worlds: (surprising) nicety of engine's play alone combined with a theoretically sound TB implementation. Early_Mate2 seems to approach this goal.
Yes early_mate nicely solves the mate blindness of master (which is a cosmetic issue but which may annoy users).

It would be really nice though to understand precisely why no-TB seems to find mates faster than TB, if it can find them (with current implementations of search and TB probing).

It means that if an oracle (e.g. the user doing analysis) tells an engine that it should look for a mate, then the engine's best strategy is to ignore TB's!!

As a corollary it should not take TB-cutoffs when searching at lower depths!! This could mean that the information gathered by not taking TB-cutoffs (through the TT and history mechanisms) pays off later when searching for the mate at higher depths.

Now it is almost 100% certain that TB cutoffs are strongly beneficial when the engine cannot find a mate (TB's must get their elo somewhere!).

So the aim seems to be to have one's cake and eat it too. It is not clear if it is possible but it doesn't hurt to try. There is an interesting analogy with looking for tactics. If the engine is told to specifically look for tactics, it can find them faster by being less aggressive about reductions and pruning. This is the principle behind Houdini Tactical mode. However HT is weaker elowise than regular Houdini.
Is it not possible this analogy with Houdini Tactical be a promising one for shortest paths to Win from non-TB positions? Houdini Tactical would be even better in what it is meant for, if ELO was not a consideration. But in our case, after finding a TB Win, we can safely switch to a completely another engine, to a MateFinder, disregarding any ELO, and if it doesn't find a Mate, play according to DTZ. There would be no ELO loss. I am not sure that finding longer or shorter Mates is the best way to find the shortest path to Win, but it is plausible that this way is shortening the path. And is surely resolving more mates.

Today I saw some pretty amazing results with Texel on non-TB positions, Peter is up to something. This seems completely different approach from what we are discussing, but it works.

In short, from easy 6-men Wins, using only 5-men TBs (so non-TB root positions), I have the following global results results (each line is significantly better than the next line)


If the engine wins:

The shortest path to Win:

1. Texel approach (no ELO loss) // 1. TB-less (important ELO loss)
2.
3. early_mate2 (no ELO loss)
4. master (no ELO loss, baseline)


The number of resolved Mates:

1. Texel approach (no ELO loss)
2. TB-less (important ELO loss)
3. early_mate2 (no ELO loss)
4. master (no ELO loss, baseline)


================
The test with Texel.

To check for ELO loss, say some bug in implementation on non-TB positions (on TB positions it plays perfectly and optimally) or that reading Gaviota from HDD takes time:

Texel_Syzygy5_Gaviota5 versus Texel_Syzygy5 (master) on very sensitive to 5-men TBs 6-men hard Wins:

1000 games at 1s/move:

Score of Texel_Szygy5_Gaviota5 vs Texel_Syzygy5: 427 - 426 - 147 [0.501] 1000
ELO difference: 0.35 +/- 19.88
Finished match

Pentanomial error margins are some 6 times smaller (+/-3 or so), and it's pretty clear that there is no any ELO loss even on this sensitive suite.
Also, both the path to Win was much shorter and the number of resolved mates was much higher for Texel_Syzygy5_Gaviota5 compared to that of "master" Texel_Syzgyg5.


But the interesting comparison was with our champion on these numbers when it actually finds a Win: TB-less engine.

Suite: easy 6-men easy Wins (on the hard one the TB-less engine is so handicapped ELO-wise, that this can affect the statistic)

1000 games at 1s/move:

Score of Texel_Syzygy5_Gaviota5 vs Texel_No_TB: 484 - 386 - 130 [0.549] 1000
ELO difference: 34.16 +/- 20.16
Finished match

Won games length:

Texel_Syzyg5_Gaviota5 Wins:
Mean game length: 27.83 moves
Median game length: 25 moves

Texel TB-less Wins:
Mean game length: 27.99 moves
Median game length: 25 moves


Texel_Syzygy5_Gaviota_5 path to Win is similar or even shorter than that of Texel TB-less.

Total Mates resolved from the start to the end of the games are very much higher in number for Texel_Syzygy5_Gaviota5 over Texel TB-less.

And these results are not due only to the phase of the games where only 5-men remain (TB root positions). The length of the games of Texel_Syzygy5_Gaviota5 is similar to TB-less on non-TB root positions too. The number of resolved mates is even ridiculously higher for Texel_Syzygy_Gaviota in early phase of the game from 6-men easy Wins. For example from the first move it plays, Texel_Syzygy5_Gaviota5 resolves 136/1000 of 6-men easy Wins opening positions as Mates. Texel TB-less 0/1000. In fact Texel_Syzygy_Gaviota can be used on close to TB root positions as MateFinder. I tried ChestUCI specialized MateFinder on some 100 of these 6-men positions, and none were resolved. SF TB-less and early_mate do resolve some of them.
User avatar
Nordlandia
Posts: 2821
Joined: Fri Sep 25, 2015 9:38 pm
Location: Sortland, Norway

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Nordlandia »

Kai Laskos: Do you like the idea of stockfish with Gaviota egtb support?

I.e an custom stockfish version with additional DTM egtb support for endgame analysis purposes.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos »

Nordlandia wrote:Kai Laskos: Do you like the idea of stockfish with Gaviota egtb support?

I.e an custom stockfish version with additional DTM egtb support for endgame analysis purposes.
Yes, with Texel approach, it seems that Stockfish + Syzygy 6-men (~150GB) + Gaviota 6-men (~1TB) would be an endgame analysis monster, at least up to 7-8 men.

I have to check whether the Texel approach with Syzygy-5 + Gaviota-5 is very efficient for 7-8-9 men endgames, maybe 6-men openings were too close to the TBs. I have 7-8-9 men endgame unbalanced suites, somewhat favoring Wins over Draws, but difficult enough for TB-less engines to fail often. So, they are sensitive to TBs, but they are not proven Wins or Draws. The suites I used previously were theoretical Wins. When I will have time, I will check for Texel approach performance.
User avatar
Nordlandia
Posts: 2821
Joined: Fri Sep 25, 2015 9:38 pm
Location: Sortland, Norway

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Nordlandia »

Laskos wrote:Yes, with Texel approach, it seems that Stockfish + Syzygy 6-men (~150GB) + Gaviota 6-men (~1TB) would be an endgame analysis monster, at least up to 7-8 men.
If stockfish probes Syzygy and Gaviota simultaneously "early mate" approach is the optimal path. The only downside of Gaviota format is that 6-men bases don't exist yet, and is unlikely to be generated soon, unless Miguel Ballicora makes it possible.

6-men Gaviota is presumed to be allocate 1 terabyte of space.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos »

Nordlandia wrote:
Laskos wrote:Yes, with Texel approach, it seems that Stockfish + Syzygy 6-men (~150GB) + Gaviota 6-men (~1TB) would be an endgame analysis monster, at least up to 7-8 men.
If stockfish probes Syzygy and Gaviota simultaneously "early mate" approach is the optimal path. The only downside of Gaviota format is that 6-men bases don't exist yet, and is unlikely to be generated soon, unless Miguel Ballicora makes it possible.

6-men Gaviota is presumed to be allocate 1 terabyte of space.
Texel approach is much more than early_mate and probably must include it in some equivalent way as a small part.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish Natural TB loses heavily to Stockfish master

Post by Laskos »

I tested Texel with many more pieces on the board than TB 5-men I used (Syzygy + DTM), and got the same exceptional for Texel results. The of "quality of implementations" list stands the same:

If the engine wins:

The shortest path to Win:

1. Texel approach (no ELO loss) // 1. TB-less (important ELO loss)
2.
3. early_mate2 (no ELO loss)
4. master (no ELO loss, baseline)


The number of resolved Mates:

1. Texel approach (no ELO loss)
2. TB-less (important ELO loss)
3. early_mate2 (no ELO loss)
4. master (no ELO loss, baseline)


"Master" here is in reference to Stockfish current master, not Texel's one (Texel with Syzygy but without Gaviota), even that "Texel master" is better than "Stockfish master" (but worse than Syzygy + Gaviota enabled Texel).

The Texel approach is explained here in more than a year old thread:
Peter Österlund
How texel probes endgame tablebases

I don't know how this wreck of an implementation called "Natural", which to these days (last time I checked) would come last in all these stats and will even lose ELO-wise to master (but is probably playing "naturally"), got several very long threads (including mine) with dozens of times more posts and views than Peter's thread, who single-handedly seems to have implemented a state of the art DTM50-like play (remarkable to me, and I read his thread for the first time today). I am not a chess engine programmer, and generally, even in my speciality, I am a weak programmer. I can't understand many of the subtler issues there. But here, on CCC, there are good chess engine developers. With Stockfish, things seem pretty hopeless for the time being with their "Natural", they seem now to decipher how to not lose ELO-points to master, never mind all these rankings above and subtleties of ELO-parity implementations. But I hope that, for example, Komodo and Houdini authors will come to realize the importance of this achievement at least for analysis. And it would be fun to have a 350 ELO points stronger engine enabled with Texel approach + Syzygy 6-men on SSD + DTM 6-men on HDD analysing maybe 10-12 men complicated endgames and often giving improving with time Mate scores. Also, users with powerful machines will be very favored (not me :) ). These days a 128GB SSD (for WDL) and a 2TB HDD can be found under $80 each, and some chess engine nerds maybe won't shy away.
===========================

Test with Texel:

I used 14-men openings (1000+ of them) of this variant of endgames:
"Peasant Revolt" Endgame Chess

You can read there what (nice) properties they have. I used this set to be far away from artifacts of 5-men perfect and optimal play of Texel when using 5-men TBs, and, at the same time, to keep sensitivity to TBs (5-men Syzygy + 5-men Gaviota used).

1000 games at 1s/move:

Score of Texel_Syzygy5_Gaviota5 vs Texel_No_TB: 213 - 115 - 672 [0.549] 1000
ELO difference: 34.16 +/- 12.26
Finished match

Won games length:

Texel_Syzyg5_Gaviota5 Wins:
Mean game length: 43.68 moves
Median game length: 42 moves

Texel_No_TB:
Mean game length: 43.31 moves
Median game length: 43 moves


Texel_Syzygy5_Gaviota5 length of path to Win is very similar to Texel TB-less.
Which is remarkable. We will see that when we compare that to the best Syzygy implementation for Stockfish, early_mate2.

Total Mates resolved from the start to the end of the games are almost twice as much in number for Texel_Syzygy5_Gaviota5 over Texel TB-less.
Which is again remarkable. And this has nothing to do with solely perfect Mate-wise play on 5-men at the root. During the first 20 moves of the games, where almost all positions were far away from 5-men, the ratio of resolved Mates in favor of Texel_Syzygy5_Gaviota5 over Texel TB_No_TB was 7-8.


Compare that to the best Syzygy implementation for Stockfish as of now, early_mate2:

1000 games at 0.25s/move:

Score of SF_Early_Mate2 vs SF_No_TB: 284 - 154 - 562 [0.565] 1000
ELO difference: 45.42 +/- 14.20
Finished match

Won games length:

SF_Early_Mate2 Wins:
Mean game length: 54.48 moves
Median game length: 50 moves

SF_No_TB Wins:
Mean game length: 42.37 moves
Median game length: 41 moves


SF_Early_Mate2 length of path to Win is significantly higher than that of SF TB-less.

Total Mates resolved from the start to the end of the games are about 1.3 times less for SF_Early_Mate2 compared to those of SF TB-less.

And take into account that early_mate2 is much better than the current master of Syzygy implementation for SF on all these stats.