8man Endgame Tablebases ELO gain?

dangi12012 · Post by **dangi12012** » Sun Nov 07, 2021 10:57 pm

Sesse wrote: ↑Sun Nov 07, 2021 9:28 pm I don't think you can make that conclusion.

The typical case is a drawn endgame with one or two pawns up. Stockfish, even with NNUE, will show a clear positive score; with tablebases, you'll get the appropriate message of a TB draw. Will no-TB Stockfish still be able to hold the draw? Sure! Would TB have been able to get into a non-drawn endgame? Probably not. (So there's no Elo to be had from TBs here, even though the evaluation is clearly wrong. The second simply does not follow from the first, because all that matters for Elo is whether there is a winning move and it has a higher score, not the absolute values of the evaluations.) Does this prove that Stockfish already has the required knowledge? Again, it depends on what your goal is.

Stockfish vs Stockfish - its a position with 12 pieces. Engine A has 7 man EGTB. A makes a move that forces a draw - and B thinks its a blunder and agrees. Only that after the taking we are left with 7 pieces and B thinks its +0.5. Only A knows this was winning and now can convert to victory.

There are Lots and Lots of 7 man positions stockfish 14 needs hours to see the end because depth to conversation might be 40+ and depth to mate 120 or so.

Maybe with 8 man it becomes 200 elo because there are infinitely more positions that look good even for nnue to not see the true evaluation.

dangi12012 · Post by **dangi12012** » Mon Nov 08, 2021 3:26 pm

For example modern stockfish might go down this position thinking there is a slight advantage:
[fen]8/8/1n6/8/8/5k1p/7P/6K1 w - - 0 1[/fen]

it will evaluate to 0.7 when in reality its 0.0 and tablebase will enable SF to see that many plies away from this position.
Without tablebase SF is blind to the true evaluation.

I dont understand how this can be worth so little ELO. (+15)

hgm · Post by **hgm** » Mon Nov 08, 2021 4:44 pm

Because it 'never happens'.

Note that 15 Elo corresponds to a 2% improvement of the score. To get that improvement through half-points you would need a better result in 4% of all games. The chances that you hit a pathological draw with a large apparent advantage like this are far smaller.

Besides, a score of 0.7 is a draw score. Stockfish would never prefer this position over one where it is a healthy Pawn ahead (e.g. KNPPKNP).The draw margin in the late end-game is larger than 1 Pawn (except in pure Pawn endings, which deserve to get the normal eval score doubled). So it only gets into positions like this when the alternatives are even more or just as drawish. Avoiding this position by scoring it as 0 would almost never bring it an extra half point, even if it happened in 100% of the games.

So it seems Stockfish knows very well that this is a draw. Why else wouldn't it evaluate it as +3.5? It is a common misconception that theoretical draws should be scored as 0.0. Doing so actually costs Elo, as the engine would give up any attempt for doing strong moves, even in cases where it is very hard for the opponent to actually hold the draw. In KBPPPKB with unlike Bishops the engine would probably sacrifice his Bishop and three Pawns in the first 4 moves, thinking this is optimal play (as there was no line that scored better than 0.0, and KKB is still a draw).

yurikvelo · Post by **yurikvelo** » Mon Nov 08, 2021 6:24 pm

dangi12012 wrote: ↑Mon Nov 08, 2021 3:26 pm Without tablebase SF is blind to the true evaluation.

true evaluation doesn't guarantee better moves and better game outcome.
It's quite possible, that with Syzygy probing, this game will end also draw. True 0.00 eval will arrive much earlier and cli-cutechess adjudicate it as draw much earlier.

dangi12012 wrote: ↑Mon Nov 08, 2021 3:26 pm I dont understand how this can be worth so little ELO. (+15)

1) Such position are rare itself
2) Even if they occur there was no guarantee you can find better PV earlier only due to TBHits. There is high probability you will get 0.00 earlier thanks to massive TBHits.
3) at time controls like LTC (1' + 1" @ 1CPU) games are typically lost long before TBHits.
Run couple hundred games of SF vs SF Syzygy (RepeatStartPosition=1). Choose decisive openings (not counting == and 10 game pairs, only 1= or 11).
In my test (1' + 1" @ 4CPU = 4x times as LTC Fishtest) I have 0 Elo difference and such losses are evenly spread between Syzygy and no-Syzygy.
When I deep analyze lost game - decisive blunder is made long before TBHits come at stage.

At least at this time controls, game are decided before TBHits come into game.

dangi12012 · Post by **dangi12012** » Mon Nov 08, 2021 10:26 pm

yurikvelo wrote: ↑Mon Nov 08, 2021 6:24 pm
dangi12012 wrote: ↑Mon Nov 08, 2021 3:26 pm Without tablebase SF is blind to the true evaluation.
true evaluation doesn't guarantee better moves and better game outcome.
It's quite possible, that with Syzygy probing, this game will end also draw. True 0.00 eval will arrive much earlier and cli-cutechess adjudicate it as draw much earlier.

dangi12012 wrote: ↑Mon Nov 08, 2021 3:26 pm I dont understand how this can be worth so little ELO. (+15)
1) Such position are rare itself
2) Even if they occur there was no guarantee you can find better PV earlier only due to TBHits. There is high probability you will get 0.00 earlier thanks to massive TBHits.
3) at time controls like LTC (1' + 1" @ 1CPU) games are typically lost long before TBHits.
Run couple hundred games of SF vs SF Syzygy (RepeatStartPosition=1). Choose decisive openings (not counting == and 10 game pairs, only 1= or 11).
In my test (1' + 1" @ 4CPU = 4x times as LTC Fishtest) I have 0 Elo difference and such losses are evenly spread between Syzygy and no-Syzygy.
When I deep analyze lost game - decisive blunder is made long before TBHits come at stage.

At least at this time controls, game are decided before TBHits come into game.

Yes I think that is the point. I saw in fishtest that Tablebases give +30 Elo when starting from a late midgame position. So its really that they give 0 in the beginning and 30 later on and then 0 again. So in summary its not worth much as it seems.

Technically and Economically its feasible to generate 8 man tb today - but is it worth it?

My question remains: Could it be that 8 man, 9 man would be hyperexponential? Since it is very clear that when we reach 32 man tablebase that chess is solved perfectly (however much elo that is). So a tablebase ELO cannot be a linear model with 7/20 Elo * 32.

Jouni · Post by **Jouni** » Tue Nov 09, 2021 9:31 am

Nice, that Vondele can confirm my result! So 50 MB NNUE data is better than 150 GB syzygy data.

Ajedrecista · Post by **Ajedrecista** » Tue Nov 09, 2021 11:46 am

Hello Daniel:

dangi12012 wrote: ↑Sat Nov 06, 2021 10:25 pm[...]

Multiple questions:
a) Could it be that the storage size increase of around 110x per man added does not justify endgame tablebases at all?
b) Is 1.7PB an accurate estimate?
c) Does someone have a fishtest estimate for 7 man egtb?
d) Could it be that this looks bad only on first look and that the ELO increase could explode with more added pieces?

Please answer including the letter.

I did a quick estimate about the size of the full set of 8-man EGTB some months ago. It can be read in the following thread:

When will 8 piece tablebase be ready?

I do not know where you have got the value of your post but it matches quite well with my estimate [except on PB and PiB, which would be a factor of (1.024)^5 ~ 1.1259]. I was told later about intermediate results during generation which would not be part of the final result.

Regards from Spain.

Ajedrecista.

hgm · Post by **hgm** » Tue Nov 09, 2021 12:15 pm

As to (d): it seems that the most important EGT are those with an imbalance of a single Pawn. For example, having the 6-men KRPKRP hardly helps (because it is almost draw anyway, and engines woule easily find it by themselves), while the 7-men KRPPKRP has many difficult-to-find wins. This suggests 7-men are useful, but 8-men much less so.

dangi12012 · Post by **dangi12012** » Wed Nov 10, 2021 12:35 am

hgm wrote: ↑Tue Nov 09, 2021 12:15 pm As to (d): it seems that the most important EGT are those with an imbalance of a single Pawn. For example, having the 6-men KRPKRP hardly helps (because it is almost draw anyway, and engines woule easily find it by themselves), while the 7-men KRPPKRP has many difficult-to-find wins. This suggests 7-men are useful, but 8-men much less so.

I dont see how 8 man would be less useful than 7 man? Keeping in mind that 32 man would be infinitely usefuly - then its already proven that elo scaling in Tablebases from 0...32 cannot be linear.
But how much ELO a 8 man would bring is unknown. I dont even find any tests for the 7 man tablebase.
Anyone can run it at home on an array of nvme ssds - for around 2500 dollars.

phhnguyen · Post by **phhnguyen** » Wed Nov 10, 2021 6:01 am

Not sure why someone thought some endgame positions are rare?

Below is a match between SF and Komodo. I have only Syzygy 6 men. Engines started probing (tbhits > 0) very early when the board has over 26 pieces left. If I have 7 or 8 men, they may probe early and more. I think they probably hit almost any endgame.

IMO, the problem of gaining ELO for Syzygy (as well as any EGTB) is that it may slow down the search first before helping. (Someone can test out that by making some tournaments with timer depth only). In the above image, Komodo may probe over 80 thousand times for a move, too much access to hard disks as well as calling probe functions. That is why I doubt both Syzygy 7 and 8 can help much since they are extremely large and slow for recent hardware.

8man Endgame Tablebases ELO gain?

Re: 8man Endgame Tablebases ELO gain?

Re: 8man Endgame Tablebases ELO gain?

Re: 8man Endgame Tablebases ELO gain?

Re: 8man Endgame Tablebases ELO gain?

Re: 8man Endgame Tablebases ELO gain?

Re: 8man Endgame Tablebases ELO gain?

Re: 8-man endgame tablebases Elo gain?

Re: 8man Endgame Tablebases ELO gain?

Re: 8man Endgame Tablebases ELO gain?

Re: 8man Endgame Tablebases ELO gain?