Question to syzygy author

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
yurikvelo
Posts: 710
Joined: Sat Dec 06, 2014 1:53 pm

Re: Question to syzygy author

Post by yurikvelo »

Onegin wrote:Why not use both DTM and DTZ metrics instead of using engine search?
ask here: http://www.talkchess.com/forum/viewtopic.php?t=59904
or here http://www.talkchess.com/forum/viewtopic.php?t=47681

this is technical topic for devs
User avatar
hgm
Posts: 28395
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Question to syzygy author

Post by hgm »

Onegin wrote:I am a member of TalkChess.com and I accidentally stumbled into your discussion. I am an average computer user but can anybody answer my simple question about Syzygy tablebases. Why wdl TBs do NOT return DTM metric but simply a number -2, -1, 0, 1 or 2. Why not use both DTM and DTZ metrics instead of using engine search?
If they returned DTM they would be DTM tables, not WDL tables. WDL tables only return whether the position is a win, draw or loss. That is why they are called WDL tables. So the Syzygy tables are not really WDL tables but a kind of extended WDL, as they also contain cursed wins and blessed losses as 4th and 5th possibility.

The engine only needs WDL information on end-games it is not in yet at the root; the more detailed DTx info is only helpful for choosing a move in the end-game you are currently in. For end-games you will pass through on the way to a win the only thing of importance is to convert to a won position (or a not-lost position when defending), and you would only have to worry about how to win it once you get there. WDL stores much less information per position, so the WDL tables are smaller, and therefore also faster in usage.

You could use both DTM and DTZ50 metrics: Just play the move that DTM-wise is best, and according to DTZ50 and your current ply-counter setting would still be a win. Disadvantage is that you have to have both DTM and DTZ50 tables around, and the DTM tables are nearly an order of magnitude larger than the DTZ50. That would give you a very good approximation of the behavior most users want without the need for any search. If you would use DTM tables adapted to avoid conversion to cursed wins (as sort of partial DTM50), it would even be perfect.
Last edited by hgm on Fri May 06, 2016 9:52 am, edited 1 time in total.
Onegin
Posts: 11
Joined: Sat Mar 12, 2016 1:40 am

Re: Question to syzygy author

Post by Onegin »

yurikvelo wrote:
Onegin wrote:Why not use both DTM and DTZ metrics instead of using engine search?
ask here: http://www.talkchess.com/forum/viewtopic.php?t=59904
or here http://www.talkchess.com/forum/viewtopic.php?t=47681

this is technical topic for devs
Yes, I've seen those forums before, the ones you posted links to, especially the first post on Syzygy tablebases by de Man. That was 3 years ago when he first created those TBs. But I'm still not sure I know how they work. So I thought somebody could answer some questions
Onegin
Posts: 11
Joined: Sat Mar 12, 2016 1:40 am

Re: Question to syzygy author

Post by Onegin »

hgm wrote:
Onegin wrote:I am a member of TalkChess.com and I accidentally stumbled into your discussion. I am an average computer user but can anybody answer my simple question about Syzygy tablebases. Why wdl TBs do NOT return DTM metric but simply a number -2, -1, 0, 1 or 2. Why not use both DTM and DTZ metrics instead of using engine search?
If they returned DTM they would be DTM tables, not WDL tables. WDL tables only return whether the position is a win, draw or loss. That is why they are called WDL tables. So the Syzygy tables are not really WDL tables but a kind of extended WDL, as they also contain cursed wins and blessed losses as 4th and 5th possibility.

The engine only needs WDL information on end-games it is not in yet at the root; the more detailed DTx info is only helpful for choosing a move in the end-game you are currently in. For end-games you will pass through on the way to a win the only thing of importance is to convert to a won position (or a not-lost position when defending), and you would only have to worry about how to win it once you get there. WDL stores much less information per position, so the WDL tables are smaller, and therefore also faster in usage.

You could use both DTM and DTZ50 metrics: Just play the move that DTM-wise is best, and according to DTZ50 and your current ply-counter setting would still be a win. Disadvantage is that you have to have both DTM and DTZ50 tables around, and the DTM tables are nearly an order of magnitude larger than the DTZ50. That would give you a very good approximation of the behavior most users want without the need for any search. If you would use DTM tables adapted to avoid conversion to cursed wins (as sort of partial DTM50), it would even be perfect.
So, the engine uses the following algorithm:

1. WDL table returned value 2. Position is won. Use the same DTM approach as in the case of DTM based tablebases (Nalimov, for example)
2. WDL table returned value 1. Position is won, but drawn under 50 move rule. Use DTZ information to zero ply counter at the same time making sure you select only won position from WDL table. You also need to make sure you arrive at the zeroing move before 50 move rule expires.
3. WDL table returned value 0. Position is drawn. Same as in case 1, use usual TB approach.
4. WDL returned value -1 or -2. Same as in cases 1 and 2, only with reverse colors

???
User avatar
hgm
Posts: 28395
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Question to syzygy author

Post by hgm »

Well, what the engine does has nothing to do with the End-Game Tables. The EGT just provide information. The engine can do with that information what it wants.

If in the search you encounter a cursed win, it is basically a draw (unless you play correspondence Chess, where the 50-move rule does not apply). So you score the position as a favorable draw, and search on in the hope you will be able to force a win that is not cursed. If cursed wins are the best you can do, you could of course go for one and hope the opponent will play imperfectly, and lose it anyway. But in a true draw the opponent can also play imperfectly, and lose by that. It is not at all obvious that a cursed win would be more difficult to defend than a true draw. So it makes sense to evaluate the draws, cursed wins and blessed losses all by search + heuristic evaluation, rather than giving them some 'synthetic' score close to zero. The heuristic evaluation is supposed to give a high score to positions that put the opponent under pressure.

If you already are in a wn position of a tabulated end-game you can simply convert to a won simpler end-game immediately, or, if not possible, play the move with the smallest DTZ. That guarantees a win, but often in a silly way, where it takes the most lengthy and difficult win it can still force, and sacrifices all other material first. So some engines prefer a mate score over a DTZ tablebase win, and search to see if they can discover such a mate score. (For DTM the tablebase gives you a mate score already.)

If you are in a cursed win, the situation is tricky. You would have to know whether the win is cursed because you cannot force conversion quickly enough in the current phase, or whether you can timely force conversions, but they are all cursed in the next phase. In the latter case you might not want to go for the quickest conversion, but for a conversion to the 'least cursed' win in the next phase that you can force within 50 moves in the current phase. I don't know if that is the info you will find in the DTZ tables.
syzygy
Posts: 5780
Joined: Tue Feb 28, 2012 11:56 pm

Re: Question to syzygy author

Post by syzygy »

Onegin wrote:So, the engine uses the following algorithm:

1. WDL table returned value 2. Position is won. Use the same DTM approach as in the case of DTM based tablebases (Nalimov, for example)
2. WDL table returned value 1. Position is won, but drawn under 50 move rule. Use DTZ information to zero ply counter at the same time making sure you select only won position from WDL table. You also need to make sure you arrive at the zeroing move before 50 move rule expires.
3. WDL table returned value 0. Position is drawn. Same as in case 1, use usual TB approach.
4. WDL returned value -1 or -2. Same as in cases 1 and 2, only with reverse colors

???
I think you first need to have a good idea of how an engine works in general. This thread is not really the best place to learn about that.

If you really want to know, there is the source code. If you cannot read source code, then the goal of really understanding how things work might not be achievable anyway. If you just want to know at a "user's level", then this is a developers forum and therefore the wrong place.
syzygy
Posts: 5780
Joined: Tue Feb 28, 2012 11:56 pm

Re: Question to syzygy author

Post by syzygy »

hgm wrote:If you are in a cursed win, the situation is tricky. You would have to know whether the win is cursed because you cannot force conversion quickly enough in the current phase, or whether you can timely force conversions, but they are all cursed in the next phase. In the latter case you might not want to go for the quickest conversion, but for a conversion to the 'least cursed' win in the next phase that you can force within 50 moves in the current phase. I don't know if that is the info you will find in the DTZ tables.
My tables don't distinguish between a distance of 10 moves to a cursed win and a distance of 60 moves to a real win. Both are stored essentially as "60" (and that improves compression a bit). That is enough to convert a cursed win if the 50-move rule is ignored, but does not help so much in finding an optimal swindling strategy (whatever that might be... if the opponent uses the same tables nothing will help anyway).
Onegin
Posts: 11
Joined: Sat Mar 12, 2016 1:40 am

Re: Question to syzygy author

Post by Onegin »

syzygy wrote:
Onegin wrote:So, the engine uses the following algorithm:

1. WDL table returned value 2. Position is won. Use the same DTM approach as in the case of DTM based tablebases (Nalimov, for example)
2. WDL table returned value 1. Position is won, but drawn under 50 move rule. Use DTZ information to zero ply counter at the same time making sure you select only won position from WDL table. You also need to make sure you arrive at the zeroing move before 50 move rule expires.
3. WDL table returned value 0. Position is drawn. Same as in case 1, use usual TB approach.
4. WDL returned value -1 or -2. Same as in cases 1 and 2, only with reverse colors

???
I think you first need to have a good idea of how an engine works in general. This thread is not really the best place to learn about that.

If you really want to know, there is the source code. If you cannot read source code, then the goal of really understanding how things work might not be achievable anyway. If you just want to know at a "user's level", then this is a developers forum and therefore the wrong place.
No, I'm not a developer, I'm a user. I'll try to find user's forum next time, I didn't know. This is my first participation in a TalkChess forum. I don't know how an engine works but I noticed that when I use Stockfish in the endgame calculation Arena GUI shows # of TB hits and there are two types of behavior: 1. small # of TB hits (several hits, up to 30-40) 2. very large # of TB hits (up to tens of thousands), number which is comparable to the # nodes the engine goes through. I couldn't find an explanation for such a behavior and started thinking about EGTBs
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Question to syzygy author

Post by mcostalba »

Here we are with my daily questions :-)

I have seen this code to assign leading pawn color (at least this is what I understand):

Code: Select all

bool c = (   !pos.count<PAWN>(BLACK)
          || (   pos.count<PAWN>(WHITE)
              && pos.count<PAWN>(BLACK) >= pos.count<PAWN>(WHITE)));

        pawn.pawnCount[0] = pos.count<PAWN>(c ? WHITE : BLACK);
        pawn.pawnCount[1] = pos.count<PAWN>(c ? BLACK : WHITE);
For me it is not clear the condition, because you assign leading color to white if black has no pawns, and this is understandable, but then, if both have pawns, you assign leading color to the side with _less_ pawns, and this I don't' understand, shouldn't be the opposite?


In probe_dtz_table there is one early-fail logic that I don't understand.

Code: Select all

if ((ptr->piece.flags & 1) != bside && !ptr->symmetric) {
            *success = -1;
            return 0;
        }



and also the mapping of the score, in particular what is 'map' and what is used for?

Code: Select all

       if (ptr->piece.flags & 2)
            res = ptr->piece.map[ptr->piece.map_idx[wdl_to_map[wdl + 2]] + res];

        if (!(ptr->piece.flags & pa_flags[wdl + 2]) || (wdl & 1))
            res *= 2;
Care to elaborate a bit on those? Thanks in advance.
syzygy
Posts: 5780
Joined: Tue Feb 28, 2012 11:56 pm

Re: Question to syzygy author

Post by syzygy »

mcostalba wrote:Here we are with my daily questions :-)

I have seen this code to assign leading pawn color (at least this is what I understand):

Code: Select all

bool c = (   !pos.count<PAWN>(BLACK)
          || (   pos.count<PAWN>(WHITE)
              && pos.count<PAWN>(BLACK) >= pos.count<PAWN>(WHITE)));

        pawn.pawnCount[0] = pos.count<PAWN>(c ? WHITE : BLACK);
        pawn.pawnCount[1] = pos.count<PAWN>(c ? BLACK : WHITE);
For me it is not clear the condition, because you assign leading color to white if black has no pawns, and this is understandable, but then, if both have pawns, you assign leading color to the side with _less_ pawns, and this I don't' understand, shouldn't be the opposite?
The short answer is that this how the generator selects the leading color, and so how the TB files are organised.

The reason why the generator selects the color with the fewest number of pawns if both sides have pawns is that this results in a more efficient indexing scheme.

Take for example KPPvKP. If we first place the black pawn and then mirror it to the a-d files, no further symmetries are left (apart from interchanging the two white pawns). But if we first place the two white pawns, there are still symmetries if the two white pawns are placed symmetrically. Example:
[d]8/8/2k1p3/8/8/3K4/P6P/8 w - - 0 1
and
[d]8/8/3p1k2/8/8/4K3/P6P/8 w - - 0 1
Taking white as leading color, my pawn indexing function would map these two symmetric positions to different index values. Taking black as leading color, the pawn indexing function first mirrors the leading black pawn on e6 in the first position to d6 to get the second position, so the two positions are mapped to the same index value.

I guess this does not explain why the leading color for KPPPvKPP should be black, because placing three white pawns first would remove all symmetry whereas placing two black pawns first still leaves some. But KPPPvKPP is a 7-piece position, so does not occur. (In any event tables with so many like pieces will be very small compared to other tables, so in the end we are talking about peanuts.)
In probe_dtz_table there is one early-fail logic that I don't understand.

Code: Select all

if ((ptr->piece.flags & 1) != bside && !ptr->symmetric) {
            *success = -1;
            return 0;
        }
The DTZ tables are one-sided, i.e. they store positions only for white to move or only for black to move.

So "(ptr->piece.flags & 1) != bside" tests whether the current side to move is of the "wrong" color and not stored in the table. If that is the case, the call to probe_dtz_table() sets *success to -1 (it is not really a fail; 0 is fail) and the calling code has to do a one-ply search (see probe_dtz_no_ep()).

If the table is symmetric (e.g. KQRvKQR) then storing a single side is anyway sufficient because a position with black to move can be mirrored into a position with white to move by changing the colors of all pieces. So symmetric WDL tables are also one-sided and symmetric DTZ tables can always be probed without the one-ply search.
and also the mapping of the score, in particular what is 'map' and what is used for?

Code: Select all

       if (ptr->piece.flags & 2)
            res = ptr->piece.map[ptr->piece.map_idx[wdl_to_map[wdl + 2]] + res];
First I'll quote the comment explaining the result returned by probe_dtz():

Code: Select all

// Probe the DTZ table for a particular position.
// If *success != 0, the probe was successful.
// The return value is from the point of view of the side to move:
//         n < -100 : loss, but draw under 50-move rule
// -100 <= n < -1   : loss in n ply (assuming 50-move counter == 0)
//         0	    : draw
//     1 < n <= 100 : win in n ply (assuming 50-move counter == 0)
//   100 < n        : win, but draw under 50-move rule
(The real meaning of the values < -100 and > 100 is actually a bit subtle, see my reply to HGM above.)

Since the WDL tables store whether the position is a loss, cursed loss, draw, cursed win or win, this information does not need to be repeated in the DTZ table. So what happens is basically this:

- if the position is a draw, then the DTZ table stores a "don't care" (the compressor picks some arbitrary value trying to get good compression);
- if the position is a win, then the DTZ table stores the distance to zero.
- if the position is a loss, then the DTZ table stores the distance to zero also as a positive value;
- if the position is a cursed win, then the DTZ table stores the "distance to zero" minus 100;
- if the position is a cursed loss, then the DTZ table stores the positive "distance to zero" minus 100.

This means that probe_dtz() must first call probe_wdl(). If the position is a draw, then probe_dtz() can immediately return a draw value (so will never look into the table and will never be bothered by the meaningless "don't care" value stored there). Otherwise, it will call probe_dtz_table() and construct the correct dtz value form the values returned by probe_dtz_table() and probe_wdl().

This improves compression a lot, because compression gets better if we have fewer distinct values and if the frequently occurring values occur even more frequently.

This does not yet explain the map[] mapping. The idea of that mapping is that the four ranges of values we get from wins, losses, cursed wins and cursed losses may not all be distributed in the same way. And if we store values in plies(*), then it could even be that most wins are odd and most losses are even (or the other way around), so there is little gain from losing the sign.

So in such cases we remap the values in the four ranges. This was done by the generator and so the probing code has to do the same. The generator sorted the values in each of the four ranges by frequency of occurrence and then assigned the values 0, 1, 2, 3, ... in order of decreasing frequency. The mapping information necessary to reconstruct the original values in stored in the TB file and used to initialise the map[] array during initialisation of the TB.

(*) in most cases, the DTZ tables store distance to zero in number of moves. For some tables it is critical to store the dtz values for wins and losses in plies because otherwise things could go wrong with the 50-move rule. This explains the following lines:

Code: Select all

        if (!(ptr->piece.flags & pa_flags[wdl + 2]) || (wdl & 1))
            res *= 2;
For a cursed win or loss ((wdl & 1) is non-zero), the value is stored in moves so we need to multiply by two. For wins and losses, we need to look at ptr->piece.flags (which is set during table initialisation).
Care to elaborate a bit on those? Thanks in advance.
You're welcome :)