EGTB generation: more efficient out-counting method?

wgarvin · Post by **wgarvin** » Fri Jun 10, 2011 8:23 pm

smatovic wrote:a basic question:

How many possible, legal moves can i get from one board position?

I thought it were 218, but maybe it is less?

--
Srdja

I was recently wondering about a related question: For various tablebase endings, what is the maximum number of non-capturing/reversible moves available in any position of that ending?

It has implications for the tablebase generation, if you are using the out-counting method. If you know the "outs counter" for an unresolved position has never overflowed, then you can resolve the position immediately when the counter reaches zero without needing to test all of the successors. I think for 3-vs-2 endings (e.g. KQQKQ) or 3-vs-3 endings, the maximum you could have is 62 or so, so if you use 64+64 values to represent unresolved positions (i.e. 64 for "unknown" and 64 for "unknown but a drawing move has been found"), then you can generate e.g. DTZ50 for all of these endings without worrying about counter overflow. I don't know if existing out-counting generators take advantage of this or not.

But for 4-vs-1 or 4-vs-2 endings, the situation is a bit more complicated. You can't afford to dedicate enough of the byte values to represent unresolved positions to cover all of the outs, so the generator will have to deal with the possibility of overflow for one or both sides. You can always generate the successors to check (but this is slow). I thought of a trick though, that might help for some of them. If your indexing function includes all 64 squares for the least-significant piece (which you might do for simplicity/efficiency reasons in the generator) then at least 2 out of every 64 squares are going to be unused, broken positions. So each 64-byte cache line contains 16 bits where you can store bitflags, with each bitflag covering up to 4 positions. You could set each flag if any of its covered positions had overflowed the counter during initialization. When a counter hits zero, you would check its flag (which is in the same cache line) before generating the successors to verify that they are all actually resolved. If the flag was 0 that step could be skipped.

If you want to get even more clever, you can subdivide the 256 byte values into "unresolved" and "resolved" positions in two different ways, and use the flag to differentiate which format those 4 positions are stored in. So you could have e.g. 63+63 unresolved counter values for most positions, but when any position needs to be resolved with a score too large for that representation (which should be relatively rare), OR if you have to initialize one of them with an out-counter too large to fit, then you could set the flag and convert those 4 positions into the other representation where they have e.g. 32+32 unresolved counter values and more values for resolved (win/loss) scores, and for positions in this second format, you would assume that the outcounter might have overflowed.

hgm · Post by **hgm** » Fri Jun 10, 2011 8:40 pm

Interesting idea. It never occurred to me you could allow the counter to overflow. Testing daugthter positions for verification of a potentially lost position is not that expensive (although of course far more expensive than decrementing the counter), and if you onl have to do it when the counter reaches zero, it would reduce the work to a bearable level.

As soon as a disk is needed to store the tablebase for the next iteration, out counters are no longer competative. The best scheme I could conceive used a single bit to indicate just-decided positions, and then compressing the set using a sort of run-length encoding.

wgarvin · Post by **wgarvin** » Fri Jun 10, 2011 8:48 pm

hgm wrote:Interesting idea. It never occurred to me you could allow the counter to overflow. Testing daugthter positions for verification of a potentially lost position is not that expensive (although of course far more expensive than decrementing the counter).

As soon as a disk is needed to store the tablebase for the next iteration, out counters are no longer competative. The best scheme I could conceive used a single bit to indicate just-decided positions, and then compressing the set using a sort of run-length encoding.

Agreed. Out-counting is great if the whole pawnslice (or database, if there are no pawns) can fit in RAM.

As soon as it can't, you should switch to a method that either stores 2 bits per position in RAM (if that fits), or one of the methods that uses 1-bit per position of RAM plus 1-bit per position on disk, or something like that.

I can't remember where it was mentioned (somewhere on CCRL forum but their search function is unable to search for the word "dual")... Anyway, Lincke's thesis describes a "dual indexing" algorithm that uses only one bit of RAM per position. It sounds like it worked very well for Awari, to make the disk accesses a lot less random. I'm not sure how well it would work for computer chess for e.g. 7-man endings.

hgm wrote:if you only have to do it when the counter reaches zero, it would reduce the work to a bearable level.

If you did it for every position, you're basically doubling the total amount of "random" accesses: you generate all predecessors of a position once when you resolve it (to decrement their counters), and when its counter has reached zero, you generate all successors of a position once to verify that they are really resolved as opposed to the counter having overflowed. Average number of successors in one side of the database is the same as average number of predecessors in the other side, and vice versa. [Edit: maybe when counting outs during initialization, you also have to check the successors to see if they were already resolved. So I guess the penalty of testing for overflow in every position is only 50%.]

[Edit: note that even if you are going to generate all of the successors and check that they are resolved, you still need to have two ranges of counter values, to remember whether "a drawing move has been found" or not. Thats because during initialization, there might be converting moves that convert to a draw, and still have to remember that somehow.]

But if you only did it for some fraction of the positions--probably a small fraction, then the cost would be not that bad, and you'd mostly only be paying for it in positions where it was necessary. Where either that position, or one of its very local neighbors, either (1) needed to be able to represent more resolved values, or (2) had overflowed its out-counter during initialization.

hgm · Post by **hgm** » Fri Jun 10, 2011 10:18 pm

Doubling, and certainly 50% are acceptable overheads. But if you use a cache-friendly algorithm, the overhead can be much smaller. E.g. if you mentally organize the set of positions as a two-dimensional array, one index running over all white constellations, one index running over the black ones... Then the random accesses for decrementing the counters stay in the same row or column of that matrix as the acesses for doing the verification. If the row/column then fits in the cache, the accesses for decrementing the counters would load it into there from RAM. The accesses for verification would then all be cache hits: hardly any overhead at all.

Even if a row threatens to overflow the cache, you often can make use of the locality of King moves to treat the row in parts that do fit.

wgarvin · Post by **wgarvin** » Fri Jun 10, 2011 11:41 pm

hgm wrote:Doubling, and certainly 50% are acceptable overheads. But if you use a cache-friendly algorithm, the overhead can be much smaller. E.g. if you mentally organize the set of positions as a two-dimensional array, one index running over all white constellations, one index running over the black ones... Then the random accesses for decrementing the counters stay in the same row or column of that matrix as the acesses for doing the verification. If the row/column then fits in the cache, the accesses for decrementing the counters would load it into there from RAM. The accesses for verification would then all be cache hits: hardly any overhead at all.

Even if a row threatens to overflow the cache, you often can make use of the locality of King moves to treat the row in parts that do fit.

Yeah. King-striping [K-slice streaming?], dual indexing, etc. will be critical to performance in any effort to do 7-man endings. I'm not ambitious enough to try and tackle a problem that big though: 5-man DTZ50 is about the limit of my ambitions!

Edit: by the way, are you still pursuing your Leapfrog generator project? It sounded very promising, but also like a huge amount of work.

...with current or near-future hardware, it might be practical to relax your 2GB limit on RAM. It can't be that hard to get a box with 12GB or more of RAM in it nowadays.

wgarvin · Post by **wgarvin** » Sat Jun 11, 2011 12:14 am

On a totally unrelated topic: Diagonal symmetry in endings without pawns [or castling]...

hgm's Leapfrog: Symmetry page wrote:When all pieces are different, the canonical representative is usually defined a the position that has the designated primary symmetry piece in the triangle a1-d1-d4. If this primary piece is on the diagonal a1-d4, a designated secondary piece has to be in the triangle a1-a8-h8.

This is not enough to weed out all symmetry images: if both primary and secondary symmetry piece are on the diagonal, there are still two positions, differing by the location of the other pieces being each other's diagonal image, which satisfy our definition of 'canonical reprsentative'. Since their number is only a very small fraction of the total number of positions, and it would be a hassle to start paying attention to a teriary symmetry piece, we take this overhead for granted.

Having to deal with two indexes that are for equivalent positions seems like it would be annoying -- move generation has to know to generate both of them, etc.

What I suggest for that case (the two primary symmetry pieces both on the diagonal) is to just compute the index both ways and use whichever index is lower. The other index should just be marked as broken, and not used. That way, every board position is uniquely mapped to exactly one index.

If the symmetry pieces are the two kings, then you need lookup tables anyway to combine their indexing into legal king combinations with whatever ordering you want, so I guess those tables could be extended with bitflags to indicate "must flip diagonally" and "flip diagonally if the resulting index is lower".

EGTB generation: more efficient out-counting method?

EGTB generation: more efficient out-counting method?

Re: EGTB generation: more efficient out-counting method?

Re: EGTB generation: more efficient out-counting method?

Re: EGTB generation: more efficient out-counting method?

Re: EGTB generation: more efficient out-counting method?

Re: EGTB generation: more efficient out-counting method?