I'm thinking out loud what if we split an 8-piece table into 2*462 (wtm/btm x (wk,bk)) K-slices of 62*61*60*59*58*57 positions (in the worst case of no duplicate pieces). A bitmap of one such slice takes up about 5.15 GB.
Before splitting into K-slices, the Wu-Beal algorithm for calculating loss-in-n btm from win-in-n wtm:
Code: Select all
S1 = Load(W, WIN_IN_N(n));
R1 = Predecessor(S1);
S2 = Load(B, UNKNOWN);
S2 = S2 & R1;
----
R2 = Load(W, WIN_IN<=N(n));
S3 = ProveSuccessor(S2, R2);
B = Add(S3, LOSE_IN_N(n));
Here W, B are intermediate stages of the final 8-piece table (wtm, btm), accessed sequentially and stored on disk.
R1/R2 are temporary bitmaps in RAM of size 462*62*61*60*59*58*57/8 bytes, accessed randomly.
S1/S2/S3 are temporary bitmaps that are accessed sequentially (so can be stored on disk, if not generated on the fly).
S1 is a subset of wtm positions, S2,S3 are subsets of btm positions.
S1 and S3 do not need to be stored separately. S1 is read from W, and S3 is immediately written to B.
Let's rewrite the first part a bit:
Code: Select all
S2 = 0;
S1 = Load(W, WIN_IN_N(n));
R1 = Predecessor(S1);
S2 |= Load(B, UNKNOWN) & R1;
It is not necessary to do this in one go. We can take a K-slice S1_k of W, calculate Predecessor(S1_k) which will fall within up to 9 btm K-slices (*), AND the result with LOAD(B,UNKNOWN) and then OR with S2. We only need to hold 9 btm K-slices R1_k in RAM.
Ideally we hold more than 9 btm K-slices in RAM and delay the "AND" and "OR" operation for a btm K-slice R1_k until we need to drop it from RAM to make room for another btm K-slice.
We can choose the order in which we run through K-slices of W, and we can choose which btm K-slices to drop. This gives an interesting optimization problem.
(*) an unmove from a position with one king not on the diagonal to a position with both kings on the diagonal also needs to generate the diagonal mirror-image of the position, but this position is still in the same K-slice.
The same can be done with the second part:
Code: Select all
R2 = LOAD(W, WIN_IN<=N(n));
S3 = ProveSuccessor(S2, R2);
B = Add(S3, LOSE_IN_N(n));
ProveSuccessor(S2,R2) calculates the subset of btm positions p in S2 for which all wtm successor positions (obtained by applying black moves) are in R2.
To calculate ProveSuccessor(S2_k,, R2) for a K-slice S2_k, we need to have up to 9 wtm K-slices of R2 in RAM.
So we just cache as many wtm K-slices in RAM as we can and try to run through K-slices of S2 in a clever order to try to reduce the number of times each wtm K-slice has to be loaded into RAM. This is a second optimization problem.
Then there is also the part that calculates win-in-(n+1) wtm from loss-in-n btm.
This needs S3 (= loss-in-n btm), so either we do store S3 separately or we extract it on the fly from B.
Code: Select all
R3 = Predecessor(S3);
S4 = Load(W, UNKNOWN);
S4 = S4 & R3;
W = Add(S4, WIN_IN_N(n+1));
Let's rewrite a bit:
Code: Select all
S4 = 0;
R3 = Predecessor(S3);
S4 |= Load(W, UNKNOWN) & R3;
W = Add(S4, WIN_IN_N(n+1));
So very similar to the first part above. We go through S3 K-slice by K-slice, Predecessor(S3_k) is a subset of up to 9 wtm K-slices. We keep as many of them as long as we can in RAM, and if we need to drop one we AND with Load(W, UNKNOWN) and OR with W.
The same then needs to be done with wtm and btm interchanged. And repeated many times until no new UNKNOWN positions are resolved. The remaining UNKNOWN positions are draws.
47 GB of RAM (for 9 slices) would be enough, but the more the better (until you can hold 462 slices at once in 2.38 TB).
I guess this should work, but maybe I am overlooking something.
It would be interesting to have an idea of the overhead incurred by being able to hold only N out of 462 slices in RAM at a time.
The same can be done with 7 pieces. A K-slice would be 92.56 MB. So 833 MB would be the minimum, and 42 GB could hold all 462 K-slices.
For pawn tables we can split in pawn slices and those can probably be split in K-slices again. It might be sufficient to split into K-slices with one king fixed.