Tablebase suggestion

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

syzygy
Posts: 5891
Joined: Tue Feb 28, 2012 11:56 pm

Re: Tablebase suggestion

Post by syzygy »

syzygy wrote: Wed Feb 04, 2026 3:52 pm So it seems that if 58 K-slices can fit in RAM, which corresponds to about 300 GB, the K-slice approach should be as efficient as the regular approach that requires 2.38 TB of RAM. With less RAM you have to be a bit more careful to do things in a clever order.
In the case of black moving and the wK being fixed, RAM for 19 K-slices should be enough to not lose efficiency.

Ignoring the wK, we let the bK zig-zag a1-h1-h2-a2-a3-h3...-a8. We load the K-slices that we need and we drop those that we no longer need.
When we are in g2, we have loaded 19 slices (a1-h1,a2-h2,f3,g3,h3).
[d]8/8/8/8/8/5kkk/kkkkkkkk/kkkkkkkk b - - 0 1
Then at f2 we drop h1 and load e3. So still 19 slices.
[d]8/8/8/8/8/4kkkk/kkkkkkkk/kkkkkkk1 b - - 0 1/fen]
And it seems it continues this way, i.e. 19 slices at once in RAM means we don't need to reload anything.
The presence of the wK somewhere in a1-d1-d4 only reduces the number of slices you need to load. (And if the wK is on the diagonal, the bK only needs to visit the lower half of the board including the diagonal, so even less to do.)

The case of white moving in a1-d1-d4 with the bK "fixed" in an orbit seems more complicated.
But can't we still do exactly the same?
wK starts in a1, moves to d1. Then when it moves to e1, the board flips horizontally and the wK is back on d1 but the black king is in another location. The wK continues to a1, then moves up (corresponding to h1-h2). Instead of ending up in a2, the board flips diagonally and the wK is now b1 and then moves up to b2. If I'm not mistaken, the loaded king slices remain as valid as before. It just looks a lot more chaotic.
If this indeed works, then 19 K-slices are again enough.

19 K-slices take up 97.9 GB, so 128 GB of RAM would be comfortable.
syzygy
Posts: 5891
Joined: Tue Feb 28, 2012 11:56 pm

Re: Tablebase suggestion

Post by syzygy »

Ender8pieces wrote: Wed Feb 04, 2026 5:29 pm
syzygy wrote: Wed Feb 04, 2026 3:58 pm
Disk I/O will be the limiting factor, so compression of sparse bitmaps will speed things up.
...
A few hundred full-drive writes will finish the SSD.
It is difficult to be certain without the raw numbers, but I’d like to address the points as candidly as possible:

While Disk I/O is indeed the primary bottleneck, compression has a double-edged effect. It might save bandwidth, but as you noted, it significantly impacts drive endurance and can cause slowdowns due to redundant writes (Write Amplification). A potential middle ground could be running compressed during the initial, data-heavy iterations and switching to raw/uncompressed once the updates become sparse.
With compression you'll also need to write a lot less. Splitting things up in 462 K-slices may indeed have the advantage that fewer K-slices may need to be rewritten in later iterations. In principle each K-slice could be further subdivided (on disk) to increase this effect.

I guess your point is that keeping things uncompressed on SSD allows you to only rewrite the SSD blocks that have changed data (this would only be relevant for the full W and B tables). But SSD blocks are quite large, and rewriting a single byte means rewriting the whole block. Anyway, in my view HDDs are preferred here, and then compression is a total win. And this is just an implementation detail.
If we assume 1 byte per position in a raw file, writing to a single byte within a 4KB sector necessitates writing the entire sector.
Note that SSD blocks are much bigger than 4 KB, at least 128 KB and nowadays apparently often more than 512 KB. And you have to make sure that the OS understands what you're doing. You can't just write the changed byte, you'll have to know which portions of the file correspond to a physical block and write that block in one go.
In the early iterations, this is frequent, making the difference between compressed and raw formats negligible in terms of disk wear.
Compressed is much smaller. Big difference.
syzygy
Posts: 5891
Joined: Tue Feb 28, 2012 11:56 pm

Re: Tablebase suggestion

Post by syzygy »

syzygy wrote: Wed Feb 04, 2026 5:45 pmIgnoring the wK, we let the bK zig-zag a1-h1-h2-a2-a3-h3...-a8.
I think zig-zag does not add anything. We can also do a1->h1, a2->h2, a3->h3, etc.
Can we show that 19 is optimal?

If we have RAM for 20 K-slices, loading/saving can be overlapped with calculating more easily.
Ender8pieces
Posts: 14
Joined: Wed Jan 21, 2026 9:16 am
Full name: JS

Re: Tablebase suggestion

Post by Ender8pieces »

Maybe considering SSD vs. HDD or raw vs. compressed is somewhat premature. These are implementation details that depend on the specific task and hardware budget. One could even consider a hybrid approach: for example running the initial, heavy iterations on HDDs to preserve flash endurance, and then switching to Enterprise SSDs for the sparse, late-game iterations where random access latency becomes the primary bottleneck.

I haven’t fully delved into the 5D transition analysis yet, but it is fascinating to see the problem converge into a well-defined, abstract mathematical challenge. It reminds me of cache-locality optimizations in other fields. Techniques like Hilbert space-filling curves, which are commonly used to linearize multi-dimensional data while preserving locality, might be highly relevant here to minimize "jumps" between slices, though I haven't mapped it out specifically for this case
syzygy
Posts: 5891
Joined: Tue Feb 28, 2012 11:56 pm

Re: Tablebase suggestion

Post by syzygy »

Ender8pieces wrote: Wed Feb 04, 2026 8:11 pmwhere random access latency becomes the primary bottleneck.
I don't see random disk access becoming the bottleneck. There will be disk seeks to access each of the 462 parts of a table, but that is not much.

I expect the bottleneck to be sequential disk throughput. Multiple drives in a RAID configuration would help. An SSD or multiple SSDs would help even more, but as mentioned the amounts of data being written are very considerable. Without compression, the largest pawnless tables are 37.2 TB with 1 byte per position (some might need 2 bytes/position). Compression helps, but you can't expect a miracle.

It would be interesting to know how Marc Bourzutschky's system stores the generated tables and the intermediate results. Almost certainly he did not keep the generated tables.

What we really need is new storage technology...
I haven’t fully delved into the 5D transition analysis yet, but it is fascinating to see the problem converge into a well-defined, abstract mathematical challenge. It reminds me of cache-locality optimizations in other fields. Techniques like Hilbert space-filling curves, which are commonly used to linearize multi-dimensional data while preserving locality, might be highly relevant here to minimize "jumps" between slices, though I haven't mapped it out specifically for this case
I am reasonably sure that 19 K-slices as I described is optimal.
But it is still possible that the whole thing does not work because I overlooked something (or am hallucinating).
If it works, then the largest 8-piece pawnless tables can apparently be generated with somewhat reasonable efficiency on a "low-end" system with 128 GB of RAM. It would be good to have ECC memory, though.

Storing and distributing those tables will be a very major problem if the goal is to create a complete set. In fact, it really is a hopeless task if you also want to generate the pawnful tables.
But it might be nice to outdo Bourzutschky and generate KQRBvKQRN.