Asynchronous tablebase lookups

Sesse · Post by **Sesse** » Sun Nov 14, 2021 11:20 pm

I came up with an idea for asynchronous lookups; I call it “LazyTB” because it's very much like Lazy SMP

The idea is that you have a separate thread that's responsible for the tablebase lookups. Whenever a search thread would want to make a TB lookup, it doesn't, but instead puts the lookup into a queue and then keeps searching on. Later, the TB thread picks it up, fires off an asynchronous I/O read (which may or may not hit the buffer cache), and then fills in the hash table. The first thread will of course keep doing its redundant search, but on the next iteration, the tablebase lookup result will be in the hash table and the branch will be skipped.

I made a proof-of-concept in Stockfish, and it's doing pretty poorly. It crashes a lot, and for some reason, it doesn't really manage to increase my I/O usage (neither on SSD nor on rotating media). Obviously I haven't tested playing strength

But I think given a proper implementation, there's a 20% chance or so this could give a way to probe tablebases with no search slowdown, and ideally, with much better I/O saturation. (Of course, if the queue just keeps on growing, one would have to throw away lookups based on some criteria. Perhaps something like “depth, but combined with number of times this disk block was requested”.)

(Extra hypothetical bonus: SSDs may be happier doing 512b reads than the 4096b reads mmap forces them into.)

Sopel · Post by **Sopel** » Sun Nov 14, 2021 11:45 pm

Have you measured the IO latencies, IO queue depths, and overal speed loss from using TBs depending on the number of search threads? I'd imagine that with modern high-core cpus this would be a non-issue. Also, keep in mind that it's quite feasible to cache most of TB6 in memory today while using TB7 will not be beneficial for a long time.

dangi12012 · Post by **dangi12012** » Mon Nov 15, 2021 12:33 am

Sesse wrote: ↑Mon Dec 28, 2020 1:22 am

Ummm directstorage and these apis would mean loading nvme data into a gpu. It has nothing to do with tablebase probing.
For random IO there is nothing faster than a memory mapped file. You can use that pointer directly and the OS will fetch 4k on a pagefault.

Asynchronous IO has much more overhead with wait objects etc. - and ultimately is slower.

Sesse · Post by **Sesse** » Mon Nov 15, 2021 10:00 am

Ummm directstorage and these apis would mean loading nvme data into a gpu. It has nothing to do with tablebase probing.

I don't know about DirectStorage, but io_uring certainly isn't related to GPUs at all. io_uring has much less overhead than regular syscalls (and there are no “wait objects”). Have you actually ever used any of these APIs?

For random IO there is nothing faster than a memory mapped file. You can use that pointer directly and the OS will fetch 4k on a pagefault.

The latter is correct (assuming you've turned off readahead, which Stockfish on Linux does these days), the former is simply 100% wrong.

Asynchronous tablebase lookups

Re: Asynchronous tablebase lookups

Re: Asynchronous tablebase lookups

Re: Asynchronous tablebase lookups

Re: Asynchronous tablebase lookups