Asynchronous tablebase lookups

Discussion of chess software programming and technical issues.

Moderator: Ras

Sesse
Posts: 300
Joined: Mon Apr 30, 2018 11:51 pm

Re: Asynchronous tablebase lookups

Post by Sesse »

I came up with an idea for asynchronous lookups; I call it “LazyTB” because it's very much like Lazy SMP :-)

The idea is that you have a separate thread that's responsible for the tablebase lookups. Whenever a search thread would want to make a TB lookup, it doesn't, but instead puts the lookup into a queue and then keeps searching on. Later, the TB thread picks it up, fires off an asynchronous I/O read (which may or may not hit the buffer cache), and then fills in the hash table. The first thread will of course keep doing its redundant search, but on the next iteration, the tablebase lookup result will be in the hash table and the branch will be skipped.

I made a proof-of-concept in Stockfish, and it's doing pretty poorly. It crashes a lot, and for some reason, it doesn't really manage to increase my I/O usage (neither on SSD nor on rotating media). Obviously I haven't tested playing strength :-) But I think given a proper implementation, there's a 20% chance or so this could give a way to probe tablebases with no search slowdown, and ideally, with much better I/O saturation. (Of course, if the queue just keeps on growing, one would have to throw away lookups based on some criteria. Perhaps something like “depth, but combined with number of times this disk block was requested”.)

(Extra hypothetical bonus: SSDs may be happier doing 512b reads than the 4096b reads mmap forces them into.)
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Asynchronous tablebase lookups

Post by Sopel »

Have you measured the IO latencies, IO queue depths, and overal speed loss from using TBs depending on the number of search threads? I'd imagine that with modern high-core cpus this would be a non-issue. Also, keep in mind that it's quite feasible to cache most of TB6 in memory today while using TB7 will not be beneficial for a long time.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Asynchronous tablebase lookups

Post by dangi12012 »

Sesse wrote: Mon Dec 28, 2020 1:22 am
Ummm directstorage and these apis would mean loading nvme data into a gpu. It has nothing to do with tablebase probing.
For random IO there is nothing faster than a memory mapped file. You can use that pointer directly and the OS will fetch 4k on a pagefault.

Asynchronous IO has much more overhead with wait objects etc. - and ultimately is slower.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Sesse
Posts: 300
Joined: Mon Apr 30, 2018 11:51 pm

Re: Asynchronous tablebase lookups

Post by Sesse »

Ummm directstorage and these apis would mean loading nvme data into a gpu. It has nothing to do with tablebase probing.
I don't know about DirectStorage, but io_uring certainly isn't related to GPUs at all. io_uring has much less overhead than regular syscalls (and there are no “wait objects”). Have you actually ever used any of these APIs?
For random IO there is nothing faster than a memory mapped file. You can use that pointer directly and the OS will fetch 4k on a pagefault.
The latter is correct (assuming you've turned off readahead, which Stockfish on Linux does these days), the former is simply 100% wrong.