
The idea is that you have a separate thread that's responsible for the tablebase lookups. Whenever a search thread would want to make a TB lookup, it doesn't, but instead puts the lookup into a queue and then keeps searching on. Later, the TB thread picks it up, fires off an asynchronous I/O read (which may or may not hit the buffer cache), and then fills in the hash table. The first thread will of course keep doing its redundant search, but on the next iteration, the tablebase lookup result will be in the hash table and the branch will be skipped.
I made a proof-of-concept in Stockfish, and it's doing pretty poorly. It crashes a lot, and for some reason, it doesn't really manage to increase my I/O usage (neither on SSD nor on rotating media). Obviously I haven't tested playing strength

(Extra hypothetical bonus: SSDs may be happier doing 512b reads than the 4096b reads mmap forces them into.)