I am curious to know what kind of transposition table is being used by current cluster programs (Rybka & DS). The local transposition seems easiest and that is what i plan to do.
TT types as defined in Feldman's paper
Global tt One processor holds the whole tt. Probes and Stores from/to this processor. Local tt Each has its own, nothing shared. distributed tt part of tt stored in each processor. Shared tt size grows with each processor.
Remote TT lookups (global or distributed TT) would be WAY too slow over commodity hardware. Gigabit Ethernet gives latencies of about 200us, which wouldn't allow for more than 5000 nodes per second.
Most of the advantage of distributed TT can be had by being smart about what workpackets get sent to what node. (Preferably the one that has the relevant hash info)
I never came to implement it, but when I was thinking about SMP, some ideas came to my mind:
- every processor uses its own local TT
- the hash entries are structured by material configuration
- bloom filters are used for each configuration and shared with the processors that are in a similar board position
- a sort of prefetching is used in an ETC way at deeper nodes
or a second thread is launched to wait for the result of the TT-probing and tells the master to abort in case of a TT-cut
- its definitly important that the processors always get to search a similar part of the tree, a part they have the most information about locally.
maybe some aspects are worth to think through more thoroughly
Remote TT lookups (global or distributed TT) would be WAY too slow over commodity hardware. Gigabit Ethernet gives latencies of about 200us, which wouldn't allow for more than 5000 nodes per second.
Most of the advantage of distributed TT can be had by being smart about what workpackets get sent to what node. (Preferably the one that has the relevant hash info)
You can actually do global TT with a fairly significant amount of work. Nothing says you can't start the probe, then continue the search, and when you get the result back, take whatever action is necessary. This has been done in the past, say by Schaeffer among others...
If you look at the numbers, you'll see this is only possible the first few plies near to the root. So then you'd still have local TT's for most of the search tree.
What's more, by already starting the search, you're going to be doing work that's not needed, so efficiency will drop again.
Thanks. I also plan to keep communications costs as low as possible.
In that paper even history and killers are passed around among processors, which i do not expect to help much. Even TTs give 20% cut-off rates IIRC.
They used a "Transputer system" (don't know what that is), which i am guessing has very low latency.
Most of the advantage of distributed TT can be had by being smart about what workpackets get sent to what node. (Preferably the one that has the relevant hash info)
I guess that there could be many ways to group together processors to work on similar trees. I certainly need to drop the peer-to-peer approach I am using for SMP and strictly follow a master-slave YBW with some smart algorithm to handle work allocation as you suggested.
Last edited by Daniel Shawul on Mon Mar 22, 2010 3:47 pm, edited 1 time in total.
I haven't thought of using the material signature as a way to detect similarity of search trees, but that is one way of doing it I suppose. I think that there are many other ways to detect search trees near one another.
Gian-Carlo Pascutto wrote:If you look at the numbers, you'll see this is only possible the first few plies near to the root. So then you'd still have local TT's for most of the search tree.
What's more, by already starting the search, you're going to be doing work that's not needed, so efficiency will drop again.
I'm not impressed.
Depends. Using 10gb+ infiniband, with ultra-low latency, one can probably run this out to at least 2/3 of max depth. Not on normal 10mb/100mb/gb ethernet of course.
My approach is based on a combo as well, but with the real intent of keeping up with who searches what position (Zobrist key match) so that a given position goes to the same node when possible to take advantage of the existing hash (if it has not been totally replaced obviously).
Distributed computing is not always "the answer". Sometimes it is "the question" and "no" is the answer...