I wanted to describe an idea I had that may or may not be worth something. I am not aware of this idea being tried previously, but its simple enough that it likely has. I apologize if this is already common knowledge.
So essentially I think Lazy SMP works well in Winter, but not as well as in other engines and I am sure a lot of that has to do with things that will change in the future, such as the single bucket, always replace transposition table. This got me to think about why Lazy SMP might work as well as it does (to me it is a bit counterintuitive that it works at all) and I think a lot of it has to do with different moves being preferred as the best move with different aspiration bounds and depths, leading to helper threads getting better bounds and refutations on these alternative moves.
This gave me the idea to feed helper threads random subsets of the root moves. This directly guarantees some helper threads will focus on seemingly suboptimal moves, which may turn out to be the best, should our main line suddenly fail low at some high depth. My original implementation relies on my current SMP scheme if there are 10 or fewer legal moves at the root. If there are more than that, then I order the helper threads in pairs and give each half of a pair half the root moves. Each pair of threads gets a different random permutation of root moves and in my current implementation the threads do no depth skipping or reordering of moves. I call this Lazy SMP variant Lazy Ignorance SMP, as the helper threads ignore moves and the negative connotations of lazy and ignorance seem to fit well =)
In my initial tests with 4 helper threads running on slow cores, but a rather long /80+0.8 TC I saw a gain of roughly 10 +-7 Elo over my regular Lazy SMP implementation. After that I started testing with 14 threads (main thread + 13 helpers). Here the difference to my Lazy SMP is currently on -3 +-9. In other words, we are still within margin of error, but perhaps the idea does not scale as well to more cores as we would like. There are lots of ways one could improve on this and perhaps combine it with other Lazy SMP approaches. One idea is to rely on Wasp and Ethereals depth skipping scheme, but only for threads sharing the same primary root move. Another idea is to have some threads use Lazy Ignorance SMP whereas other threads rely on other Lazy SMP schemes. Finally, this was just an initial implementation, maybe it was incorrect to limit ourselves to positions with more than 10 root moves, or perhaps we are missing some condition? Maybe we should not split the root moves into 2 equal halves, but only ignore a quarter of moves or perhaps split the list into quarters or thirds?
I think it might be difficult to replace SF's SMP with this scheme directly, as SF seems to have some highly optimized stuff going on, but I might try an openbench patch for Ethereal.
Lazy Ignorance SMP
Moderator: Ras
-
jorose
- Posts: 388
- Joined: Thu Jan 22, 2015 3:21 pm
- Location: Zurich, Switzerland
- Full name: Jonathan Rosenthal
Lazy Ignorance SMP
-Jonathan
-
smatovic
- Posts: 3600
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Lazy Ignorance SMP
Running more than 32 threads I suggest to test an idea i call RMO, randomized move order:jorose wrote: ↑Mon Jan 21, 2019 1:25 pm
...
This gave me the idea to feed helper threads random subsets of the root moves. This directly guarantees some helper threads will focus on seemingly suboptimal moves, which may turn out to be the best, should our main line suddenly fail low at some high depth. My original implementation relies on my current SMP scheme if there are 10 or fewer legal moves at the root. If there are more than that, then I order the helper threads in pairs and give each half of a pair half the root moves. Each pair of threads gets a different random permutation of root moves and in my current implementation the threads do no depth skipping or reordering of moves. I call this Lazy SMP variant Lazy Ignorance SMP, as the helper threads ignore moves and the negative connotations of lazy and ignorance seem to fit well =)
...
- root thread performs a normal AB search
- helper threads pick a random move after Nullmove, TT-move, IID-move etc.
- search terminates when any thread finishes its search
Communication between threads via TT-Move and TT-Scores of shared hash table.
I implemented it on gpu, the first 64 threads use an ABDADA implementation, above RMO:
https://github.com/smatovic/Zeta/blob/m ... esults.txt
--
Srdja