Parallel search: is deterministic or not?

Cardoso · Post by **Cardoso** » Sun Aug 29, 2010 2:50 pm

Hi, every time I restart my engine I get slightily diiferent node counts for the same position with a fixed deep search.
Is it a bug or multi core (4 threads) engines are non deterministic?

best regards,
Alvaro

Michael Sherwin · Post by **Michael Sherwin** » Sun Aug 29, 2010 4:12 pm

Logically, if cutoffs in one thread are determined by what happens in another thread and system interrupts can interrupt any thread at any time then the cutoffs can be different in every run.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Sun Aug 29, 2010 6:08 pm

Depends entirely on the parallel algorithm you used.

rbarreira · Post by **rbarreira** » Mon Aug 30, 2010 1:25 am

If you do even something as simple as sharing a transposition table between threads, that should already be enough to get a lot of non-determinism.

bob · Post by **bob** » Mon Aug 30, 2010 2:47 am

Cardoso wrote:Hi, every time I restart my engine I get slightily diiferent node counts for the same position with a fixed deep search.
Is it a bug or multi core (4 threads) engines are non deterministic?

best regards,
Alvaro

Absolutely normal. The nodes can vary. The score can change. And at times you will even get a completely different best move, although this is less common than the first two.

bob · Post by **bob** » Mon Aug 30, 2010 8:37 pm

rbarreira wrote:If you do even something as simple as sharing a transposition table between threads, that should already be enough to get a lot of non-determinism.

Or if you have history counters for anything. Or use killer moves. And then there are issues caused by different move ordering since you can't control who does what, when, in a parallel search.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Mon Aug 30, 2010 8:59 pm

bob wrote:
rbarreira wrote:If you do even something as simple as sharing a transposition table between threads, that should already be enough to get a lot of non-determinism.
Or if you have history counters for anything. Or use killer moves.

Only if they're shared between threads.

Anyway, I won't argue the most efficient implementations of parallel alpha-beta search are going to be non-deterministic, but it's still determined by the choice of parametrization algorithm.

bob · Post by **bob** » Mon Aug 30, 2010 11:27 pm

Gian-Carlo Pascutto wrote:
bob wrote:
rbarreira wrote:If you do even something as simple as sharing a transposition table between threads, that should already be enough to get a lot of non-determinism.
Or if you have history counters for anything. Or use killer moves.
Only if they're shared between threads.

Not quite. They don't have to be shared between threads. Since each thread will search a different set of sub-trees each time the same position is run, the counters will change, while for the single-cpu program, the counters will be identical at each point in the tree, run-to-run.

Anyway, I won't argue the most efficient implementations of parallel alpha-beta search are going to be non-deterministic, but it's still determined by the choice of parametrization algorithm.

I am not sure how. Searching things in parallel, at the very least, leads to a different tree due to searching things in a different order (we even see super-linear speedups here and there because of this). In a parallel algorithm, one of the first questions you ask when deciding how to do the parallel implementation is "if I have this big loop, and I execute each iteration in a different order, will the answers be correct?" If yes, that's a good loop to parallelize, if no, then find another way. In chess, I suppose one could do a really deterministic parallel search, if one were willing to suffer greatly reduced performance. I once gave a talk at Los Alamos Lab and a 1 hour talk turned into a 9 hour discussion, with some _really_ bright people trying to figure out an efficient and deterministic parallel algorithm for chess. Conclusion was deterministic behaviour is possible, but not in conjunction with efficiency. Or vice versa, where efficiency is possible, but not in conjunction with deterministic behaviour.

It doesn't take much of a change in a search tree to _really_ see a cascading effect that greatly influences the final result. Just a quick cutoff that normally would be found after searching the first 5 moves, but now comes first and cuts off those 5 moves before they reach critical points that will help later.

Even if you go to a completely distributed algorithm, with nothing shared at all, the shape of the tree will be changed due to the differences in the order things are searched in. Even factoring out such things as history counters, killer moves and transposition table entries. Trying to avoid that seems like a significant waste of time. It is easy enough to make a parallel search inefficient without going to excessive synchronization.

Gian-Carlo Pascutto · Tue Aug 31, 2010 12:33 pm

bob wrote: Not quite. They don't have to be shared between threads. Since each thread will search a different set of sub-trees each time the same position is run, the counters will change, while for the single-cpu program, the counters will be identical at each point in the tree, run-to-run.

You are confusing cause and effect here. The history counters will be different because (in your assumption!) the trees searched are not the same. But the history counters don't have to be the *cause* of that, if they are not in shared memory.

So, having history counters or not has nothing to do with the algorithm being deterministic.

bob · Post by **bob** » Tue Aug 31, 2010 5:50 pm

Gian-Carlo Pascutto wrote:
bob wrote: Not quite. They don't have to be shared between threads. Since each thread will search a different set of sub-trees each time the same position is run, the counters will change, while for the single-cpu program, the counters will be identical at each point in the tree, run-to-run.
You are confusing cause and effect here. The history counters will be different because (in your assumption!) the trees searched are not the same. But the history counters don't have to be the *cause* of that, if they are not in shared memory.

So, having history counters or not has nothing to do with the algorithm being deterministic.

Let's go over this again. At any point in the tree, if I have history counters, the values they contain will influence various decisions that affect the shape of the tree, correct? They might allow or disallow pruning. They might affect the order moves are searched if you use the "history heuristic" for move ordering.

Whether you share them or have a separate copy for each thread/process/node, how can you have the same history counters each time you search the same position? Each thread/process/node will search a different sub-tree each time you run the same position, since split points happen at effectively random times due to timing inconsistencies. So whether you share the values or not, it is extremely unlikely that when you search position P on thread N, that you have a set of history counters C that will be the same the next time you decide to run this test on the same parallel architecture. Can you be certain that by the time you reach position P on thread N, that the counters will be exactly C every time? Has thread N searched _exactly_ the same sub-trees that it searched the last time? With _exactly_ the same bounds so that all those sub-trees updated the history counters in exactly the same way?

What happens when one thread gets a fail-high and you have to stop all the other threads that are searching to help? Can you stop them at _exactly_ the same node each and every time? Or will a slight delay on your thread allow one of those "helpers" to search a couple of extra nodes thereby changing one or more history counters?

I don't see how you'd ever get anything but non-deterministic behaviour out of a search that uses any data that can be changed as the search progresses.

In the case of hash tables, are you always going to search the same sub-tree on the same processor, so that it always has the same hash table with the same entries? This would require searching the tree in a completely deterministic way, somehow factoring out the timing perturbations that would cause different sub-trees to default to different threads each time the same position is searched.

This is not really about shared data being updated in a "racy" way, it is about local data not being updated the same way if different sub-trees are searched using that data each time.

The problem is, that to search position P and get exactly the same result each time, when you start on P you must have the identical same C you had the last time (history counter values). How can that ever happen, much less happen each and every time?

Parallel search: is deterministic or not?

Parallel search: is deterministic or not?

Re: Parallel search: is deterministic or not?

Re: Parallel search: is deterministic or not?

Re: Parallel search: is deterministic or not?

Re: Parallel search: is deterministic or not?

Re: Parallel search: is deterministic or not?

Re: Parallel search: is deterministic or not?

Re: Parallel search: is deterministic or not?

Re: Parallel search: is deterministic or not?

Re: Parallel search: is deterministic or not?