RMO - Randomized Move Order - yet another Lazy SMP derivate

smatovic · Post by **smatovic** » Mon Dec 30, 2019 6:06 pm

I implemented ABDADA in an iterative way in Zeta, and it scales well, but for
folks who are too lazy for ABDADA, here is another Lazy SMP derivate - RMO.

With <= 64 threads it can not compete with ABDADA, but with >= 128 there is not
such a big difference, at least on my time to depth benchmarks.

RMO - basics:

- communication of TT move and TT scores between threads via Shared Hash Table
- all threads follow the same Iterative Deepening loop with same depth
- thread 0 performs a normal search with move ordering etc.
- threads >0 randomize the move order up to QSearch
- as soon as one thread finishes its search, all threads should terminate the
current ID iteration and start the next one

RMO - exceptions for threads>0:

There are some cases you may not want to randomize the move order:

- Null move
- TT move
- IID move
- Counter move
- Killer move

and some cases where you may want to perform a normal search:

- LMR
- IID

I have tested to randomize captures with an higher priority than quiet moves,
but have no results that justify to do so.

In general you can also combine RMO with an classic parallel search, to let the
first n threads search a classic way and the others via RMO. Maybe such an
approach will make more sense with thread count of > 128.

--
Srdja

smatovic · Post by **smatovic** » Mon Dec 30, 2019 7:09 pm

Ups, I forgot to mention the most important part of all parallel searches, search the oldest son first

So...

RMO - basics:

- communication of TT move and TT scores between threads via Shared Hash Table
- all threads follow the same Iterative Deepening loop with same depth
- thread 0 performs a normal search with move ordering etc.
- threads >0 perform a normal search for oldest son first
- threads >0 randomize the move order up to QSearch for other siblings
- as soon as one thread finishes its search, all threads should terminate the
current ID iteration and start the next one

--
Srdja

lucasart · Post by **lucasart** » Tue Dec 31, 2019 2:00 am

smatovic wrote: ↑Mon Dec 30, 2019 6:06 pm I implemented ABDADA in an iterative way in Zeta, and it scales well, but for
folks who are too lazy for ABDADA, here is another Lazy SMP derivate - RMO.

With <= 64 threads it can not compete with ABDADA, but with >= 128 there is not
such a big difference, at least on my time to depth benchmarks.

RMO - basics:

- communication of TT move and TT scores between threads via Shared Hash Table
- all threads follow the same Iterative Deepening loop with same depth
- thread 0 performs a normal search with move ordering etc.
- threads >0 randomize the move order up to QSearch
- as soon as one thread finishes its search, all threads should terminate the
current ID iteration and start the next one

RMO - exceptions for threads>0:

There are some cases you may not want to randomize the move order:

- Null move
- TT move
- IID move
- Counter move
- Killer move

and some cases where you may want to perform a normal search:

- LMR
- IID

I have tested to randomize captures with an higher priority than quiet moves,
but have no results that justify to do so.

In general you can also combine RMO with an classic parallel search, to let the
first n threads search a classic way and the others via RMO. Maybe such an
approach will make more sense with thread count of > 128.

--
Srdja

ABDADA is not the right benchmark. Compared to lazy SMP, it brings nothing to the table, other than polluting the code base. What scales best in my testing (up to 32 threads thanks to OpenBench) is lazy SMP, with just this simple trick:

if a thread is about to run depth=d, first check what others are doing, and if half (or more) of the threads are running at depths >= d, then skip depth=d (rule is iterative, continue until a suitable depth is found to work on).

So it's just a loosely synchronized way of saying that we want (approx.) half threads on some depth d, and half on some depth d+1. There may be some overshoots, and some threads still finishing an obsolete (already completed by another thread) iteration, but that's ok.

I've tried (many) more complicated approaches, such as:
* strict enforcement of the d/d+1 balancing, by immediately kicking threads running obsolete iterations d'<d when an iteration d is completed by one thread).
* different load ratios (instead of just half-half).
* spreading across more than 2 depths, and playing with load ratios.

PS: In my testing, I've also found IID and killers to be useless.

Edit: I have not tried yet the "Bread Crumbs" approach introduced by vondele in Stockfish. This looks promising, and much cleaner than ABDADA.

Daniel Shawul · Post by **Daniel Shawul** » Tue Dec 31, 2019 2:35 am

ABDADA is a proper algorithm though -- and it is hardly more complicated than lazy smp. If I try to figure out how lazy smp works with a pencil and paper, i have difficulty justifying it other than for the fact that one thread could provide information for another. On the other hand, with ABDADA, you can see how it does YBW, how it distributes work between workers by assigning different moves etc ... Lazy smp relies on threads searching at different depths for this. Oh well if it works it works ...

lucasart · Post by **lucasart** » Tue Dec 31, 2019 3:18 am

Daniel Shawul wrote: ↑Tue Dec 31, 2019 2:35 am ABDADA is a proper algorithm though -- and it is hardly more complicated than lazy smp. If I try to figure out how lazy smp works with a pencil and paper, i have difficulty justifying it other than for the fact that one thread could provide information for another. On the other hand, with ABDADA, you can see how it does YBW, how it distributes work between workers by assigning different moves etc ... Lazy smp relies on threads searching at different depths for this. Oh well if it works it works ...

Indeed it is hard to explain with a pencil and paper why lazy SMP is so strong. And that is why people from academia (like Bob and you) have such a hard time accepting it.

The first clue is to observe how TTD and ELO evolve. In TTD, lazy is not impressive, and on a few threads YBW will be better using this (wrong) metric. But lazy beats YBW in elo while losing in TTD.

This basically means that your whole world of pencil and paper is an illusion. Perhaps a big part of reason is the chaotic nature of modern search (with all the path dependt effects like history ordering and its effect on lmr etc.). That and the chaotic nature of thread scheduling, becoming chaos^2, a chaos feedback loop. Lazy SMP feeds on chaos, which increases with more search time and more threads, and that's why it beats the hell out of YBW on many threads.

I know, chaos sounds a like a bullshit smoke screen for what we don't understand. But sometimes it's the right way to think about a problem. Look at meteorology. They don't try to prove whether it will rain tomorrow by using pencil and paper anymore. They run huge simulations and just accept the Oracle's results as giving them confidence intervals, and forget the idea of trying to understand why.

Daniel Shawul · Post by **Daniel Shawul** » Tue Dec 31, 2019 4:06 am

lucasart wrote: ↑Tue Dec 31, 2019 3:18 am I know, chaos sounds a like a bullshit smoke screen for what we don't understand. But sometimes it's the right way to think about a problem. Look at meteorology. They don't try to prove whether it will rain tomorrow by using pencil and paper anymore. They run huge simulations and just accept the Oracle's results as giving them confidence intervals, and forget the idea of trying to understand why.

This is so wrong I feel obliged to enlighten you a bit

Nobody in their right mind trusts simulation results without some kind of validation. If a forecast says 12-inch of snow tomorrow
but we only get 1-inch, you know something is wrong. If a hurricane heads northwest, when the forecast says westwards, same thing...
Snow, hail, land-surface, and other physics modeling is hard and you can easily produce garbage results with the wrong type of model.
In fact, most of meteorology is data-driven, and accurate forecasts depend on presences of high-resolution radar/satellite data for initial
conditions. Weather forecasts beyond a couple of days are known to produce garbage results, so you need to assimilate observations as they come in every couple of hours. To counter the "butterfly effect" ( a butterfly in south-America flapping its wings can cause a tornado in europe),
you do an ensemble of simulations and average results. However, the dynamical core is still an important part of weather prediction ...

Michel · Post by **Michel** » Tue Dec 31, 2019 10:26 am

Daniel Shawul wrote: ↑Tue Dec 31, 2019 4:06 am
lucasart wrote: ↑Tue Dec 31, 2019 3:18 am I know, chaos sounds a like a bullshit smoke screen for what we don't understand. But sometimes it's the right way to think about a problem. Look at meteorology. They don't try to prove whether it will rain tomorrow by using pencil and paper anymore. They run huge simulations and just accept the Oracle's results as giving them confidence intervals, and forget the idea of trying to understand why.
This is so wrong I feel obliged to enlighten you a bit
Nobody in their right mind trusts simulation results without some kind of validation. If a forecast says 12-inch of snow tomorrow
but we only get 1-inch, you know something is wrong. If a hurricane heads northwest, when the forecast says westwards, same thing...
Snow, hail, land-surface, and other physics modeling is hard and you can easily produce garbage results with the wrong type of model.
In fact, most of meteorology is data-driven, and accurate forecasts depend on presences of high-resolution radar/satellite data for initial
conditions. Weather forecasts beyond a couple of days are known to produce garbage results, so you need to assimilate observations as they come in every couple of hours. To counter the "butterfly effect" ( a butterfly in south-America flapping its wings can cause a tornado in europe),
you do an ensemble of simulations and average results. However, the dynamical core is still an important part of weather prediction ...

Yes this is simply the "Scientific Method". Making models and validating them by experiment. If there is no match then you adapt the model but as the subject evolves the required adaptations become smaller and smaller (e.g. the passage from Newtonian mechanics to General relativity is almost unobservable).

This is something Lucas and Marco Costalba do not understand. In their mind you cannot even use 2+2=4. You have to validate it by experiment _every time_ you use it since the universe might have changed since the last time you used it.

AndrewGrant · Post by **AndrewGrant** » Tue Dec 31, 2019 2:26 pm

This basically means that your whole world of pencil and paper is an illusion. Perhaps a big part of reason is the chaotic nature of modern search (with all the path dependt effects like history ordering and its effect on lmr etc.). That and the chaotic nature of thread scheduling, becoming chaos^2, a chaos feedback loop. Lazy SMP feeds on chaos, which increases with more search time and more threads, and that's why it beats the hell out of YBW on many threads.

Apart from the "This basically means that your whole world of pencil and paper is an illusion" part, does anyone really disagree with this rationale? Its one of the better justifications for why Lazy SMP is king.

Michel · Post by **Michel** » Tue Dec 31, 2019 3:23 pm

AndrewGrant wrote: ↑Tue Dec 31, 2019 2:26 pm
This basically means that your whole world of pencil and paper is an illusion. Perhaps a big part of reason is the chaotic nature of modern search (with all the path dependt effects like history ordering and its effect on lmr etc.). That and the chaotic nature of thread scheduling, becoming chaos^2, a chaos feedback loop. Lazy SMP feeds on chaos, which increases with more search time and more threads, and that's why it beats the hell out of YBW on many threads.
Apart from the "This basically means that your whole world of pencil and paper is an illusion" part, does anyone really disagree with this rationale? Its one of the better justifications for why Lazy SMP is king.

Well at least the claim is somewhat testable. One may ask how lazy SMP performs on a primitive engine. I don't think anyone would care to do the experiment but there is one useful data point.

Toga II's lazy SMP (it was called shared hash in those days) scaled similarly as YBW engines on 4 threads. Toga II had (untuned) history, but no other modern statistical search enhancement. But even for such a primitive engine lazy SMP apparently also worked fine.

Lazy SMP versus YBW is simply a trade-off. Lazy SMP is much more lightweight than YBW. So lightweight that it scales almost linearly with the number of threads. It turns out that this is enough to compensate for YBW's more organized but also much more overhead-laden search.

It is similar to the trade-off eval versus search. Modern A/B engine make the trade-off "more search versus less eval". These engines are so successful that for years it has discouraged research into other approaches.

But now thanks to DeepMind we finally know that a different trade-off is also possible. So who knows what will happen in the future with regard to SMP search...

smatovic · Post by **smatovic** » Tue Dec 31, 2019 4:00 pm

AndrewGrant wrote: ↑Tue Dec 31, 2019 2:26 pm
This basically means that your whole world of pencil and paper is an illusion. Perhaps a big part of reason is the chaotic nature of modern search (with all the path dependt effects like history ordering and its effect on lmr etc.). That and the chaotic nature of thread scheduling, becoming chaos^2, a chaos feedback loop. Lazy SMP feeds on chaos, which increases with more search time and more threads, and that's why it beats the hell out of YBW on many threads.
Apart from the "This basically means that your whole world of pencil and paper is an illusion" part, does anyone really disagree with this rationale? Its one of the better justifications for why Lazy SMP is king.

What is an Elo increase worth if you can not figure out why it worked out?

--
Srdja

RMO - Randomized Move Order - yet another Lazy SMP derivate

RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate