RMO - Randomized Move Order - yet another Lazy SMP derivate

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Michel
Posts: 2080
Joined: Sun Sep 28, 2008 11:50 pm

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by Michel » Sun Jan 05, 2020 11:52 am

Code: Select all

  // Iterative deepening loop until requested to stop or the target depth is reached
  while (   (rootDepth += ONE_PLY) < DEPTH_MAX
         && !Threads.stop
         && !(Limits.depth && mainThread && rootDepth / ONE_PLY > Limits.depth))
Perhaps this is the culprit? If I read it correctly then the depth limit is only respected in mainThread.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

Michel
Posts: 2080
Joined: Sun Sep 28, 2008 11:50 pm

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by Michel » Sun Jan 05, 2020 12:01 pm

petero2 wrote:I modified this logic to make it never exceed the fixed depth limit and ran 8 threads vs 1 thread again at fixed depth 10. The result was +42.5 elo after 5536 games.
Thanks. Interesting! This points indeed to a bit of widening. But I would argue that the elo difference is quite small for 3 thread doublings.

Also this doesn't say that the widening is actually beneficial in timed search of course... Maybe the widening is just a necessary evil resulting from the choice of separate history tables to reduce contention (as you suggest).
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

Daniel Shawul
Posts: 3809
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by Daniel Shawul » Sun Jan 05, 2020 4:24 pm

Alayan wrote:
Sun Jan 05, 2020 8:42 am
Daniel Shawul wrote:
Sun Jan 05, 2020 1:27 am
Peter, have you seen my latest results. Stockfish mulit-threaded (8-threads) even when constrained to single-core thrashes 1-thread stockfish.
There is clearly a "bug" in fixed depth tests with LazySMP, because it happens to sneak to depth d+1 is the current most likely theory.
You ran fixed depth games. Stockfish with 8 threads on 1 core will take much more time than Stockfish with 8 cores to reach the given depth, but will have a rather similar search tree in the end. Hence, having similar results with 1 or 8 cores using fixed depth for 8 threads is expected, though some parts like breadcrumbs will not behave as expected with such a scheme.
lucasart wrote:
Sun Jan 05, 2020 6:02 am
SF uses skip depth scheme, so the reported depth is misleading and massively understated.
This is wrong. Depth skipping has been removed from Stockfish over 9 months ago, because it scales awfully at longer TCs. See https://github.com/official-stockfish/S ... /pull/1972

Current Stockfish doesn't do it.
Thank you for the confirmation! People here, a main stockfish author, has said both 8-threads and 1-thread will search a similar search tree at a fixed depth meaning there is no "widening".

Daniel Shawul
Posts: 3809
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by Daniel Shawul » Sun Jan 05, 2020 4:36 pm

petero2 wrote:
Sun Jan 05, 2020 10:55 am
Daniel Shawul wrote:
Sun Jan 05, 2020 1:27 am
Peter, have you seen my latest results.
...
You said it is a mix of ABDADA + Lazy so that d/d+1 is a potential problem ...
My implementation does not use the classical d/d+1 scheme:
petero2 wrote:
Sun Aug 13, 2017 7:52 pm
* I disabled the lazy SMP "depth + 1" trick, as I don't know if that makes sense in combination with ABDADA. Disabling it in a pure lazy SMP setting seemed to cost around 5 elo at most, so I hope this is not a big deal.
I do a different kind of extended depth logic though:
petero2 wrote:
Sun Aug 06, 2017 9:15 pm
When a helper thread finishes its search it will send the search result to the master thread. The helper thread will then restart its current search at a one ply deeper depth. It is expected that this search will be aborted rather quickly. As soon as the master thread receives the helper thread search result, it will continue its algorithm, which causes it to call negaScoutRoot() again with different parameters, and negaScoutRoot() tells all helper threads to start searching using the new parameters.
I modified this logic to make it never exceed the fixed depth limit and ran 8 threads vs 1 thread again at fixed depth 10. The result was +42.5 elo after 5536 games.
Daniel Shawul wrote:
Sun Jan 05, 2020 1:27 am
Also, my tests with my implementation of ABDADA give absolutely zero elo comparing 8-thread vs 1-thread at fixed depth.
I repost here the fixed depth = 14 result

Code: Select all

Score of scorpio-sht-8-thread vs scorpio-sht-1-thread: 78 - 90 - 132  [0.480] 300
Score of scorpio-abdada-8-thread vs scorpio-abdada-1-thread: 111 - 107 - 106  [0.506] 324
Score of scorpio-ybw-8-thread vs scorpio-ybw-1-thread: 92 - 94 - 114  [0.496] 300
Are you still using the same move ordering in all threads?
Daniel Shawul wrote:
Sat Aug 19, 2017 2:32 pm
1. It might be the case that lazy SMP (and SHT) works better if you have a per-thread history table. The late moves in ALL nodes are mostly ordered by the history heuristic, and if the history table is thread local, there is a decent chance that it becomes different in different threads. This would have the effect that different threads often don't search the same moves at the same time, and would therefore be able to pick up results from other threads from the transposition table.
Unfortunately, I share the history and refutation tables between threads, so the move list is searched in the same order in different threads.
If so, that might explain why you see different results compared to Stockfish and Texel.
Oh, I forgot that we had this discussion years ago. Yes, I still do share history tables and refutation.
I can believe that a different move ordering leading to different LMR reductions could be the culprit, and have been suggested many times before.
What I make sure for ABDADA is on the first and second pass I do reductions correctly if you skip some of them.
However, people do use dynamic tables per thread ( i do eval/pawn hash table for instance) that could lead to different move orders.
BUT shouldn't these things affect all algorithms, YBW/ABDADA/SHT equally then? Especially ABDADA should have similar behaviour as YBW
since it does YBW too.

Daniel Shawul
Posts: 3809
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by Daniel Shawul » Sun Jan 05, 2020 4:40 pm

Michel wrote:
Sun Jan 05, 2020 12:01 pm
petero2 wrote:I modified this logic to make it never exceed the fixed depth limit and ran 8 threads vs 1 thread again at fixed depth 10. The result was +42.5 elo after 5536 games.
Thanks. Interesting! This points indeed to a bit of widening. But I would argue that the elo difference is quite small for 3 thread doublings.

Also this doesn't say that the widening is actually beneficial in timed search of course... Maybe the widening is just a necessary evil resulting from the choice of separate history tables to reduce contention (as you suggest).
So it means if someone decides to share a history table or refutation, the "widening effect" goes away -- like I what am observing in my engine.
It also means YBW/ABDADA may exhibit similar behavior not only SHT or Lazy.
I will try to reproduce Peter's results by using separate history tables ...

petero2
Posts: 594
Joined: Mon Apr 19, 2010 5:07 pm
Location: Sweden
Contact:

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by petero2 » Sun Jan 05, 2020 5:56 pm

Michel wrote:
Sun Jan 05, 2020 12:01 pm
petero2 wrote:I modified this logic to make it never exceed the fixed depth limit and ran 8 threads vs 1 thread again at fixed depth 10. The result was +42.5 elo after 5536 games.
Thanks. Interesting! This points indeed to a bit of widening. But I would argue that the elo difference is quite small for 3 thread doublings.
Yes I think so too. According to earlier tests 8 threads vs 1 thread gained about 220 elo in timed games, even though those tests used more CPU time per move on average.

I repeated my 8 thread vs 1 threads fixed depth test at depth 14 and got 27.6 elo. The difference compared to d=10 might mostly be caused by increased draw rate though:

Code: Select all

d     elo    win   draw  loss
10    42.5   1701  2808  1027
14    27.6    648  1948   410
For reference I also played fixed depth 11 vs fixed depth 10 using 1 thread and got:

Code: Select all

d     elo    win   draw  loss
11-10 134.9  1879  1749   392

User avatar
Ronald
Posts: 95
Joined: Tue Jan 23, 2018 9:18 am
Location: Rotterdam
Full name: Ronald Friederich
Contact:

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by Ronald » Sun Jan 05, 2020 6:14 pm

I get different results with rofChade. rofChade is using different depths, but I modified it so that no thread exceeds the fixed depth. I ran a test for fixed depth 15, with 128Mb Hash.

Code: Select all

Score of 1 thread vs 2 threads: 39 - 123 - 338 [0.416]
Elo difference: -58.93 +/- 17.03

Score of 1 thread vs 4 threads: 29 - 162 - 309 [0.367]
Elo difference: -94.70 +/- 18.28
1 thread vs 8 threads is still running and scores -150 after 100 games.

'

Daniel Shawul
Posts: 3809
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by Daniel Shawul » Sun Jan 05, 2020 8:08 pm

Daniel Shawul wrote:
Sun Jan 05, 2020 4:40 pm
Michel wrote:
Sun Jan 05, 2020 12:01 pm
petero2 wrote:I modified this logic to make it never exceed the fixed depth limit and ran 8 threads vs 1 thread again at fixed depth 10. The result was +42.5 elo after 5536 games.
Thanks. Interesting! This points indeed to a bit of widening. But I would argue that the Elo difference is quite small for 3 thread doublings.

Also, this doesn't say that the widening is actually beneficial in timed search of course... Maybe the widening is just a necessary evil resulting from the choice of separate history tables to reduce contention (as you suggest).
So it means if someone decides to share a history table or refutation, the "widening effect" goes away -- like I what am observing in my engine.
It also means YBW/ABDADA may exhibit similar behavior not only SHT or Lazy.
I will try to reproduce Peter's results by using separate history tables ...
Ok I have made history and refutation tables thread-local now but it seems there is no change at least for depth=8 will test 14 later.

Code: Select all

Score of scorpio-sht-8-thread vs scorpio-sht-1-thread: 66 - 65 - 68  [0.503] 199
My changes are very simple: make the tables local by uncommenting static, and clear history tables for all threads at the start.

Code: Select all

/*static*/ CACHE_ALIGN int history[14][64];
/*static*/ CACHE_ALIGN MOVE refutation[14][64];
@Peter, if you can a test where you make these tables global (opposite of what i did) in the meantime, we can know for sure
it is coming from history and refutation being local ...
1 thread vs 8 threads is still running and scores -150 after 100 games.
@Ronald, can you do the same tests but this time constrain the multi-threaded searches to 1-core ?

petero2
Posts: 594
Joined: Mon Apr 19, 2010 5:07 pm
Location: Sweden
Contact:

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by petero2 » Sun Jan 05, 2020 8:39 pm

Michel wrote:
Sun Jan 05, 2020 11:52 am

Code: Select all

  // Iterative deepening loop until requested to stop or the target depth is reached
  while (   (rootDepth += ONE_PLY) < DEPTH_MAX
         && !Threads.stop
         && !(Limits.depth && mainThread && rootDepth / ONE_PLY > Limits.depth))
Perhaps this is the culprit? If I read it correctly then the depth limit is only respected in mainThread.
Yes it is easy to verify using print statements that the depth limit is not respected in fixed depth searches for threads other than the main thread. Fixing this is easy though. Just remove "&& mainThread".

With latest Stockfish from git I get +314 elo for 8 threads vs 1 thread at fixed depth 8. After fixing the above problem I instead get +251 elo. At fixed depth 14 I get +172 elo:

Code: Select all

d     elo    win   draw  loss
8+    314   3501    654   301
8     251   2266    576   314
14    172   1617   1380   169

Michel
Posts: 2080
Joined: Sun Sep 28, 2008 11:50 pm

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by Michel » Sun Jan 05, 2020 8:58 pm

petero2 wrote:
Sun Jan 05, 2020 8:39 pm
Michel wrote:
Sun Jan 05, 2020 11:52 am

Code: Select all

  // Iterative deepening loop until requested to stop or the target depth is reached
  while (   (rootDepth += ONE_PLY) < DEPTH_MAX
         && !Threads.stop
         && !(Limits.depth && mainThread && rootDepth / ONE_PLY > Limits.depth))
Perhaps this is the culprit? If I read it correctly then the depth limit is only respected in mainThread.
Yes it is easy to verify using print statements that the depth limit is not respected in fixed depth searches for threads other than the main thread. Fixing this is easy though. Just remove "&& mainThread".

With latest Stockfish from git I get +314 elo for 8 threads vs 1 thread at fixed depth 8. After fixing the above problem I instead get +251 elo. At fixed depth 14 I get +172 elo:

Code: Select all

d     elo    win   draw  loss
8+    314   3501    654   301
8     251   2266    576   314
14    172   1617   1380   169
Wow. So it seems there is indeed substantial widening in SF. As I understand it SF is actually using SHT now (like Toga II) and not Lazy as it is commonly understood.

It appears the amount of widening varies wildly from engine to engine. But there does not seem to be a lot of freedom in implementing SHT. So perhaps in the end there is a mystery....
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

Post Reply