Page 10 of 12
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
Posted: Sun Jan 05, 2020 10:22 pm
by Daniel Shawul
Michel wrote: ↑Sun Jan 05, 2020 9:58 pm
petero2 wrote: ↑Sun Jan 05, 2020 9:39 pm
Michel wrote: ↑Sun Jan 05, 2020 12:52 pm
Code: Select all
// Iterative deepening loop until requested to stop or the target depth is reached
while ( (rootDepth += ONE_PLY) < DEPTH_MAX
&& !Threads.stop
&& !(Limits.depth && mainThread && rootDepth / ONE_PLY > Limits.depth))
Perhaps this is the culprit? If I read it correctly then the depth limit is only respected in mainThread.
Yes it is easy to verify using print statements that the depth limit is not respected in fixed depth searches for threads other than the main thread. Fixing this is easy though. Just remove "&& mainThread".
With latest Stockfish from git I get +314 elo for 8 threads vs 1 thread at fixed depth 8. After fixing the above problem I instead get +251 elo. At fixed depth 14 I get +172 elo:
Code: Select all
d elo win draw loss
8+ 314 3501 654 301
8 251 2266 576 314
14 172 1617 1380 169
Wow. So it seems there is indeed substantial widening in SF. As I understand it SF is actually using SHT now (like Toga II) and not Lazy as it is commonly understood.
It appears the amount of widening varies wildly from engine to engine. But there does not seem to be a lot of freedom in implementing SHT. So perhaps in the end there is a mystery....
So there is a bug AND widening wow. I rerun my fixed-depth constrained matche again after fixing the bug,
and now I get acceptable results that can be explained by hyper-threading i think.
Code: Select all
Score of stockfish-8-thread vs stockfish-1-thread: 852 - 695 - 533 [0.538] 2080
My previous result was
Code: Select all
Score of stockfish-8-thread vs stockfish-1-thread: 1113 - 467 - 497 [0.656] 2077
Lets now nail down the real causes -- thread local stuff that could change move ordering a lot ?
So if Stockfish is now SHT, it means there is nothing unique about LazySMP then.
Also, the fact that you don't get that much benefit with the constrained match BUT you do get significant benefit
when threads are running in parallel (non-constrained match) suggests dynamic tables being the culprit.
Edit My result after the fix is not as great as Peter's but it is still there
Code: Select all
Score of stockfish-8-thread vs stockfish-1-thread: 3629 - 1130 - 1380 [0.704] 6139
+151 elo compared to +251 of Peter.
Stockfish git hash I used is from Jul 14, 2019: commit 7090d2561ae9ce3803bbc04319c4c93f
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
Posted: Sun Jan 05, 2020 11:36 pm
by petero2
Daniel Shawul wrote: ↑Sun Jan 05, 2020 10:22 pm
Edit My result after the fix is not as great as Peter's but it is still there
Code: Select all
Score of stockfish-8-thread vs stockfish-1-thread: 3629 - 1130 - 1380 [0.704] 6139
+151 elo compared to +251 of Peter.
Stockfish git hash I used is from Jul 14, 2019: commit 7090d2561ae9ce3803bbc04319c4c93f
I tried that version and got +168 elo after 3198 games (w=1972, d=689, l=537), so quite a bit less than the +251 I got using the latest version. I guess the difference between 151 and 168 can be explained by error margins and different opening books.
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
Posted: Sun Jan 05, 2020 11:42 pm
by elcabesa
sorry, what do you mean by SHT? Shared transpositio table?
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
Posted: Mon Jan 06, 2020 12:38 am
by lucasart
elcabesa wrote: ↑Sun Jan 05, 2020 11:42 pm
sorry, what do you mean by SHT? Shared transpositio table?
SHT and Lazy are the same thing.
SHT is the real name, dating back from times immemorial:
https://www.chessprogramming.org/Shared_Hash_Table
Lazy is just a silly name coined in this forum a few years ago, and was refering to a completely misguided idea, that certainly doesn't work or make any sense:
http://www.talkchess.com/forum3/viewtopic.php?t=46597
But the term "lazy" just stuck, becase people like it. Everyone wants to be lazy, and get SMP working in as little effort as possible, without spending weeks of work coding and debugging, an incomprehensible tree splitting algorithm.
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
Posted: Mon Jan 06, 2020 2:08 am
by Daniel Shawul
petero2 wrote: ↑Sun Jan 05, 2020 11:36 pm
Daniel Shawul wrote: ↑Sun Jan 05, 2020 10:22 pm
Edit My result after the fix is not as great as Peter's but it is still there
Code: Select all
Score of stockfish-8-thread vs stockfish-1-thread: 3629 - 1130 - 1380 [0.704] 6139
+151 elo compared to +251 of Peter.
Stockfish git hash I used is from Jul 14, 2019: commit 7090d2561ae9ce3803bbc04319c4c93f
I tried that version and got +168 elo after 3198 games (w=1972, d=689, l=537), so quite a bit less than the +251 I got using the latest version. I guess the difference between 151 and 168 can be explained by error margins and different opening books.
Yeah, i think so too. Why this huge jump from version to version is anybody's guess. I quickly tested the history/refutation idea by disabling update_quiet_stats and update_capture_stats in stockfish. Stockfish made further gains without using history/refutation table updates.
But I must admit that I don't know what I did by putting a return statement at the top these two subroutines (could have side effects I didn't anticipate)
Code: Select all
Score of stockfish-8-thread vs stockfish-1-thread: 5534 - 1209 - 1257 [0.770] 8000
I got access to 20-core machine and run a 1 min search for my YBW and ABDADA, and ABDADA seems to go as deep if not deeper than YBW.
It is just one position though but given that I don't observe any widening in my code, it is understandable that my ABDADA goes deeper.
YBW 12mnps and depth 28
Code: Select all
28 22 4004 406735355 Ng1-f3 Ng8-f6 d2-d4 e7-e6 c2-c4 d7-d5 c4xd5 e6xd5 Nb1-c3 Nb8-c6 a2-a3 h7-h6 e2-e3 Bc8-f5 Bf1-e2 Bf8-d6 Ke1-g1 Ke8-g8 b2-b3 Rf8-e8 Bc1-b2 Qd8-d7 Be2-d3 a7-a6 h2-h3 Qd7-e6 Bd3xf5 Qe6xf5
# splits = 594927 badsplits = 44137 egbb_probes = 0
# nodes = 706700617 <68 qnodes> time = 60040ms nps = 11770496 eps = 12031188 nneps = 0
ABDADA 18mnps and depth 29
Code: Select all
29 31 5487 706625383 d2-d4 Ng8-f6 c2-c4 e7-e6 Nb1-c3 d7-d5 Bc1-g5 Bf8-e7 c4xd5 e6xd5 e2-e3 Ke8-g8 Ng1-f3 h7-h6 Bg5-h4 Nb8-c6 a2-a3 Bc8-f5 Bf1-d3 Nf6-e4 Bh4xe7 Nc6xe7 Ke1-g1 Ne4xc3 b2xc3 Bf5xd3 Qd1xd3 Ne7-c6 Qd3-c2
# splits = 0 badsplits = 0 egbb_probes = 0
# nodes = 1085996479 <66 qnodes> time = 60055ms nps = 18083362 eps = 17143516 nneps = 0
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
Posted: Mon Jan 06, 2020 6:05 am
by Alayan
Daniel Shawul wrote: ↑Sun Jan 05, 2020 5:24 pm
Thank you for the confirmation! People here, a main stockfish author, has said both 8-threads and 1-thread will search a similar search tree at a fixed depth meaning there is no "widening".
You misunderstood me.
I wrote than 8-threads @ fixed depth running on a single-core and 8-threads @ fixed depth running on 8 cores will have a similar search tree, though with some differences introduced by no actual concurrent execution happening. Running fixed-depth means that the time difference to reach the given depth depending on how many cores you use is secondary.
Also, while I contributed to some SF patches and testing environment changes, I wouldn't call myself a "main stockfish author", there are many people who did much much more.
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
Posted: Mon Jan 06, 2020 5:11 pm
by petero2
Daniel Shawul wrote: ↑Sun Jan 05, 2020 9:08 pm
Ok I have made history and refutation tables thread-local now but it seems there is no change at least for depth=8 will test 14 later.
Code: Select all
Score of scorpio-sht-8-thread vs scorpio-sht-1-thread: 66 - 65 - 68 [0.503] 199
My changes are very simple: make the tables local by uncommenting static, and clear history tables for all threads at the start.
Code: Select all
/*static*/ CACHE_ALIGN int history[14][64];
/*static*/ CACHE_ALIGN MOVE refutation[14][64];
@Peter, if you can a test where you make these tables global (opposite of what i did) in the meantime, we can know for sure
it is coming from history and refutation being local ...
I made my History::ht array static so that all History objects share the same state. I then re-ran the fixed depth 10 test for 8 threads vs 1 thread, first with my standard search, then after disabling the ABDADA-like part. I got:
Code: Select all
type elo win draw loss
abdada 19.7 805 1604 633
lazy 15.7 816 1520 680
Earlier (i.e. with thread local history table) I got at fixed depth 10:
Code: Select all
type elo win draw loss
abdada 42.5 1701 2808 1027
lazy 50.2 962 1525 529
So a global history table removes about 50-70% of the widening in this test.
I also measured how the global history table affects playing strength by playing against unmodified Texel using 8 threads and time control 6 seconds + 0.06 seconds / move and got -48 elo (w=373, d=1975, l=806).
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
Posted: Mon Jan 06, 2020 6:05 pm
by Alayan
Vondele did an interesting experiment :
I implemented 5 versions (drafts:
https://github.com/vondele/Stockfish/co ... hreadDepth)
v1: Master version: all threads search irrespective of Limits.depth, mainThread stops search when it reaches limits depth.
v2: First thread to reach Limits.depth stops search.
v3: all threads at most Limits.depth, mainThread stops search reaching limit (simple '&& mainThread' removal)
v4: all threads search to exactly limits depth
v5: all threads search to at least limits depth (last one to reach stops search)
and had them play a depth=13 match, with 8 threads. Also in the pool: sequential runs with depth 13 ... 16. Interesting, even v2 is significantly stronger than sequential, and v1 ~ v4:
Code: Select all
Rank Name Elo +/- Games Score Draws
1 v5 141 15 1006 69.2% 48.2%
2 seq16 117 15 1005 66.2% 49.7%
3 v4 37 15 1007 55.3% 52.5%
4 v1 35 15 1008 55.0% 52.8%
5 seq15 23 15 1008 53.3% 53.9%
6 v3 22 15 1007 53.2% 53.3%
7 seq14 -86 15 1005 37.8% 49.2%
8 v2 -92 15 1008 37.1% 47.8%
9 seq13 -211 18 1006 22.9% 36.1%
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
Posted: Mon Jan 06, 2020 8:30 pm
by Daniel Shawul
petero2 wrote: ↑Mon Jan 06, 2020 5:11 pm
Daniel Shawul wrote: ↑Sun Jan 05, 2020 9:08 pm
Ok I have made history and refutation tables thread-local now but it seems there is no change at least for depth=8 will test 14 later.
Code: Select all
Score of scorpio-sht-8-thread vs scorpio-sht-1-thread: 66 - 65 - 68 [0.503] 199
My changes are very simple: make the tables local by uncommenting static, and clear history tables for all threads at the start.
Code: Select all
/*static*/ CACHE_ALIGN int history[14][64];
/*static*/ CACHE_ALIGN MOVE refutation[14][64];
@Peter, if you can a test where you make these tables global (opposite of what i did) in the meantime, we can know for sure
it is coming from history and refutation being local ...
I made my History::ht array static so that all History objects share the same state. I then re-ran the fixed depth 10 test for 8 threads vs 1 thread, first with my standard search, then after disabling the ABDADA-like part. I got:
Code: Select all
type elo win draw loss
abdada 19.7 805 1604 633
lazy 15.7 816 1520 680
Earlier (i.e. with thread local history table) I got at fixed depth 10:
Code: Select all
type elo win draw loss
abdada 42.5 1701 2808 1027
lazy 50.2 962 1525 529
So a global history table removes about 50-70% of the widening in this test.
I also measured how the global history table affects playing strength by playing against unmodified Texel using 8 threads and time control 6 seconds + 0.06 seconds / move and got -48 elo (w=373, d=1975, l=806).
Well, it now seems clear local history has a role in widening of Texel. I don't know why it doesn't widen Scorpio's search.
I re-run with more games now. I had a thread_sleep(30ms) before which was slowing down my tests by 10x i think
Depth=8 global history/refutation
Code: Select all
Score of scorpio-sht-8-thread vs scorpio-sht-1-thread: 1115 - 980 - 906 [0.522] 3001
Depth=8 local history/refutation
Code: Select all
Score of scorpio-sht-8-thread vs scorpio-sht-1-thread: 1077 - 1051 - 874 [0.504] 3002
So it seems going to local history tables even decreases the widening for me.
Also for Stockfish that seems to be the case. I redid the experiment by removing all dynamic terms in MovePicker::score() ( also tested a scoring scheme that keeps same order moves were generated) and they both increased the widening effect.
There is something funky in Stockfish anyway with its big elo gains from widening so I will try to focus on that instead of Scorpio.
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
Posted: Mon Jan 06, 2020 8:32 pm
by Daniel Shawul
Alayan wrote: ↑Mon Jan 06, 2020 6:05 pm
Vondele did an interesting experiment :
I implemented 5 versions (drafts:
https://github.com/vondele/Stockfish/co ... hreadDepth)
v1: Master version: all threads search irrespective of Limits.depth, mainThread stops search when it reaches limits depth.
v2: First thread to reach Limits.depth stops search.
v3: all threads at most Limits.depth, mainThread stops search reaching limit (simple '&& mainThread' removal)
v4: all threads search to exactly limits depth
v5: all threads search to at least limits depth (last one to reach stops search)
and had them play a depth=13 match, with 8 threads. Also in the pool: sequential runs with depth 13 ... 16. Interesting, even v2 is significantly stronger than sequential, and v1 ~ v4:
Code: Select all
Rank Name Elo +/- Games Score Draws
1 v5 141 15 1006 69.2% 48.2%
2 seq16 117 15 1005 66.2% 49.7%
3 v4 37 15 1007 55.3% 52.5%
4 v1 35 15 1008 55.0% 52.8%
5 seq15 23 15 1008 53.3% 53.9%
6 v3 22 15 1007 53.2% 53.3%
7 seq14 -86 15 1005 37.8% 49.2%
8 v2 -92 15 1008 37.1% 47.8%
9 seq13 -211 18 1006 22.9% 36.1%
Thanks, I think v4 ( all threads searching to same depth) is what we want.
So if I read this right, stockfish widening gives it at least +2 more plies of elo i.e. v4 at depth=13 is lightly better than seq at depth=15.