Sigh... Everyone claims "it is a mystery that Lazy works so well".
I do not see the mystery.
Occam's razor says that we should first look for the simplest possible explanation. And the simplest possible explanation is that Lazy has almost zero overhead. The overhead is so small, that (combined with a few primitive tricks to avoid that threads do too much double work) it wins against (some?) smarter work dividing algorithms that have much more overhead.
As far as I know, no one has discredited this explanation.
RMO - Randomized Move Order - yet another Lazy SMP derivate
Moderators: hgm, Rebel, chrisw
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 201
- Joined: Thu Jun 06, 2019 8:05 pm
- Full name: Percival Tiglao
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
Perhaps now is a good time for me to plug my test methodology? I have invented a "game" for my personal study (maybe others have invented this simple game themselves already: but I'm still relatively new to this community). It is the simplest possible game I could imagine, so that I can focus on the pure task of finding a better search.Dann Corbit wrote: ↑Fri Jan 03, 2020 9:47 pm Something is missing from the minimization of these:
1. Number of CPU instructions executed
2. Number of memory load/stores executed (in L1, L2, L3 cache, and DDR4 RAM)
3. The number of MESI messages passed between CPU cores.
Consider a program that does eval() on the root node, and then waits until time expires and returns the move choice.
1. Very few instructions
2. Very few load/store operations
3. Very few CPU messages
But it is obviously bad. Something has to measure the usefulness of the outputs and we need measures like depth achieved and Elo gained to factor in somehow.
The chess search is a function of several variables. I guess that ultimately we have time as the x axis and Elo as the y axis.
1. A move is represented by a 5-bit number, between 0 and 31.
2. Players alternate by choosing a move.
3. After a set depth, the game is over. A 10-depth game (5-moves per player) results in a 60-bit final position.
4. Hash(chosen numbers) == the precise value of the final position. Assume that the hash creates a 16-bit signed integer (say, the bottom 16-bits of MD5, Adler32, CRC16, or whatever other hash function you can think of). Negative numbers favor Player2, Positive numbers favor Player1. Zero is exactly a tie.
For example, if Player1 chooses "move 5", and then Player2 chooses "move 22", the position would be the 10-bit binary number: 0010111000b (hex 0x0B8).
The "Search Question" is to discover the optimal moves of the two players. AlphaBeta is simple and obvious for this simple game, and the various search strategies (YBW, ABDADA, Shared Hash Table, LazySMP) all would apply to this game as well. Its how I've been experimenting with GPU-searches the past few months.
The ultimate question therefore, is which search methodology can calculate an exhaustive, depth 10 search over this game in the shortest amount of time? (or other benchmarks: such as the Big-O of instructions, memory operations, or CPU messages)
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
But this is not a realistic "chess like" tree. In chess neighboring positions in the tree are heavily correlated. Moreover evaluation errors are also heavily correlated. Modern chess engines (A/B and MCTS) exploit these correlations. You will not capture this with a simple random tree.dragontamer5788 wrote: ↑Fri Jan 03, 2020 9:57 pm
Perhaps now is a good time for me to plug my test methodology? I have invented a "game" for my personal study. It is the simplest possible game I could imagine, so that I can focus on the pure task of finding a better search.
1. A move is represented by a 5-bit number, between 0 and 31.
2. Players alternate by choosing a move.
3. After a set depth, the game is over. A 10-depth game (5-moves per player) results in a 60-bit final position.
4. Hash(chosen numbers) == the precise value of the final position. Assume that the hash creates a 16-bit signed integer (say, the bottom 16-bits of MD5, Adler32, CRC16, or whatever other hash function you can think of). Negative numbers favor Player2, Positive numbers favor Player1. Zero is exactly a tie.
For example, if Player1 chooses "move 5", and then Player2 chooses "move 22", the position would be the 10-bit binary number: 0010111000b (hex 0x0B8).
The "Search Question" is to discover the optimal moves of the two players. AlphaBeta is simple and obvious for this simple game, and the various search strategies (YBW, ABDADA, Shared Hash Table, LazySMP) all would apply to this game as well. Its how I've been experimenting with GPU-searches the past few months.
The ultimate question therefore, is which search methodology can calculate an exhaustive, depth 10 search over this game in the shortest amount of time? (or other benchmarks: such as the Big-O of instructions, memory operations, or CPU messages)
Generating realistic synthetic chess like trees to test search algorithms would be an interesting project.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
No that is not true. Threads work at two depths: d and d+1 simultaneously. While this separation may result in reduced overhead, each group of threads working on the same depth sure going to result in a ton of overhead!Michel wrote: ↑Fri Jan 03, 2020 9:50 pm Sigh... Everyone claims "it is a mystery that Lazy works so well".
I do not see the mystery.
Occam's razor says that we should first look for the simplest possible explanation. And the simplest possible explanation is that Lazy has almost zero overhead. The overhead is so small, that (combined with a few primitive tricks to avoid that threads do too much double work) it wins against (some?) smarter work dividing algorithms that have much more overhead.
As far as I know, no one has discredited this explanation.
I think the goal should be to prove SHT (no d/d+1 stuff) can be better than existing algorithms (YBW, ABDADA, DTS etc). If not, Lazy ( implicitly LazySHT ) is going to lose to one of LazyYBW, LazyABDADA, LazyDTS etch where you use the smarter algorithm for threads searching at the same depth.
-
- Posts: 201
- Joined: Thu Jun 06, 2019 8:05 pm
- Full name: Percival Tiglao
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
That's a good point.Michel wrote: ↑Fri Jan 03, 2020 10:08 pmBut this is not a realistic "chess like" tree. In chess neighboring positions in the tree are heavily correlated. Moreover evaluation errors are also heavily correlated. Modern chess engines (A/B and MCTS) exploit these correlations. You will not capture this with a simple random tree.dragontamer5788 wrote: ↑Fri Jan 03, 2020 9:57 pm
Perhaps now is a good time for me to plug my test methodology? I have invented a "game" for my personal study. It is the simplest possible game I could imagine, so that I can focus on the pure task of finding a better search.
1. A move is represented by a 5-bit number, between 0 and 31.
2. Players alternate by choosing a move.
3. After a set depth, the game is over. A 10-depth game (5-moves per player) results in a 60-bit final position.
4. Hash(chosen numbers) == the precise value of the final position. Assume that the hash creates a 16-bit signed integer (say, the bottom 16-bits of MD5, Adler32, CRC16, or whatever other hash function you can think of). Negative numbers favor Player2, Positive numbers favor Player1. Zero is exactly a tie.
For example, if Player1 chooses "move 5", and then Player2 chooses "move 22", the position would be the 10-bit binary number: 0010111000b (hex 0x0B8).
The "Search Question" is to discover the optimal moves of the two players. AlphaBeta is simple and obvious for this simple game, and the various search strategies (YBW, ABDADA, Shared Hash Table, LazySMP) all would apply to this game as well. Its how I've been experimenting with GPU-searches the past few months.
The ultimate question therefore, is which search methodology can calculate an exhaustive, depth 10 search over this game in the shortest amount of time? (or other benchmarks: such as the Big-O of instructions, memory operations, or CPU messages)
Generating realistic synthetic chess like trees to test search algorithms would be an interesting project.
Hmm... I guess there can be no better test than simply matching engines against each other in chess. That's the ultimate test.
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
When I said "overhead" I meant "administrative overhead" (like in YBW). Lazy trades this overhead (deregulation ) for the overhead resulting from double work.Daniel wrote: No that is not true. Threads work at two depths: d and d+1 simultaneously. While this separation may result in reduced overhead, each group of threads working on the same depth sure going to result in a ton of overhead!
In SHT there is only the hash table to make the threads desynchronize a little. It seemed to work well in Toga II up to 4 threads (where according to CCRL its scaling was more or less on par with the YBW engines of the time). But it seems unreasonable to extend it beyond that. Your tests seem to confirm this.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 43
- Joined: Thu Oct 11, 2018 2:26 pm
- Full name: Graham Jones
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
I have been working on that... If anyone knows of previous work along these lines please let me know. As well as correlations, I think heavy-tailed distributions are important since even a generally accurate evaulation function will sometimes make gross errors.Michel wrote: ↑Fri Jan 03, 2020 10:08 pm But this is not a realistic "chess like" tree. In chess neighboring positions in the tree are heavily correlated. Moreover evaluation errors are also heavily correlated. Modern chess engines (A/B and MCTS) exploit these correlations. You will not capture this with a simple random tree.
Generating realistic synthetic chess like trees to test search algorithms would be an interesting project.
Graham Jones, www.indriid.com
-
- Posts: 160
- Joined: Tue Jan 23, 2018 10:18 am
- Location: Rotterdam
- Full name: Ronald Friederich
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
I think you use Occam's razor on the wrong aspect. The only thing that is shared by Lazy SMP is the hashtable. In plain Lazy SMP every thread executes an identical search, the hashtable is what makes multiple threads behave differently, and a mayor part of that is that together they create a "richer" tree. Even 2 threads on a fixed depth perform much better than 1 thread, and this has nothing to do with TTD. So important questions are why is the tree richer, and how different is this richer tree from the 1 thread tree, so that we can try to create this richer tree with 1 thread also.Michel wrote: ↑Fri Jan 03, 2020 9:50 pm Occam's razor says that we should first look for the simplest possible explanation. And the simplest possible explanation is that Lazy has almost zero overhead. The overhead is so small, that (combined with a few primitive tricks to avoid that threads do too much double work) it wins against (some?) smarter work dividing algorithms that have much more overhead.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
I don't disagree. But so long as you use the same program, different search algorithms, the test ought to work. Might need to tune each version to each new type of architecture, but so long as basic chess knowledge remains the same... both programs are, in reality, going to get run on all sorts of architectures, so worrying too much about tweaking for max performance is probably overkill anyway.dragontamer5788 wrote: ↑Fri Jan 03, 2020 7:08 pmWhile that's the correct, general idea, there are numerous implementation details that will hamper such a test. Lockless (atomic-compare-and-swap) vs spinlocks, NUMA computers vs UMA computers, Intel Skylake vs AMD Zen (vs IBM Power9), Hyperthreads vs Physical cores.
There are so many variables at play here.
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: RMO - Randomized Move Order - yet another Lazy SMP derivate
Exactly!Ronald wrote: ↑Fri Jan 03, 2020 11:35 pmI think you use Occam's razor on the wrong aspect. The only thing that is shared by Lazy SMP is the hashtable. In plain Lazy SMP every thread executes an identical search, the hashtable is what makes multiple threads behave differently, and a mayor part of that is that together they create a "richer" tree. Even 2 threads on a fixed depth perform much better than 1 thread, and this has nothing to do with TTD. So important questions are why is the tree richer, and how different is this richer tree from the 1 thread tree, so that we can try to create this richer tree with 1 thread also.Michel wrote: ↑Fri Jan 03, 2020 9:50 pm Occam's razor says that we should first look for the simplest possible explanation. And the simplest possible explanation is that Lazy has almost zero overhead. The overhead is so small, that (combined with a few primitive tricks to avoid that threads do too much double work) it wins against (some?) smarter work dividing algorithms that have much more overhead.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.