Ok, so just say you haven't got a clue about lazy smp, and are looking for a very quick and dirty way to add to a non-smp program.. which is written in C.
What's the easiest and simplest way to implement? Where's the best start point?
Simplest way to implement quick and dirty lazy smp
Moderators: hgm, Rebel, chrisw
-
- Posts: 327
- Joined: Sat Mar 27, 2010 7:15 pm
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
Re: Simplest way to implement quick and dirty lazy smp
Just run the search in multiple forks using a shared hash table.
In C++11, it is much fun using std::thread/mutex/condition. Stockfish is a very simple and easy to understand implementation.
The one in Minic is derived from Stockfish and I guess quite copy/paste-able.
In C++11, it is much fun using std::thread/mutex/condition. Stockfish is a very simple and easy to understand implementation.
The one in Minic is derived from Stockfish and I guess quite copy/paste-able.
-
- Posts: 2724
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Simplest way to implement quick and dirty lazy smp
1) Make sure all threads have their own memory stacks except global hash tablesilentshark wrote: ↑Fri Jan 04, 2019 2:16 pm Ok, so just say you haven't got a clue about lazy smp, and are looking for a very quick and dirty way to add to a non-smp program.. which is written in C.
What's the easiest and simplest way to implement? Where's the best start point?
2) use the SHT, shared hash table, in all threads for TT-moves and TT-scores
3) test RMO - randomized move order in helper threads,
- root thread performs a normal AB search
- helper threads pick a random move after Nullmove, TT-move, IID-move etc.
- search terminates when any thread finishes its search
4) consider a quick but not dirty ABDADA implementation
--
Srdja
-
- Posts: 96
- Joined: Fri Jul 06, 2018 1:09 am
- Location: Chicago, IL
- Full name: Josh Odom
Re: Simplest way to implement quick and dirty lazy smp
Does randomized move order provide a benefit for lazy SMP?
-
- Posts: 2724
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Simplest way to implement quick and dirty lazy smp
I implemented RMO on gpu and it was superior to plain SHT,
i tried some lazy smp ideas like different search depths,
but these did not work for me.
My ABDADA implementation is superior to RMO,
at least up to 32 threads.
--
Srdja
-
- Posts: 327
- Joined: Sat Mar 27, 2010 7:15 pm
Re: Simplest way to implement quick and dirty lazy smp
All great suggestions, many thanks
-
- Posts: 1357
- Joined: Wed Mar 08, 2006 10:15 pm
- Location: San Francisco, California
Re: Simplest way to implement quick and dirty lazy smp
Well, if your code is not thread-safe and you REALLY want the QUICKEST and DIRTIEST implementation, you could do what I did in Myrddin (thanks to an idea from H.G. Muller). Instead of threads, I used processes which are extremely easy to launch in parallel. You just have to make your various hash tables accessible by all processes via shared memory, and then have the master process feed moves via pipes to all of the slaves while they are in analyze mode. They fill up the hash tables, which the main process uses to speed up its search.silentshark wrote: ↑Fri Jan 04, 2019 2:16 pm Ok, so just say you haven't got a clue about lazy smp, and are looking for a very quick and dirty way to add to a non-smp program.. which is written in C.
What's the easiest and simplest way to implement? Where's the best start point?
It worked a fair bit better than expected. My testing showed +90 elo for 4 CPU, and this is borne out by CCRL testing, which shows +93 (although CCRL has only played about 260 games with the 4 CPU version).
jm
-
- Posts: 27869
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Simplest way to implement quick and dirty lazy smp
Yeah, I used that in my engine HaQiKi D, also with satisfactory results (eventually). I don't connect the processes (by pipes) in a star topology, though, but as a linear chain. When a process receives the command "cores N" with N > 1 it launches a daughter process, and passes the command "cores N-1" to it. That will set up the chain.
The process that receives the "xboard" command (which is not passed along the chain) knows it is the master process communicating with the GUI. This then allocates the hash memory in a sharable way, and sends its name in a (non-standard) "attach" command (which is passed along to daughters) to the daughter, so the latter can map it in its own memory instead of allocating hash memory.
Game-state changing commands are passed to daughters, as well as any move the master makes itself. This keeps the processes in sync. Whenever the master starts searching, it sends an "analyze" command to its daughter, whenever it stops searching it sends "exit" to stop the analysis (both also passed along).
Basically the whole thing is implemented in the protocol driver loop.
The process that receives the "xboard" command (which is not passed along the chain) knows it is the master process communicating with the GUI. This then allocates the hash memory in a sharable way, and sends its name in a (non-standard) "attach" command (which is passed along to daughters) to the daughter, so the latter can map it in its own memory instead of allocating hash memory.
Game-state changing commands are passed to daughters, as well as any move the master makes itself. This keeps the processes in sync. Whenever the master starts searching, it sends an "analyze" command to its daughter, whenever it stops searching it sends "exit" to stop the analysis (both also passed along).
Basically the whole thing is implemented in the protocol driver loop.
-
- Posts: 1357
- Joined: Wed Mar 08, 2006 10:15 pm
- Location: San Francisco, California
Re: Simplest way to implement quick and dirty lazy smp
The only significant difference between your implementation and mine is that I pass the address for the hash tables as commandline parameters to the slaves. How do you pass commands down the chain if not via pipes?hgm wrote: ↑Sat Jan 05, 2019 9:47 pm Yeah, I used that in my engine HaQiKi D, also with satisfactory results (eventually). I don't connect the processes (by pipes) in a star topology, though, but as a linear chain. When a process receives the command "cores N" with N > 1 it launches a daughter process, and passes the command "cores N-1" to it. That will set up the chain.
The process that receives the "xboard" command (which is not passed along the chain) knows it is the master process communicating with the GUI. This then allocates the hash memory in a sharable way, and sends its name in a (non-standard) "attach" command (which is passed along to daughters) to the daughter, so the latter can map it in its own memory instead of allocating hash memory.
Game-state changing commands are passed to daughters, as well as any move the master makes itself. This keeps the processes in sync. Whenever the master starts searching, it sends an "analyze" command to its daughter, whenever it stops searching it sends "exit" to stop the analysis (both also passed along).
Basically the whole thing is implemented in the protocol driver loop.
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Simplest way to implement quick and dirty lazy smp
I fail to see how this is easier. Honestly, managing subprocesses, and communications via pipes, is a lot more complicated in my experience (especially on POSIX systems, with the fork logic being quite mind bending). And it's hideous, from a design/coding standpoint. Simply start threads in an id_loop() and join when done. That's all there is to it. The rest is details, which you can improve on as you go along.JVMerlino wrote: ↑Sat Jan 05, 2019 7:31 pmWell, if your code is not thread-safe and you REALLY want the QUICKEST and DIRTIEST implementation, you could do what I did in Myrddin (thanks to an idea from H.G. Muller). Instead of threads, I used processes which are extremely easy to launch in parallel. You just have to make your various hash tables accessible by all processes via shared memory, and then have the master process feed moves via pipes to all of the slaves while they are in analyze mode. They fill up the hash tables, which the main process uses to speed up its search.silentshark wrote: ↑Fri Jan 04, 2019 2:16 pm Ok, so just say you haven't got a clue about lazy smp, and are looking for a very quick and dirty way to add to a non-smp program.. which is written in C.
What's the easiest and simplest way to implement? Where's the best start point?
It worked a fair bit better than expected. My testing showed +90 elo for 4 CPU, and this is borne out by CCRL testing, which shows +93 (although CCRL has only played about 260 games with the 4 CPU version).
jm
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.