Let's say we start with the most basic implementation:
* we have 4 threads
* run an id loop starting at depth 1 with 2 threads T1, T2
* run an id loop starting at depth 2 with 2 threads T3, T4
Ok, now let's say T2 finishes depth 1:
* in this totally dumb implementation, T2 will start searching depth 2, while T1 is still searching depth 1.
* instead we want to have always 2 threads searching some depth N and the other 2 threads searching depth N+1.
* here we have T1 continuing a useless search, 3 threads at depth 2 (instead of 2), and no one searching depth 3.
Looks like we need a global array (to RW under lock protection obviously), that knows which thread is doing what, and a signaling process to abort search, so that:
* when T2 finishes, it knows that there are already 2 threads searching depth 2, so the next iteration for T2 should be depth 3
* also, T2 knows that T1 is now doing useless work, so T2 raises a signal that T1 can observe to abort the search and go back to the id loop, where the above logic will kick in as well so that T1 will start depth 3.
Another thing is the main thread. What to do with it? It seems natural to park it in a loop that reads stdin to process UCI commands (and raise a "all searching should stop" signal when stop command is received for example).
And then there's stopping the search, because we're out of time, have finished the max depth, or reached the node limit. Looks like the cleanest way would be to have another thread that manages that like:
* sleep for a few msec
* check all these conditions (nodes will be a pain as node counters will be per thread, and locking could slow things down too much), and raise the appropriate signal.
And what of aspiration? If 2 threads sear the same window same depth, one finishes with a score outside the window, then the other is now doing useless work on a refuted window and should be signald to abort and move to the next window, plus windows should be global and for each depth (with lock protection).
Seems not so lazy after all, if I want to do this right. Am I missing something?
			
			
									
						
							lazy smp questions
Moderator: Ras
- 
				lucasart
- Posts: 3242
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
lazy smp questions
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
			
						- 
				bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: lazy smp questions
I normally always respond to questions about this topic, but not in this case. You guys like to act childish and make clever insulting remarks, so rather than explaining why this is a poor approach, I'll leave it as an exercise for you to work out for yourself, WITHOUT telling you what was known about this approach 30+ years ago. And it WAS done back then by at least two different groups.lucasart wrote:Let's say we start with the most basic implementation:
* we have 4 threads
* run an id loop starting at depth 1 with 2 threads T1, T2
* run an id loop starting at depth 2 with 2 threads T3, T4
Ok, now let's say T2 finishes depth 1:
* in this totally dumb implementation, T2 will start searching depth 2, while T1 is still searching depth 1.
* instead we want to have always 2 threads searching some depth N and the other 2 threads searching depth N+1.
* here we have T1 continuing a useless search, 3 threads at depth 2 (instead of 2), and no one searching depth 3.
Looks like we need a global array (to RW under lock protection obviously), that knows which thread is doing what, and a signaling process to abort search, so that:
* when T2 finishes, it knows that there are already 2 threads searching depth 2, so the next iteration for T2 should be depth 3
* also, T2 knows that T1 is now doing useless work, so T2 raises a signal that T1 can observe to abort the search and go back to the id loop, where the above logic will kick in as well so that T1 will start depth 3.
Another thing is the main thread. What to do with it? It seems natural to park it in a loop that reads stdin to process UCI commands (and raise a "all searching should stop" signal when stop command is received for example).
And then there's stopping the search, because we're out of time, have finished the max depth, or reached the node limit. Looks like the cleanest way would be to have another thread that manages that like:
* sleep for a few msec
* check all these conditions (nodes will be a pain as node counters will be per thread, and locking could slow things down too much), and raise the appropriate signal.
And what of aspiration? If 2 threads sear the same window same depth, one finishes with a score outside the window, then the other is now doing useless work on a refuted window and should be signald to abort and move to the next window, plus windows should be global and for each depth (with lock protection).
Seems not so lazy after all, if I want to do this right. Am I missing something?
- 
				lucasart
- Posts: 3242
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: lazy smp questions
I'm asking whether Bob Hyatt thinks that lazy SMP is a good idea or not. I know already the answer to that question.
I'm asking some some specific question about how to implement it, in the lazyest, yet still effective way.
			
			
									
						
							I'm asking some some specific question about how to implement it, in the lazyest, yet still effective way.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
			
						- 
				cdani  
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: lazy smp questions
Hi.
My lazy smp implementation is the most lazy I know about Anyway seems to be good for minimum 80 elo gain at 4 threads, probably 100 in the latest implementation, pending to be evaluated at CCRL.
 Anyway seems to be good for minimum 80 elo gain at 4 threads, probably 100 in the latest implementation, pending to be evaluated at CCRL.
The thread 0 manages the id loop. It signals other threads to start thinking at search_root for each new depth. Itself calls search_root, and controls user input every n nodes (calculated on the fly), inside alpha_beta function.
How all this stop? Simply when search_root of thread 0 finishes, it stops the other threads. Some of them maybe had ended their search before, so are not doing nothing. Thats it.
So the win comes only from hash table, as thread 0 ignores the results of the other threads. The only thing directly used from other threads is the number of nodes analyzed, to calculate the NPS.
			
			
									
						
							My lazy smp implementation is the most lazy I know about
 Anyway seems to be good for minimum 80 elo gain at 4 threads, probably 100 in the latest implementation, pending to be evaluated at CCRL.
 Anyway seems to be good for minimum 80 elo gain at 4 threads, probably 100 in the latest implementation, pending to be evaluated at CCRL.The thread 0 manages the id loop. It signals other threads to start thinking at search_root for each new depth. Itself calls search_root, and controls user input every n nodes (calculated on the fly), inside alpha_beta function.
How all this stop? Simply when search_root of thread 0 finishes, it stops the other threads. Some of them maybe had ended their search before, so are not doing nothing. Thats it.
So the win comes only from hash table, as thread 0 ignores the results of the other threads. The only thing directly used from other threads is the number of nodes analyzed, to calculate the NPS.
Daniel José -    http://www.andscacs.com
  http://www.andscacs.com
			
						 http://www.andscacs.com
  http://www.andscacs.com- 
				lucasart
- Posts: 3242
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: lazy smp questions
Thanks Daniel. That is even lazyer than what I had in mind.
			
			
									
						
							Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
			
						- 
				cdani  
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: lazy smp questions
I add some more info on hashes shared between threads, apart from the main one which also serves as a evaluation hash:
* Double move refutation hash table (like discocheck).
* Triple/quadruple move refutation hash table.
* Piece hash, like killers but associated to (hash ^ hash pawns).
* Counter moves.
* Pawn hash.
Each one is supposed to help a little to increase benefices of SMP, but I did not verified if it's true.
The rest, like killers, history and others, are exclusive for each thread.
			
			
									
						
							* Double move refutation hash table (like discocheck).
* Triple/quadruple move refutation hash table.
* Piece hash, like killers but associated to (hash ^ hash pawns).
* Counter moves.
* Pawn hash.
Each one is supposed to help a little to increase benefices of SMP, but I did not verified if it's true.
The rest, like killers, history and others, are exclusive for each thread.
Daniel José -    http://www.andscacs.com
  http://www.andscacs.com
			
						 http://www.andscacs.com
  http://www.andscacs.com- 
				mar
- Posts: 2667
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: lazy smp questions
What I do:
- always finish depth 1 in main search thread no matter what (also all limits [time/nodes etc] are ignored in this case)
- main search thread (="master") starts (=notifies) helper threads (="slaves") inside id loop (let's say main search thread is thread 0 and takes part in lazy smp itself)
- whenever a slave search finishes it aborts all others (using a special flag present in main search thread)
if it's main search thread that finished first, nothing special happens, otherwise main search thread finds the "lazy helper" that finished and "borrows" its result.
After that I simply continue id loop.
("main thread" in my case in simply I/O loop in main(), because I have a separate thread that manages search and I don't actively poll for user input within search)
			
			
									
						
										
						- always finish depth 1 in main search thread no matter what (also all limits [time/nodes etc] are ignored in this case)
- main search thread (="master") starts (=notifies) helper threads (="slaves") inside id loop (let's say main search thread is thread 0 and takes part in lazy smp itself)
- whenever a slave search finishes it aborts all others (using a special flag present in main search thread)
if it's main search thread that finished first, nothing special happens, otherwise main search thread finds the "lazy helper" that finished and "borrows" its result.
After that I simply continue id loop.
("main thread" in my case in simply I/O loop in main(), because I have a separate thread that manages search and I don't actively poll for user input within search)
- 
				lucasart
- Posts: 3242
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: lazy smp questions
Thanks. But I'm wondering if you even need to have permanent threads, and manage them with locks and condition variables. Why not just create threads just when you need them, starting them directly in the ID loop and join them (with abort signal) or even kill them when you're done. Spawning a few threads is such a cheap operation nowadays (at least on linux where i tested it and was amazed by the results), why bother?
			
			
									
						
							Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
			
						- 
				cetormenter
- Posts: 170
- Joined: Sun Oct 28, 2012 9:46 pm
Re: lazy smp questions
I am pretty sure Nirvana uses the laziest of SMP approaches. I think the chess programming wiki calls it "Shared Hash Table". I simply spawn the threads at the root position and set them free. The main thread then controls when all of the threads stop their search. Not sure how well it scales to many threads but up to 4 is it pretty competitive considering how little time it takes to implement even compared to "lazy" smp. However I don't think this approach satisfies your "yet still effective" criteria since a lot of work is wasted.
			
			
									
						
										
						- 
				JVMerlino
- Posts: 1404
- Joined: Wed Mar 08, 2006 10:15 pm
- Location: San Francisco, California
Re: lazy smp questions
I did the same thing in Myrddin. Each slave process simply acts as if it is in analysis mode and fills the hash table. I got this idea from H.G Muller.cetormenter wrote:I am pretty sure Nirvana uses the laziest of SMP approaches. I think the chess programming wiki calls it "Shared Hash Table". I simply spawn the threads at the root position and set them free. The main thread then controls when all of the threads stop their search. Not sure how well it scales to many threads but up to 4 is it pretty competitive considering how little time it takes to implement even compared to "lazy" smp. However I don't think this approach satisfies your "yet still effective" criteria since a lot of work is wasted.
In my own testing, it got +69 elo at 4 CPU. Definitely not as good as other options, but very easy to implement and so I'm satisfied.
 
 jm