Parallel search

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Fabio Gobbato
Posts: 217
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Parallel search

Post by Fabio Gobbato »

When I launch a similar code I always get threads blocked for waiting a lock.
Maybe it's a stupid thing but I can't understand where the code could leave a lock locked.
Could someone help me?

Code: Select all

idle_loop
{
	mutex_lock(SMPLock);
	...
	cond_wait(Work,SMPLock);
	...
	mutex_unlock(SMPLock);

	SMPSearch();

}

split
{
	mutex_lock(SMPLock);
	...
	cond_broadcast(Work);
	...
	mutex_unlock(SMPLock);

	SMPSearch();

	mutex_lock(SMPLock);
	...
	if (ncpu>0) cond_wait(CloseSplit,SMPLock);
	...
	mutex_unlock(SMPLock);
}

SMPSearch
{
	while() //all the moves
	{
		mutex_lock(SMPLock);
		ExtractMove();
		mutex_unlock(SMPLock);

		Make();
		Search();
		UnMake();

		mutex_lock(SMPLock);
		// check the bounds
		mutex_unlock(SMPLock);
	}
	mutex_lock(SMPLock);
	ncpu--;	
	if (ncpu==0) cond_signal(CloseSplit,SMPLock);
	mutex_unlock(SMPLock);
}
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Parallel search

Post by bob »

That's obviously not the real code. The most common causes of deadlock are

(1) having a complex piece of code where you acquire the lock, then have a way to slip out without releasing the lock, usually by having the unlock behind an if or whatever;

(2) same thing but where you do a return before releasing a lock, leaving it set to deadlock the next time you try to acquire it.

(3) using more than one lock and failing to always acquire them in exactly the same order.

I recently found another way, that being to have a structure with an array of counters in front of the lock, followed by the lock, and then promptly accessing that array with a subscript that is too big, stepping right in the middle of the lock.
BeyondCritics
Posts: 396
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: Parallel search

Post by BeyondCritics »

Hi Fabio,

compile your program, so that you can access the stack of your threads within the debugger. The next time your search seems to hang, stop it, open it in the debugger and analyze the stack trace of each thread in turn, to see which locks it holds and why. Analyse the stack of each thread, to get a picture of the problem. What caused the dead lock to occur?

Oliver
User avatar
Fabio Gobbato
Posts: 217
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Re: Parallel search

Post by Fabio Gobbato »

I have solved it starting a new codeblocks project.
I'm sorry but I wasn't able to understand the problem.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Parallel search

Post by stegemma »

Fabio Gobbato wrote:I have solved it starting a new codeblocks project.
I'm sorry but I wasn't able to understand the problem.
Maybe the problem will be appear again... or never, if this was the solution. It would be interesting, for you, to compare the two projects, to know what is different.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Parallel search

Post by bob »

stegemma wrote:
Fabio Gobbato wrote:I have solved it starting a new codeblocks project.
I'm sorry but I wasn't able to understand the problem.
Maybe the problem will be appear again... or never, if this was the solution. It would be interesting, for you, to compare the two projects, to know what is different.
I would agree. You should always debug the old code just to be sure you understand what was wrong, otherwise you might well repeat again.

If you run under linux and use gcc, gdb is quite good at debugging threads. You can type ~Z to stop a deadlocked execution, then type "thread 1" and "where" to see where you were, then type "thread 2" and "where" to see where that thread was. On occasion one is waiting on a lock while the other is executing, indicating you might have failed to release a lock you had acquired.
User avatar
Fabio Gobbato
Posts: 217
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Re: Parallel search

Post by Fabio Gobbato »

That solutions solved some error related to the compilation.
Then there was some problems with uninitialized variables caught with valgrind.
Now I solved the problem, I have used similar names for the mutex so I lock with one and unlock with the other.
That was the problem and because I used really similar names I could'n distinguish them.
To debug my engine I use gdb with codeblocks and it works well with threads.
I'm helping also with helgrind.
Thanks to everybody for the patience, it was a stupid error.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Parallel search

Post by stegemma »

Fabio Gobbato wrote:That solutions solved some error related to the compilation.
Then there was some problems with uninitialized variables caught with valgrind.
Now I solved the problem, I have used similar names for the mutex so I lock with one and unlock with the other.
That was the problem and because I used really similar names I could'n distinguish them.
To debug my engine I use gdb with codeblocks and it works well with threads.
I'm helping also with helgrind.
Thanks to everybody for the patience, it was a stupid error.
As i Always say in italian "c'è sempre un altro baco" (there's Always another bug).

This kind of bugs can be limited using C++ encapsulation (even in C you can use similar paradigma). If my object needs locking, i create a Mutex related to the object itself, so it is less frequent to call the wrong one. More than this, if you make an object that encapsulate a Whole mutex, it will be destroyed when it get out of scope and you can release it in the destructor. But even if C it is preferred for its speedness, against C++, you can still program "thinking by objects" and just name anything as it were an object. So, the mutex for the moves array will be: MovesArrayMutex and so on. SMPLock could be easly confused with anything else.

I'm not teaching you anything... i've recently done a similar error in a business software, that takes more than 2 days to be solved!!!