DTS and the call stack

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

jswaff

DTS and the call stack

Post by jswaff »

I'm working on a DTS implementation in Prophet and have become stuck on one point. I'm afraid it's going to be a sticky one.

According to Hyatt's DTS paper and a master's thesis from here:
http://www.valavan.net/mthesis.pdf , if a processor returning from
a parallel search is the last one, it's responsible for returning the evaluation
to the parent and to keep working from the parent.

This seems pretty easy provided the parent's owner is the last processor.
The call stack is such that he can just "return back up" and keep working.
But, if the last processor is one of the helper's, how would you handle
this since the call stack is not set up to "return back up"?

--
James
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: DTS and the call stack

Post by Pradu »

jswaff wrote:I'm working on a DTS implementation in Prophet and have become stuck on one point. I'm afraid it's going to be a sticky one.

According to Hyatt's DTS paper and a master's thesis from here:
http://www.valavan.net/mthesis.pdf , if a processor returning from
a parallel search is the last one, it's responsible for returning the evaluation to the parent and to keep working from the parent.

This seems pretty easy provided the parent's owner is the last processor.
The call stack is such that he can just "return back up" and keep working.
But, if the last processor is one of the helper's, how would you handle
this since the call stack is not set up to "return back up"?

--
James
Hi James! Glad to see I'm not the only one working on it 8-) . This is the reason you need iterative search for DTS, to pass the search stack. In YBW I believe the searching thread acts as a helper thread but only to the threads at the split-point but DTS passes on the search stack to another thread. I'm going to do it a bit different than Hyatt. Instead of copying the entire search stack to each thread when splitting, I make it pass on the pointer to the search stack to the one of the remaining searching threads. Once that searching thread T1 finishes and it has no work and other threads are searching the same split point, it transfers it's own search stack to one of the searching threads T2 and T1 acts as a helper thread. When T2 finishes it's search and backs up to the split point, it polls to see if any threads transferred its search stack to it. If so it acts like the thread that entered the split point, and it can transfer the search stack again to another thread under the same conditions. This will make splitting much cheaper than copying the entire search stack everytime.

BTW less than a month left! :?
EDIT: Prophet isn't playing in ACCA tourney??
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: DTS and the call stack

Post by bob »

jswaff wrote:I'm working on a DTS implementation in Prophet and have become stuck on one point. I'm afraid it's going to be a sticky one.

According to Hyatt's DTS paper and a master's thesis from here:
http://www.valavan.net/mthesis.pdf , if a processor returning from
a parallel search is the last one, it's responsible for returning the evaluation
to the parent and to keep working from the parent.

This seems pretty easy provided the parent's owner is the last processor.
The call stack is such that he can just "return back up" and keep working.
But, if the last processor is one of the helper's, how would you handle
this since the call stack is not set up to "return back up"?

--
James
It is a sticky issue. True DTS really needs an iterated search rather than a recursive search. Then you call search one time, and just increment a "ply"variable as you step down the tree. This way you can see everything and there is no "call stack" to deal with.

If you are willing to give up the true DTS implementation, what I did in Crafty was to have a function "ThreadWait()" where I park all idle processors. When a thread wants to become a parent and spawn siblings, it grabs them from the idle process loop by giving them work and they dive out of ThreadWait() and call SearchMP() which shares the move list and so forth. But what about the parent? I immediately call ThreadWait() from that search as well, which will then join in by calling SearchMP() just like the other threads. The only proviso is that eventually all threads will become idle at this split point, and the original "master" will be the only one that can return out of ThreadWait(). The rest remain stuck there waiting on more work.
jswaff

Re: DTS and the call stack

Post by jswaff »

Pradu wrote: This is the reason you need iterative search for DTS, to pass the search stack. In YBW I believe the searching thread acts as a helper thread but only to the threads at the split-point but DTS passes on the search stack to another thread.
I was afraid someone would say that. I'm not quite willing to rewrite my
search at this point. I've not put a lot of thought into it, but it seems an
iterative search would be... ugly. :)
Pradu wrote: I'm going to do it a bit different than Hyatt. Instead of copying the entire search stack to each thread when splitting, I make it pass on the pointer to the search stack to the one of the remaining searching threads. Once that searching thread T1 finishes and it has no work and other threads are searching the same split point, it transfers it's own search stack to one of the searching threads T2 and T1 acts as a helper thread. When T2 finishes it's search and backs up to the split point, it polls to see if any threads transferred its search stack to it. If so it acts like the thread that entered the split point, and it can transfer the search stack again to another thread under the same conditions. This will make splitting much cheaper than copying the entire search stack everytime.
Ok, you lost me a bit here. By "search stack" I thought you were talking
about the call stack, but when you talk about passing around pointers
to the search stack I guess you are referring to a data structure containing
move lists, etc. But, I don't see how you can get around doing some
copying there so that each thread can operate off it's own data, independent
of the other threads.
Pradu wrote: BTW less than a month left! :?
EDIT: Prophet isn't playing in ACCA tourney??
Unfortunately not. I'm going to a family reunion that weekend. They
really should schedule those things around my computer chess calendar. :-/
jswaff

Re: DTS and the call stack

Post by jswaff »

bob wrote: But what about the parent? I immediately call ThreadWait() from that search as well, which will then join in by calling SearchMP() just like the other threads. The only proviso is that eventually all threads will become idle at this split point, and the original "master" will be the only one that can return out of ThreadWait(). The rest remain stuck there waiting on more work.
So, do you just have the parent thread hanging out waiting on the helper
threads to finish, or do you put it to work helping out one of the helpers?

I have seen ThreadWait() (below is from Crafty 19.8). I don't quite follow
all of it, but it seems you are just calling Pause() and waiting. But, that
seems to directly contradict one of the stated goals of DTS - "...to
completely eliminate the cases where a processor was idle."

--
James

Code: Select all

/*
 ----------------------------------------------------------
|                                                          |
|   we can exit if our thread[i] is non-zero, or if we are |
|   waiting on others to finish a block that *we* have to  |
|   return through.  when the busy count on such a block   |
|   hits zero, we return immediately which unwinds the     |
|   search as it should be.                                |
|                                                          |
 ----------------------------------------------------------
*/
    while (!thread[tid] && !quit && (!waiting || waiting->nprocs)) Pause();
    if (quit) return(0);

Code: Select all

lock.h:#  define Pause() ({asm volatile ("pause");})
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: DTS and the call stack

Post by Pradu »

jswaff wrote:Ok, you lost me a bit here.
Yeah I talk gibberish at times...8-)
By "search stack" I thought you were talking about the call stack, but when you talk about passing around pointers to the search stack I guess you are referring to a data structure containing move lists, etc.
Yeah should have stated that.
But, I don't see how you can get around doing some copying there so that each thread can operate off it's own data, independent of the other threads.
True you will need some data. Each thread will have it's own search stack which continues from where the split point stack left off. The search thread needs from the previous thread are the hashkeys for the positions already searched for draw detection and the alpha and beta of the split-point for updating bounds. For the "drawTable" I will have it arranged like this:

Code: Select all

[pointer to the drawTable of the split point, hashkey0, hashkey1 ...]
And for the drawTable that goes all the way to the root you have a NULL pointer for the first element. So I guess there will be some overhead traversing this linked list, not quite sure if it is less than the overhead of copying the entire stack at the split point but I'm guessing so. For updating alpha and beta you can make these global variables (maybe in a split point structure?) and check these at various places in search. But the rest of the data you won't need, I don't think. (I haven't implemented a parallel search yet, just a parallel perft so far...so take my words with a grain of salt). But anyways if you need any other data (like the last move played for guys who do recapture extensions), I guess you can just access this data through the pointer to the stack.

So I guess the only thing you need to copy is the split-point state of the board position and some pointers to the split-point search stack. And the last thread that leaves the split-point inherits the split-point stack as it's own and everything works out.
jswaff wrote:
Pradu wrote: BTW less than a month left! :?
EDIT: Prophet isn't playing in ACCA tourney??
Unfortunately not. I'm going to a family reunion that weekend. They
really should schedule those things around my computer chess calendar. :-/
Have fun 8-)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: DTS and the call stack

Post by bob »

jswaff wrote:
bob wrote: But what about the parent? I immediately call ThreadWait() from that search as well, which will then join in by calling SearchMP() just like the other threads. The only proviso is that eventually all threads will become idle at this split point, and the original "master" will be the only one that can return out of ThreadWait(). The rest remain stuck there waiting on more work.
So, do you just have the parent thread hanging out waiting on the helper
threads to finish, or do you put it to work helping out one of the helpers?
As I mentioned, the parent jumps into ThreadWait() and then jumps right back into the search stuff. This lets the parent finish first, return to ThreadWait() and since the current split point has not been completed, the parent then can be used to help the other threads complete their sub-trees...


I have seen ThreadWait() (below is from Crafty 19.8). I don't quite follow
all of it, but it seems you are just calling Pause() and waiting. But, that
seems to directly contradict one of the stated goals of DTS - "...to
completely eliminate the cases where a processor was idle."
ThreadWait() is the point where a thread waits while waiting on another thread to assign it some work. The time spent there is almost nil...


--
James

Code: Select all

/*
 ----------------------------------------------------------
|                                                          |
|   we can exit if our thread[i] is non-zero, or if we are |
|   waiting on others to finish a block that *we* have to  |
|   return through.  when the busy count on such a block   |
|   hits zero, we return immediately which unwinds the     |
|   search as it should be.                                |
|                                                          |
 ----------------------------------------------------------
*/
    while (!thread[tid] && !quit && (!waiting || waiting->nprocs)) Pause();
    if (quit) return(0);

Code: Select all

lock.h:#  define Pause() ({asm volatile ("pause");})
jswaff

Re: DTS and the call stack

Post by jswaff »

bob wrote: As I mentioned, the parent jumps into ThreadWait() and then jumps right back into the search stuff. This lets the parent finish first, return to ThreadWait() and since the current split point has not been completed, the parent then can be used to help the other threads complete their sub-trees...
How do you know the parent will finish first? I can see why you would want it to finish first, but not how you know it will.

--
James
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: DTS and the call stack

Post by Pradu »

jswaff wrote:
bob wrote: As I mentioned, the parent jumps into ThreadWait() and then jumps right back into the search stuff. This lets the parent finish first, return to ThreadWait() and since the current split point has not been completed, the parent then can be used to help the other threads complete their sub-trees...
How do you know the parent will finish first? I can see why you would want it to finish first, but not how you know it will.

--
James
Sometimes it will sometimes it won't. It just turns into another helper thread (only to the other threads at the split point though) when the parent finishes first and has no more work at the split-point.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: DTS and the call stack

Post by bob »

jswaff wrote:
bob wrote: As I mentioned, the parent jumps into ThreadWait() and then jumps right back into the search stuff. This lets the parent finish first, return to ThreadWait() and since the current split point has not been completed, the parent then can be used to help the other threads complete their sub-trees...
How do you know the parent will finish first? I can see why you would want it to finish first, but not how you know it will.

--
James
I don't know it will finish first. But I have to at least handle that case. take the following scenario...

I start a search with one thread active, three hung in ThreadWait() waiting for work. As soon as I find a good split point by the YBW criterion, I split the current search into four threads and give them each work to do. Three of the threads are already hung in ThreadWait() waiting on work so they charge off immediately. The original thread then calls ThreadWait() which then lets it charge off as the fourth thread doing the search. Now if the first thread finishes first, it still returns to ThreadWait() where other active threads can give it work. If it finishes last, it just cleans up and goes back to the original search TREE struct which had no threads helping it.

Hard to explain, not that hard to make work, but the data structures are somewhat complicated... This way no thread has to stop and wait, they are always in ThreadWait() which instantly gets them new assignments when they re-enter that code...