Need help on YBWC algorithm

phhnguyen · Post by **phhnguyen** » Fri Sep 03, 2010 6:37 pm

Hi all,

I have started parallel chess programming, focusing on YBWC algorithm. Have just read some papers and took a look at Viper code. I understand the main idea but still miss a lot of technique details which some of I ask here:

1) When copy data for a new thread, should I copy all history moves? I am trying to store board and some information with history moves so worry much about the copy amount. If yes, do you know any technique to reduce the copy (such as copy only from the last capture?)?

2) In Viper code, I see the code of search function which I confuse much looks like below:

Code: Select all

for &#40;all moves&#41; &#123;
  ...
  if &#40;split&#40;...))
     break;
&#125;

Why break at split point? Should the master thread continue to search until running out of move?

3) From my understanding, I write the following pseudo code for the main search function, (can someone correct me please?):

Code: Select all

for &#40;all moves&#41; &#123;
    if &#40;move is illegal&#41; continue;
    makemove&#40;);
    search&#40;-beta, -alpha&#41;; // call recursive alphabeta
    undomove&#40;);

    /*
     * Until here the search made at least one legal move, so we call split 
     * function to check if there any available thread. If yes, copy data for 
     * it and then wake it up to get help
     */
    split&#40;); 
    // Not break, the main thread will continue as usual
&#125;

/*
  * At this point, the master thread ran out of moves, if there is any slave thread 
  * still working, the main thread will go to idle to wait. When slave threads 
  * complete their tasks, they will wake the master thread up. If there is no 
  * slave working, finish the search as usual
  */
if &#40;there is any working slave thread&#41;
   goToIdleState&#40;);

/*
 * Update hash table and return as usual here
 */

Any help will be highly appreciated.
Pham

bob · Post by **bob** » Sat Sep 04, 2010 3:34 am

phhnguyen wrote:Hi all,

I have started parallel chess programming, focusing on YBWC algorithm. Have just read some papers and took a look at Viper code. I understand the main idea but still miss a lot of technique details which some of I ask here:

1) When copy data for a new thread, should I copy all history moves? I am trying to store board and some information with history moves so worry much about the copy amount. If yes, do you know any technique to reduce the copy (such as copy only from the last capture?)?

Yes. Or use a global history table. But the best test would be to try just using your existing history data for a thread, and testing. And then copying parent's history data and testing. And even copying/combining parent's data with current data and testing. Copying is not free. Does the advantages of more current (copied) data offset the cost of copying that data? Testing is the only way to answer.

2) In Viper code, I see the code of search function which I confuse much looks like below:
Code: Select all
for &#40;all moves&#41; &#123;
  ...
  if &#40;split&#40;...))
     break;
&#125;
Why break at split point? Should the master thread continue to search until running out of move?

I have not looked at the code, but I'd bet, based on the above, that the "Split() function actually splits the search, completes it, and then returns. So the natural thing to do is break; since the search has not analyzed every move during the parallel split() function call. Just a guess, but probably correct.

3) From my understanding, I write the following pseudo code for the main search function, (can someone correct me please?):

Code: Select all

for &#40;all moves&#41; &#123;
    if &#40;move is illegal&#41; continue;
    makemove&#40;);
    search&#40;-beta, -alpha&#41;; // call recursive alphabeta
    undomove&#40;);

    /*
     * Until here the search made at least one legal move, so we call split 
     * function to check if there any available thread. If yes, copy data for 
     * it and then wake it up to get help
     */
    split&#40;); 
    // Not break, the main thread will continue as usual
&#125;

/*
  * At this point, the master thread ran out of moves, if there is any slave thread 
  * still working, the main thread will go to idle to wait. When slave threads 
  * complete their tasks, they will wake the master thread up. If there is no 
  * slave working, finish the search as usual
  */
if &#40;there is any working slave thread&#41;
   goToIdleState&#40;);

/*
 * Update hash table and return as usual here
 */

Any help will be highly appreciated.
Pham

There is a problem in the above. What if the main thread runs out of moves to search. All it can do is sit and wait until everyone else has finished. Not as efficient as sending all threads, including the current one, off to a different block of code to do the parallel search, so that if anyone runs out of work there, they can continue to help still-working threads...

phhnguyen · Post by **phhnguyen** » Sat Sep 04, 2010 6:02 am

bob wrote: Yes. Or use a global history table. But the best test would be to try just using your existing history data for a thread, and testing. And then copying parent's history data and testing. And even copying/combining parent's data with current data and testing. Copying is not free. Does the advantages of more current (copied) data offset the cost of copying that data? Testing is the only way to answer.

It seems that you are using a pointer system for storing history moves, which has some advantages over my implementation of using a simple array.

But there is a point I am not clear: why do you still need to copy (at least not too much)? Is it enough to just allocate memory and set pointers for slave threads?

There is a problem in the above. What if the main thread runs out of moves to search. All it can do is sit and wait until everyone else has finished. Not as efficient as sending all threads, including the current one, off to a different block of code to do the parallel search, so that if anyone runs out of work there, they can continue to help still-working threads...

I have planed to implement the weak form of YBWC first. After that I will improve the function goToIdleState() to return that master thread to available thread pool to help others.

If you find no error in the my pseudo code, I am going to implement it now. Thank you for your helps.

bob · Post by **bob** » Sat Sep 04, 2010 7:10 am

phhnguyen wrote:
bob wrote: Yes. Or use a global history table. But the best test would be to try just using your existing history data for a thread, and testing. And then copying parent's history data and testing. And even copying/combining parent's data with current data and testing. Copying is not free. Does the advantages of more current (copied) data offset the cost of copying that data? Testing is the only way to answer.
It seems that you are using a pointer system for storing history moves, which has some advantages over my implementation of using a simple array.

But there is a point I am not clear: why do you still need to copy (at least not too much)? Is it enough to just allocate memory and set pointers for slave threads?

Depends on your approach. There is an argument for global history counters (one set that everybody modifies and uses. There is an argument for local history for each thread. If you use a local history, you might want to "seed" the local history by copying from the parent's history when you start a search. Or you might not. Difficult to say which is best.

There is a problem in the above. What if the main thread runs out of moves to search. All it can do is sit and wait until everyone else has finished. Not as efficient as sending all threads, including the current one, off to a different block of code to do the parallel search, so that if anyone runs out of work there, they can continue to help still-working threads...
I have planed to implement the weak form of YBWC first. After that I will improve the function goToIdleState() to return that master thread to available thread pool to help others.

If you find no error in the my pseudo code, I am going to implement it now. Thank you for your helps.

It leaves out a lot of details, but looks OK from an abstract point of view. Probably the biggest issue will be to learn how to use the "volatile" keyword correctly in data declarations.

phhnguyen · Post by **phhnguyen** » Sat Sep 04, 2010 7:00 pm

bob wrote: Depends on your approach. There is an argument for global history counters (one set that everybody modifies and uses. There is an argument for local history for each thread. If you use a local history, you might want to "seed" the local history by copying from the parent's history when you start a search. Or you might not. Difficult to say which is best.

OK, I have seen one reason to have a large copy anyway even I change to use global history moves and pointer system. If I implement the strong form of YBWC and when a master thread is going to help others, it has to copy to save all history moves from ply 0 to current working one. Am I correct?

It leaves out a lot of details, but looks OK from an abstract point of view. Probably the biggest issue will be to learn how to use the "volatile" keyword correctly in data declarations.

Thanks, I will learn

I have a lot of problems of implementing details. Allow me to ask one by one from simplest ones.

At this moment I have been considering to implement two ways to call a slave thread to help:

1) Master thread checks legal and do the move first, then call slave help from that move. Advantages: make sure slave threads always have "real" tasks to do. Disadvantage: Master thread has to work harder when some threads are idle
2) Master thread call slave thread help immediately after the first legal move. Slave thread has to check legal and do the move itself. Advantage: slave threads go to help earlier. Disadvantage: master and slave threads may look each other more frequently; after all, some slave threads may have nothing (legal move) to do

Can someone give me some suggestions on above problems? Many thanks.

bob · Post by **bob** » Sat Sep 04, 2010 9:12 pm

phhnguyen wrote:
bob wrote: Depends on your approach. There is an argument for global history counters (one set that everybody modifies and uses. There is an argument for local history for each thread. If you use a local history, you might want to "seed" the local history by copying from the parent's history when you start a search. Or you might not. Difficult to say which is best.
OK, I have seen one reason to have a large copy anyway even I change to use global history moves and pointer system. If I implement the strong form of YBWC and when a master thread is going to help others, it has to copy to save all history moves from ply 0 to current working one. Am I correct?

"has to"?? No. "should"?? probably. That's the point here. Don't make any assumptions, test. Don't take what others do as "best practice". Test. You never know what works best in a specific program until you test it.

It will take a good while to get a working parallel search unless you do like some and just copy/paste what is done in some open-source program or other. First you want correctness. Only then do you want to worry about performance. Of course, as you design, you want to avoid any issues that will create significant issues later, with respect to performance. But first make it correct, then make it fast. Hopefully it will remain correct.

It leaves out a lot of details, but looks OK from an abstract point of view. Probably the biggest issue will be to learn how to use the "volatile" keyword correctly in data declarations.
Thanks, I will learn

I have a lot of problems of implementing details. Allow me to ask one by one from simplest ones.

At this moment I have been considering to implement two ways to call a slave thread to help:

1) Master thread checks legal and do the move first, then call slave help from that move. Advantages: make sure slave threads always have "real" tasks to do. Disadvantage: Master thread has to work harder when some threads are idle

This is an implementation detail. YBW already means you need to search at least one move before starting a parallel search. Yes, from the instant where a thread goes idle, to the instant that it starts helping, ought to be as short as possible.

2) Master thread call slave thread help immediately after the first legal move. Slave thread has to check legal and do the move itself. Advantage: slave threads go to help earlier. Disadvantage: master and slave threads may look each other more frequently; after all, some slave threads may have nothing (legal move) to do

Can someone give me some suggestions on above problems? Many thanks.

In Crafty, my move generator just spits out a stream of moves. Whether it does it for one process or for several is just a matter of proper locking so that two different threads don't get the same move to search...

zamar · Post by **zamar** » Sat Sep 04, 2010 9:23 pm

bob wrote:
2) In Viper code, I see the code of search function which I confuse much looks like below:
Code: Select all
for &#40;all moves&#41; &#123;
  ...
  if &#40;split&#40;...))
     break;
&#125;
Why break at split point? Should the master thread continue to search until running out of move?
I have not looked at the code, but I'd bet, based on the above, that the "Split() function actually splits the search, completes it, and then returns. So the natural thing to do is break; since the search has not analyzed every move during the parallel split() function call. Just a guess, but probably correct.

I can confirm that your guess is correct.

phhnguyen · Post by **phhnguyen** » Tue Sep 14, 2010 8:30 am

I have been designing my YBWC algorithm and have another technical question which needed some advices:

When a slave has found a cut-off (value>beta - the search is completed) I don't know what to do with master and other slave threads. First of all, I think I will simply stop them. However, I see problem that I can't stop master because it needs to work to return node's score to the parent - the master needs to abort searching until to a correct node then continue working. Furthermore, if some of those threads are working as slaves for other nodes I can't stop them either.

I have imagined a very complicated system (using some structures and pointers) to manage them but afraid of over designing. Can someone share me experience about that?

Many thanks,

bob · Post by **bob** » Tue Sep 14, 2010 4:13 pm

phhnguyen wrote:I have been designing my YBWC algorithm and have another technical question which needed some advices:

When a slave has found a cut-off (value>beta - the search is completed) I don't know what to do with master and other slave threads. First of all, I think I will simply stop them. However, I see problem that I can't stop master because it needs to work to return node's score to the parent - the master needs to abort searching until to a correct node then continue working. Furthermore, if some of those threads are working as slaves for other nodes I can't stop them either.

I have imagined a very complicated system (using some structures and pointers) to manage them but afraid of over designing. Can someone share me experience about that?

Many thanks,

What do you do if the master runs out of work before any of the slaves. Does it sit and wait?

In Crafty, I hang all "slave" (if you want to call them that) processes in a function called "ThreadWait()" where they spin until their global "pointer" becomes non-null. When I split, I copy the necessary data from the current thread to a split block to be used by each idle thread, and I also make a new copy for the master (the current process). The master then stuffs the addresses of those split blocks into the global pointers the other threads are looping on in ThreadWait() which breaks that "while (!*mypointer);" loop. Then the master calls ThreadWait() itself and uses the pointer it has already set. If the master runs out of work first, it then can sit in that same busy loop waiting on someone else to give it some work. And there is a special condition that if there is nobody else waiting at this split point, while the master is, then the master can now return and back things up correctly.

If you do that, then yes, you can stop all the threads (stop as in tell them to quit what they are doing and return back to ThreadWait() to wait for more work)...

Complicated, but it works well.

phhnguyen · Post by **phhnguyen** » Tue Sep 14, 2010 5:06 pm

Thanks Bob, getting your points. To make clear for me, I would like to ask few questions:

Suppose a split block A is ran by master thread 1, then thread 1 runs out of work and goes to help and currently works in other block, say B. Now, a slave (thread 2) of block A completes the search (found a cut off):

- Will we stop the thread 1 (running block B)? Will we wait thread 1 to complete block B, then be back to work with A?

- Or can we use any available thread to work on A as the master? If yes, can we use that slave (thread 2 - who found cut-off) to return the result to parent (play as the master)?

Need help on YBWC algorithm

Need help on YBWC algorithm

Re: Need help on YBWC algorithm

Re: Need help on YBWC algorithm

Re: Need help on YBWC algorithm

Re: Need help on YBWC algorithm

Re: Need help on YBWC algorithm

Re: Need help on YBWC algorithm

Another question

Re: Another question

Re: Another question