Inside Zeta 097x, the MC-AlphaBeta Search

smatovic · Post by **smatovic** » Tue Dec 04, 2012 11:28 pm

I want to introduce the implemented search algorithm of Zeta 097x,
afaik only Zeta and Rocinante by Antonio Torrecillas uses an MC-AB Search for Chess.

Normal Chess Engines use an parallel AlphaBeta-Depth-First Search like YBWC or DTS for computation.

I tried to implement such a framework on the GPU, but i was not able to implement an efficient load balancer to distribute and handle the work across the processes. Remember that GPUs run thousands of concurrent threads.

After some tests i decided to use an Monte-Carlo-AlphaBeta search with UCT Parameters. It is similar to the the Best-First-Minimax-Search proposed by Korf and Chickering.

MC-AB can be divided into four strategic phases, repeated as long as there is time left:

In the Selection phase the tree is traversed from the root-node until it selects a leaf-node that is not added to the tree yet
The Expansion strategy adds the leaf node to the tree
The Playout phase performs an Alpha-Beta-Quiscence search where only the captures are played.
The Backpropagation strategy propagates the result from the playout phase through the tree.

I have tested several select formulas for the MC-AB selection phase:

score = child.score
score = child.score - (child.score/100)*rand()%33
score = child.score - (child.score/100)*rand()%33 + (parent.visits / child.visits)

The first one searches deep, but misses often important positions,
the second one adds some randomized parameter to utilize better the threads of the GPU. And the third one adds some simple UCT parameters to make the search broader.

Because the game-tree is kept in memory he can be prepared and reused for the next turn.

My current test-games show that MC-AB with UCT has to search more than 10 times as much nodes as an AB-Depth-First-Search to achieve an similar game play.

--
Srdja

Edsel Apostol · Post by **Edsel Apostol** » Sat Dec 08, 2012 2:10 am

Hi Srdja,

I'm interested in making our engine work with GPUs also, because I think that it would be the future of computing. What do you think is the main implementation difference between a standard chess engine and the one working with GPU (hash tables, pawn hash tables, eval cache, search algorithms, etc)? How much work will be needed to implement such from a working standard chess engine?

smatovic · Post by **smatovic** » Sat Dec 08, 2012 9:30 am

Well,
the main differences between a GPU and CPU:

1) GPUs run thousands or ten-thousands of threads.
2) GPUs consists of tens or hundreds of SIMD Units.
3) GPUs have little register size.
4) GPUs are limited in programming.

Be sure,
you will not only have to rewrite your code, but rethink how your code is structured,
because of the SIMD-Nature.....16 to 32 threads are executed in one Warp or Wavefront,
means all these threads execute every branch or loop.

Register Size per Thread really hurts,
you will have to deal with about 512 Bytes per thread,
means there is no place to hold an complete move-list,
and global Memory (RAM), slows things significant down.

The current GPUs do not support recursion (except the upcoming K20 of Nvidia),
so you will have to write things in an iterative way.

Like Vincent made clear in his posts an SMP Search on GPUs has to be layered because of the amount of threads, at least if you wish to use an classic parallel AlphaBeta Search.

http://talkchess.com/forum/viewtopic.php?t=46277

I tried to implement different Search Algorithms, like Nagging, YBW, RBFS,
but MC-AB is the best i can come up with.

If you are going to develop for one GPU-Architecture only things are getting more relaxed...

Another thingy are APUs, CPU with an GPU on die,
no clue how fast they are and if they could be used as an coprocessor...

Zeta is GPL btw

https://github.com/smatovic/Zeta

--
Srdja

diep · Post by **diep** » Sat Dec 08, 2012 2:54 pm

smatovic wrote:I want to introduce the implemented search algorithm of Zeta 097x,
afaik only Zeta and Rocinante by Antonio Torrecillas uses an MC-AB Search for Chess.

Normal Chess Engines use an parallel AlphaBeta-Depth-First Search like YBWC or DTS for computation.

I tried to implement such a framework on the GPU, but i was not able to implement an efficient load balancer to distribute and handle the work across the processes. Remember that GPUs run thousands of concurrent threads.

After some tests i decided to use an Monte-Carlo-AlphaBeta search with UCT Parameters. It is similar to the the Best-First-Minimax-Search proposed by Korf and Chickering.

MC-AB can be divided into four strategic phases, repeated as long as there is time left:

In the Selection phase the tree is traversed from the root-node until it selects a leaf-node that is not added to the tree yet
The Expansion strategy adds the leaf node to the tree
The Playout phase performs an Alpha-Beta-Quiscence search where only the captures are played.
The Backpropagation strategy propagates the result from the playout phase through the tree.

I have tested several select formulas for the MC-AB selection phase:

score = child.score
score = child.score - (child.score/100)*rand()%33
score = child.score - (child.score/100)*rand()%33 + (parent.visits / child.visits)

The first one searches deep, but misses often important positions,
the second one adds some randomized parameter to utilize better the threads of the GPU. And the third one adds some simple UCT parameters to make the search broader.

Because the game-tree is kept in memory he can be prepared and reused for the next turn.

My current test-games show that MC-AB with UCT has to search more than 10 times as much nodes as an AB-Depth-First-Search to achieve an similar game play.

--
Srdja

Look my national rating is slowly rising. There is always a 6 months to a year delay until games get counted into the elorating. Means that i'm rising towards 2400 now national. I saw temporarly i'm at 2360 now.

FIDE runs hopelessly behind of course as one has too few games that count for FIDE over here.

In short i estimate my strength at 2360+ right now.

That means i can play moves without randomness.

We know that brute force search is going to beat me.

How is a random scoring search going to beat me?

Sure, a random search plays stronger than just selecting a random move. Yet there is a limit that randomness gives expressed in professor elo.

Todays chessprograms do not have that limit.

Why introduce this limit in elo to your program?

smatovic · Post by **smatovic** » Sun Dec 09, 2012 12:15 pm

Why introduce this limit in elo to your program?

Otherwise the search would not be able to utilize 4096 threads....

--
Srdja

diep · Post by **diep** » Sun Dec 09, 2012 12:45 pm

smatovic wrote:
Why introduce this limit in elo to your program?
Otherwise the search would not be able to utilize 4096 threads....

--
Srdja

You're using the gpu to make a limited strength chessprogram?

smatovic · Post by **smatovic** » Sun Dec 09, 2012 12:55 pm

]You're using the gpu to make a limited strength chessprogram?

Either this or,
i try to find alternatives to AB-DepthFirst-Search that scale with more than thousands of threads...

--
Srdja

diep · Post by **diep** » Sun Dec 09, 2012 2:40 pm

smatovic wrote:
]You're using the gpu to make a limited strength chessprogram?
Either this or,
i try to find alternatives to AB-DepthFirst-Search that scale with more than thousands of threads...

--
Srdja

Well i wrote down my consideration i had in 2007, after researching for months a solution how to get a SMP chessprogram to work at the GPU.

The gpu's can be very fast, they just have 1 disadvantage. You *need* a good speedup out of the layers of SMP search and otherwise the 10x higher nps that gpu can give you is not going to translate into a bigger plydepth...

Inside Zeta 097x, the MC-AlphaBeta Search

Inside Zeta 097x, the MC-AlphaBeta Search

Re: Inside Zeta 097x, the MC-AlphaBeta Search

Re: GPU vs CPU

Re: Inside Zeta 097x, the MC-AlphaBeta Search

Re: Inside Zeta 097x, the MC-AlphaBeta Search

Re: Inside Zeta 097x, the MC-AlphaBeta Search

Re: Inside Zeta 097x, the MC-AlphaBeta Search

Re: Inside Zeta 097x, the MC-AlphaBeta Search