I want to introduce the implemented search algorithm of Zeta 097x,
afaik only Zeta and Rocinante by Antonio Torrecillas uses an MC-AB Search for Chess.
Normal Chess Engines use an parallel AlphaBeta-Depth-First Search like YBWC or DTS for computation.
I tried to implement such a framework on the GPU, but i was not able to implement an efficient load balancer to distribute and handle the work across the processes. Remember that GPUs run thousands of concurrent threads.
After some tests i decided to use an Monte-Carlo-AlphaBeta search with UCT Parameters. It is similar to the the Best-First-Minimax-Search proposed by Korf and Chickering.
MC-AB can be divided into four strategic phases, repeated as long as there is time left:
In the Selection phase the tree is traversed from the root-node until it selects a leaf-node that is not added to the tree yet
The Expansion strategy adds the leaf node to the tree
The Playout phase performs an Alpha-Beta-Quiscence search where only the captures are played.
The Backpropagation strategy propagates the result from the playout phase through the tree.
I have tested several select formulas for the MC-AB selection phase:
score = child.score
score = child.score - (child.score/100)*rand()%33
score = child.score - (child.score/100)*rand()%33 + (parent.visits / child.visits)
The first one searches deep, but misses often important positions,
the second one adds some randomized parameter to utilize better the threads of the GPU. And the third one adds some simple UCT parameters to make the search broader.
Because the game-tree is kept in memory he can be prepared and reused for the next turn.
My current test-games show that MC-AB with UCT has to search more than 10 times as much nodes as an AB-Depth-First-Search to achieve an similar game play.
--
Srdja
Inside Zeta 097x, the MC-AlphaBeta Search
Moderator: Ras
-
smatovic
- Posts: 3503
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
-
Edsel Apostol
- Posts: 803
- Joined: Mon Jul 17, 2006 5:53 am
- Full name: Edsel Apostol
Re: Inside Zeta 097x, the MC-AlphaBeta Search
Hi Srdja,
I'm interested in making our engine work with GPUs also, because I think that it would be the future of computing. What do you think is the main implementation difference between a standard chess engine and the one working with GPU (hash tables, pawn hash tables, eval cache, search algorithms, etc)? How much work will be needed to implement such from a working standard chess engine?
I'm interested in making our engine work with GPUs also, because I think that it would be the future of computing. What do you think is the main implementation difference between a standard chess engine and the one working with GPU (hash tables, pawn hash tables, eval cache, search algorithms, etc)? How much work will be needed to implement such from a working standard chess engine?
-
smatovic
- Posts: 3503
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: GPU vs CPU
Well,
the main differences between a GPU and CPU:
1) GPUs run thousands or ten-thousands of threads.
2) GPUs consists of tens or hundreds of SIMD Units.
3) GPUs have little register size.
4) GPUs are limited in programming.
Be sure,
you will not only have to rewrite your code, but rethink how your code is structured,
because of the SIMD-Nature.....16 to 32 threads are executed in one Warp or Wavefront,
means all these threads execute every branch or loop.
Register Size per Thread really hurts,
you will have to deal with about 512 Bytes per thread,
means there is no place to hold an complete move-list,
and global Memory (RAM), slows things significant down.
The current GPUs do not support recursion (except the upcoming K20 of Nvidia),
so you will have to write things in an iterative way.
Like Vincent made clear in his posts an SMP Search on GPUs has to be layered because of the amount of threads, at least if you wish to use an classic parallel AlphaBeta Search.
http://talkchess.com/forum/viewtopic.php?t=46277
I tried to implement different Search Algorithms, like Nagging, YBW, RBFS,
but MC-AB is the best i can come up with.
If you are going to develop for one GPU-Architecture only things are getting more relaxed...
Another thingy are APUs, CPU with an GPU on die,
no clue how fast they are and if they could be used as an coprocessor...
Zeta is GPL btw
https://github.com/smatovic/Zeta
--
Srdja
the main differences between a GPU and CPU:
1) GPUs run thousands or ten-thousands of threads.
2) GPUs consists of tens or hundreds of SIMD Units.
3) GPUs have little register size.
4) GPUs are limited in programming.
Be sure,
you will not only have to rewrite your code, but rethink how your code is structured,
because of the SIMD-Nature.....16 to 32 threads are executed in one Warp or Wavefront,
means all these threads execute every branch or loop.
Register Size per Thread really hurts,
you will have to deal with about 512 Bytes per thread,
means there is no place to hold an complete move-list,
and global Memory (RAM), slows things significant down.
The current GPUs do not support recursion (except the upcoming K20 of Nvidia),
so you will have to write things in an iterative way.
Like Vincent made clear in his posts an SMP Search on GPUs has to be layered because of the amount of threads, at least if you wish to use an classic parallel AlphaBeta Search.
http://talkchess.com/forum/viewtopic.php?t=46277
I tried to implement different Search Algorithms, like Nagging, YBW, RBFS,
but MC-AB is the best i can come up with.
If you are going to develop for one GPU-Architecture only things are getting more relaxed...
Another thingy are APUs, CPU with an GPU on die,
no clue how fast they are and if they could be used as an coprocessor...
Zeta is GPL btw
https://github.com/smatovic/Zeta
--
Srdja
-
diep
- Posts: 1822
- Joined: Thu Mar 09, 2006 11:54 pm
- Location: The Netherlands
Re: Inside Zeta 097x, the MC-AlphaBeta Search
Look my national rating is slowly rising. There is always a 6 months to a year delay until games get counted into the elorating. Means that i'm rising towards 2400 now national. I saw temporarly i'm at 2360 now.smatovic wrote:I want to introduce the implemented search algorithm of Zeta 097x,
afaik only Zeta and Rocinante by Antonio Torrecillas uses an MC-AB Search for Chess.
Normal Chess Engines use an parallel AlphaBeta-Depth-First Search like YBWC or DTS for computation.
I tried to implement such a framework on the GPU, but i was not able to implement an efficient load balancer to distribute and handle the work across the processes. Remember that GPUs run thousands of concurrent threads.
After some tests i decided to use an Monte-Carlo-AlphaBeta search with UCT Parameters. It is similar to the the Best-First-Minimax-Search proposed by Korf and Chickering.
MC-AB can be divided into four strategic phases, repeated as long as there is time left:
In the Selection phase the tree is traversed from the root-node until it selects a leaf-node that is not added to the tree yet
The Expansion strategy adds the leaf node to the tree
The Playout phase performs an Alpha-Beta-Quiscence search where only the captures are played.
The Backpropagation strategy propagates the result from the playout phase through the tree.
I have tested several select formulas for the MC-AB selection phase:
score = child.score
score = child.score - (child.score/100)*rand()%33
score = child.score - (child.score/100)*rand()%33 + (parent.visits / child.visits)
The first one searches deep, but misses often important positions,
the second one adds some randomized parameter to utilize better the threads of the GPU. And the third one adds some simple UCT parameters to make the search broader.
Because the game-tree is kept in memory he can be prepared and reused for the next turn.
My current test-games show that MC-AB with UCT has to search more than 10 times as much nodes as an AB-Depth-First-Search to achieve an similar game play.
--
Srdja
FIDE runs hopelessly behind of course as one has too few games that count for FIDE over here.
In short i estimate my strength at 2360+ right now.
That means i can play moves without randomness.
We know that brute force search is going to beat me.
How is a random scoring search going to beat me?
Sure, a random search plays stronger than just selecting a random move. Yet there is a limit that randomness gives expressed in professor elo.
Todays chessprograms do not have that limit.
Why introduce this limit in elo to your program?
-
smatovic
- Posts: 3503
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Inside Zeta 097x, the MC-AlphaBeta Search
Otherwise the search would not be able to utilize 4096 threads....Why introduce this limit in elo to your program?
--
Srdja
-
diep
- Posts: 1822
- Joined: Thu Mar 09, 2006 11:54 pm
- Location: The Netherlands
Re: Inside Zeta 097x, the MC-AlphaBeta Search
You're using the gpu to make a limited strength chessprogram?smatovic wrote:Otherwise the search would not be able to utilize 4096 threads....Why introduce this limit in elo to your program?
--
Srdja
-
smatovic
- Posts: 3503
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Inside Zeta 097x, the MC-AlphaBeta Search
Either this or,]You're using the gpu to make a limited strength chessprogram?
i try to find alternatives to AB-DepthFirst-Search that scale with more than thousands of threads...
--
Srdja
-
diep
- Posts: 1822
- Joined: Thu Mar 09, 2006 11:54 pm
- Location: The Netherlands
Re: Inside Zeta 097x, the MC-AlphaBeta Search
Well i wrote down my consideration i had in 2007, after researching for months a solution how to get a SMP chessprogram to work at the GPU.smatovic wrote:Either this or,]You're using the gpu to make a limited strength chessprogram?
i try to find alternatives to AB-DepthFirst-Search that scale with more than thousands of threads...
--
Srdja
The gpu's can be very fast, they just have 1 disadvantage. You *need* a good speedup out of the layers of SMP search and otherwise the 10x higher nps that gpu can give you is not going to translate into a bigger plydepth...