SPPS - a Simple Parallel Processing Scheme
Posted: Sat Jun 18, 2011 5:41 pm
I work currently on a GPU implementation of a Chess Engine and want to introduce a parallel processing scheme i use.
Initial situation:
- tenthousands of parallel threads
- no recursion
- no built-in communication between threads
- little registers amount
- sufficient RAM
I assume a max amount of 128 (maybe 256 in future) possible childs for each node.
So i am able to run 128 threads on one movelist of a chess position in parallel.
Because GPUs have little registers amount (private/local memory) all the movelists have to be stored in GPU-RAM (global memory).
With an array of "search depth * X * 128 * 128 * sizeof(Move)" the gpu is able to run x*128 threads for x chess positions in parallel.
Pseudo Code for 128 threads:
Initial situation:
- tenthousands of parallel threads
- no recursion
- no built-in communication between threads
- little registers amount
- sufficient RAM
I assume a max amount of 128 (maybe 256 in future) possible childs for each node.
So i am able to run 128 threads on one movelist of a chess position in parallel.
Because GPUs have little registers amount (private/local memory) all the movelists have to be stored in GPU-RAM (global memory).
With an array of "search depth * X * 128 * 128 * sizeof(Move)" the gpu is able to run x*128 threads for x chess positions in parallel.
Pseudo Code for 128 threads:
Code: Select all
// pid => Process ID, 0-127
// for each search depth
while (sd >= 0)
// for each board and a fix search depth
while (done[sd] < 128 && sd < max_depth) {
// every of the 128 parallel process uses the same initial board
board = globalboards[(sd*128)+done[sd]];
// until end of boards
if (!board)
break;
// every of the 128 parallel process uses an individual move
move = globalmoves[(sd*128*128)+((done[sd])*128)+pid];
// only if our process id gets a move
if (!move)
continue;
// domove
// move up in search tree
sd++;
// copy board to global
globalboards[(sd*128)+pid] = board;
// generate legal moves
moves = genMoves(board);
// and store in global
globalmoves[(sd*128*128) + (pid*128)+i] = move.i;
}
// move down in tree
done[sd] = 0;
sd--;
}