I would like to know a little more about your engine. Some time ago I corresonded to a company who was very interested in gpu's and chess and at that time was willing to allow the use of some high end equipment all using gpus. Back then their interest was to generate 7 piece egtb's. If I understand more of what you are doing I can always touch base with them and see if their position is still available.
BTW do you or are you interested in coding for an egtb generator? Just curious.
very nice to read that you have managed to create a chess playing engine on a GPU!
smatovic wrote:With its current "SPPS" search, which is comparable to a Negamax wo AlphaBeta pruning Zeta achieves with 128 Threads ~ 500.000 nodes per second.
No Quiscence Search, no Castling/En Passant moves. Very simple Eval.
In your blog you wrote:
As i thought the SPPS-Search with 128 parallel Threads has problems when the engine enters the Q-Search. In Q-Search are only Capture-Moves considered and as fewer the moves size is the more power i am loosing with spps
What is the reason for this? Could you explain how your "SPPS" scheme is related to Q-Search? Since you do not want to split a QS node over different threads there must be some other reason which I do not understand yet.
very nice to read that you have managed to create a chess playing engine on a GPU!
What is the reason for this? Could you explain how your "SPPS" scheme is related to Q-Search? Since you do not want to split a QS node over different threads there must be some other reason which I do not understand yet.
With spps- a simple parallel processing scheme - i use 128 Threads in parallel to process one board position with an max amount of 128 childs (my personal assumption). So the next iteration has 128*128 childs. Because every of this 128 threads generates max 128 childs. This means my average occupancy of the 128 threads depends on the max amount of childs from one position....i.e. if we got an average of 32 childs per node during a chess game then i will "loose" with spps 128-32=96 idle Threads, but i win a SIMD friendly process with no communication overhead.
Naturaly Q-Search has fewer moves (childs) so i got more idle nodes...
...i am going to implement AlphaBeta Pruning with move ordering next, maybe this will boost enough to handle Q-Search.