GPUs better for chess than CPUs?

parrish · Post by **parrish** » Mon Jun 06, 2011 9:28 pm

hgm wrote:
parrish wrote:A search tree is, by it's nature, parallel, and not serial.
Not if you want to employ alpha-beta pruning. If you would search all branches from a certain node in parallel, and one of them would give you a beta cutoff, you can no longer prune all the other branches. Because you had already searched them in parallel. Which was a complete waste of time. So naively parallelizing the tree search will waste about 99.999999% of the CPU power. You will need an awful lot of parallelism to make up for that...

As to GPUs: I am not really an expert on them, but I thought GPUs were not so much parallel processors, but rathere (SIMD) array processors. Or am I completely wrong on that?

The entire concept of alpha-beta pruning was designed in the era of slow serial processing cpu's. With today's enormous, much faster, AND CHEAP parallel processing power available, alternative methods to alpha-beta pruning should be considered.

parrish · Post by **parrish** » Mon Jun 06, 2011 9:35 pm

rbarreira wrote:
marcelk wrote:
rbarreira wrote:You have to find a way to do alpha-beta search efficiently with lots of threads.

You have to shoehorn move generation / move making / evaluation into GPU code (it's not even close to a natural programming paradigm for those things... for one thing, branching and recursion aren't present).

A pipe dream basically.
My prediction is that someone new to computer chess who doesn't
yet know it is extremely hard will find a method anyway, and after
that everyone else will copy and beat him.
Why would that happen with computer chess specifically? Or do you mean that you think GPUs are universally good for all problems?

It could happen, but the odds are against it because no one thinks that GPUs are good for general problem solving. GPUs are good at applying a small set of simple operations to a large number of independent data units. That doesn't describe a tree search in any way...

"A small set of simple operations to a large number of independent data units" is precisely what a static eval is.

Daniel Shawul · Post by **Daniel Shawul** » Mon Jun 06, 2011 9:36 pm

Well that is only part of the problem. Copying data to/from gpu is very slow like in the case of cluster computing. I would guess a plain monte carlo approach is the best candidate here but. If you add trees & branches you have to __synchthread() somewhere which makes things very slow..
YMMV

wgarvin · Post by **wgarvin** » Mon Jun 06, 2011 10:15 pm

parrish wrote: "A small set of simple operations to a large number of independent data units" is precisely what a static eval is.

Not really. Static eval involves a large number of different, small calculations. What GPUs excel at is doing the same calculations across a lot of different data inputs (i.e. wide SIMD).

As others have noted, its pretty hard to adapt alpha-beta tree searching to the GPU computation model in any efficient way.

marcelk · Post by **marcelk** » Mon Jun 06, 2011 11:22 pm

rbarreira wrote:
marcelk wrote:
rbarreira wrote:You have to find a way to do alpha-beta search efficiently with lots of threads.

You have to shoehorn move generation / move making / evaluation into GPU code (it's not even close to a natural programming paradigm for those things... for one thing, branching and recursion aren't present).

A pipe dream basically.
My prediction is that someone new to computer chess who doesn't
yet know it is extremely hard will find a method anyway, and after
that everyone else will copy and beat him.
Why would that happen with computer chess specifically? Or do you mean that you think GPUs are universally good for all problems?

It could happen, but the odds are against it because no one thinks that GPUs are good for general problem solving. GPUs are good at applying a small set of simple operations to a large number of independent data units. That doesn't describe a tree search in any way...

The reason I think that is because I get or see the question asked nearly every month or so. Some of those askers will try it. One might succeed.
Reasoning why it will never work works against us old school programmers. We all think mostly in the same patterns.

Dann Corbit · Post by **Dann Corbit** » Mon Jun 06, 2011 11:50 pm

Seems to me that a workable model might be to do all the work on the GPU board and send nothing back except the answers (move and score). The only thing sent to the GPU board would be the initialization stuff and then the opponent's move for each subsequent ply.

The GPU board answers back with the move and score so the I/O becomes tiny after the setup process.

If the GPUs are highly integrated enough (IOW, if we have literally *thousands* of GPUs) we can probably work around the recursion problem by banked iteration and the communication cost will be reduced to the answer set only.

But I have not tried it, so I have no idea if the idea is any good.

hgm · Post by **hgm** » Tue Jun 07, 2011 6:40 am

parrish wrote:The entire concept of alpha-beta pruning was designed in the era of slow serial processing cpu's. With today's enormous, much faster, AND CHEAP parallel processing power available, alternative methods to alpha-beta pruning should be considered.

Well, alpha-beta pruning is nothing else than not wasting time on branchess that will not affect the score. Nothing more, nothing less. So _any_ alternative will waste time by searching branchess that do not affect the score. Unfortunately, the search tree of two-player zero-sum games with alternating turns is such that almost all branches of the tree are not affecting the score. The score-affecting branches contain only about the square root of the number of nodes of the full tree.

Dann Corbit · Post by **Dann Corbit** » Tue Jun 07, 2011 8:06 am

hgm wrote:
parrish wrote:The entire concept of alpha-beta pruning was designed in the era of slow serial processing cpu's. With today's enormous, much faster, AND CHEAP parallel processing power available, alternative methods to alpha-beta pruning should be considered.
Well, alpha-beta pruning is nothing else than not wasting time on branchess that will not affect the score. Nothing more, nothing less. So _any_ alternative will waste time by searching branchess that do not affect the score. Unfortunately, the search tree of two-player zero-sum games with alternating turns is such that almost all branches of the tree are not affecting the score. The score-affecting branches contain only about the square root of the number of nodes of the full tree.

What about a zero width search of all branches, and then strengthen the strong branches by widening? (you could also kill branches once mate is found). More promising branches would be extended and more toxic branches would be reduced.

If you have enough CPUs it might be possible to get a good effect.

smatovic · Post by **smatovic** » Tue Jun 07, 2011 6:13 pm

The point is to feed in a parallel-alpha-beta search thousands of threads with boards considering the known GPU-limitations (no recursion, no sub-threads).

"Holger Hopp" describes in his german Diplom Thesis a way to implement YBCW for an synthetic game tree on an Mas-Par SIMD computer. But the Mas-Par has an built-in communication bus, GPUs not.

In OpenCL/Cuda a thread is not able to "call" another thread, so it is more difficult to implement a Master-Slave relationship between threads for parallel alpha-beta.

--
Srdja

smatovic · Post by **smatovic** » Tue Jun 07, 2011 6:49 pm

The entire concept of alpha-beta pruning was designed in the era of slow serial processing cpu's. With today's enormous, much faster, AND CHEAP parallel processing power available, alternative methods to alpha-beta pruning should be considered.

You are right, but consider:

Modern high-end GPUs have today about 600 GFLOP in DP, a single CPU with 4 cores has about 50 GFLOP. Sounds good, but not if you consider that ab-pruning with good move ordering "speeds" the CPU up to 100x (compared to standard min-max).

You will need a non-ab-pruning algorithm with an speedup of min. 10x to compete with CPUs.

--
Srdja

GPUs better for chess than CPUs?

Re: GPUs better for chess than CPUs?

Re: GPUs better for chess than CPUs?

Re: GPUs better for chess than CPUs?

Re: GPUs better for chess than CPUs?

Re: GPUs better for chess than CPUs?

Re: GPUs better for chess than CPUs?

Re: GPUs better for chess than CPUs?

Re: GPUs better for chess than CPUs?

Re: GPUs better for chess than CPUs?

Re: GPUs better for chess than CPUs?