Ron Murawski wrote:
So I think OpenCL is trying to leverage available GPU power, which is not quite
the same as scaling program performance according to cpu cores available.
My understanding is that OpenCL uses everything it finds that can computer : either CPU cores or GPU cores. But you could be right as well, as a look at Apple's own advertising seems to show too :
But even with "just" the GPU cores, those are now growing so massively parallel that they exceed any CPU's horsepower since a few years yet and the distance is growing.
So my belief is that chess engines should use the GPUs whenever available, as it would multiply their nps by something like a quantum leap. My intention is to take a look at that for my own engine one day (when I have enough time..... *sigh*)
(actually, I'm quite surprised that nobody yet in the top-engine authors elite started to use the GPUs as a performance booster...)
phhnguyen wrote:Any reason and update for the "battle" between C and C++? Thanks.
It has not been intended to continue such a battle, even because I do not regard C++ to be significantly slower simply by using it. It merely is targeting two goals: a) to force myself to realize some parts more simple and near to a machine, e.g. by substituting C++ objects by more dense data structures fitting into word length; b) to prepare for a later intended code embedding into an Objective-C environment (which of course also might be possible using C++, but seems to be too quirky).
Ron Murawski wrote:
So I think OpenCL is trying to leverage available GPU power, which is not quite
the same as scaling program performance according to cpu cores available.
My understanding is that OpenCL uses everything it finds that can computer : either CPU cores or GPU cores. But you could be right as well, as a look at Apple's own advertising seems to show too :
But even with "just" the GPU cores, those are now growing so massively parallel that they exceed any CPU's horsepower since a few years yet and the distance is growing.
So my belief is that chess engines should use the GPUs whenever available, as it would multiply their nps by something like a quantum leap. My intention is to take a look at that for my own engine one day (when I have enough time..... *sigh*)
(actually, I'm quite surprised that nobody yet in the top-engine authors elite started to use the GPUs as a performance booster...)
Wikipedia says:
"OpenCL gives any application access to the Graphics Processing Unit for
non-graphical computing. Thus, OpenCL extends the power of the Graphics
Processing Unit beyond graphics"
It will be hard to coordinate OpenCL usage in a chess mp program. There exists
the possibility of a stall (waiting for the GPU calculation to finish) and
it's not clear what amount of communication overhead is required. If the
overhead is high, then only large calculations should be done this way...
Ron Murawski wrote:
So I think OpenCL is trying to leverage available GPU power, which is not quite
the same as scaling program performance according to cpu cores available.
My understanding is that OpenCL uses everything it finds that can computer : either CPU cores or GPU cores. But you could be right as well, as a look at Apple's own advertising seems to show too :
But even with "just" the GPU cores, those are now growing so massively parallel that they exceed any CPU's horsepower since a few years yet and the distance is growing.
So my belief is that chess engines should use the GPUs whenever available, as it would multiply their nps by something like a quantum leap. My intention is to take a look at that for my own engine one day (when I have enough time..... *sigh*)
(actually, I'm quite surprised that nobody yet in the top-engine authors elite started to use the GPUs as a performance booster...)
Wikipedia says:
"OpenCL gives any application access to the Graphics Processing Unit for
non-graphical computing. Thus, OpenCL extends the power of the Graphics
Processing Unit beyond graphics"
It will be hard to coordinate OpenCL usage in a chess mp program. There exists
the possibility of a stall (waiting for the GPU calculation to finish) and
it's not clear what amount of communication overhead is required. If the
overhead is high, then only large calculations should be done this way...
Ron
There have been several attempts to adapt GPUs to chess using OpenCL.
As yet, none have had any significant success, so far as I know.
Dann Corbit wrote:
There have been several attempts to adapt GPUs to chess using OpenCL.
As yet, none have had any significant success, so far as I know.
That's very interesting, and I had no idea it had been attempted. Would you happen to have more details on those attempts, Dann?
Surely, as Ron pointed out, such a program would have to limit data exchanges with the central memory, using only the graphic card's one (most of them have over 1GB of ram, now... but shared between hundred of cores). That could be a bottleneck for hash-tables. But maybe we could try to adapt a small engine that makes no use of hashtables? For instance, HGM's Micro-Max looks like a perfect candidate to learn to master those new technologies...
GPUs are generally useful for doing the same set of relatively simple floating point calculations on a large data set in parallel. That doesn't really reflect how a chess program works.
As mentioned, there have been several attempts to write a chess program in OpenCL as well as in CUDA. So far, none of them have been very succesful. A GPU just doesn't seem like the best architecture to build a chess program on.
Right... I found two projects of chess engines based on OpenCL, none of them seeming to make any real progress (but remember this is just the beginning of OpenCl, nobody use it yet but for toy-tests (fractals, pi-calculation...). Only Apple uses it extensively in their OS (the latest QuickTime X makes use of OpenCL for instance)... but they created it so they master it.
I have to agree even though I was optimistic before. I had to sweat to get a speed up of 5x, for a seemingly easy parallel application like conjugate gradient solver, from an Nvidia Quadro GPU with 128 cores. It has to be embarrassingly parallel with no or little communication between cpus and gpus. Copying of all necessary data should be done before solution begins and put to shared memory, which is not the case for chess. Chess on a cluster of Cpus is already hard enough and gpus doesn't seem feasible in the near future.
Dann Corbit wrote:
There have been several attempts to adapt GPUs to chess using OpenCL.
As yet, none have had any significant success, so far as I know.
That's very interesting, and I had no idea it had been attempted. Would you happen to have more details on those attempts, Dann?
Surely, as Ron pointed out, such a program would have to limit data exchanges with the central memory, using only the graphic card's one (most of them have over 1GB of ram, now... but shared between hundred of cores). That could be a bottleneck for hash-tables. But maybe we could try to adapt a small engine that makes no use of hashtables? For instance, HGM's Micro-Max looks like a perfect candidate to learn to master those new technologies...
Transfer of memory is one issue.
Complete lack of recursion is another.
JuLieN wrote:Right... I found two projects of chess engines based on OpenCL, none of them seeming to make any real progress (but remember this is just the beginning of OpenCl, nobody use it yet but for toy-tests (fractals, pi-calculation...). Only Apple uses it extensively in their OS (the latest QuickTime X makes use of OpenCL for instance)... but they created it so they master it.
We have in the comp lab one quad with 3 GPUs in it to do molecular simulations with NAMD (tailored to use GPUs). A collaborator has been down there at UofI to pick this up and beta test it. We build this machine and it is a freaking monster. This is not a toy!