GPUs are constantly changing....
There is still not enough known about pascal to say if it will be a potentially useful target for something like chess.
But certainly we are getting closer... I would guess that sometime before 2022 GPUs will become not only viable but potentially necessary targets for a top tier chess engine.
What I want to know is can this be ported to a chess engine?
Moderators: hgm, Rebel, chrisw
-
- Posts: 546
- Joined: Sat Aug 17, 2013 12:36 am
-
- Posts: 710
- Joined: Sat Dec 06, 2014 1:53 pm
Re: What I want to know is can this be ported to a chess eng
Chess rely heavily on UNIFORM memory access.
Even NUMA memory access on multi-socket CPU gives penalty.
GPU is heavily-NUMA by nature. GPU speed is not due to advance in technology, but because heavy parrallelism with hundreds and thousands of threads, each working with isolated subset of slow memory.
Mini-Max bruteforce tree cannot be split in 1000 subsets of independent memory.
No matter 2022 or 2042 - if GPU will have advantages over CPU - only thanks to heavy-NUMA architecture.
Much easier and more promising is distributed computing over a PC cluster
http://acmbulletin.fiit.stuba.sk/vol3num2/lackovic.pdf
Even NUMA memory access on multi-socket CPU gives penalty.
GPU is heavily-NUMA by nature. GPU speed is not due to advance in technology, but because heavy parrallelism with hundreds and thousands of threads, each working with isolated subset of slow memory.
Mini-Max bruteforce tree cannot be split in 1000 subsets of independent memory.
No matter 2022 or 2042 - if GPU will have advantages over CPU - only thanks to heavy-NUMA architecture.
Much easier and more promising is distributed computing over a PC cluster
http://acmbulletin.fiit.stuba.sk/vol3num2/lackovic.pdf
-
- Posts: 1796
- Joined: Thu Sep 18, 2008 10:24 pm
Re: What I want to know is can this be ported to a chess eng
What about Monte Carlo?yurikvelo wrote:Chess rely heavily on UNIFORM memory access.
GPU is heavily-NUMA by nature. GPU speed is not due to advance in technology, but because heavy parrallelism with hundreds and thousands of threads, each working with isolated subset of slow memory.
Mini-Max bruteforce tree cannot be split in 1000 subsets of independent memory.
-
- Posts: 546
- Joined: Sat Aug 17, 2013 12:36 am
Re: What I want to know is can this be ported to a chess eng
HINT: This is wrong.No matter 2022 or 2042 - if GPU will have advantages over CPU - only thanks to heavy-NUMA architecture.
Much easier and more promising is distributed computing over a PC cluster
-
- Posts: 182
- Joined: Thu Jul 19, 2007 3:13 am
Re: What I want to know is can this be ported to a chess eng
What about this , for neural network chess ??
http://www.extremetech.com/extreme/2257 ... onic-brain
http://www.extremetech.com/extreme/2257 ... onic-brain
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: What I want to know is can this be ported to a chess eng
What caught my is the support for native "half precision" calculations.What caught my eye is the "double precision" calculations.
src:Mixed-Precision in GPUs
Looks like AMD, Intel and Nvidia add native 16 bit float and integer support to their devices:
AMD with GCN 1.2 (GCN Gen3)
http://www.anandtech.com/show/8460/amd- ... 5-review/2
Intel with Skylake IGP
https://software.intel.com/sites/defaul ... 9-v1d0.pdf
Nvidia with the upcoming Pascal architecture
http://blogs.nvidia.com/blog/2015/03/17/pascal/
http://web.archive.org/web/201602131457 ... s.app26.de
--
Srdja
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: What I want to know is can this be ported to a chess eng
I guess with some further tuning it could be 50+ mnps,5 million nps? That's good for those old cards.
my implementation relies on not optimized memory patterns.
--
Srdja
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: What I want to know is can this be ported to a chess eng
just my 2 cents...Isn't there ANYTHING that can be done on the GPU?
src:YBWC vs. RBFMS vs. MCTS vs. MCAB
To port an classic chess engine approach with an parallel Alphabeta algorithm like YBWC to an GPU architecutre would take a significant bunch of time, if it is even possible to port all well known computer chess techniques straight forward. And it is questionable if an Elo gain, by more computed nodes per second, is eaten up again by an higher branchingfactor due to an simpler implementation.
Zeta 098 and 097 make use of an Randomized Best First MiniMax Search, but my implementation makes excessive use of Global Memory and scales poorly.
At the very beginning of the project it was clear, that an Monte Carlo Tree Search would fit best for gpus. But until now there is no known engine that could make MCTS work well for Chess.
What is left, except to try to port an classic approach?
I could improve the performance of the BestFist search significantly by switching from GlobalMemory to LocalMemory and i could remove the randomness...another alternative would be to switch to MCAB, Monte Carlo Alphabeta...
http://web.archive.org/web/201602131457 ... s.app26.de
--
Srdja