recently i was convinced that by the use of local memory (memory shared by all items of an work-group resp. thread block)
and by coupling single threads (64) to work on the same chess position a fast but simple engine could be implemented.
in practice two issues hampered this idea:
1) Benchmarks showed that i will have to run multiple work-groups to utilize the gpu.
But by running multiple work-goups the local memory per work-group would not be enough to store a move-list-stack and additional search tree information.
2) Coupling threads to work on the same chess position in parallel is possible, but as son as i wish to sync the work, or share information between the the threads i loose a lot of cycles, so a "one thread one board" approach looks more promising.
--
Srdja
GPU chess update, local memory....
Moderators: hgm, Rebel, chrisw
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: GPU chess update, local memory....atomics
Playing around with local atomic functions,
get about 50 Knps to 100 Knps per work-group (running 64 threads) for an vanilla alphabeta implementation.
Hope i can quadruple the nps and then try a lazy smp approach with up to 512 work-groups...
--
Srdja
get about 50 Knps to 100 Knps per work-group (running 64 threads) for an vanilla alphabeta implementation.
Hope i can quadruple the nps and then try a lazy smp approach with up to 512 work-groups...
--
Srdja
-
- Posts: 31
- Joined: Fri Nov 25, 2016 10:14 am
- Location: Singapore
Re: GPU chess update, local memory....atomics
Brahim HAMADICHAREF
Singapore
Singapore
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: GPU chess update, local memory....atomics
thx, any help is appreciated.
--
Srdja
--
Srdja
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: GPU chess update, local memory....atomics
51M NPS would raise some eyebrows. I guess it would become the world's most cost effective chess engine (in terms of nodes per dollar).
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: GPU chess update, local memory....atomics
This looks interesting:
https://github.com/GPUOpen-Professional ... -Tools/HIP
It is a toolset that works on open CL and cuda and includes atomics.
https://github.com/GPUOpen-Professional ... -Tools/HIP
It is a toolset that works on open CL and cuda and includes atomics.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: GPU chess update, local memory....atomics
Thanks, i will take a look into it.It is a toolset that works on open CL and cuda and includes atomics.
--
Srdja
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: GPU chess update, local memory....atomics
Hmm, it seems not to pay off,
i stuck on 40 Knps (on AMD HD 7750) to 100 Knps (on Nvidia GTX 750) for an single work-group running 64 threads.
And it looks like register usage prevents to run the gpu under full load,
so i end up with 1.5 Mnps with 64 work-groups on the AMD HD 7750 and 2 Mnps with 32 work-groups for the Nvidia GTX 750.
Maybe someone smarter than me can squeeze more juice out of it...
anyway here is the code:
https://github.com/smatovic/Zeta/tree/v099a
--
Srdja
*** edit ***
PS: parallel search is yet not implemented
i stuck on 40 Knps (on AMD HD 7750) to 100 Knps (on Nvidia GTX 750) for an single work-group running 64 threads.
And it looks like register usage prevents to run the gpu under full load,
so i end up with 1.5 Mnps with 64 work-groups on the AMD HD 7750 and 2 Mnps with 32 work-groups for the Nvidia GTX 750.
Maybe someone smarter than me can squeeze more juice out of it...
anyway here is the code:
https://github.com/smatovic/Zeta/tree/v099a
--
Srdja
*** edit ***
PS: parallel search is yet not implemented
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: GPU chess update, local memory....atomics
I have an AMD GPU card on this machine.
When I ran with --guessconfig it configured itself for the I-7 CPU.
How do I tell it to examine the GPU?
I have AMD Radeon 6900 series with 2 GB GPU RAM.
When I ran with --guessconfig it configured itself for the I-7 CPU.
How do I tell it to examine the GPU?
I have AMD Radeon 6900 series with 2 GB GPU RAM.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: GPU chess update, local memory....atomics
Graphics Card Manufacturer - Powered by AMD
Graphics Chipset - AMD Radeon HD 6900 Series
Device ID - 6719
Vendor ID - 1002
SubSystem ID - 3121
SubSystem Vendor ID - 1682
Revision ID - 00
Bus Type - PCI Express 2.0
Current Bus Settings - PCI Express 2.0 x8
BIOS Version - 013.010.000.007
BIOS Part Number - 113-695CDF80-113-C2050200-100
BIOS Date - 2011/01/06 03:01
Memory Size - 2048 MB
Memory Type - GDDR5
Memory Clock - 1250 MHz
Core Clock - 800 MHz
Total Memory Bandwidth - 160 GByte/s
2D Driver File Path - /REGISTRY/MACHINE/SYSTEM/ControlSet001/Control/Class/{4d36e968-e325-11ce-bfc1-08002be10318}/0000
Graphics Chipset - AMD Radeon HD 6900 Series
Device ID - 6719
Vendor ID - 1002
SubSystem ID - 3121
SubSystem Vendor ID - 1682
Revision ID - 00
Bus Type - PCI Express 2.0
Current Bus Settings - PCI Express 2.0 x8
BIOS Version - 013.010.000.007
BIOS Part Number - 113-695CDF80-113-C2050200-100
BIOS Date - 2011/01/06 03:01
Memory Size - 2048 MB
Memory Type - GDDR5
Memory Clock - 1250 MHz
Core Clock - 800 MHz
Total Memory Bandwidth - 160 GByte/s
2D Driver File Path - /REGISTRY/MACHINE/SYSTEM/ControlSet001/Control/Class/{4d36e968-e325-11ce-bfc1-08002be10318}/0000
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.