| View previous topic :: View next topic |
| Author |
Message |
Daniel Shawul
Joined: 14 Mar 2006 Posts: 2185 Location: Ethiopia
|
Post subject: Re: uct for chess Posted: Tue Mar 13, 2012 9:43 pm |
|
|
| Quote: |
Yep, to achieve a full occupancy of the device is nearly impossible with such an complex chess move generation and such little registers.
With Zeta i use something above 60 registers and can run on my Device 16*2*32 threads, full occupancy would mean running 16*24*32 therads.
|
My GPU is an old one so I can't get full occupancy even for HEX (50% only). Even though I use 60 registers for chess, I do not read from global memory any variables so I am not all that unhappy about it. I can control latency by simply doing more monte-carlo simulations which would be impossible if that too uses global memory.
| Quote: |
I ordered an AMD HD 7750, the new GCN architecture has 32 KB registers for each SIMD Unit, each SIMD can run 10*16 threads. So you got 204,8 Bytes per Thread, great! But unfortunately there are still no drivers for Linux.
|
That is great. Hopefully someday I will test it on a fermi which has 3x more registers.
| Quote: |
Maybe you could switch to an QuadBitboard Board presentation, uses only 32 Bytes.
|
I have in total 9 bitboards (18 registers) so it is not a lot but the calculations to generate one random legal move are intensive and before you know it you hit 60 registers. Also quad bitboards require SIMD operations for decoding if I am not mistaken. GPUs are SIMT and as such do not have SSE instructions.
| Quote: |
In Cuda there should be somekind of shared memory, in OpenCL called local memory. It is fast and big enough to hold for each thread the Board. So you could save some registers for computation.
|
I have infact ample space from shared memory that I can spend. But so far no improvement. I put the Board struct on shared memory , neither the register usage decreased nor it run equally faster. But theoretically that is the only option left. Btw what cuda calls "local memory" is a disguised slow as a tortoise global memory (much like thread local storage in cpu). I can spill some registers to local memory during compilation but it runs slower. Also as I mentioned before manually moving some variables to shared mem didn't improve performance for some people working on fast libraries. But I will keep on digging.
-----
cheers _________________ https://sites.google.com/site/dshawul/
https://github.com/dshawul |
|
| Back to top |
|
 |
|
| Subject |
Author |
Date/Time |
uct on gpu |
Daniel Shawul |
Fri Feb 24, 2012 5:52 am |
Re: uct on gpu |
Srdja Matovic |
Fri Feb 24, 2012 8:17 am |
Re: uct on gpu |
Srdja Matovic |
Fri Feb 24, 2012 8:45 am |
Re: uct on gpu |
Daniel Shawul |
Fri Feb 24, 2012 1:00 pm |
Re: uct on gpu |
Srdja Matovic |
Fri Feb 24, 2012 1:44 pm |
Re: uct on gpu |
Daniel Shawul |
Fri Feb 24, 2012 2:28 pm |
Re: uct on gpu |
Srdja Matovic |
Fri Feb 24, 2012 3:04 pm |
Re: uct on gpu |
Daniel Shawul |
Fri Feb 24, 2012 3:53 pm |
Re: uct on gpu |
david nash |
Sun Feb 26, 2012 12:42 am |
Re: uct on gpu |
Daniel Shawul |
Thu Mar 08, 2012 1:26 pm |
Re: uct on gpu |
Daniel Shawul |
Sat Feb 25, 2012 8:30 pm |
100x speed up |
Daniel Shawul |
Mon Feb 27, 2012 8:02 pm |
Re: 100x speed up |
Robert Hyatt |
Thu Mar 15, 2012 2:13 pm |
Re: 100x speed up |
Daniel Shawul |
Thu Mar 15, 2012 3:24 pm |
Re: 100x speed up |
Robert Hyatt |
Thu Mar 15, 2012 4:35 pm |
Re: 100x speed up |
Daniel Shawul |
Thu Mar 15, 2012 5:11 pm |
Table |
Daniel Shawul |
Thu Mar 15, 2012 5:51 pm |
Re: 100x speed up |
Robert Hyatt |
Thu Mar 15, 2012 7:36 pm |
Re: 100x speed up |
Daniel Shawul |
Thu Mar 15, 2012 8:21 pm |
Re: uct on gpu |
Daniel Shawul |
Thu Mar 08, 2012 1:08 pm |
uct for chess |
Daniel Shawul |
Mon Mar 12, 2012 10:30 pm |
Re: uct for chess |
Karlo Bala Jr. |
Mon Mar 12, 2012 11:14 pm |
Re: uct for chess |
Daniel Shawul |
Tue Mar 13, 2012 12:13 am |
Re: uct for chess |
Karlo Bala Jr. |
Tue Mar 13, 2012 12:52 pm |
Re: uct for chess |
Srdja Matovic |
Tue Mar 13, 2012 8:08 pm |
Re: uct for chess |
Daniel Shawul |
Tue Mar 13, 2012 9:43 pm |
Re: uct for chess |
Daniel Shawul |
Wed Mar 14, 2012 2:21 am |
Re: uct for chess |
Srdja Matovic |
Wed Mar 14, 2012 11:56 am |
Re: uct for chess |
Daniel Shawul |
Wed Mar 14, 2012 12:46 pm |
Re: uct for chess |
Srdja Matovic |
Wed Mar 14, 2012 1:00 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Srdja Matovic |
Mon Mar 19, 2012 3:04 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Daniel Shawul |
Mon Mar 19, 2012 8:01 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Mon Mar 19, 2012 8:43 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Mon Mar 19, 2012 9:01 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Daniel Shawul |
Mon Mar 19, 2012 10:01 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 12:59 am |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 1:04 am |
Re: uct for chess - move gen speedup by vector datatypes |
Daniel Shawul |
Tue Mar 20, 2012 2:40 am |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 1:07 pm |
Re: uct for chess - MCS, YBW and 32 bit move gen |
Srdja Matovic |
Tue Mar 20, 2012 2:37 pm |
Re: uct for chess - MCS, YBW and 32 bit move gen |
Vincent Diepeveen |
Wed Mar 21, 2012 4:39 pm |
Re: uct for chess - MCS, YBW and 32 bit move gen |
Srdja Matovic |
Wed Mar 21, 2012 5:53 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Daniel Shawul |
Tue Mar 20, 2012 3:18 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Wed Mar 21, 2012 2:13 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Daniel Shawul |
Wed Mar 21, 2012 4:00 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Mon Mar 19, 2012 8:33 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Srdja Matovic |
Mon Mar 19, 2012 9:30 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 12:54 am |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 12:45 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Srdja Matovic |
Tue Mar 20, 2012 2:38 am |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 1:13 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Srdja Matovic |
Tue Mar 20, 2012 1:43 pm |
Re: uct for chess - move gen performance killers |
Srdja Matovic |
Tue Mar 20, 2012 4:45 pm |
intrinsic popcnt |
Daniel Shawul |
Wed Mar 14, 2012 5:21 am |
Re: intrinsic popcnt |
Daniel Shawul |
Wed Mar 14, 2012 5:50 am |
Re: intrinsic popcnt |
Robert Hyatt |
Thu Mar 15, 2012 5:12 pm |
Re: uct on gpu |
Vincent Diepeveen |
Thu Mar 15, 2012 8:14 pm |
Re: uct on gpu |
Daniel Shawul |
Thu Mar 15, 2012 8:27 pm |
Re: uct on gpu |
Vincent Diepeveen |
Sat Mar 17, 2012 1:17 pm |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|