| View previous topic :: View next topic |
| Author |
Message |
Daniel Shawul
Joined: 14 Mar 2006 Posts: 2187 Location: Ethiopia
|
Post subject: Re: uct on gpu Posted: Fri Feb 24, 2012 1:00 pm |
|
|
Hi Srdja,
Thanks for thumbs up.
| Quote: |
I didnt get it....if you use spinlocks block-wise you can not access a node by one thread alone?
|
Yes I use block wise spinlocks and there is no need a thread wise spinlock. At first I was afraid about the tree growth would be so slow
but now after reducing the number of cycles (simulations) that a thread does before checking the tree, I understood that infact the tree
grows so fast I am gonna have to find a way to control it.
| Quote: |
Cuda devices with compute capabiltiy 2.0 should have 8 Blocks per MP and 256 threads per Block, you should be able to run 8*32 threads per block...if you are not running out of registers...
|
Yes, but for my case there is little need to share information between threads unlike what you do with YBW for alpha-beta. My basic montecarlo kernel uses
| Code: |
ptxas info : Compiling entry function '_Z7playouti' for 'sm_11'
ptxas info : Used 20 registers, 186+16 bytes smem, 67 bytes cmem[0], 84 bytes cmem[1], 36 bytes cmem[14]
|
So I can fit in roughly 352 threads per block. Max allowed is 512 but the 20 registers used per thread limits it. Due to someother avoidable constraint,
I can only use power-of-two number of threads so I use 256 threads per block. Infact I don't need to cram all those threads in one block (even though I could) because
I don't share much between the threads. The device has 14 multiprocessors, so I launch 8 active blocks per MP = 8 x 14 = 112 blocks and 1 warp per block = 32 threads.
Those threads will always be active (no batching) so I keep them busy until the specified number of simulations is reached. Increasing the number of threads or blocks
doesn't increase performance because then it would start timeslicing to accomodate all the threads. I just have to make sure that there are enough warps (could be from different
blocks loaded at the same MP) to avoid latency due to global memory read/writes.
Here is the tree growth rate for different number of cycles and a 112x32 setup
| Code: |
cycles Nodes Time(sec)
128 4200 13
64 5833 13
32 14327 13
16 45202 13.5
8 80858 14
4 286716 15.6
|
As you can see at 4 cycles i.e 4 x 32 = 128 simulations per block the tree growth is really high with only a slight increase in simulation time.
So letting each thread grow the tree will blow up the tree. My windows "watchdog" is timing out the kernel so I could not test lower number of cycles.
---
cheers _________________ https://sites.google.com/site/dshawul/
https://github.com/dshawul |
|
| Back to top |
|
 |
|
| Subject |
Author |
Date/Time |
uct on gpu |
Daniel Shawul |
Fri Feb 24, 2012 5:52 am |
Re: uct on gpu |
Srdja Matovic |
Fri Feb 24, 2012 8:17 am |
Re: uct on gpu |
Srdja Matovic |
Fri Feb 24, 2012 8:45 am |
Re: uct on gpu |
Daniel Shawul |
Fri Feb 24, 2012 1:00 pm |
Re: uct on gpu |
Srdja Matovic |
Fri Feb 24, 2012 1:44 pm |
Re: uct on gpu |
Daniel Shawul |
Fri Feb 24, 2012 2:28 pm |
Re: uct on gpu |
Srdja Matovic |
Fri Feb 24, 2012 3:04 pm |
Re: uct on gpu |
Daniel Shawul |
Fri Feb 24, 2012 3:53 pm |
Re: uct on gpu |
david nash |
Sun Feb 26, 2012 12:42 am |
Re: uct on gpu |
Daniel Shawul |
Thu Mar 08, 2012 1:26 pm |
Re: uct on gpu |
Daniel Shawul |
Sat Feb 25, 2012 8:30 pm |
100x speed up |
Daniel Shawul |
Mon Feb 27, 2012 8:02 pm |
Re: 100x speed up |
Robert Hyatt |
Thu Mar 15, 2012 2:13 pm |
Re: 100x speed up |
Daniel Shawul |
Thu Mar 15, 2012 3:24 pm |
Re: 100x speed up |
Robert Hyatt |
Thu Mar 15, 2012 4:35 pm |
Re: 100x speed up |
Daniel Shawul |
Thu Mar 15, 2012 5:11 pm |
Table |
Daniel Shawul |
Thu Mar 15, 2012 5:51 pm |
Re: 100x speed up |
Robert Hyatt |
Thu Mar 15, 2012 7:36 pm |
Re: 100x speed up |
Daniel Shawul |
Thu Mar 15, 2012 8:21 pm |
Re: uct on gpu |
Daniel Shawul |
Thu Mar 08, 2012 1:08 pm |
uct for chess |
Daniel Shawul |
Mon Mar 12, 2012 10:30 pm |
Re: uct for chess |
Karlo Bala Jr. |
Mon Mar 12, 2012 11:14 pm |
Re: uct for chess |
Daniel Shawul |
Tue Mar 13, 2012 12:13 am |
Re: uct for chess |
Karlo Bala Jr. |
Tue Mar 13, 2012 12:52 pm |
Re: uct for chess |
Srdja Matovic |
Tue Mar 13, 2012 8:08 pm |
Re: uct for chess |
Daniel Shawul |
Tue Mar 13, 2012 9:43 pm |
Re: uct for chess |
Daniel Shawul |
Wed Mar 14, 2012 2:21 am |
Re: uct for chess |
Srdja Matovic |
Wed Mar 14, 2012 11:56 am |
Re: uct for chess |
Daniel Shawul |
Wed Mar 14, 2012 12:46 pm |
Re: uct for chess |
Srdja Matovic |
Wed Mar 14, 2012 1:00 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Srdja Matovic |
Mon Mar 19, 2012 3:04 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Daniel Shawul |
Mon Mar 19, 2012 8:01 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Mon Mar 19, 2012 8:43 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Mon Mar 19, 2012 9:01 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Daniel Shawul |
Mon Mar 19, 2012 10:01 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 12:59 am |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 1:04 am |
Re: uct for chess - move gen speedup by vector datatypes |
Daniel Shawul |
Tue Mar 20, 2012 2:40 am |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 1:07 pm |
Re: uct for chess - MCS, YBW and 32 bit move gen |
Srdja Matovic |
Tue Mar 20, 2012 2:37 pm |
Re: uct for chess - MCS, YBW and 32 bit move gen |
Vincent Diepeveen |
Wed Mar 21, 2012 4:39 pm |
Re: uct for chess - MCS, YBW and 32 bit move gen |
Srdja Matovic |
Wed Mar 21, 2012 5:53 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Daniel Shawul |
Tue Mar 20, 2012 3:18 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Wed Mar 21, 2012 2:13 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Daniel Shawul |
Wed Mar 21, 2012 4:00 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Mon Mar 19, 2012 8:33 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Srdja Matovic |
Mon Mar 19, 2012 9:30 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 12:54 am |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 12:45 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Srdja Matovic |
Tue Mar 20, 2012 2:38 am |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 1:13 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Srdja Matovic |
Tue Mar 20, 2012 1:43 pm |
Re: uct for chess - move gen performance killers |
Srdja Matovic |
Tue Mar 20, 2012 4:45 pm |
intrinsic popcnt |
Daniel Shawul |
Wed Mar 14, 2012 5:21 am |
Re: intrinsic popcnt |
Daniel Shawul |
Wed Mar 14, 2012 5:50 am |
Re: intrinsic popcnt |
Robert Hyatt |
Thu Mar 15, 2012 5:12 pm |
Re: uct on gpu |
Vincent Diepeveen |
Thu Mar 15, 2012 8:14 pm |
Re: uct on gpu |
Daniel Shawul |
Thu Mar 15, 2012 8:27 pm |
Re: uct on gpu |
Vincent Diepeveen |
Sat Mar 17, 2012 1:17 pm |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|