One other thing is that now there is absolutely no synchronization (__syncthreads) except for the inherent warp sync. A warp is the smallest working unit to fetch a node from DRAM; previously it was a block. The threads in a warp can diverge since each does its own monte carlo game on the same initial board. That has speeded up the code and simplified it as well.
On the initial position e4! (this is HEX btw!) is chosen as the best move and it was expanded to depth 7 after 45 mil simulation. Time is decreased to 5.375sec compared to previous ~5.5. Tree is also very selective.
Code: Select all
0.h8 22323755 45989376 0.485411
1.a1 148364 291840 0.508374
1.b1 88821 177152 0.501383
1.c1 10350 20992 0.493045
1.d1 7756 15872 0.488659
1.e1 8021 16384 0.489563
1.f1 36382 75264 0.483392
1.g1 54957 116224 0.472854
1.h1 947 2048 0.462402
1.a2 58208 116224 0.500826
1.b2 254948 501248 0.508626
1.c2 174206 344064 0.506319
1.d2 59118 116736 0.506425
1.e2 58413 116224 0.502590
1.f2 57629 116736 0.493669
1.g2 1467 3072 0.477539
1.h2 680 1536 0.442708
1.a3 58347 117760 0.495474
1.b3 239661 470528 0.509345
1.c3 185939 364032 0.510777
1.d3 238775 470528 0.507462
1.e3 174534 344576 0.506518
1.f3 53076 105472 0.503224
1.g3 57767 117248 0.492691
1.h3 1972 4096 0.481445
1.a4 54223 111616 0.485799
1.b4 156572 313344 0.499681
1.c4 259251 508928 0.509406
1.d4 402122 789504 0.509335
1.e4 1890305 3686400 0.512778
1.f4 120209 241152 0.498478
1.g4 54437 108544 0.501520
1.h4 57571 119296 0.482590
1.a5 37182 76800 0.484141
1.b5 58508 116736 0.501199
1.c5 172019 337408 0.509825
1.d5 149083 292352 0.509943
1.e5 15977301 30803968 0.518677
1.f5 256793 503296 0.510223
1.g5 59556 117248 0.507949
1.h5 57704 118784 0.485789
1.a6 1699 3584 0.474051
1.b6 58560 118784 0.492996
1.c6 59423 117760 0.504611
1.d6 225991 443392 0.509687
1.e6 177571 348160 0.510027
1.f6 211449 414720 0.509860
1.g6 115356 231424 0.498462
1.h6 57993 117760 0.492468
1.a7 2211 4608 0.479818
1.b7 1935 4096 0.472412
1.c7 57972 118272 0.490158
1.d7 58431 117760 0.496187
1.e7 60376 119296 0.506102
1.f7 225979 445952 0.506734
1.g7 130554 256000 0.509977
1.h7 59264 118784 0.498922
1.a8 1896 4096 0.462891
1.b8 1966 4096 0.479980
1.c8 2192 4608 0.475694
1.d8 57759 119296 0.484165
1.e8 1983 4096 0.484131
1.f8 58658 119296 0.491701
1.g8 58889 117760 0.500076
1.h8 127233 249856 0.509225
Total nodes : 31499
Leaf nodes : 30985
Maximum depth : 7
Average depth : 3.72
Errors: no error
time 5375