Yes, negamax w/ alpha-beta pruning.
Right now the CPU is doing most of the tree, but the lowest couple of levels are lopped off and done asynchronously on the GPU (the quiescence search too). It's an iterative search implementation, so the "callstack" on the CPU side can be easily changed, and when the GPU results come back, I just need to modify the search in progress to reflect what has been learned.
Turns out that's trickier than it sounds. :/
My code is doing something with it, but it's not working yet. Scores are getting munged somewhere, and the end result is poor.
Pigeon is now running on the GPU
Moderator: Ras
-
StuartRiffle
- Posts: 25
- Joined: Tue Apr 05, 2016 9:34 pm
- Location: Canada
Re: Pigeon is now running on the GPU
-Stuart
(Pigeon)
(Pigeon)
-
StuartRiffle
- Posts: 25
- Joined: Tue Apr 05, 2016 9:34 pm
- Location: Canada
Re: Pigeon is now running on the GPU
Still not all working on the CPU side, but I've got a debug trace with some numbers to share, for those who like that sort of thing.
This run is using blocks of 128 threads to process batches of 4096 searches, so each GPU thread is doing 32 searches sequentially. Each search here is one level deep, plus quiescence. Where it says "most steps", that measures the number of passes through the iterated negamax loop of the longest running thread. (It usually requires ~2 steps per node). After doing so many searches, the thread runtimes tend to balance out, but sometimes one or two threads get unlucky and end up doing a lot more work, while the rest of the warp is already done and spinning idle. That shows up as horrible spikes:
Working on it...
This run is using blocks of 128 threads to process batches of 4096 searches, so each GPU thread is doing 32 searches sequentially. Each search here is one level deep, plus quiescence. Where it says "most steps", that measures the number of passes through the iterated negamax loop of the longest running thread. (It usually requires ~2 steps per node). After doing so many searches, the thread runtimes tend to balance out, but sometimes one or two threads get unlucky and end up doing a lot more work, while the rest of the warp is already done and spinning idle. That shows up as horrible spikes:
Code: Select all
4096 jobs, 87632 nodes, GPU time 74.7ms, CPU latency 384.3ms, most steps 416, nps 1173k
4096 jobs, 106149 nodes, GPU time 92.2ms, CPU latency 472.9ms, most steps 524, nps 1151k
4096 jobs, 70716 nodes, GPU time 41.0ms, CPU latency 429.4ms, most steps 299, nps 1722k
4096 jobs, 80498 nodes, GPU time 49.9ms, CPU latency 400.9ms, most steps 434, nps 1614k
4096 jobs, 67654 nodes, GPU time 58.9ms, CPU latency 431.0ms, most steps 614, nps 1148k
4096 jobs, 78862 nodes, GPU time 68.6ms, CPU latency 458.0ms, most steps 753, nps 1150k
4096 jobs, 79425 nodes, GPU time 63.5ms, CPU latency 486.6ms, most steps 422, nps 1250k
4096 jobs, 80506 nodes, GPU time 86.5ms, CPU latency 528.6ms, most steps 1204, nps 930k
4096 jobs, 92748 nodes, GPU time 68.1ms, CPU latency 518.8ms, most steps 624, nps 1362k
4096 jobs, 82805 nodes, GPU time 106.5ms, CPU latency 532.4ms, most steps 1636, nps 777k
4096 jobs, 84516 nodes, GPU time 33.5ms, CPU latency 522.8ms, most steps 287, nps 2525k
4096 jobs, 107581 nodes, GPU time 64.1ms, CPU latency 523.3ms, most steps 660, nps 1679k
4096 jobs, 114540 nodes, GPU time 61.5ms, CPU latency 525.2ms, most steps 373, nps 1861k
4096 jobs, 109486 nodes, GPU time 49.0ms, CPU latency 501.7ms, most steps 344, nps 2234k
4096 jobs, 70930 nodes, GPU time 36.2ms, CPU latency 495.9ms, most steps 507, nps 1956k
4096 jobs, 90644 nodes, GPU time 222.1ms, CPU latency 630.8ms, most steps 1989, nps 408k <-- :(
4096 jobs, 84284 nodes, GPU time 54.9ms, CPU latency 612.0ms, most steps 809, nps 1534k
4096 jobs, 73404 nodes, GPU time 39.6ms, CPU latency 554.2ms, most steps 222, nps 1853k
4096 jobs, 61860 nodes, GPU time 29.0ms, CPU latency 548.1ms, most steps 172, nps 2135k
4096 jobs, 84171 nodes, GPU time 65.7ms, CPU latency 549.8ms, most steps 527, nps 1280k
4096 jobs, 85119 nodes, GPU time 53.0ms, CPU latency 529.1ms, most steps 433, nps 1605k
4096 jobs, 59988 nodes, GPU time 33.3ms, CPU latency 513.0ms, most steps 292, nps 1800k
4096 jobs, 73250 nodes, GPU time 61.9ms, CPU latency 552.2ms, most steps 561, nps 1183k
4096 jobs, 103216 nodes, GPU time 66.6ms, CPU latency 381.6ms, most steps 842, nps 1549k
4096 jobs, 89989 nodes, GPU time 31.3ms, CPU latency 597.0ms, most steps 203, nps 2872k
4096 jobs, 75139 nodes, GPU time 54.3ms, CPU latency 593.3ms, most steps 352, nps 1384k
4096 jobs, 76158 nodes, GPU time 33.7ms, CPU latency 527.8ms, most steps 325, nps 2258k
4096 jobs, 98841 nodes, GPU time 45.9ms, CPU latency 433.1ms, most steps 530, nps 2155k
4096 jobs, 73504 nodes, GPU time 53.4ms, CPU latency 449.4ms, most steps 450, nps 1377k
4096 jobs, 87670 nodes, GPU time 37.0ms, CPU latency 370.6ms, most steps 248, nps 2371k
4096 jobs, 60932 nodes, GPU time 28.1ms, CPU latency 351.3ms, most steps 420, nps 2167k
4096 jobs, 83921 nodes, GPU time 93.4ms, CPU latency 363.2ms, most steps 712, nps 898k
4096 jobs, 71281 nodes, GPU time 188.0ms, CPU latency 520.2ms, most steps 901, nps 379k
4096 jobs, 61097 nodes, GPU time 48.7ms, CPU latency 516.4ms, most steps 516, nps 1255k
4096 jobs, 73361 nodes, GPU time 69.2ms, CPU latency 554.3ms, most steps 635, nps 1060k
4096 jobs, 61694 nodes, GPU time 45.7ms, CPU latency 553.1ms, most steps 467, nps 1348k
4096 jobs, 44942 nodes, GPU time 35.8ms, CPU latency 536.2ms, most steps 409, nps 1255k
4096 jobs, 67500 nodes, GPU time 66.7ms, CPU latency 569.8ms, most steps 640, nps 1012k
4096 jobs, 65230 nodes, GPU time 47.9ms, CPU latency 586.5ms, most steps 368, nps 1361k
4096 jobs, 21984 nodes, GPU time 36.1ms, CPU latency 533.9ms, most steps 576, nps 609k
4096 jobs, 46154 nodes, GPU time 96.3ms, CPU latency 438.3ms, most steps 605, nps 479k
4096 jobs, 70606 nodes, GPU time 124.4ms, CPU latency 515.4ms, most steps 683, nps 567k
4096 jobs, 51403 nodes, GPU time 53.8ms, CPU latency 494.8ms, most steps 660, nps 954k
4096 jobs, 60762 nodes, GPU time 123.3ms, CPU latency 573.2ms, most steps 2673, nps 492k <-- :(
4096 jobs, 40601 nodes, GPU time 69.6ms, CPU latency 608.5ms, most steps 457, nps 583k
4096 jobs, 58776 nodes, GPU time 41.8ms, CPU latency 583.9ms, most steps 247, nps 1405k
4096 jobs, 56152 nodes, GPU time 35.1ms, CPU latency 570.4ms, most steps 235, nps 1601k
4096 jobs, 56891 nodes, GPU time 27.1ms, CPU latency 565.2ms, most steps 217, nps 2099k
4096 jobs, 61622 nodes, GPU time 44.5ms, CPU latency 513.6ms, most steps 428, nps 1384k
4096 jobs, 56363 nodes, GPU time 47.0ms, CPU latency 434.2ms, most steps 369, nps 1198k
4096 jobs, 63393 nodes, GPU time 39.0ms, CPU latency 420.3ms, most steps 306, nps 1626k
4096 jobs, 73533 nodes, GPU time 77.5ms, CPU latency 374.5ms, most steps 570, nps 948k
4096 jobs, 39181 nodes, GPU time 30.7ms, CPU latency 347.3ms, most steps 240, nps 1277k
4096 jobs, 52712 nodes, GPU time 39.0ms, CPU latency 334.4ms, most steps 316, nps 1351k
4096 jobs, 46191 nodes, GPU time 41.7ms, CPU latency 346.8ms, most steps 415, nps 1106k
4096 jobs, 44395 nodes, GPU time 45.4ms, CPU latency 368.5ms, most steps 456, nps 977k
4096 jobs, 40598 nodes, GPU time 23.5ms, CPU latency 334.4ms, most steps 274, nps 1727k
4096 jobs, 58922 nodes, GPU time 38.3ms, CPU latency 326.5ms, most steps 380, nps 1539k
4096 jobs, 63757 nodes, GPU time 66.8ms, CPU latency 350.6ms, most steps 786, nps 954k
4096 jobs, 79772 nodes, GPU time 115.8ms, CPU latency 352.1ms, most steps 698, nps 688k
4096 jobs, 55011 nodes, GPU time 61.3ms, CPU latency 408.4ms, most steps 540, nps 896k
4096 jobs, 51189 nodes, GPU time 54.5ms, CPU latency 331.1ms, most steps 590, nps 939k
4096 jobs, 85050 nodes, GPU time 75.0ms, CPU latency 402.3ms, most steps 410, nps 1133k
4096 jobs, 73920 nodes, GPU time 90.3ms, CPU latency 486.9ms, most steps 595, nps 818k
4096 jobs, 41490 nodes, GPU time 73.7ms, CPU latency 556.6ms, most steps 850, nps 563k
4096 jobs, 42782 nodes, GPU time 58.2ms, CPU latency 588.3ms, most steps 726, nps 734k
4096 jobs, 52909 nodes, GPU time 39.0ms, CPU latency 556.2ms, most steps 196, nps 1358k
4096 jobs, 40576 nodes, GPU time 92.5ms, CPU latency 537.6ms, most steps 456, nps 438k
4096 jobs, 61760 nodes, GPU time 73.9ms, CPU latency 549.1ms, most steps 845, nps 835k
4096 jobs, 44468 nodes, GPU time 44.3ms, CPU latency 537.0ms, most steps 443, nps 1004k
4096 jobs, 49727 nodes, GPU time 50.1ms, CPU latency 513.3ms, most steps 354, nps 992k
4096 jobs, 44698 nodes, GPU time 32.9ms, CPU latency 455.7ms, most steps 386, nps 1360k
4096 jobs, 53483 nodes, GPU time 45.1ms, CPU latency 427.3ms, most steps 500, nps 1184k
-Stuart
(Pigeon)
(Pigeon)
-
StuartRiffle
- Posts: 25
- Joined: Tue Apr 05, 2016 9:34 pm
- Location: Canada
Re: Pigeon is now running on the GPU
Some progress... below is a cut-and paste of what nsight shows for one kernel launch:
post a picture
post a picture-Stuart
(Pigeon)
(Pigeon)
-
tttony
- Posts: 273
- Joined: Sun Apr 24, 2011 12:33 am
Re: Pigeon is now running on the GPU
Excelent!!
But I can't test it, I have an AMD card
I remember testing the ZetaDva but Srdja has in stand-by the project
But I can't test it, I have an AMD card
I remember testing the ZetaDva but Srdja has in stand-by the project
Skiull http://skiull.blogspot.com
-
AdminX
- Posts: 6363
- Joined: Mon Mar 13, 2006 2:34 pm
- Location: Acworth, GA
Re: Pigeon is now running on the GPU
Any more news on Pigeon 1.6.0?
"Good decisions come from experience, and experience comes from bad decisions."
__________________________________________________________________
Ted Summers
__________________________________________________________________
Ted Summers