It can be used with any multi-threaded search algorithm without forcing you to rewrite your algorithm.
Note that Lc0 currently uses a single-threaded batching method that forces you to rewrite your search, while A0 probably uses
tensoflow-serving with muti-threaded batching. The AGO-0 paper mentions they just need to batch 8 positions on a 20x256 or 40x256 network
to be efficient. What Lc0 does is the following: A single search thread "starts" multiple monte-carlo simulations without actually finishing them (i.e. the NN eval at the tip is not yet complete). This is done for the specified mini-batch size (128 or so), followed by network evaluation
for all positions simultaneously, and finally update the tree with the results of the evaluations at the tips.
This is very involved and requires re-writing the MCTS algorithm and infact I got tired of trying to make this work in my code
which has all sorts of mcts+alpha-beta algorithms.
On the other hand, multi-threaded batching doesn't need me to re-write algorithms and can be used even with alpha-beta,
as well as being more efficient in collecting batches for evaluation. Each searcher thread requests for evaluation of a single
position (eval()) and then blocks until it gets the result. The server (which in my case is egbbll) batches requests from multiple
threads (one from each thread) and does the batch evaluation and returns the result for each thread. The thread blocking is
actually done by egbbdll so the chess playing program doesn't have to do anything special.
It turns out I can launch significantly more threads than the number of available cores and be equally efficient as a case where
each searcher thread has its own core. For example, on a network of size 12x128, a search done with 4-cores overspecified with
128 threads gives the same nps as using 32-cores with 128 threads. The GPU is a Tesla P100, and the CPU is 32-core intel Xeon (two 16-core CPUs in two sockets).
Results:
Playouts per second on the CPU (batching doesn't help here)
Code: Select all
1-thread = 33
n-threads = <33
Code: Select all
1-thread = 226
128-threads using all 64 cores = 2872
128-threads using just 4 cores = 2575
Detailed results
============
1-core CPU
Code: Select all
$ ./scorpio use_nn 1 mt 1 montecarlo 1 frac_alphabeta 0 backup_type 4 book off go quit
feature done=0
ht 4194304 X 16 = 64.0 MB
eht 524288 X 8 = 8.0 MB
pht 32768 X 24 = 0.8 MB
treeht 419430400 X 32 = 12800.0 MB
processors [1]
processors [1]
EgbbProbe 4.1 by Daniel Shawul
180 egbbs loaded !
Loading neural network....
Neural network loaded !
loading_time = 1s
[st = 11114ms, mt = 29250ms , hply = 0 , moves_left 10]
63 0 111 1071 e2-e4 e7-e5
64 0 225 2707 e2-e4 e7-e5
65 0 339 4479 d2-d4 d7-d5 Nb1-c3
66 0 452 6164 d2-d4 d7-d5 Nb1-c3 Ng8-f6
67 0 565 8093 d2-d4 d7-d5 Nb1-c3 Ng8-f6
68 0 677 11687 e2-e4 e7-e5 Ng1-f3 Nb8-c6
69 0 788 16331 d2-d4 d7-d5 Nb1-c3 Ng8-f6 e2-e3
70 0 899 20810 e2-e4 e7-e5 Ng1-f3 Ng8-f6 d2-d3
71 0 1012 26998 e2-e4 e7-e5 Ng1-f3 Ng8-f6 Nb1-c3 Nb8-c6
# 1 0 62 e2-e4 e7-e5 d2-d4 d7-d5 Ng1-f3 d5xe4 Nf3xe5 Ng8-f6 Nb1-c3
# 2 0 62 d2-d4 d7-d5 Nb1-c3 Ng8-f6 Ng1-f3 Nb8-c6 h2-h3 e7-e6 e2-e3
# 3 0 24 Nb1-c3 d7-d5 e2-e4 d5xe4 Nc3xe4 e7-e5 Ng1-f3 Nb8-c6
# 4 0 32 Ng1-f3 d7-d5 d2-d3 e7-e6 e2-e4 Nb8-c6 Nb1-c3
# 5 0 23 e2-e3 d7-d5 Ng1-f3 Ng8-f6 Nb1-c3 e7-e6 d2-d4 Nb8-c6 h2-h3
# 6 0 28 d2-d3 e7-e5 Ng1-f3 Nb8-c6 e2-e4 Ng8-f6 Nb1-c3 d7-d5 Nc3xd5
# 7 0 23 g2-g3 e7-e5 e2-e4 d7-d5 e4xd5 Qd8xd5 f2-f3 Nb8-c6 Nb1-c3 Qd5-d4 d2-d3
# 8 -3 12 b2-b3 d7-d5 Ng1-f3 Ng8-f6 d2-d3
# 9 -5 10 f2-f3 d7-d5 d2-d4 Ng8-f6
# 10 -9 8 h2-h3 d7-d5 d2-d4 Ng8-f6
# 11 -7 9 c2-c3 d7-d5 d2-d4
# 12 -11 8 a2-a3 d7-d5 d2-d4
# 13 -11 8 Nb1-a3 d7-d5 d2-d4 e7-e6
# 14 -6 10 Ng1-h3 e7-e5 e2-e4 Ng8-f6
# 15 -11 8 f2-f4 Ng8-f6 Ng1-f3 d7-d5
# 16 0 37 c2-c4 e7-e5 d2-d3 Ng8-f6 Ng1-f3 Nb8-c6 Nb1-c3 d7-d6 e2-e4 Bc8-e6 Bc1-e3
# 17 -15 7 h2-h4 d7-d5 d2-d4
# 18 -13 7 a2-a4 d7-d5 d2-d4
# 19 -15 7 g2-g4 d7-d5 e2-e3
# 20 -22 6 b2-b4 e7-e5 Bc1-b2
nodes = 37778 <95% qnodes> time = 11149ms nps = 3388 eps = 2865 nneps = 30
Tree: nodes = 10070 depth = 10 pps = 33 visits = 372
qsearch_calls = 9702 search_calls = 0
move e2e4
Bye Bye
Code: Select all
$ ./scorpio use_nn 1 mt 1 montecarlo 1 frac_alphabeta 0 backup_type 4 go quit
feature done=0
ht 4194304 X 16 = 64.0 MB
eht 524288 X 8 = 8.0 MB
pht 32768 X 24 = 0.8 MB
treeht 419430400 X 32 = 12800.0 MB
processors [1]
processors [1]
EgbbProbe 4.1 by Daniel Shawul
0 egbbs loaded !
Loading neural network....
Neural network loaded !
loading_time = 6s
[st = 11114ms, mt = 29250ms , hply = 0 , moves_left 10]
63 0 111 8032 d2-d4 d7-d5 e2-e3
64 0 222 16376 d2-d4 d7-d5 Nb1-c3 Ng8-f6
65 0 333 28681 d2-d4 d7-d5 Nb1-c3 Ng8-f6
66 0 445 41847 e2-e4 e7-e5 Ng1-f3 Nb8-c6
67 0 557 55091 d2-d4 d7-d5 Nb1-c3 Ng8-f6
68 0 668 74011 d2-d4 d7-d5 Nb1-c3 Ng8-f6 e2-e3
69 0 779 95658 e2-e4 e7-e5 Ng1-f3 Ng8-f6 Nb1-c3
70 0 891 123039 e2-e4 e7-e5 Ng1-f3 Ng8-f6 Nb1-c3
71 0 1002 166551 e2-e4 e7-e5 Ng1-f3 Ng8-f6 d2-d3
# 1 0 127 e2-e4 e7-e5 d2-d4 d7-d5 Ng1-f3 d5xe4 Nf3xe5 Ng8-f6 Nb1-c3
# 2 0 127 d2-d4 d7-d5 Nb1-c3 Ng8-f6 Ng1-f3 Nb8-c6 h2-h3 e7-e6 e2-e3 h7-h6 a2-a3
# 3 0 127 Nb1-c3 d7-d5 d2-d4 Ng8-f6 Ng1-f3 Nb8-c6 h2-h3 e7-e6 e2-e3 h7-h6
# 4 0 127 Ng1-f3 d7-d5 d2-d3 e7-e6 Nb1-c3 Nb8-c6 e2-e4 Ng8-f6 h2-h3 Bf8-d6
# 5 0 127 e2-e3 d7-d5 Ng1-f3 Ng8-f6 Nb1-c3 e7-e6 d2-d4 Nb8-c6 h2-h3 h7-h6 a2-a3 a7-a6 Bf1-d3
# 6 0 127 d2-d3 e7-e5 Ng1-f3 Nb8-c6 e2-e4 Ng8-f6 Nb1-c3 d7-d5
# 7 0 127 g2-g3 e7-e5 Ng1-f3 Nb8-c6 d2-d3 Ng8-f6 Bf1-g2 d7-d5 Ke1-g1 Bc8-e6 Bc1-d2 h7-h6 Nb1-c3 a7-a6 e2-e4 d5-d4 Nc3-e2 Bf8-d6
# 8 0 127 b2-b3 d7-d5 d2-d4 Ng8-f6 Ng1-f3 Nb8-c6 e2-e3 e7-e6 Bc1-b2
# 9 0 127 f2-f3 e7-e5 e2-e4 Nb8-c6 Nb1-c3 Ng8-f6 a2-a3 d7-d5 Nc3xd5
# 10 0 127 h2-h3 e7-e5 Ng1-f3 Nb8-c6 e2-e4 d7-d5 e4xd5 Qd8xd5 Nb1-c3 Qd5-d8 Bf1-d3
# 11 0 127 c2-c3 d7-d5 d2-d4 Ng8-f6 Ng1-f3 Nb8-c6 h2-h3 Bc8-f5 e2-e3 e7-e6 Nb1-d2
# 12 0 127 a2-a3 e7-e5 e2-e4 d7-d6 Nb1-c3 Ng8-f6 d2-d4 e5xd4 Qd1xd4 Nb8-c6 Qd4-d1
# 13 0 126 Nb1-a3 e7-e5 e2-e4 d7-d5 e4xd5 Qd8xd5 d2-d3 Bf8xa3 b2xa3 Nb8-c6 Bc1-e3
# 14 0 126 Ng1-h3 e7-e5 e2-e4 Ng8-f6 Nb1-c3 d7-d5 e4xd5 Bc8xh3 g2xh3 Nf6xd5 d2-d4 Nd5xc3 b2xc3
# 15 0 126 f2-f4 Ng8-f6 Nb1-c3 d7-d5 e2-e3 Nb8-c6 d2-d4 a7-a6 a2-a3 e7-e6 Ng1-f3 h7-h6 Bf1-d3 Bf8-d6 Ke1-g1 Ke8-g8 Bc1-d2
# 16 0 126 c2-c4 e7-e5 Ng1-f3 e5-e4 Nf3-e5 d7-d6 Qd1-a4 Nb8-d7 Ne5-g4
# 17 0 126 h2-h4 e7-e5 Ng1-f3 Bf8-d6 d2-d3 Nb8-c6 e2-e4 Ng8-f6 Nb1-c3 Ke8-g8
# 18 0 126 a2-a4 e7-e5 e2-e4 Ng8-f6 d2-d3 d7-d5 e4xd5 Nf6xd5 Ng1-f3 Bf8-b4 Bc1-d2 f7-f6
# 19 0 126 g2-g4 d7-d5 d2-d4 Nb8-c6 e2-e3 e7-e5 Nb1-c3 Ng8-f6 d4xe5 Nf6xg4 Qd1xd5 Nc6xe5 Qd5-d4 Qd8-f6 Qd4-f4
# 20 0 126 b2-b4 e7-e5 Bc1-b2 Ng8-f6 e2-e3 d7-d5 Bb2xe5 Bf8xb4 Ng1-f3 Nb8-d7 Bf1-b5 a7-a6 a2-a3 a6xb5 a3xb4 Ra8-a4 Nb1-c3 Ra4xa1 Qd1xa1
nodes = 231367 <94% qnodes> time = 11118ms nps = 20810 eps = 17422 nneps = 191
Tree: nodes = 68047 depth = 18 pps = 226 visits = 2513
qsearch_calls = 65549 search_calls = 0
move e2e4
Bye Bye
Code: Select all
$ ./scorpio use_nn 1 mt 128 montecarlo 1 frac_alphabeta 0 backup_type 4 go quit
feature done=0
ht 4194304 X 16 = 64.0 MB
eht 524288 X 8 = 8.0 MB
pht 32768 X 24 = 0.8 MB
treeht 419430400 X 32 = 12800.0 MB
processors [1]
processors [128]
EgbbProbe 4.1 by Daniel Shawul
0 egbbs loaded !
Loading neural network....
Neural network loaded !
loading_time = 7s
[st = 11114ms, mt = 29250ms , hply = 0 , moves_left 10]
63 0 148 115 e2-e4 e7-e5 d2-d3
64 0 261 1583 e2-e4 Ng8-f6 e4-e5 Nf6-e4 Nb1-c3
65 0 374 2702 e2-e4 d7-d5 d2-d3 Ng8-f6 Nb1-c3 d5xe4 d3xe4 Qd8xd1 Ke1xd1
66 0 489 4878 e2-e4 e7-e5 Ng1-f3 Ng8-f6 Nb1-c3
67 0 605 7938 e2-e4 e7-e5 d2-d3 Nb8-c6 Ng1-f3
68 0 719 10648 e2-e4 e7-e5 d2-d3 Nb8-c6 Ng1-f3
69 0 831 13836 e2-e4 e7-e5 d2-d4 d7-d5 Ng1-f3 d5xe4 Nf3xe5 Ng8-f6
70 0 945 18123 e2-e4 e7-e5 d2-d4 d7-d5 Ng1-f3 d5xe4 Nf3xe5 Ng8-f6 Nb1-c3
71 0 1061 22783 e2-e4 e7-e5 d2-d4 d7-d5 Ng1-f3 d5xe4 Nf3xe5 Ng8-f6 Bf1-c4 Bf8-b4 c2-c3 Bb4-d6 Bc4xf7 Ke8-e7
# 1 0 1626 e2-e4 e7-e5 d2-d4 d7-d5 Ng1-f3 d5xe4 Nf3xe5 Ng8-f6 Bf1-c4 Bf8-b4 c2-c3 Bb4-d6 Ne5xf7
# 2 0 1588 d2-d4 d7-d5 Nb1-c3 Ng8-f6 Ng1-f3 Nb8-c6 h2-h3 e7-e6 e2-e3 h7-h6 a2-a3 a7-a6
# 3 0 1655 Nb1-c3 d7-d5 d2-d4 Ng8-f6 Ng1-f3 Nb8-c6 h2-h3 e7-e6 e2-e3 h7-h6 a2-a3 a7-a6 Bf1-d3 Bf8-d6 Ke1-g1
# 4 0 1577 Ng1-f3 d7-d5 d2-d4 Ng8-f6 c2-c3 Nb8-c6 h2-h3 Bc8-f5 e2-e3 e7-e6 Nb1-d2
# 5 0 1610 e2-e3 d7-d5 Ng1-f3 Nb8-c6 Nb1-c3 e7-e5 d2-d4 e5-e4 Nf3-d2 Ng8-f6 Bf1-e2
# 6 0 1606 d2-d3 e7-e5 e2-e4 Nb8-c6 Bc1-e3 d7-d6 Nb1-c3 Ng8-f6 Ng1-f3 Bc8-e6 a2-a3 h7-h6 h2-h3
# 7 0 1645 g2-g3 d7-d5 d2-d4 Nb8-c6 Ng1-f3 Ng8-f6 Bf1-g2 h7-h6 Nb1-c3 e7-e6 Ke1-g1 a7-a6 a2-a3 Bf8-d6
# 8 0 1621 b2-b3 e7-e5 e2-e4 d7-d6 Bc1-b2 Ng8-f6 Nb1-c3 Nb8-c6 Ng1-f3
# 9 0 1645 f2-f3 e7-e5 e2-e4 Bf8-c5 d2-d3 d7-d5
# 10 0 1645 h2-h3 e7-e5 e2-e4 d7-d5 d2-d4 e5xd4 e4xd5 Bf8-b4 Nb1-d2 Qd8xd5 Ng1-e2 Qd5-e4
# 11 0 1631 c2-c3 d7-d5 d2-d4 Ng8-f6 Ng1-f3 Nb8-c6 h2-h3 e7-e6
# 12 0 1620 a2-a3 e7-e5 e2-e4 d7-d6 Nb1-c3 Ng8-f6 Ng1-f3 Nb8-c6
# 13 0 1607 Nb1-a3 e7-e5 e2-e4 d7-d5 e4xd5 Qd8xd5 d2-d3 Bf8-b4 c2-c3
# 14 0 1566 Ng1-h3 d7-d5 d2-d4 Nb8-c6 Nh3-g5 Ng8-f6 Nb1-c3 e7-e6
# 15 0 1619 f2-f4 d7-d5 d2-d3 g7-g6 e2-e4
# 16 0 1634 c2-c4 e7-e5 d2-d3 Ng8-f6 Nb1-c3 d7-d6 Ng1-f3 Nb8-c6
# 17 0 1544 h2-h4 e7-e5 e2-e4 Ng8-f6 Ng1-f3 d7-d6
# 18 0 1631 a2-a4 e7-e5 e2-e4 d7-d5 e4xd5 Qd8xd5 Nb1-c3 Qd5-d8
# 19 0 1561 g2-g4 e7-e5 e2-e4 d7-d5 e4xd5 Qd8xd5 f2-f3 Bc8xg4 Nb1-c3
# 20 0 1635 b2-b4 d7-d5 d2-d3 Ng8-f6 Ng1-f3 e7-e5 b4-b5 Nb8-d7 Bc1-b2 Bf8-b4 Nb1-c3 Ke8-g8 h2-h3 c7-c6 b5xc6
nodes = 10380412 <29% qnodes> time = 11226ms nps = 924675 eps = 234964 nneps = 2857
Tree: nodes = 884071 depth = 21 pps = 2872 visits = 32247
qsearch_calls = 6731 search_calls = 0
move e2e4
Bye Bye
Code: Select all
$ taskset f0000000 ./scorpio use_nn 1 mt 128 montecarlo 1 frac_alphabeta 0 backup_type 4 go quit
feature done=0
ht 4194304 X 16 = 64.0 MB
eht 524288 X 8 = 8.0 MB
pht 32768 X 24 = 0.8 MB
treeht 419430400 X 32 = 12800.0 MB
processors [1]
processors [128]
EgbbProbe 4.1 by Daniel Shawul
0 egbbs loaded !
Loading neural network....
Neural network loaded !
loading_time = 6s
[st = 11114ms, mt = 29250ms , hply = 0 , moves_left 10]
63 0 112 96 Nb1-c3 d7-d5 Ng1-f3
64 0 226 1150 e2-e4 e7-e5 Ng1-f3 Nb8-c6
65 0 344 2541 e2-e4 d7-d5 d2-d3 Ng8-f6 Nb1-c3
66 0 466 5004 e2-e4 d7-d5 d2-d3 Ng8-f6 Nb1-c3 d5xe4 d3xe4 Qd8xd1 Ke1xd1
67 0 580 6356 e2-e4 e7-e5 d2-d3 Nb8-c6 Ng1-f3
68 0 693 8331 e2-e4 e7-e5 d2-d3 Nb8-c6 Ng1-f3
69 0 808 10945 e2-e4 e7-e5 d2-d3 Nb8-c6 Ng1-f3 Ng8-f6
70 0 920 14370 e2-e4 e7-e5 d2-d4 d7-d5 Ng1-f3 d5xe4 Nf3xe5
71 0 1031 18653 e2-e4 e7-e5 d2-d4 e5xd4 Qd1xd4 Nb8-c6 Qd4-d1 d7-d6 Ng1-f3 Ng8-f6 Nb1-c3
# 1 0 1371 e2-e4 e7-e5 d2-d4 e5xd4 Qd1xd4 Nb8-c6 Qd4-d1 Bf8-d6 Ng1-f3 Ng8-f6 Nb1-c3 Ke8-g8 Bc1-e3 a7-a6 a2-a3 Rf8-e8 Bf1-d3 b7-b6
# 2 0 1091 d2-d4 d7-d5 Nb1-c3 Ng8-f6 Ng1-f3 Nb8-c6 h2-h3 e7-e6 e2-e3 h7-h6 a2-a3 a7-a6 Bf1-d3 Bf8-d6
# 3 0 968 Nb1-c3 d7-d5 d2-d4 Ng8-f6 Ng1-f3 Nb8-c6 h2-h3
# 4 0 1700 Ng1-f3 d7-d5 d2-d4 Ng8-f6 h2-h3 Nb8-c6 Nb1-c3 e7-e6 e2-e3 h7-h6 a2-a3 a7-a6 Bf1-d3 Bf8-d6 Ke1-g1 Ke8-g8 Bc1-d2 Bc8-d7 e3-e4 Nf6xe4
# 5 0 1595 e2-e3 d7-d5 d2-d4 Ng8-f6 Bf1-d3 Nb8-c6 Ng1-f3 e7-e6
# 6 0 1085 d2-d3 e7-e5 e2-e4 Nb8-c6 Bc1-e3 d7-d6 Nb1-c3 Ng8-f6 Ng1-f3 Bc8-e6
# 7 0 1684 g2-g3 d7-d5 d2-d4 Nb8-c6 Ng1-f3 Ng8-f6 Bf1-g2 h7-h6 Nb1-c3 e7-e6 Ke1-g1 a7-a6
# 8 0 1970 b2-b3 e7-e5 e2-e4 d7-d6 Bc1-b2 Ng8-f6 Nb1-c3 Nb8-c6 Ng1-f3 Bc8-e6 d2-d4 Nc6xd4
# 9 0 1364 f2-f3 e7-e5 e2-e4 Bf8-c5 d2-d3 d7-d5
# 10 0 1678 h2-h3 e7-e5 e2-e4 d7-d5 e4xd5 Qd8xd5 Nb1-c3
# 11 0 1692 c2-c3 d7-d5 d2-d4 Ng8-f6 Ng1-f3 Nb8-c6 h2-h3
# 12 0 1600 a2-a3 e7-e5 e2-e4 d7-d6 Nb1-c3 Ng8-f6 Ng1-f3 Nb8-c6 d2-d4
# 13 0 1932 Nb1-a3 e7-e5 e2-e4 d7-d5 e4xd5 Qd8xd5 d2-d3 Ng8-f6 Bc1-d2 Bf8xa3 b2xa3 Ke8-g8 Ng1-e2 Bc8-g4 f2-f3
# 14 0 1478 Ng1-h3 d7-d5 d2-d4 Nb8-c6 Nb1-c3 e7-e6 e2-e3 a7-a6 Nh3-f4
# 15 0 1745 f2-f4 d7-d5 d2-d3 Ng8-f6 Ng1-f3 Nb8-c6
# 16 0 1383 c2-c4 e7-e5 Ng1-f3 e5-e4 Nf3-e5 d7-d6 Qd1-a4 Nb8-d7 Ne5-g4 h7-h5
# 17 0 1323 h2-h4 e7-e5 e2-e4 Ng8-f6 Ng1-f3 Nb8-c6 Nb1-c3
# 18 0 1352 a2-a4 e7-e5 e2-e4 d7-d5 d2-d4 d5xe4 d4xe5
# 19 0 1839 g2-g4 d7-d5 d2-d4 Nb8-c6 e2-e3 e7-e5 Nb1-c3 Ng8-f6 d4xe5 Nf6xg4 Qd1xd5 Nc6xe5 e3-e4 Qd8-f6 Qd5-d4 c7-c6
# 20 0 1356 b2-b4 e7-e5 Bc1-b2 e5-e4 e2-e3 d7-d5 Ng1-e2 Ng8-f6 b4-b5 Bf8-d6 d2-d4 Ke8-g8 Nb1-c3 c7-c6 f2-f3 Rf8-e8 f3xe4 d5xe4
nodes = 8396172 <35% qnodes> time = 11722ms nps = 716274 eps = 218633 nneps = 2538
Tree: nodes = 821310 depth = 22 pps = 2575 visits = 30187
qsearch_calls = 6038 search_calls = 0
move e2e4
Bye Bye
I expected that they will be same. If I use the former, I don't get any benefits from batching. Does anybody
know the exact difference between sched_yield() and usleep(0) or in windows Sleep(0) and SwitchToThread()?
regards,
Daniel