Measuring SMP "effciency"

zullil · Post by **zullil** » Sun Jul 26, 2015 8:05 pm

Here are data from Stockfish. The engine searched the position shown to depth 36 (with a 16 GB hash), first with 20 threads and then with 1 thread. (I understand that the 20 thread result is statistically meaningless; I just want an example to frame my questions.)

[d]r2q1rk1/pp1nbppb/2p1pn1p/3pN3/4P3/1P1P2P1/PBPN1PBP/R2Q1RK1 w - -

Code: Select all

Threads = 20

info depth 36 seldepth 48 multipv 1 score cp 0 nodes 5677110155 nps 26636217 hashfull 998 tbhits 0 time 213135 pv e5d7 f6d7 d1e2 f8e8 f1c1 h7g6 f2f4 d8b6 g1h1 e7c5 e4e5 a7a5 a2a4 c5d4 b2d4 b6d4 d2f3 d4b6 f3h4 g6h7 c1d1 a8c8 e2d2 c8a8 d2e2

Code: Select all

Threads =  1

info depth 36 seldepth 50 multipv 1 score cp 0 nodes 3822231208 nps 1604340 hashfull 960 tbhits 0 time 2382432 pv e5d7 f6d7 f1e1 f8e8 d1e2 h7g6 e2e3 d8a5 a2a3 a8d8 a1c1 a5a6 h2h4 e7f6 d3d4 f6e7 a3a4 a6b6 e4e5 b6a5 b2c3 a5c7 c3b2 c7a5

Code: Select all

speedup = 2382432 / 213135 = 11.2

nps speedup = 26636217 / 1604340 = 16.6

tree "bloating" = 5677110155 / 3822231208 = 1.49

Am I correct that ideally the search tree in the parallel search should be identical to the search tree in the deterministic search?

Am I correct that searching 49% more nodes in the parallel search is problematic?

If so, what might be done to lessen the "bloating" of the parallel search?

Layman's questions, but I'm still trying to understand these issues.

Joerg Oster · Post by **Joerg Oster** » Sun Jul 26, 2015 9:13 pm

zullil wrote:Here are data from Stockfish. The engine searched the position shown to depth 36 (with a 16 GB hash), first with 20 threads and then with 1 thread. (I understand that the 20 thread result is statistically meaningless; I just want an example to frame my questions.)

[d]r2q1rk1/pp1nbppb/2p1pn1p/3pN3/4P3/1P1P2P1/PBPN1PBP/R2Q1RK1 w - -
Code: Select all
Threads = 20

info depth 36 seldepth 48 multipv 1 score cp 0 nodes 5677110155 nps 26636217 hashfull 998 tbhits 0 time 213135 pv e5d7 f6d7 d1e2 f8e8 f1c1 h7g6 f2f4 d8b6 g1h1 e7c5 e4e5 a7a5 a2a4 c5d4 b2d4 b6d4 d2f3 d4b6 f3h4 g6h7 c1d1 a8c8 e2d2 c8a8 d2e2
Code: Select all
Threads =  1

info depth 36 seldepth 50 multipv 1 score cp 0 nodes 3822231208 nps 1604340 hashfull 960 tbhits 0 time 2382432 pv e5d7 f6d7 f1e1 f8e8 d1e2 h7g6 e2e3 d8a5 a2a3 a8d8 a1c1 a5a6 h2h4 e7f6 d3d4 f6e7 a3a4 a6b6 e4e5 b6a5 b2c3 a5c7 c3b2 c7a5
Code: Select all
speedup = 2382432 / 213135 = 11.2

nps speedup = 26636217 / 1604340 = 16.6

tree "bloating" = 5677110155 / 3822231208 = 1.49
Am I correct that ideally the search tree in the parallel search should be identical to the search tree in the deterministic search?

Am I correct that searching 49% more nodes in the parallel search is problematic?

If so, what might be done to lessen the "bloating" of the parallel search?

Layman's questions, but I'm still trying to understand these issues.

Yes, you are correct.
Search overhead of 49% sounds like a lot.

Ideally, you want to have a good speedup (time to depth ratio), good scaling (nps ratio), and at the same time reduce the search overhead to a minimum. Though it cannot become zero, as almost every parallel search widens the tree.

What might be done? Good question.
First, we need to know what is causing this overhead, I guess.

Edit: See also here https://chessprogramming.wikispaces.com/Parallel+Search

bob · Post by **bob** » Mon Jul 27, 2015 2:14 am

zullil wrote:Here are data from Stockfish. The engine searched the position shown to depth 36 (with a 16 GB hash), first with 20 threads and then with 1 thread. (I understand that the 20 thread result is statistically meaningless; I just want an example to frame my questions.)

[d]r2q1rk1/pp1nbppb/2p1pn1p/3pN3/4P3/1P1P2P1/PBPN1PBP/R2Q1RK1 w - -
Code: Select all
Threads = 20

info depth 36 seldepth 48 multipv 1 score cp 0 nodes 5677110155 nps 26636217 hashfull 998 tbhits 0 time 213135 pv e5d7 f6d7 d1e2 f8e8 f1c1 h7g6 f2f4 d8b6 g1h1 e7c5 e4e5 a7a5 a2a4 c5d4 b2d4 b6d4 d2f3 d4b6 f3h4 g6h7 c1d1 a8c8 e2d2 c8a8 d2e2
Code: Select all
Threads =  1

info depth 36 seldepth 50 multipv 1 score cp 0 nodes 3822231208 nps 1604340 hashfull 960 tbhits 0 time 2382432 pv e5d7 f6d7 f1e1 f8e8 d1e2 h7g6 e2e3 d8a5 a2a3 a8d8 a1c1 a5a6 h2h4 e7f6 d3d4 f6e7 a3a4 a6b6 e4e5 b6a5 b2c3 a5c7 c3b2 c7a5
Code: Select all
speedup = 2382432 / 213135 = 11.2

nps speedup = 26636217 / 1604340 = 16.6

tree "bloating" = 5677110155 / 3822231208 = 1.49
Am I correct that ideally the search tree in the parallel search should be identical to the search tree in the deterministic search?

Am I correct that searching 49% more nodes in the parallel search is problematic?

If so, what might be done to lessen the "bloating" of the parallel search?

Layman's questions, but I'm still trying to understand these issues.

99% of those extra nodes result from poor move ordering. Or if you want to be more specific, satisfying the YWBC and splitting at a node after having searched the first move, only to discover that you fail high on some other node there, while other threads are searching moves that didn't need to be searched.

There is little help here other than to improve move ordering, and that is a daunting take once you go beyond hash, good captures, killers, perhaps counter-moves (never did much for me) and such.

Measuring SMP "effciency"

Measuring SMP "effciency"

Re: Measuring SMP "effciency"

Re: Measuring SMP "effciency"