Measuring SMP "effciency"

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Measuring SMP "effciency"

Post by zullil »

Here are data from Stockfish. The engine searched the position shown to depth 36 (with a 16 GB hash), first with 20 threads and then with 1 thread. (I understand that the 20 thread result is statistically meaningless; I just want an example to frame my questions.)

[D]r2q1rk1/pp1nbppb/2p1pn1p/3pN3/4P3/1P1P2P1/PBPN1PBP/R2Q1RK1 w - -

Code: Select all

Threads = 20

info depth 36 seldepth 48 multipv 1 score cp 0 nodes 5677110155 nps 26636217 hashfull 998 tbhits 0 time 213135 pv e5d7 f6d7 d1e2 f8e8 f1c1 h7g6 f2f4 d8b6 g1h1 e7c5 e4e5 a7a5 a2a4 c5d4 b2d4 b6d4 d2f3 d4b6 f3h4 g6h7 c1d1 a8c8 e2d2 c8a8 d2e2

Code: Select all

Threads =  1

info depth 36 seldepth 50 multipv 1 score cp 0 nodes 3822231208 nps 1604340 hashfull 960 tbhits 0 time 2382432 pv e5d7 f6d7 f1e1 f8e8 d1e2 h7g6 e2e3 d8a5 a2a3 a8d8 a1c1 a5a6 h2h4 e7f6 d3d4 f6e7 a3a4 a6b6 e4e5 b6a5 b2c3 a5c7 c3b2 c7a5

Code: Select all

speedup = 2382432 / 213135 = 11.2

nps speedup = 26636217 / 1604340 = 16.6

tree "bloating" = 5677110155 / 3822231208 = 1.49
Am I correct that ideally the search tree in the parallel search should be identical to the search tree in the deterministic search?

Am I correct that searching 49% more nodes in the parallel search is problematic?

If so, what might be done to lessen the "bloating" of the parallel search?

Layman's questions, but I'm still trying to understand these issues.
Joerg Oster
Posts: 931
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany

Re: Measuring SMP "effciency"

Post by Joerg Oster »

zullil wrote:Here are data from Stockfish. The engine searched the position shown to depth 36 (with a 16 GB hash), first with 20 threads and then with 1 thread. (I understand that the 20 thread result is statistically meaningless; I just want an example to frame my questions.)

[D]r2q1rk1/pp1nbppb/2p1pn1p/3pN3/4P3/1P1P2P1/PBPN1PBP/R2Q1RK1 w - -

Code: Select all

Threads = 20

info depth 36 seldepth 48 multipv 1 score cp 0 nodes 5677110155 nps 26636217 hashfull 998 tbhits 0 time 213135 pv e5d7 f6d7 d1e2 f8e8 f1c1 h7g6 f2f4 d8b6 g1h1 e7c5 e4e5 a7a5 a2a4 c5d4 b2d4 b6d4 d2f3 d4b6 f3h4 g6h7 c1d1 a8c8 e2d2 c8a8 d2e2

Code: Select all

Threads =  1

info depth 36 seldepth 50 multipv 1 score cp 0 nodes 3822231208 nps 1604340 hashfull 960 tbhits 0 time 2382432 pv e5d7 f6d7 f1e1 f8e8 d1e2 h7g6 e2e3 d8a5 a2a3 a8d8 a1c1 a5a6 h2h4 e7f6 d3d4 f6e7 a3a4 a6b6 e4e5 b6a5 b2c3 a5c7 c3b2 c7a5

Code: Select all

speedup = 2382432 / 213135 = 11.2

nps speedup = 26636217 / 1604340 = 16.6

tree "bloating" = 5677110155 / 3822231208 = 1.49
Am I correct that ideally the search tree in the parallel search should be identical to the search tree in the deterministic search?

Am I correct that searching 49% more nodes in the parallel search is problematic?

If so, what might be done to lessen the "bloating" of the parallel search?

Layman's questions, but I'm still trying to understand these issues.
Yes, you are correct.
Search overhead of 49% sounds like a lot.

Ideally, you want to have a good speedup (time to depth ratio), good scaling (nps ratio), and at the same time reduce the search overhead to a minimum. Though it cannot become zero, as almost every parallel search widens the tree.

What might be done? Good question.
First, we need to know what is causing this overhead, I guess. :lol:

Edit: See also here https://chessprogramming.wikispaces.com/Parallel+Search
Jörg Oster
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Measuring SMP "effciency"

Post by bob »

zullil wrote:Here are data from Stockfish. The engine searched the position shown to depth 36 (with a 16 GB hash), first with 20 threads and then with 1 thread. (I understand that the 20 thread result is statistically meaningless; I just want an example to frame my questions.)

[D]r2q1rk1/pp1nbppb/2p1pn1p/3pN3/4P3/1P1P2P1/PBPN1PBP/R2Q1RK1 w - -

Code: Select all

Threads = 20

info depth 36 seldepth 48 multipv 1 score cp 0 nodes 5677110155 nps 26636217 hashfull 998 tbhits 0 time 213135 pv e5d7 f6d7 d1e2 f8e8 f1c1 h7g6 f2f4 d8b6 g1h1 e7c5 e4e5 a7a5 a2a4 c5d4 b2d4 b6d4 d2f3 d4b6 f3h4 g6h7 c1d1 a8c8 e2d2 c8a8 d2e2

Code: Select all

Threads =  1

info depth 36 seldepth 50 multipv 1 score cp 0 nodes 3822231208 nps 1604340 hashfull 960 tbhits 0 time 2382432 pv e5d7 f6d7 f1e1 f8e8 d1e2 h7g6 e2e3 d8a5 a2a3 a8d8 a1c1 a5a6 h2h4 e7f6 d3d4 f6e7 a3a4 a6b6 e4e5 b6a5 b2c3 a5c7 c3b2 c7a5

Code: Select all

speedup = 2382432 / 213135 = 11.2

nps speedup = 26636217 / 1604340 = 16.6

tree "bloating" = 5677110155 / 3822231208 = 1.49
Am I correct that ideally the search tree in the parallel search should be identical to the search tree in the deterministic search?

Am I correct that searching 49% more nodes in the parallel search is problematic?

If so, what might be done to lessen the "bloating" of the parallel search?

Layman's questions, but I'm still trying to understand these issues.
99% of those extra nodes result from poor move ordering. Or if you want to be more specific, satisfying the YWBC and splitting at a node after having searched the first move, only to discover that you fail high on some other node there, while other threads are searching moves that didn't need to be searched.

There is little help here other than to improve move ordering, and that is a daunting take once you go beyond hash, good captures, killers, perhaps counter-moves (never did much for me) and such.