Here are data from Stockfish. The engine searched the position shown to depth 36 (with a 16 GB hash), first with 20 threads and then with 1 thread. (I understand that the 20 thread result is statistically meaningless; I just want an example to frame my questions.)
[D]r2q1rk1/pp1nbppb/2p1pn1p/3pN3/4P3/1P1P2P1/PBPN1PBP/R2Q1RK1 w - -
zullil wrote:Here are data from Stockfish. The engine searched the position shown to depth 36 (with a 16 GB hash), first with 20 threads and then with 1 thread. (I understand that the 20 thread result is statistically meaningless; I just want an example to frame my questions.)
[D]r2q1rk1/pp1nbppb/2p1pn1p/3pN3/4P3/1P1P2P1/PBPN1PBP/R2Q1RK1 w - -
Am I correct that ideally the search tree in the parallel search should be identical to the search tree in the deterministic search?
Am I correct that searching 49% more nodes in the parallel search is problematic?
If so, what might be done to lessen the "bloating" of the parallel search?
Layman's questions, but I'm still trying to understand these issues.
Yes, you are correct.
Search overhead of 49% sounds like a lot.
Ideally, you want to have a good speedup (time to depth ratio), good scaling (nps ratio), and at the same time reduce the search overhead to a minimum. Though it cannot become zero, as almost every parallel search widens the tree.
What might be done? Good question.
First, we need to know what is causing this overhead, I guess.
zullil wrote:Here are data from Stockfish. The engine searched the position shown to depth 36 (with a 16 GB hash), first with 20 threads and then with 1 thread. (I understand that the 20 thread result is statistically meaningless; I just want an example to frame my questions.)
[D]r2q1rk1/pp1nbppb/2p1pn1p/3pN3/4P3/1P1P2P1/PBPN1PBP/R2Q1RK1 w - -
Am I correct that ideally the search tree in the parallel search should be identical to the search tree in the deterministic search?
Am I correct that searching 49% more nodes in the parallel search is problematic?
If so, what might be done to lessen the "bloating" of the parallel search?
Layman's questions, but I'm still trying to understand these issues.
99% of those extra nodes result from poor move ordering. Or if you want to be more specific, satisfying the YWBC and splitting at a node after having searched the first move, only to discover that you fail high on some other node there, while other threads are searching moves that didn't need to be searched.
There is little help here other than to improve move ordering, and that is a daunting take once you go beyond hash, good captures, killers, perhaps counter-moves (never did much for me) and such.