Stockfish and serious hardware: 384 threads

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Jouni
Posts: 3293
Joined: Wed Mar 08, 2006 8:15 pm

Stockfish and serious hardware: 384 threads

Post by Jouni »

In SF forum they have run some tests with 384 threads. It is 8x Intel Xeon Platinum 8168 with 431.403.814 nodes/s. Way to go! One test was 384 vs 64 threads which gave ELO: +93.95 +-11.9 (95%) LOS: 100.0%. Also hyperthreading seem to be beneficial for SF.
Jouni
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: Stockfish and serious hardware: 384 threads

Post by mjlef »

Jouni wrote: Sun Jul 08, 2018 8:48 pm In SF forum they have run some tests with 384 threads. It is 8x Intel Xeon Platinum 8168 with 431.403.814 nodes/s. Way to go! One test was 384 vs 64 threads which gave ELO: +93.95 +-11.9 (95%) LOS: 100.0%. Also hyperthreading seem to be beneficial for SF.
Wow! I m quite surprised. thread doubling experiments on an earlier Stockfish showed just a 6 elo gain between 16 and 32 cores. Since 64 to 384 is between 2 and 3 doublings, I would have expected a much lower gain. There is something to learn that is unexpected here.
Jouni
Posts: 3293
Joined: Wed Mar 08, 2006 8:15 pm

Re: Stockfish and serious hardware: 384 threads

Post by Jouni »

But I have difficult to believe how can they test:

scaling result going from 192 threads to 384 hyperthreads
ELO: 22.27 +-9.7 (95%) LOS: 100.0%
Total: 1000 W: 134 L: 70 D: 796

when used google server is ALWAYS with hyperthreads??
Jouni
noobpwnftw
Posts: 560
Joined: Sun Nov 08, 2015 11:10 pm

Re: Stockfish and serious hardware: 384 threads

Post by noobpwnftw »

It is quite straightforward:

With ponder off while the 192-thread engine is running, the worse case scenario is has 96 threads on real cores and 96 threads on HT. Despite any clever OS scheduling strategy to prevent this from happening, you can still see a fairly consistent scaling factor for doubling the number of threads regardless of distribution. One can argue that the result of HT effectiveness may be an upper-bound, then people could always run some 8 cores vs 4 cores + 4 HT tests on two machines, since we now know about how that would scale.