SMP SF Formula

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Werewolf
Posts: 2064
Joined: Thu Sep 18, 2008 10:24 pm

SMP SF Formula

Post by Werewolf »

Hi,

In the old days we had the formula N^0.76 for multi-core speedup in Rybka.

With SF, do we have a rough equivalent? I'm wondering how much faster 8, 16, 32 threads are than 1 thread - assuming clock speed stays constant and the number of physical cores is never less than the number of threads.
Paloma
Posts: 1219
Joined: Thu Dec 25, 2008 9:07 pm
Full name: Herbert L

Re: SMP SF Formula

Post by Paloma »

lkaufman
Posts: 6281
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: SMP SF Formula

Post by lkaufman »

Paloma wrote: Sat Oct 30, 2021 4:35 pm Maybe this:
http://www.fastgm.de/threads3.html
That answers the question as worded, but that is the wrong question to ask. If we run eight threads (for example) on eight cores, we may get close to an 8 to 1 speedup in NPS, but that is useless information. What we should ask is what ratio of time odds can eight threads give to one thread and come out even? I would guess the answer is about 5 to 1 (for Stockfish or Dragon), nowhere near 8 to 1, because many nodes are redundant with MP. I do remember doing such tests a decade ago with four cores and four threads, and back then most programs got somewhere around 2.8 to 1 effective speedup; I think Stockfish might have been the first to reah 3 to 1. I don't know if any engines are much better than 3 to 1 with four threads now.
Komodo rules!
Werewolf
Posts: 2064
Joined: Thu Sep 18, 2008 10:24 pm

Re: SMP SF Formula

Post by Werewolf »

My question was not about asking for a NPS speedup, but rather an effective search speedup which included losses due to SMP search inefficiency.

I was expecting the answer for SF to be about 6 to 1 for 8 threads under the conditions cited.
lkaufman
Posts: 6281
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: SMP SF Formula

Post by lkaufman »

Werewolf wrote: Sat Oct 30, 2021 10:05 pm My question was not about asking for a NPS speedup, but rather an effective search speedup which included losses due to SMP search inefficiency.

I was expecting the answer for SF to be about 6 to 1 for 8 threads under the conditions cited.
That is equivalent to the question I posed, which proposes a test that would answer your question. 5 to 1 would have been the normal answer some years ago; if it has reached 6 to 1, that would be a very good result.
Komodo rules!
petero2
Posts: 733
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: SMP SF Formula

Post by petero2 »

I agree Larry's proposed test is a good way to measure how much SMP improves the strength of an engine.

Turbo boost is however an additional complication. If you are an engine programmer, you would probably be most interested in the result you get by performing Larry's test with turbo boost disabled, but if you are an engine user, there would be no reason to cripple the hardware by disabling turbo boost, so you would probably be most interested in the result of Larry's test with turbo boost enabled.
Werewolf
Posts: 2064
Joined: Thu Sep 18, 2008 10:24 pm

Re: SMP SF Formula

Post by Werewolf »

lkaufman wrote: Sat Oct 30, 2021 10:55 pm
Werewolf wrote: Sat Oct 30, 2021 10:05 pm My question was not about asking for a NPS speedup, but rather an effective search speedup which included losses due to SMP search inefficiency.

I was expecting the answer for SF to be about 6 to 1 for 8 threads under the conditions cited.
That is equivalent to the question I posed, which proposes a test that would answer your question. 5 to 1 would have been the normal answer some years ago; if it has reached 6 to 1, that would be a very good result.
Assuming no search "thickening", time to depth tested with a different number of threads would also work?
lkaufman
Posts: 6281
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: SMP SF Formula

Post by lkaufman »

Werewolf wrote: Sun Oct 31, 2021 5:43 am
lkaufman wrote: Sat Oct 30, 2021 10:55 pm
Werewolf wrote: Sat Oct 30, 2021 10:05 pm My question was not about asking for a NPS speedup, but rather an effective search speedup which included losses due to SMP search inefficiency.

I was expecting the answer for SF to be about 6 to 1 for 8 threads under the conditions cited.
That is equivalent to the question I posed, which proposes a test that would answer your question. 5 to 1 would have been the normal answer some years ago; if it has reached 6 to 1, that would be a very good result.
Assuming no search "thickening", time to depth tested with a different number of threads would also work?
The assumption is wrong for almost all engines of the past 5 years, so that will not work or even come close to working properly.
Komodo rules!
Jouni
Posts: 3786
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: SMP SF Formula

Post by Jouni »

Stockfish wiki before NNUE:

Playing 8 threads vs 1 thread at LTC (60+0.6, 8moves_v3.pgn):

Score of t8 vs seq: 476 - 3 - 521 [0.737] 1000
Elo difference: 178.6 +/- 14.0, LOS: 100.0 %, DrawRatio: 52.1 %

Playing 1 thread at 8xLTC (480+4.8) vs (60+0.6) (8moves_v3.pgn):

Score of seq8 vs seq: 561 - 5 - 434 [0.778] 1000
Elo difference: 217.9 +/- 15.8, LOS: 100.0 %, DrawRatio: 43.4 %

Which is roughly 82% efficiency (178/218).

ProDeo forum after NNEU:

1 Stockfish 17/09/21 - 8 CPU 11 9 160 5 0 155 82.5 51.6% 96.9%
2 Stockfish 17/09/21 - 16 CPU 9 8 160 4 0 156 82.0 51.2% 97.5%
3 Stockfish 17/09/21 - 4 CPU 7 9 160 4 1 155 81.5 50.9% 96.9%
4 Stockfish 17/09/21 - 2 CPU -2 9 160 2 3 155 79.5 49.7% 96.9%
5 Stockfish 17/09/21 - 1 CPU -24 14 160 0 11 149 74.5 46.6% 93.1%

+33 ELO from 16 cores!
Jouni
Werewolf
Posts: 2064
Joined: Thu Sep 18, 2008 10:24 pm

Re: SMP SF Formula

Post by Werewolf »

Jouni wrote: Sun Oct 31, 2021 8:59 am Stockfish wiki before NNUE:

Playing 8 threads vs 1 thread at LTC (60+0.6, 8moves_v3.pgn):

Score of t8 vs seq: 476 - 3 - 521 [0.737] 1000
Elo difference: 178.6 +/- 14.0, LOS: 100.0 %, DrawRatio: 52.1 %

Playing 1 thread at 8xLTC (480+4.8) vs (60+0.6) (8moves_v3.pgn):

Score of seq8 vs seq: 561 - 5 - 434 [0.778] 1000
Elo difference: 217.9 +/- 15.8, LOS: 100.0 %, DrawRatio: 43.4 %

Which is roughly 82% efficiency (178/218).

ProDeo forum after NNEU:

1 Stockfish 17/09/21 - 8 CPU 11 9 160 5 0 155 82.5 51.6% 96.9%
2 Stockfish 17/09/21 - 16 CPU 9 8 160 4 0 156 82.0 51.2% 97.5%
3 Stockfish 17/09/21 - 4 CPU 7 9 160 4 1 155 81.5 50.9% 96.9%
4 Stockfish 17/09/21 - 2 CPU -2 9 160 2 3 155 79.5 49.7% 96.9%
5 Stockfish 17/09/21 - 1 CPU -24 14 160 0 11 149 74.5 46.6% 93.1%

+33 ELO from 16 cores!
Is this right? What could cause this?