2950x SMT ON and SMT OFF

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: 2950x SMT ON and SMT OFF

Post by mwyoung »

Laskos wrote: Mon Mar 04, 2019 9:41 am
mwyoung wrote: Mon Mar 04, 2019 4:31 am
jjoshua2 wrote: Mon Mar 04, 2019 4:10 am
mwyoung wrote: Thu Feb 28, 2019 4:15 am
jjoshua2 wrote: Thu Feb 28, 2019 2:29 am if threadcount and NPS are the same
NPS only hold true. If you are using the same number of threads or cores.
...

This is why if you look at both screen shots. And look at the total nodes searched the reach depth 36 ply. The 32 thread Stockfish had to search more then 1.1 billion more nodes. To reach the same depth as the 16 thread Stockfish.
We seem to agree that NPS and threads are important, and NPS only works for same number of threads, but you seem to have missed the part where I said threads have to be the same when you give example of different thread count.
I really don't think nps are important at all. When determining the speed of a chess program. It is like looking at your cars tachometer to judge how fast you are going.

This only works if you have one gear or CPU core.

If you want to know how fast. Or who is the fastest. You use the speedometer or odometer. Time over distance. Like time to depth.

I am all for people using whatever makes them feel better. Self delusion can be a wonderful thing. Look ma at my NPS.
Your time-to-depth argument was valid with YBW SMP, the engines had the same "ply value" independently of the number of threads. Nowadays, SF and many other top engines use Lazy SMP, the tree is widening with the number of threads, so each "ply value" is larger with larger number of threads. You can test that easily even in 100 games (the difference is large enough): use SF on 1 thread to depth 18 against SF on 16 or 32 threads to depth 18. You will see that 1 threaded SF loses heavily, although the depth is the same. Ronald already mentioned a similar thing. Also, the scaling to even 32 threads is still good with Lazy SMP Stockfish (YBW old SF 16 -> 32 threads scaling was very bad), and the effective strength (time-to-strength) scaling might be even closer to NPS scaling than to time-to-depth scaling, somewhere in-between anyway, for 16 -> 32 threads.
Why would you want to use a measure that is not very accurate. And depending on the program tested very very inaccurate.

I will use the measure that works every time across all programs.

Another example of why not to use NPS. Is HT size.

HT size can in some programs decrease the NPS, but improve the search speed. This performance gain is only seen. If time to depth is used as the measure.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: 2950x SMT ON and SMT OFF

Post by Laskos »

mwyoung wrote: Mon Mar 04, 2019 2:25 pm
Laskos wrote: Mon Mar 04, 2019 9:41 am
mwyoung wrote: Mon Mar 04, 2019 4:31 am
jjoshua2 wrote: Mon Mar 04, 2019 4:10 am
mwyoung wrote: Thu Feb 28, 2019 4:15 am
jjoshua2 wrote: Thu Feb 28, 2019 2:29 am if threadcount and NPS are the same
NPS only hold true. If you are using the same number of threads or cores.
...

This is why if you look at both screen shots. And look at the total nodes searched the reach depth 36 ply. The 32 thread Stockfish had to search more then 1.1 billion more nodes. To reach the same depth as the 16 thread Stockfish.
We seem to agree that NPS and threads are important, and NPS only works for same number of threads, but you seem to have missed the part where I said threads have to be the same when you give example of different thread count.
I really don't think nps are important at all. When determining the speed of a chess program. It is like looking at your cars tachometer to judge how fast you are going.

This only works if you have one gear or CPU core.

If you want to know how fast. Or who is the fastest. You use the speedometer or odometer. Time over distance. Like time to depth.

I am all for people using whatever makes them feel better. Self delusion can be a wonderful thing. Look ma at my NPS.
Your time-to-depth argument was valid with YBW SMP, the engines had the same "ply value" independently of the number of threads. Nowadays, SF and many other top engines use Lazy SMP, the tree is widening with the number of threads, so each "ply value" is larger with larger number of threads. You can test that easily even in 100 games (the difference is large enough): use SF on 1 thread to depth 18 against SF on 16 or 32 threads to depth 18. You will see that 1 threaded SF loses heavily, although the depth is the same. Ronald already mentioned a similar thing. Also, the scaling to even 32 threads is still good with Lazy SMP Stockfish (YBW old SF 16 -> 32 threads scaling was very bad), and the effective strength (time-to-strength) scaling might be even closer to NPS scaling than to time-to-depth scaling, somewhere in-between anyway, for 16 -> 32 threads.
Why would you want to use a measure that is not very accurate. And depending on the program tested very very inaccurate.

I will use the measure that works every time across all programs.
What measure I proposed? What measure you want to use which "works every time across all programs"? I repeat, if you think it's time-to-depth, you are WRONG about modern top engines using Lazy SMP. There is no clear measure, we are just pretty sure that it's somewhere between time-to-depth and NPS. If you want something more precise you need to play real games and get real strength. This is not that easy, as you will need at least several hundreds, if not thousands of games on full CPU (concurrency=1) in your example to discern some 10% effective speed-up differences.
Raj Gupta
Posts: 7
Joined: Sun Dec 02, 2018 10:44 pm
Full name: Rajen B Gupta

Re: 2950x SMT ON and SMT OFF

Post by Raj Gupta »

So in practical terms, which CPU would be more powerful ie give better performance for chess for the modern top of the line progs such as the top 10 (not counting NN)

a 9700k with 8 threads, 8 cores or a 9900k with 8 cores and 16 threads (assuming that the all core clock has been set to identical) and all threads have been activated in the relevant engine settings

raj
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: 2950x SMT ON and SMT OFF

Post by mwyoung »

Raj Gupta wrote: Tue Mar 05, 2019 10:09 pm So in practical terms, which CPU would be more powerful ie give better performance for chess for the modern top of the line progs such as the top 10 (not counting NN)

a 9700k with 8 threads, 8 cores or a 9900k with 8 cores and 16 threads (assuming that the all core clock has been set to identical) and all threads have been activated in the relevant engine settings

raj
Given your example the 8 core 16 thread 9900k will be fastest. To know how much faster use time to depth. Not NPS. As HT does increase real performance of chess engines. But not as much as the nps would suggest. This is due to splitting the work load to 16 threads. And there is a enificancy to spitting the work load to more threads.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.