2950x SMT ON and SMT OFF

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

2950x SMT ON and SMT OFF

Post by mwyoung »

Just a quick and dirty example of the speed up of using SMT ON 16 Cores + 32 threads vs. SMT Off + 16 Cores + 16 Threads

SMT On: NPS 32702 39.9% better then SMT Off.
SMT On: Time to depth 238s 19.3% better then SMT Off

Time to depth is the real performance increase, and why you should never use NPS to measure a chess engines performance when comparing different CPU cores or Threads.
SMT ON Time to depth 36.jpg
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: 2950x SMT ON and SMT OFF

Post by mwyoung »

SMT ON Time to depth 36.jpg
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mbabigian
Posts: 204
Joined: Tue Oct 15, 2013 2:34 am
Location: US
Full name: Mike Babigian

Re: 2950x SMT ON and SMT OFF

Post by mbabigian »

How many positions did you use and how many times did you run each position to calculate the time to depth figure?
“Censorship is telling a man he can't have a steak just because a baby can't chew it.” ― Mark Twain
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: 2950x SMT ON and SMT OFF

Post by mwyoung »

mbabigian wrote: Wed Feb 27, 2019 9:52 pm How many positions did you use and how many times did you run each position to calculate the time to depth figure?
As I said quick and dirty. To post this example.
I have tested this over many positions, with a 10x per position average.
I have also tested engine play. To see if the measured SMT advantage would show in real game play.
My conclusion is SMT gives the best performance.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
jjoshua2
Posts: 99
Joined: Sat Mar 10, 2018 6:16 am

Re: 2950x SMT ON and SMT OFF

Post by jjoshua2 »

I think NPS and thread count are what matters, not whether SMT is on or off. A modern computer could have a similar threadcount and NPS to a 5 year old computer that used actual cores instead of SMT, and it shouldn't really matter if threadcount and NPS are the same. Just cheaper hardware to get that performance now.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: 2950x SMT ON and SMT OFF

Post by mwyoung »

jjoshua2 wrote: Thu Feb 28, 2019 2:29 am I think NPS and thread count are what matters, not whether SMT is on or off. A modern computer could have a similar threadcount and NPS to a 5 year old computer that used actual cores instead of SMT, and it shouldn't really matter if threadcount and NPS are the same. Just cheaper hardware to get that performance now.
SMT is what allows you to split the core into 2 logical cores. So it matters. And NPS is not a good way to measure the speed of a chess program. If you are using different thread or core counts.

NPS only hold true. If you are using the same number of threads or cores.

Once you test C vs C+x. NPS fails to give accurate performance results. And Time to depth must be used.

And the reason is simple. NPS always goes up, but the work load is now split. But there is no great way of splitting the workload. So there is a large percentage of the same work being processed by both threads. And results is a loss of true speed, but not NPS. Because to the computer a Node is a Node. Even if it is the same node processed by the other thread.

And the more threads, the worse this effect becomes.

Example: Computer (a) runs stockfish 10 at 1,000,000 NPS on one thread. Computer (b) runs Stockfish 10 at 1,010,000 NPS on 2 threads.

What computer is faster running Stockfish 10? Computer (a)

This is why if you look at both screen shots. And look at the total nodes searched the reach depth 36 ply. The 32 thread Stockfish had to search more then 1.1 billion more nodes. To reach the same depth as the 16 thread Stockfish.
Last edited by mwyoung on Thu Feb 28, 2019 4:18 am, edited 1 time in total.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: 2950x SMT ON and SMT OFF

Post by syzygy »

mwyoung wrote: Wed Feb 27, 2019 6:51 pm Just a quick and dirty example of the speed up of using SMT ON 16 Cores + 32 threads vs. SMT Off + 16 Cores + 16 Threads

SMT On: NPS 32702 39.9% better then SMT Off.
SMT On: Time to depth 238s 19.3% better then SMT Off

Time to depth is the real performance increase, and why you should never use NPS to measure a chess engines performance when comparing different CPU cores or Threads.
The truth is probably in the middle (but closer to time to depth than to nps). Raw nps increase is a too high estimate (because the number of threads increases -> more searched nodes go wasted). Raw time to depth speed-up is a too low estimate (because the number of threads increases -> each ply is worth more).
jjoshua2
Posts: 99
Joined: Sat Mar 10, 2018 6:16 am

Re: 2950x SMT ON and SMT OFF

Post by jjoshua2 »

mwyoung wrote: Thu Feb 28, 2019 4:15 am
jjoshua2 wrote: Thu Feb 28, 2019 2:29 am if threadcount and NPS are the same
NPS only hold true. If you are using the same number of threads or cores.
...

This is why if you look at both screen shots. And look at the total nodes searched the reach depth 36 ply. The 32 thread Stockfish had to search more then 1.1 billion more nodes. To reach the same depth as the 16 thread Stockfish.
We seem to agree that NPS and threads are important, and NPS only works for same number of threads, but you seem to have missed the part where I said threads have to be the same when you give example of different thread count.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: 2950x SMT ON and SMT OFF

Post by mwyoung »

jjoshua2 wrote: Mon Mar 04, 2019 4:10 am
mwyoung wrote: Thu Feb 28, 2019 4:15 am
jjoshua2 wrote: Thu Feb 28, 2019 2:29 am if threadcount and NPS are the same
NPS only hold true. If you are using the same number of threads or cores.
...

This is why if you look at both screen shots. And look at the total nodes searched the reach depth 36 ply. The 32 thread Stockfish had to search more then 1.1 billion more nodes. To reach the same depth as the 16 thread Stockfish.
We seem to agree that NPS and threads are important, and NPS only works for same number of threads, but you seem to have missed the part where I said threads have to be the same when you give example of different thread count.
I really don't think nps are important at all. When determining the speed of a chess program. It is like looking at your cars tachometer to judge how fast you are going.

This only works if you have one gear or CPU core.

If you want to know how fast. Or who is the fastest. You use the speedometer or odometer. Time over distance. Like time to depth.

I am all for people using whatever makes them feel better. Self delusion can be a wonderful thing. Look ma at my NPS.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: 2950x SMT ON and SMT OFF

Post by Laskos »

mwyoung wrote: Mon Mar 04, 2019 4:31 am
jjoshua2 wrote: Mon Mar 04, 2019 4:10 am
mwyoung wrote: Thu Feb 28, 2019 4:15 am
jjoshua2 wrote: Thu Feb 28, 2019 2:29 am if threadcount and NPS are the same
NPS only hold true. If you are using the same number of threads or cores.
...

This is why if you look at both screen shots. And look at the total nodes searched the reach depth 36 ply. The 32 thread Stockfish had to search more then 1.1 billion more nodes. To reach the same depth as the 16 thread Stockfish.
We seem to agree that NPS and threads are important, and NPS only works for same number of threads, but you seem to have missed the part where I said threads have to be the same when you give example of different thread count.
I really don't think nps are important at all. When determining the speed of a chess program. It is like looking at your cars tachometer to judge how fast you are going.

This only works if you have one gear or CPU core.

If you want to know how fast. Or who is the fastest. You use the speedometer or odometer. Time over distance. Like time to depth.

I am all for people using whatever makes them feel better. Self delusion can be a wonderful thing. Look ma at my NPS.
Your time-to-depth argument was valid with YBW SMP, the engines had the same "ply value" independently of the number of threads. Nowadays, SF and many other top engines use Lazy SMP, the tree is widening with the number of threads, so each "ply value" is larger with larger number of threads. You can test that easily even in 100 games (the difference is large enough): use SF on 1 thread to depth 18 against SF on 16 or 32 threads to depth 18. You will see that 1 threaded SF loses heavily, although the depth is the same. Ronald already mentioned a similar thing. Also, the scaling to even 32 threads is still good with Lazy SMP Stockfish (YBW old SF 16 -> 32 threads scaling was very bad), and the effective strength (time-to-strength) scaling might be even closer to NPS scaling than to time-to-depth scaling, somewhere in-between anyway, for 16 -> 32 threads.