New AMD Ryzen™ Threadripper™ PRO 3995WX (Windows and Multithreading Problem)

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
Zerbinati
Posts: 122
Joined: Mon Aug 18, 2014 7:12 pm
Location: Trento (Italy)

Re: New AMD Ryzen™ Threadripper™ PRO 3995WX (Windows and Multithreading Problem)

Post by Zerbinati »

Joerg Oster wrote: Tue Apr 27, 2021 9:43 am
Zerbinati wrote: Mon Apr 26, 2021 10:21 pm Thanks so much Joerg!
Marco, do you get better performance now?
Yes Joerg
reducing the hash size..
Although of course I would like to find a better solution, to have the possibility to freely assign the ram as I do with my other systems.
Little test both of my systems Playchess server(endgame)
Image

Image
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: New AMD Ryzen™ Threadripper™ PRO 3995WX (Windows and Multithreading Problem)

Post by MikeB »

Zerbinati wrote: Tue Apr 27, 2021 11:02 am
Joerg Oster wrote: Tue Apr 27, 2021 9:43 am
Zerbinati wrote: Mon Apr 26, 2021 10:21 pm Thanks so much Joerg!
Marco, do you get better performance now?
Yes Joerg
reducing the hash size..
Although of course I would like to find a better solution, to have the possibility to freely assign the ram as I do with my other systems.
Little test both of my systems Playchess server(endgame)
Image

Image
they look like pretty decent k/nps to me ... ;>)
Image
Dann Corbit
Posts: 12797
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: New AMD Ryzen™ Threadripper™ PRO 3995WX (Windows and Multithreading Problem)

Post by Dann Corbit »

When you reduce the hash size to get higher NPS, is the time to depth smaller or larger on average?
I think it is possible that NPS is a red herring.
Depending on how they are counted, if a hash hit is not counted as a node, then it might really be a big slowdown.
If the hash table is pinned in memory and large, it makes no sense to me that reducing its size means more efficient computation.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: New AMD Ryzen™ Threadripper™ PRO 3995WX (Windows and Multithreading Problem)

Post by MikeB »

Image
User avatar
Ronald
Posts: 161
Joined: Tue Jan 23, 2018 10:18 am
Location: Rotterdam
Full name: Ronald Friederich

Re: New AMD Ryzen™ Threadripper™ PRO 3995WX (Windows and Multithreading Problem)

Post by Ronald »

MikeB wrote: Fri Apr 30, 2021 2:35 am this might be of interest ...

https://www.windowscentral.com/windows- ... 28-threads
This article is not correct (anymore). There was a bug in older Windows 10 Pro versions which caused the problem. Windows 10 Pro supports up to 128-threads. The Workstation and Enterprise versions support up to 256-threads but they still have the same "problem". All Windows version still can only support a maximum of 64 threads per processor. To overcome this problem Windows has introduced processor groups. Each group can support up to 64 CPU threads, Pro supports 2 groups, Workstation and Enterprise 4. With the 128-thread Threadrippers 2 groups are needed and they show up in the Task manager as 2 "numa" nodes. Windows places 32 cores of the Threadripper in each group, the other 32 are the hyperthreads of those cores.

If a multithreaded program is not group aware Windows will always place the program in 1 group, so the program can only use a maximum of 64 hyperthreads(thus 32 real cores), the other group will not be used. Setting the program to 128 threads will be of no use.

Stockfish and many other chess programs are groupaware and will first place the threads in the real cores of each group (so 32 threads in group 0 and 32 in group 1) remaining threads will be divided over the 2 groups. So when using 80 threads both groups will run 40 threads.

One problem remains however: when you run multiple concurrent single threaded games in fi cutechess Windows will decide in which group each instance will run. If you run more than 32 concurrent games, it's possible that more than 32 instances are running concurrently in one group, which means that some instances will run hyperthreaded and will thus be slower, which will result in less reliable test results. I currently only test with hyperthreading disabled on my 3990X.

In the taskmanager you can see what is happening inside the Threadripper when you view the CPU performance as "NUMA nodes"
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: New AMD Ryzen™ Threadripper™ PRO 3995WX (Windows and Multithreading Problem)

Post by MikeB »

Ronald wrote: Fri Apr 30, 2021 10:23 am
MikeB wrote: Fri Apr 30, 2021 2:35 am this might be of interest ...

https://www.windowscentral.com/windows- ... 28-threads
This article is not correct (anymore). There was a bug in older Windows 10 Pro versions which caused the problem. Windows 10 Pro supports up to 128-threads. The Workstation and Enterprise versions support up to 256-threads but they still have the same "problem". All Windows version still can only support a maximum of 64 threads per processor. To overcome this problem Windows has introduced processor groups. Each group can support up to 64 CPU threads, Pro supports 2 groups, Workstation and Enterprise 4. With the 128-thread Threadrippers 2 groups are needed and they show up in the Task manager as 2 "numa" nodes. Windows places 32 cores of the Threadripper in each group, the other 32 are the hyperthreads of those cores.

If a multithreaded program is not group aware Windows will always place the program in 1 group, so the program can only use a maximum of 64 hyperthreads(thus 32 real cores), the other group will not be used. Setting the program to 128 threads will be of no use.

Stockfish and many other chess programs are groupaware and will first place the threads in the real cores of each group (so 32 threads in group 0 and 32 in group 1) remaining threads will be divided over the 2 groups. So when using 80 threads both groups will run 40 threads.

One problem remains however: when you run multiple concurrent single threaded games in fi cutechess Windows will decide in which group each instance will run. If you run more than 32 concurrent games, it's possible that more than 32 instances are running concurrently in one group, which means that some instances will run hyperthreaded and will thus be slower, which will result in less reliable test results. I currently only test with hyperthreading disabled on my 3990X.

In the taskmanager you can see what is happening inside the Threadripper when you view the CPU performance as "NUMA nodes"
thanks for the update and the additional clarity , good to know and very helpful...
Image
syzygy
Posts: 5780
Joined: Tue Feb 28, 2012 11:56 pm

Re: New AMD Ryzen™ Threadripper™ PRO 3995WX (Windows and Multithreading Problem)

Post by syzygy »

Joerg Oster wrote: Mon Apr 26, 2021 10:16 pmIt is probably advisable to Limit the number of threads in this case to 16 or 32 at most.
Why would you want to limit a CPU with 64 physical cores to 16 or 32 threads?
Joerg Oster
Posts: 982
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: New AMD Ryzen™ Threadripper™ PRO 3995WX (Windows and Multithreading Problem)

Post by Joerg Oster »

syzygy wrote: Sat May 01, 2021 6:59 pm
Joerg Oster wrote: Mon Apr 26, 2021 10:16 pmIt is probably advisable to Limit the number of threads in this case to 16 or 32 at most.
Why would you want to limit a CPU with 64 physical cores to 16 or 32 threads?
For hash clearing purpose only, not for search.
Clearing fi 16 GB with 16 threads should be sufficiently fast.

My concern was that setting group affinity for the new created threads in TT.clear() might destroy the thread-binding of the search threads.
Jörg Oster
syzygy
Posts: 5780
Joined: Tue Feb 28, 2012 11:56 pm

Re: New AMD Ryzen™ Threadripper™ PRO 3995WX (Windows and Multithreading Problem)

Post by syzygy »

Joerg Oster wrote: Sat May 01, 2021 7:21 pm
syzygy wrote: Sat May 01, 2021 6:59 pm
Joerg Oster wrote: Mon Apr 26, 2021 10:16 pmIt is probably advisable to Limit the number of threads in this case to 16 or 32 at most.
Why would you want to limit a CPU with 64 physical cores to 16 or 32 threads?
For hash clearing purpose only, not for search.
Clearing fi 16 GB with 16 threads should be sufficiently fast.
But the Threads variable controls the number of search threads.

16 threads might be enough for TT.clear(), but I see no problem with using more.
My concern was that setting group affinity for the new created threads in TT.clear() might destroy the thread-binding of the search threads.
How could it?
Joerg Oster
Posts: 982
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: New AMD Ryzen™ Threadripper™ PRO 3995WX (Windows and Multithreading Problem)

Post by Joerg Oster »

syzygy wrote: Sat May 01, 2021 9:04 pm
Joerg Oster wrote: Sat May 01, 2021 7:21 pm My concern was that setting group affinity for the new created threads in TT.clear() might destroy the thread-binding of the search threads.
How could it?
I don't know. :lol:
Using the same id's for both raised some suspicion on my side.
But I realized they are only used for distributing among the available groups. Right?
Jörg Oster