What is the value of logical cores ( HT) for chess ?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: What is the value of logical cores ( HT) for chess ?

Post by MikeB »

lkaufman wrote: Mon Feb 24, 2020 3:38 am
MikeB wrote: Sun Feb 23, 2020 7:52 pm Back story - Hyper-threading (HT, AMD calls it something else , but it's the use of the logical cores in addition to the real cores)) ( was very primitive when it was first released and was clearly a detriment to most chess engines ( if not all ) at the time.
Overtime some engines seem to adapt to HT quite well, SF being one of them, as the use of the additional logical cores increase nps by approximately 50% or a little more. Hey - that must be worth some Elo right?

I run hundred of games of SF 30 real cores versus SF 60 cores and despite the hype ( pun intended), I see no measureable ELo gain.
What I do find is a 25% increase ( or a little more) in power consumption . Furthermore, when using hyperthreading in testing, I have found I need to run games at tc of 5 min with 3 second increment to get consistent results which are meaningful ( sorry, but 10 second games with 0.1 second increment to me have no meaning unless you want to an engine really good at 10 second games with 0.1 second increment).. So when I cut the concurrent games down to 30 from 60, nps increases 50% and games run at 2 min plus one second increment now obtain consistent results with a higher degree of correlation to longer time controls.

I am not saying this is fact, all I'm saying this is what it looks like to me based on my 30 years involved with computer chess.
What do other thinks - interested in all opinions, especially those who have looked at this perhaps a little more scientifically than I have. Thanks.

PS - In summary , I am now thinking running HT for chess is a waste of money since you can get to the same place with a 25%+ reduction in Energy costs. Also Fast Fritz, running on two RTX 2060 Super s( roughly 30K nps) is about equal to SF running on the 3970x ( using all cores - whether real or logical). Two RTX 2060 Supers cost about $800 ($400ea) , one 3970 costs about $2000 - so it looks like to me NN have surpassed AB engines in elo/$ - comments?

Edit: Also, if you make a conscious decision to use just real cores for chess, you can run the 3970x at a higher clock speed - maybe 0.1 to .15 Ghz higher, roughly ~2 to ~3% faster.
When running many single or four thread tests at once on one machine, we do it with HT off when we have control over this, using 15 threads on a 16 core for example, but if HT is on we use all but one thread (so 31 threads on a 16 core machine) or as close to that as possible. We believe this is best, but it's not certain. For optimum performance running just one game using the full power of the machine, on my new 3970x I'm convinced that 48 threads (with HT on, 32 cores) is better than 32 or 60, but of course some other number in that range may be even better. I don't suppose there is anything "magical" about using 3 threads for every 2 cores, but you never know.
I will have to test this - good thing I just added macOS Catalina Wallppaer in desktop slide show mode so that it changes every 6 hours - as I wait for the testing to complete.

Yes, I went the through the hassle of coverting Apple's Catalina HEIC file to 8 jpg files so that I can have a more Apple looking Windows 10 Pro desktop.

This download is huge - about 1/4 GB, but if you have an UHD monitor (4K) , it may be worth your while.

https://www.dropbox.com/s/z53ib307reruq ... w.zip?dl=0
Apple's wallpaper (desktop pictures) is the best.
Image
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: What is the value of logical cores ( HT) for chess ?

Post by MikeB »

MikeB wrote: Mon Feb 24, 2020 5:51 am
lkaufman wrote: Mon Feb 24, 2020 3:38 am
MikeB wrote: Sun Feb 23, 2020 7:52 pm Back story - Hyper-threading (HT, AMD calls it something else , but it's the use of the logical cores in addition to the real cores)) ( was very primitive when it was first released and was clearly a detriment to most chess engines ( if not all ) at the time.
Overtime some engines seem to adapt to HT quite well, SF being one of them, as the use of the additional logical cores increase nps by approximately 50% or a little more. Hey - that must be worth some Elo right?

I run hundred of games of SF 30 real cores versus SF 60 cores and despite the hype ( pun intended), I see no measureable ELo gain.
What I do find is a 25% increase ( or a little more) in power consumption . Furthermore, when using hyperthreading in testing, I have found I need to run games at tc of 5 min with 3 second increment to get consistent results which are meaningful ( sorry, but 10 second games with 0.1 second increment to me have no meaning unless you want to an engine really good at 10 second games with 0.1 second increment).. So when I cut the concurrent games down to 30 from 60, nps increases 50% and games run at 2 min plus one second increment now obtain consistent results with a higher degree of correlation to longer time controls.

I am not saying this is fact, all I'm saying this is what it looks like to me based on my 30 years involved with computer chess.
What do other thinks - interested in all opinions, especially those who have looked at this perhaps a little more scientifically than I have. Thanks.

PS - In summary , I am now thinking running HT for chess is a waste of money since you can get to the same place with a 25%+ reduction in Energy costs. Also Fast Fritz, running on two RTX 2060 Super s( roughly 30K nps) is about equal to SF running on the 3970x ( using all cores - whether real or logical). Two RTX 2060 Supers cost about $800 ($400ea) , one 3970 costs about $2000 - so it looks like to me NN have surpassed AB engines in elo/$ - comments?

Edit: Also, if you make a conscious decision to use just real cores for chess, you can run the 3970x at a higher clock speed - maybe 0.1 to .15 Ghz higher, roughly ~2 to ~3% faster.
When running many single or four thread tests at once on one machine, we do it with HT off when we have control over this, using 15 threads on a 16 core for example, but if HT is on we use all but one thread (so 31 threads on a 16 core machine) or as close to that as possible. We believe this is best, but it's not certain. For optimum performance running just one game using the full power of the machine, on my new 3970x I'm convinced that 48 threads (with HT on, 32 cores) is better than 32 or 60, but of course some other number in that range may be even better. I don't suppose there is anything "magical" about using 3 threads for every 2 cores, but you never know.
I will have to test this - good thing I just added macOS Catalina Wallppaer in desktop slide show mode so that it changes every 6 hours - as I wait for the testing to complete.

Yes, I went the through the hassle of coverting Apple's Catalina HEIC file to 8 jpg files so that I can have a more Apple looking Windows 10 Pro desktop.

This download is huge - about 1/4 GB, but if you have an UHD monitor (4K) , it may be worth your while.

https://www.dropbox.com/s/z53ib307reruq ... w.zip?dl=0
Apple's wallpaper (desktop pictures) is the best.
ok , now I know I'm getting way off topic and hijacking my own thread, but it's just so weird that MS desktop slideshow started with one monitor on picture #1 and the second monitor started on picture # 3 .... and I actually like it.

https://www.dropbox.com/s/xrbu0qmtibktb ... 7.jpg?dl=0

ok - end of being off topic, back to computer chess I go...
Image
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: What is the value of logical cores ( HT) for chess ?

Post by MikeB »

lkaufman wrote: Mon Feb 24, 2020 3:38 am
MikeB wrote: Sun Feb 23, 2020 7:52 pm Back story - Hyper-threading (HT, AMD calls it something else , but it's the use of the logical cores in addition to the real cores)) ( was very primitive when it was first released and was clearly a detriment to most chess engines ( if not all ) at the time.
Overtime some engines seem to adapt to HT quite well, SF being one of them, as the use of the additional logical cores increase nps by approximately 50% or a little more. Hey - that must be worth some Elo right?

I run hundred of games of SF 30 real cores versus SF 60 cores and despite the hype ( pun intended), I see no measureable ELo gain.
What I do find is a 25% increase ( or a little more) in power consumption . Furthermore, when using hyperthreading in testing, I have found I need to run games at tc of 5 min with 3 second increment to get consistent results which are meaningful ( sorry, but 10 second games with 0.1 second increment to me have no meaning unless you want to an engine really good at 10 second games with 0.1 second increment).. So when I cut the concurrent games down to 30 from 60, nps increases 50% and games run at 2 min plus one second increment now obtain consistent results with a higher degree of correlation to longer time controls.

I am not saying this is fact, all I'm saying this is what it looks like to me based on my 30 years involved with computer chess.
What do other thinks - interested in all opinions, especially those who have looked at this perhaps a little more scientifically than I have. Thanks.

PS - In summary , I am now thinking running HT for chess is a waste of money since you can get to the same place with a 25%+ reduction in Energy costs. Also Fast Fritz, running on two RTX 2060 Super s( roughly 30K nps) is about equal to SF running on the 3970x ( using all cores - whether real or logical). Two RTX 2060 Supers cost about $800 ($400ea) , one 3970 costs about $2000 - so it looks like to me NN have surpassed AB engines in elo/$ - comments?

Edit: Also, if you make a conscious decision to use just real cores for chess, you can run the 3970x at a higher clock speed - maybe 0.1 to .15 Ghz higher, roughly ~2 to ~3% faster.
When running many single or four thread tests at once on one machine, we do it with HT off when we have control over this, using 15 threads on a 16 core for example, but if HT is on we use all but one thread (so 31 threads on a 16 core machine) or as close to that as possible. We believe this is best, but it's not certain. For optimum performance running just one game using the full power of the machine, on my new 3970x I'm convinced that 48 threads (with HT on, 32 cores) is better than 32 or 60, but of course some other number in that range may be even better. I don't suppose there is anything "magical" about using 3 threads for every 2 cores, but you never know.
I just kicked off a match between two stockfishes - one with 48 threads and the one with 32 threads , game 2 min plus 1 second.

I can already tell it's not going be a big difference except for the temp spike of 8C everytime the 48 thread engine is making it' smove.

Even though it's 50% more threads, the NPS only goes up about 25% - logical cores are not as productive as real cores.. We might see a 10 Elo difference at best. At this level. There's going to be a lot of draws with identical engines. I see my energy usage has also gone up. It's about 25-30 more watts when the 48 thread SF is thinking.
Image
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: What is the value of logical cores ( HT) for chess ?

Post by MikeB »

MikeB wrote: Mon Feb 24, 2020 6:30 am
lkaufman wrote: Mon Feb 24, 2020 3:38 am
MikeB wrote: Sun Feb 23, 2020 7:52 pm Back story - Hyper-threading (HT, AMD calls it something else , but it's the use of the logical cores in addition to the real cores)) ( was very primitive when it was first released and was clearly a detriment to most chess engines ( if not all ) at the time.
Overtime some engines seem to adapt to HT quite well, SF being one of them, as the use of the additional logical cores increase nps by approximately 50% or a little more. Hey - that must be worth some Elo right?

I run hundred of games of SF 30 real cores versus SF 60 cores and despite the hype ( pun intended), I see no measureable ELo gain.
What I do find is a 25% increase ( or a little more) in power consumption . Furthermore, when using hyperthreading in testing, I have found I need to run games at tc of 5 min with 3 second increment to get consistent results which are meaningful ( sorry, but 10 second games with 0.1 second increment to me have no meaning unless you want to an engine really good at 10 second games with 0.1 second increment).. So when I cut the concurrent games down to 30 from 60, nps increases 50% and games run at 2 min plus one second increment now obtain consistent results with a higher degree of correlation to longer time controls.

I am not saying this is fact, all I'm saying this is what it looks like to me based on my 30 years involved with computer chess.
What do other thinks - interested in all opinions, especially those who have looked at this perhaps a little more scientifically than I have. Thanks.

PS - In summary , I am now thinking running HT for chess is a waste of money since you can get to the same place with a 25%+ reduction in Energy costs. Also Fast Fritz, running on two RTX 2060 Super s( roughly 30K nps) is about equal to SF running on the 3970x ( using all cores - whether real or logical). Two RTX 2060 Supers cost about $800 ($400ea) , one 3970 costs about $2000 - so it looks like to me NN have surpassed AB engines in elo/$ - comments?

Edit: Also, if you make a conscious decision to use just real cores for chess, you can run the 3970x at a higher clock speed - maybe 0.1 to .15 Ghz higher, roughly ~2 to ~3% faster.
When running many single or four thread tests at once on one machine, we do it with HT off when we have control over this, using 15 threads on a 16 core for example, but if HT is on we use all but one thread (so 31 threads on a 16 core machine) or as close to that as possible. We believe this is best, but it's not certain. For optimum performance running just one game using the full power of the machine, on my new 3970x I'm convinced that 48 threads (with HT on, 32 cores) is better than 32 or 60, but of course some other number in that range may be even better. I don't suppose there is anything "magical" about using 3 threads for every 2 cores, but you never know.
I just kicked off a match between two stockfishes - one with 48 threads and the one with 32 threads , game 2 min plus 1 second.

I can already tell it's not going be a big difference except for the temp spike of 8C everytime the 48 thread engine is making it' smove.

Even though it's 50% more threads, the NPS only goes up about 25% - logical cores are not as productive as real cores.. We might see a 10 Elo difference at best. At this level. There's going to be a lot of draws with identical engines. I see my energy usage has also gone up. It's about 25-30 more watts when the 48 thread SF is thinking.
The first 3 games were drawn,but the 32 thread SF will score the first win
in the 4th game.
Image
Dann Corbit
Posts: 12828
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: What is the value of logical cores ( HT) for chess ?

Post by Dann Corbit »

Hyperthreads are more useful for other applications that are only partly compute bound such as database queries.
I use my computers for other things, so I leave it on.
Another thing that is nice is (for example) getting a large percentage of compute power while still leaving the machine responsive.
For instance, with a 6 core system, with hyperthreading you can give 11 threads and the machine will stay responsive.

It is true that LC0 on 2 GPUs is about equal to SF. But they have different domains of excellence.
Tactical positions are analyzed better by SF, and very quiet positions are analyzed better by LC0 and there are positions in between where you want both opinions.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: What is the value of logical cores ( HT) for chess ?

Post by corres »

MikeB wrote: Mon Feb 24, 2020 6:30 am ...
Even though it's 50% more threads, the NPS only goes up about 25% - logical cores are not as productive as real cores.. We might see a 10 Elo difference at best. At this level. There's going to be a lot of draws with identical engines. I see my energy usage has also gone up. It's about 25-30 more watts when the 48 thread SF is thinking.
Logical core (HT for Intel, SMT for AMD) use only the remainder resources of CPU. They give minimal plus power of calculation but enhance the power consumption and the heat production of CPU. If somebody want to make repeatable and serious tests it is very advisable to turn it off together with every automated frequency tuning (OC) of CPU.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: What is the value of logical cores ( HT) for chess ?

Post by MikeB »

corres wrote: Mon Feb 24, 2020 8:46 am
MikeB wrote: Mon Feb 24, 2020 6:30 am ...
Even though it's 50% more threads, the NPS only goes up about 25% - logical cores are not as productive as real cores.. We might see a 10 Elo difference at best. At this level. There's going to be a lot of draws with identical engines. I see my energy usage has also gone up. It's about 25-30 more watts when the 48 thread SF is thinking.
Logical core (HT for Intel, SMT for AMD) use only the remainder resources of CPU. They give minimal plus power of calculation but enhance the power consumption and the heat production of CPU. If somebody want to make repeatable and serious tests it is very advisable to turn it off together with every automated frequency tuning (OC) of CPU.
This is slow going, as only one game can be played at a time.
Score of Stockfish-022220-32 vs Stockfish-022220-48: 7 - 2 - 54 [0.540]
Elo difference: 27.6 +/- 32.0, LOS: 95.2 %, DrawRatio: 85.7 %

63 of 100 games finished.

I will let it finish the 100 game set - but at least on my setup, it appears to be counter-productive. One artifact that attracted my attention and made me suspicious about logical cores was that I had noticed that on somes benches I was running with 32 threads vs 64 threads, the 32 threads benches were often completed quicker than the 64 threads benches, although the 64 threads benches showed noticeably higher (50%) nps.

Anyway , perhaps my machine is an outlier, I would be very interested for others who have a Threadripper 3970x. 3990x or even the 2990w to run a similar type test . With cutechess-GUI it's very easy to set up different number of threads. As an aside, I did run 32 threads versus 16 threads on the 32 logical core Threadripper and those results were as expected with the 32 thread SF dominating the 16 threaded SF, showing about a plus 90 Elo superiority. But that's 32 real cores vs 16 real cores. If this is true, it would not make sense to use logical cores at all for computer chess ( with CPUs similar to the Threadripper 3970X) as there is no chess benefit and it costs more ( higher power consumption => higher electric bills).
Image
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: What is the value of logical cores ( HT) for chess ?

Post by MikeB »

MikeB wrote: Mon Feb 24, 2020 1:06 pm
corres wrote: Mon Feb 24, 2020 8:46 am
MikeB wrote: Mon Feb 24, 2020 6:30 am ...
Even though it's 50% more threads, the NPS only goes up about 25% - logical cores are not as productive as real cores.. We might see a 10 Elo difference at best. At this level. There's going to be a lot of draws with identical engines. I see my energy usage has also gone up. It's about 25-30 more watts when the 48 thread SF is thinking.
Logical core (HT for Intel, SMT for AMD) use only the remainder resources of CPU. They give minimal plus power of calculation but enhance the power consumption and the heat production of CPU. If somebody want to make repeatable and serious tests it is very advisable to turn it off together with every automated frequency tuning (OC) of CPU.
This is slow going, as only one game can be played at a time.
Score of Stockfish-022220-32 vs Stockfish-022220-48: 7 - 2 - 54 [0.540]
Elo difference: 27.6 +/- 32.0, LOS: 95.2 %, DrawRatio: 85.7 %

63 of 100 games finished.

I will let it finish the 100 game set - but at least on my setup, it appears to be counter-productive. One artifact that attracted my attention and made me suspicious about logical cores was that I had noticed that on somes benches I was running with 32 threads vs 64 threads, the 32 threads benches were often completed quicker than the 64 threads benches, although the 64 threads benches showed noticeably higher (50%) nps.

Anyway , perhaps my machine is an outlier, I would be very interested for others who have a Threadripper 3970x. 3990x or even the 2990w to run a similar type test . With cutechess-GUI it's very easy to set up different number of threads. As an aside, I did run 32 threads versus 16 threads on the 32 logical core Threadripper and those results were as expected with the 32 thread SF dominating the 16 threaded SF, showing about a plus 90 Elo superiority. But that's 32 real cores vs 16 real cores. If this is true, it would not make sense to use logical cores at all for computer chess ( with CPUs similar to the Threadripper 3970X) as there is no chess benefit and it costs more ( higher power consumption => higher electric bills).
I'm killing it here as , SMT is just not for my system

Score of Stockfish-022220-32 vs Stockfish-022220-64: 8 - 4 - 60 [0.528]
Elo difference: 19.3 +/- 32.6, LOS: 87.6 %, DrawRatio: 83.3 %

72 of 100 games finished.
Now, in my quest to find out how to turn off SMT, this looks very interesting as it maty be possible to run at even greater frequencies with SMT turned off.
https://www.amd.com/system/files/docume ... -guide.pdf
Image
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: What is the value of logical cores ( HT) for chess ?

Post by corres »

MikeB wrote: Mon Feb 24, 2020 1:53 pm ...
Now, in my quest to find out how to turn off SMT, this looks very interesting as it maty be possible to run at even greater frequencies with SMT turned off.
https://www.amd.com/system/files/docume ... -guide.pdf
It is rather possible because the max.frequency depends on the max.temperature of CPU.
Generally AMD SMT can be turned on/off in the BIOS.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: What is the value of logical cores ( HT) for chess ?

Post by Zenmastur »

MikeB wrote: Mon Feb 24, 2020 1:53 pm
MikeB wrote: Mon Feb 24, 2020 1:06 pm
corres wrote: Mon Feb 24, 2020 8:46 am
MikeB wrote: Mon Feb 24, 2020 6:30 am ...
Even though it's 50% more threads, the NPS only goes up about 25% - logical cores are not as productive as real cores.. We might see a 10 Elo difference at best. At this level. There's going to be a lot of draws with identical engines. I see my energy usage has also gone up. It's about 25-30 more watts when the 48 thread SF is thinking.
Logical core (HT for Intel, SMT for AMD) use only the remainder resources of CPU. They give minimal plus power of calculation but enhance the power consumption and the heat production of CPU. If somebody want to make repeatable and serious tests it is very advisable to turn it off together with every automated frequency tuning (OC) of CPU.
This is slow going, as only one game can be played at a time.
Score of Stockfish-022220-32 vs Stockfish-022220-48: 7 - 2 - 54 [0.540]
Elo difference: 27.6 +/- 32.0, LOS: 95.2 %, DrawRatio: 85.7 %

63 of 100 games finished.

I will let it finish the 100 game set - but at least on my setup, it appears to be counter-productive. One artifact that attracted my attention and made me suspicious about logical cores was that I had noticed that on somes benches I was running with 32 threads vs 64 threads, the 32 threads benches were often completed quicker than the 64 threads benches, although the 64 threads benches showed noticeably higher (50%) nps.

Anyway , perhaps my machine is an outlier, I would be very interested for others who have a Threadripper 3970x. 3990x or even the 2990w to run a similar type test . With cutechess-GUI it's very easy to set up different number of threads. As an aside, I did run 32 threads versus 16 threads on the 32 logical core Threadripper and those results were as expected with the 32 thread SF dominating the 16 threaded SF, showing about a plus 90 Elo superiority. But that's 32 real cores vs 16 real cores. If this is true, it would not make sense to use logical cores at all for computer chess ( with CPUs similar to the Threadripper 3970X) as there is no chess benefit and it costs more ( higher power consumption => higher electric bills).
I'm killing it here as , SMT is just not for my system

Score of Stockfish-022220-32 vs Stockfish-022220-64: 8 - 4 - 60 [0.528]
Elo difference: 19.3 +/- 32.6, LOS: 87.6 %, DrawRatio: 83.3 %

72 of 100 games finished.
Now, in my quest to find out how to turn off SMT, this looks very interesting as it maty be possible to run at even greater frequencies with SMT turned off.
https://www.amd.com/system/files/docume ... -guide.pdf
I have seen similar things on my machine. Much higher CPU temps, watts drawn, for an approximately 50% increase in NPS.

There are so many variables in play that no simple analysis can be made. Thread switching on a core is a very expensive and time consuming process. If the task switch is triggered by a “relatively” low latency event, like memory access during a refresh cycle then, it will switch, do very little work, and then switch back to the original thread. This “thrashing” of threads is what you are experiencing. When a thread switch is triggered by a long latency event, say a request to an SSD ect. then much more useful work can be done before the, thread will switch back. In the later case the thread switch is worth it as a large amount of work can be done before the other thread becomes unblocked.

I think one of the problems is there isn't enough control over what the second thread is allowed to work on. If it starts working on a search where most of the TT hits are found in the cache of another core, or even worse on a core on another die then the amount of work the thread can get done in a given time is limited due to extra latency involved.

Even something as simple as TT size can be an issue. Having small TT's when using SMT may have enough of an effect on latency to have an effect on thread switching performance. When playing test games, it might be beneficial to have both engines locked to the same core. Having one engine on one core, and the second engine on a core on another die could create problems.

There are so many things to consider that “trial and error” tests seem to be in order.

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.