SF3: Using Sleeping Threads makes big difference

Gusev · Post by **Gusev** » Sun May 12, 2013 7:42 pm

syzygy wrote:Simply comparing nps between 4 threads with "sleeping threads" enabled and 4 threads with "sleeping threads" disabled should already be sufficient to determine what works best. Same with 8 threads.

I just ran a quick test on my (6-core) computer with 6 and 12 threads (on Linux). With 6 threads, Stockfish seems to give slightly higher nps with "Use Sleeping Threads" set to false. With 12 threads, Stockfish gives higher nps with "Use Sleeping Threads" set to true.

You seem to disagree with Houdart, http://www.cruxis.com/chess/manual/inde ... gement.htm:

The architecture of Houdini (and of chess engines in general) is not very well suited for hyper-threading; using more threads than physical cores will usually degrade the performance of the engine. Although the hyper-threads often produce a slightly higher node speed, the increased inefficiency of the parallel alpha-beta search more than offsets the speed gain obtained with the additional hyper-threads.

To give a practical example, it's more efficient to use 4 threads running at 2,000 kN/s each than 8 threads running at 1,100 kN/s each, although the latter situation produces a higher total node speed.

For this reason it's best to set the number of threads not higher than the number of physical cores of your hardware.

In the quote above, Houdart clearly states that higher nps does not guarantee more engine strength with hyperthreading.

hgm · Post by **hgm** » Sun May 12, 2013 8:31 pm

That is completely beside the point. Syzygy is comparing sleeping threads on vs sleeping threads off. Houdart says something about HT on vs HT off. You might as well have quoted a report that stated flying is faster than driving...

syzygy · Post by **syzygy** » Sun May 12, 2013 8:50 pm

Gusev wrote:
syzygy wrote:Simply comparing nps between 4 threads with "sleeping threads" enabled and 4 threads with "sleeping threads" disabled should already be sufficient to determine what works best. Same with 8 threads.

I just ran a quick test on my (6-core) computer with 6 and 12 threads (on Linux). With 6 threads, Stockfish seems to give slightly higher nps with "Use Sleeping Threads" set to false. With 12 threads, Stockfish gives higher nps with "Use Sleeping Threads" set to true.
You seem to disagree with Houdart, http://www.cruxis.com/chess/manual/inde ... gement.htm:
The architecture of Houdini (and of chess engines in general) is not very well suited for hyper-threading; using more threads than physical cores will usually degrade the performance of the engine. Although the hyper-threads often produce a slightly higher node speed, the increased inefficiency of the parallel alpha-beta search more than offsets the speed gain obtained with the additional hyper-threads.

To give a practical example, it's more efficient to use 4 threads running at 2,000 kN/s each than 8 threads running at 1,100 kN/s each, although the latter situation produces a higher total node speed.

For this reason it's best to set the number of threads not higher than the number of physical cores of your hardware.
In the quote above, Houdart clearly states that higher nps does not guarantee more engine strength with hyperthreading.

There is no disagreement at all. I am talking about comparing 4 threads to 4 threads and 8 threads to 8 threads. I am not talking about comparing 4 threads to 8 threads.

If you keep the number of threads constant (whether it is 4 threads on 4 cores of 8 hyperthreads on 4 cores), then higher nps means increased playing strength.

The "increased inefficiency of the parallel alpha-beta search" refers to the doubling of threads. If you don't double the number of threads, there is no increased inefficiency.

Gusev · Post by **Gusev** » Sun May 12, 2013 10:46 pm

syzygy wrote:
Gusev wrote:
syzygy wrote:Simply comparing nps between 4 threads with "sleeping threads" enabled and 4 threads with "sleeping threads" disabled should already be sufficient to determine what works best. Same with 8 threads.

I just ran a quick test on my (6-core) computer with 6 and 12 threads (on Linux). With 6 threads, Stockfish seems to give slightly higher nps with "Use Sleeping Threads" set to false. With 12 threads, Stockfish gives higher nps with "Use Sleeping Threads" set to true.
You seem to disagree with Houdart, http://www.cruxis.com/chess/manual/inde ... gement.htm:
The architecture of Houdini (and of chess engines in general) is not very well suited for hyper-threading; using more threads than physical cores will usually degrade the performance of the engine. Although the hyper-threads often produce a slightly higher node speed, the increased inefficiency of the parallel alpha-beta search more than offsets the speed gain obtained with the additional hyper-threads.

To give a practical example, it's more efficient to use 4 threads running at 2,000 kN/s each than 8 threads running at 1,100 kN/s each, although the latter situation produces a higher total node speed.

For this reason it's best to set the number of threads not higher than the number of physical cores of your hardware.
In the quote above, Houdart clearly states that higher nps does not guarantee more engine strength with hyperthreading.
There is no disagreement at all. I am talking about comparing 4 threads to 4 threads and 8 threads to 8 threads. I am not talking about comparing 4 threads to 8 threads.

If you keep the number of threads constant (whether it is 4 threads on 4 cores of 8 hyperthreads on 4 cores), then higher nps means increased playing strength.

The "increased inefficiency of the parallel alpha-beta search" refers to the doubling of threads. If you don't double the number of threads, there is no increased inefficiency.

There may be more than one way to increase inefficiency. It's conceivable that switching from sleeping threads to spinning threads would do that and thus offset the benefit of higher nps, in a full analogy with what Houdart described. However, I will follow up by measuring the nps for my system and report.

Gusev · Post by **Gusev** » Sun May 12, 2013 10:54 pm

hgm wrote:Houdart says something about HT on vs HT off.

Not exactly. Hyperthreaded architecture remains hyperthreaded, 4 threads or 8 threads. Houdart merely asserts that there is no benefit of running 8 threads on 4 hyperthreaded cores, despite higher nps. Generalizing that example, I conclude that higher nps does not always guarantee stronger engine performance.

hgm · Post by **hgm** » Sun May 12, 2013 11:14 pm

Then you generalize it the wrong way. If the number of threads is the same, then higher nps will guarantee stronger engine performance. And switching the option 'sleeping threads' on or off will not alter the number of threads. So Houdart's observation completely irrelevant for the topic under discussion.

That switching the option could 'conceivably have the same effect as changing the number of threads' is about as ridiculous as assuming that the color of your tie could have that effect. Have you already tested if Stockfish becomes stronger when you wear a pink tie?

Gusev · Post by **Gusev** » Sun May 12, 2013 11:28 pm

And switching the option 'sleeping threads' on or off will not alter the number of threads.

However, it will alter the threads' behavior. And this change may be making difference, according to my test results.

hgm · Post by **hgm** » Sun May 12, 2013 11:47 pm

So you bungled the test...

Remember that to the ignorant anything is conceivable. This is known as 'superstition'.

Gusev · Post by **Gusev** » Mon May 13, 2013 12:24 am

hgm wrote:So you bungled the test...

Remember that to the ignorant anything is conceivable. This is known as 'superstition'.

Four times? I ran a full matrix, there is nothing to bungle there. There were four pairs of cases of sleeping threads ON/OFF. Besides, Owl reported the same phenomenon before me, that's why I checked. You are being rude for no obvious reason. You must agree that sleeping threads and spinning threads are not doing the same exact thing.

hgm · Post by **hgm** » Mon May 13, 2013 11:28 pm

Control-flow-wise they are exactly the same thing. The only difference is whether they waste CPU time while they are waiting or not, and thereby slow down the threads that are running (if the threads are competing for CPU, because you have more threads than cores).

There are many ways that can mess up test results. For one the result is not very significant: it is not like you see a 10-sigma deviation, barely above 1-sigma, so it can very easily be nothing but noise. And perhaps the games are not as independent as you think, increasing the standard deviation from what the simple sqrt(N) formula assumes? E.g. what if one engine in one of the test happened to be mapped into memory in such a way that there were more collisions in the Level-2 cache in one run than in the other (slowing it consistently down for the entire match)? What steps did you take to exclude that?

When you know one thing from well-founded first principles, and you see another thing from an inherently difficult, temperamental and noisy test method the logical conclusion is to reject the test, not doubt the principles. Just like you would do when scientist claims to have measured cold fusion, or measure effects of chemicals that have been diluted 10^100 times, or discovered perpetual motion...

SF3: Using Sleeping Threads makes big difference

Re: SF3: Using Sleeping Threads makes big difference

Re: SF3: Using Sleeping Threads makes big difference

Re: SF3: Using Sleeping Threads makes big difference

Re: SF3: Using Sleeping Threads makes big difference

Re: SF3: Using Sleeping Threads makes big difference

Re: SF3: Using Sleeping Threads makes big difference

Re: SF3: Using Sleeping Threads makes big difference

Re: SF3: Using Sleeping Threads makes big difference

Re: SF3: Using Sleeping Threads makes big difference

Re: SF3: Using Sleeping Threads makes big difference