TCEC resurrection - need to decide between ponder ON or OFF

Don · Post by **Don** » Wed Jan 02, 2013 8:28 pm

Lavir wrote:
Laskos wrote: Why not 4 cores with HT on? You will still have some CPU time available for other small tasks.
That would surely be the best solution if no much use is done of the PC, however in the case he wants to use some other application that requires CPU use not just limited to little spikes but semi-permanent, using 4 cores with HT can produce some unpredictability as Windows will switch physical cores around (and probably set some threads to logical processes instead), especially if the engines are set with lower priority. Using affinities will prevent this, but in that case it will be just like using the full 4 cores without HT, and could produce the same problems that would arise in that case.

So it all depends on what use is made of the PC. If it is only surfing etc. then 4 cores/HT would probably be the best solution, but if instead a more resourceful use of the PC is done (as an example using Photoshop or similar applications that require CPU time always) then it's better to use 3 cores, in that way you are sure that in any case physical cores are used by the engines.

CPU's don't work that way. We pretend that a 4 core machine is like 4 separate 1 core machines but that isn't the way it works.

I did some studies a while back which seemed to indicate that thread do not get very even allocation. If you have a quad and run 4 equal jobs things will be relatively balanced. If you run 3, for some reason there was not balance, one or two would get more resources. Even in the presence of additional jobs such as a GUI we have found that this is the best formula for fairness - 4 cores, 4 tests.

Don't take my word for it. set up a test script. start up 1, 2, 3 or 4 programs at the same time and do a deep fixed depth search and exit and look at the times of each run. CPU's have advanced since I last performed this basic test and thus instead of guessing, give it a try!

We also have discovered that running 2 copies of the same binary gives that binary an advantage. This is in spite of the fact that our tester loads each program from scratch before each game. Maybe Bob or someone else can explain this but I assume that if you load the same binary it will share the same address space and perhaps gain something from caching.

The memory footprint of each program affects the other programs too. So testing on a single CPU is never going to be completely fair but there is not much that can be done about this. The proper way to test is to run each program on it's own CPU and give it dedicated access but for SP programs that is a big waste of resources since it leaves a lot of cores idling.

Don · Post by **Don** » Wed Jan 02, 2013 8:30 pm

Martin Thoresen wrote:
Lavir wrote: That would surely be the best solution if no much use is done of the PC, however in the case he wants to use some other application that requires CPU use not just limited to little spikes but semi-permanent, using 4 cores with HT can produce some unpredictability as Windows will switch physical cores around (and probably set some threads to logical processes instead), especially if the engines are set with lower priority. Using affinities will prevent this, but in that case it will be just like using the full 4 cores without HT, and could produce the same problems that would arise in that case.

So it all depends on what use is made of the PC. If it is only surfing etc. then 4 cores/HT would probably be the best solution, but if instead a more resourceful use of the PC is done (as an example using Photoshop or similar applications that require CPU time always) then it's better to use 3 cores, in that way you are sure that in any case physical cores are used by the engines.
Fabio, spot on. That's exactly why I am limiting the number of cores to 3 physical instead of all 4/HT.

As I hinted at in my previous post, that may be an unnecessary superstition that could make matters worse. I will do a test to see how this comes out on my machine - maybe it's better your way but maybe it's not.

Don

Lavir · Post by **Lavir** » Wed Jan 02, 2013 8:49 pm

Don wrote: The memory footprint of each program affects the other programs too. So testing on a single CPU is never going to be completely fair but there is not much that can be done about this. The proper way to test is to run each program on it's own CPU and give it dedicated access but for SP programs that is a big waste of resources since it leaves a lot of cores idling.

Oh this I know. It is for this exact motive that everytime I hear people say something like "using ponder ON is bad because engines share resources" I facepalm myself; using only 1 PC there will always be sharing of resources, sadly, and there's no way outside of this.

Even the hash memory will be "shared" in part (and this can be easily tested) and so PB ON/OFF makes no difference at all. The only way to not share is to use 2 PCs, and so in part all the tests done by testers with 1 PC only have "dirty" results.

For what it concerns threads, you are right also there, on the whole, but in this specific case if you put 2 engines that do 100% of the work in 3 threads (a permanent work) then the OS will have allocated threads appropriately and so the other work goes to the part of the threads not allocated. That's part of 1 or 1 in full it doesn't change; the important things is that that space has the possibility of getting allocated, elsewhere some will have to be freed and this will be taken from the one already allocated (so in use by the engines). If the space not allocated is only from logical cores the OS will try to free space from the physical ones in case the work has higher priority and/or it's of a semi-permanent nature (and for this my example on the difference between surfing and using something like Photoshop).

Don · Post by **Don** » Wed Jan 02, 2013 9:10 pm

Lavir wrote:
Don wrote: The memory footprint of each program affects the other programs too. So testing on a single CPU is never going to be completely fair but there is not much that can be done about this. The proper way to test is to run each program on it's own CPU and give it dedicated access but for SP programs that is a big waste of resources since it leaves a lot of cores idling.
Oh this I know. It is for this exact motive that everytime I hear people say something like "using ponder ON is bad because engines share resources" I facepalm myself; using only 1 PC there will always be sharing of resources, sadly, there's no way outside of this.

Even the hash memory will be "shared" in part (and this can be easily tested) and so PB ON/OFF makes no difference at all.

For what it concerns threads, you are right also there, on the whole, but in this specific case if you put 2 engines that do 100% of the work in 3 threads (a permanent work) then the OS will have allocated threads appropriately and so the other work goes to the part of the threads not allocated. That's part of 1 or 1 in full it doesn't change; the important things is that that space has the possibility of getting allocated, elsewhere some will have to be freed and this will be taken from the one already allocated (so in use by the engines).

I did a quick test and the problem is not like it used to be. I'm using a relatively new i5 - 4 cores but no hyper-threading.

In a script I start 1, 2, 3 or 4 versions of "Komodo -bench" at the same time and list the nodes per second. The nodes are deterministic so this basically measures the speedups:

1 core : 1.5223
2 cores: 1.4763 1.4763
3 cores: 1.4618 1.4610 1.4601
4 cores: 1.4048 1.3866 1.3908 1.4084

So it turns out that Martins idea is correct, to use 3 threads for testing and to leave one free for the GUI. With 3 cores I get very stable numbers compared to using 4 cores.

What is also interesting is something you probably already know. In a quad the separate cores are far from independent. Each utilized core has an impact on the other cores.

As far as ponder is concerned, I see no problem with pondering if you allocate an additional core for each program. That restricts you to 2 matches on a 4 core machine for pondering. I would be pretty leering of leaning on hyperthreading but hyperthread clearly does work. We generally allocate 8 matches to 4 cores on our hyperthreaded machines and it definitely boosts the throughput. If you are doing fixed depth searches you will get more games per second by over-provisioning them this way. Some of this boost could be related to I/O overhead when playing really fast games. When doing I/O the program is basically idling and over-provisioning can take advantage of this idle time.

Testing with ponder however is a waste of resources unless you are specifically testing the ponder algorithm itself. The reason should be obvious, you utilize an entire extra core to ponder but you get a relatively small advantage for doing so. Pondering is roughly equivalent to getting 30% more time on average but at 100% of the cost. It's a really bad deal! If you do business this way you are probably are poor!

For watching games I especially don't like pondering but that is an individual matter. I don't like to see a move responded to instantly if I am focused on watching the progress of the games and trying to understand them. It's a little thing I admit, but it can be annoying to me. Komodo sometimes responds quickly anyway, even without pondering but it is usually not an instant thing and you have some time to mentally digest the change. It has happened that with pondering that I get confused about who is to move if I was not looking at the board the instant both moves came flying in.

Don

Modern Times · Post by **Modern Times** » Wed Jan 02, 2013 9:24 pm

Martin Thoresen wrote: Even though people have 4 cores (or the more "bogus" 6-8 cores from AMD), one should consider the speed throughput. The latest Intel (Ivy Bridge) is still far superior to anything AMD have, clock for clock.

Well one thing is for sure - Ivybrige's 8 cores (4 cores + HT) is way more bogus than AMD's 8 cores on the Piledriver CPU. This comment in relation to the new proposal to use 7 threads on the Intel !!

TCEC resurrection - need to decide between ponder ON or OFF

Which option is the better?

Re: TCEC resurrection - need to decide between ponder ON or

Re: TCEC resurrection - need to decide between ponder ON or

Re: TCEC resurrection - need to decide between ponder ON or

Re: TCEC resurrection - need to decide between ponder ON or

Re: TCEC resurrection - need to decide between ponder ON or