TalkChess.com

Posted: **Mon May 08, 2017 5:54 pm**

Zenmastur wrote:...I see I didn't answer one of the questions very well.
Namely this one:

Milos wrote: I also doubt you could gain much by running RAM on lower frequency. Usually total latency is pretty much constant for quite some range of frequencies.

I don't know why you think "total latency" is constant.

An extreme example might help clarify this question. Running DDR4 2400 at 10-12-12-28 1T vice running DDR4 2666 at 16-18-18-35 2T will show significant increases in NPS even though the 2400 is running at a lower speed. The difference is purely a function of latency. In less extreme examples, using the exact same dimms, small advantages can be had IN SOME CASES by running high speed dimms at lower frequencies and much lower latencies. This is mostly for those that are on a budget and want to get the highest possible NPS from their equipment. I've done this on a couple of my systems. This can require quite a bit of time tweaking the ram setting to get them just right. This is why not many people do it.

Regards,

Forrest

Your example is meaningless because it doesn't reflect physical reality.
You can't run DDR4 at 2666 with 16-18-18-35 2T and in the same time at 2400 with 10-12-12-28. You can't magically cut down latency for 33% while reducing clock speed for 10%. It doesn't work that way.
Total latency or total delay is a product of clock period and number of latency cycles and this total delay is physical property of process node that you use to fabricate the memory. Memory controller on the DIMM is usually very well tuned to actual dies used on the DIMM that no matter what frequency (<= than maximum frequency) you run it at you get more less the same total delay.
Here is a little explanation:
http://www.crucial.com/usa/en/memory-pe ... ed-latency

Posted: **Mon May 08, 2017 6:15 pm**

[quote="shrapnel"]
...
One guy even went so far as to suggest that in the case of the 1800X, given very fast RAM, there would almost be no need to overclock the CPU !!
...
[/quote]

Where lives that man?
To write something is one thing and to prove this it is another...
In reality the chess power of a CPU depends on RAM speed only some percent.
Naturally there are other tasks what are more sensitive to RAM speed than chess engines.
But there is no anything to replace the overclocking of a CPU!

Posted: **Mon May 08, 2017 6:28 pm**

[quote="Milos"]
...
Memory controller on the DIMM is usually very well tuned to actual dies used on the DIMM that no matter what frequency (<= than maximum frequency) you run it at you get more less the same total delay.
Here is a little explanation:
http://www.crucial.com/usa/en/memory-pe ... ed-latency[/quote]

At last, there are the correct answers our questions.
My experiments correlate to the text writing down.

Posted: **Mon May 08, 2017 6:31 pm**

Zenmastur wrote: Oh really???

The fact is, any time the page count for the TT exceeds the TLB size there will be misses. A 16 Gb TT has a total of 2^22 4k pages. A Haswell CPU has a TLB of 1024 entries. This number assumes 4k pages as the number of entries decreases as page size is increased. TT probes access pages in a pseudo-random manner. This means that on average the referenced page will only be in the TLB about 1/(2^22/2^10)= 0.0244% of the time. The other 99.976 % of the time a TLB miss will occur. So I have absolutely no idea why you think “Most TT probes can never be a TLB miss”!

As I noted earlier, as page size increase so does the entry size in the TLB. This yields fewer large pages entries in the TLB and even fewer huge pages entries. So while large and huge pages do help, they don't help as much as you might think. In fact, I have read several research papers that claim for general purposes large and huge pages can actually hurt performance.

Again non-realistic numbers. Ppl normally use values such as 256MB, 512MB, 1GB for TT not 16GB. Most engines would not even work after setting 16GB as TT.
Second large pages are usually 4MB. At worst number of TLB entries will reduce by a factor of two since it the entry size to the TLB would increase for factor of 2 (in reality it is increase from 12 to 22 bits so a factor of 1.83) and total number of pages will reduce for a factor of 1000.
Those articles you've read are probably tackling the problem of memory fragmentation in case of large pages usage, but that's a completely different issue.
So in your example if we use TT of 1GB and large pages we will need only 256 entries in the TLB and TLB could hold 512 entries so TLB miss would practically never happen.

Posted: **Mon May 08, 2017 10:04 pm**

Milos wrote:Your example is meaningless because it doesn't reflect physical reality. You can't run DDR4 at 2666 with 16-18-18-35 2T and in the same time at 2400 with 10-12-12-28. You can't magically cut down latency for 33% while reducing clock speed for 10%. It doesn't work that way.

https://www.newegg.com/Product/Product. ... 6820233998

So it should be clear that you can buy and run DDR4 2400 at 10-12-12-28 which has much lower latency than you average DDR4. Some of these dimms will run will run at 2666, with slightly slower timmings. I know because I've done it. In this particular case the CAS latency when running at 2400 is 8.33ns. When running at 2666 at CL 12 the latency is ~10.00 ns. Of course you can run the CL at 16 (as in the example I gave) in which case the latency would be ~12.00 ns. The dimms I tested ran fine at 2666 12-14-14-31. Of course, I could have run them at 2666 16-18-18-35 just as easily. So your “reality” and mine don't seem to correspond very well.

Milos wrote:Total latency or total delay is a product of clock period and number of latency cycles and this total delay is physical property of process node that you use to fabricate the memory. Memory controller on the DIMM is usually very well tuned to actual dies used on the DIMM that no matter what frequency (<= than maximum frequency) you run it at you get more less the same total delay.
Here is a little explanation:
http://www.crucial.com/usa/en/memory-pe ... ed-latency

I don't think this changes anything. It's a paper written for people who don't know anything and aren't likely to go looking on their own much less experimenting. i.e. they will accept whatever they're told regardless of evidence to the contrary.

Regards,

Forrest

Posted: **Mon May 08, 2017 10:26 pm**

Milos wrote:Again non-realistic numbers. Ppl normally use values such as 256MB, 512MB, 1GB for TT not 16GB. Most engines would not even work after setting 16GB as TT.

I'm wondering where you're getting this information. I don't think your opinion of what “most people” do is accurate or relevant. And I don't use most engines. I use Stockfish, Komodo, Houdini etc. All of these program can use 16 GB TT's or larger.

Milos wrote:Second large pages are usually 4MB. At worst number of TLB entries will reduce by a factor of two since it the entry size to the TLB would increase for factor of 2 (in reality it is increase from 12 to 22 bits so a factor of 1.83) and total number of pages will reduce for a factor of 1000.
Those articles you've read are probably tackling the problem of memory fragmentation in case of large pages usage, but that's a completely different issue. So in your example if we use TT of 1GB and large pages we will need only 256 entries in the TLB and TLB could hold 512 entries so TLB miss would practically never happen.

Fragmentation is an issue if the system runs 24/7 for months at a time. Which mine do and I can't remember the last time I used a TT as small as 1 GB.

Regards,

Zen

Posted: **Mon May 08, 2017 11:22 pm**

[quote="Zenmastur"]
[quote="Milos"]Your example is meaningless because it doesn't reflect physical reality. You can't run DDR4 at 2666 with 16-18-18-35 2T and in the same time at 2400 with 10-12-12-28. You can't magically cut down latency for 33% while reducing clock speed for 10%. It doesn't work that way. [/quote]

https://www.newegg.com/Product/Product. ... 6820233998

So it should be clear that you can buy and run DDR4 2400 at 10-12-12-28 which has much lower latency than you average DDR4. Some of these dimms will run will run at 2666, with slightly slower timmings. I know because I've done it. In this particular case the CAS latency when running at 2400 is 8.33ns. When running at 2666 at CL 12 the latency is ~10.00 ns. Of course you can run the CL at 16 (as in the example I gave) in which case the latency would be ~12.00 ns. The dimms I tested ran fine at 2666 12-14-14-31. Of course, I could have run them at 2666 16-18-18-35 just as easily. So your “reality” and mine don't seem to correspond very well.

[quote="Milos"]
Total latency or total delay is a product of clock period and number of latency cycles and this total delay is physical property of process node that you use to fabricate the memory. Memory controller on the DIMM is usually very well tuned to actual dies used on the DIMM that no matter what frequency (<= than maximum frequency) you run it at you get more less the same total delay.
Here is a little explanation:
http://www.crucial.com/usa/en/memory-pe ... ed-latency[/quote]

I don't think this changes anything. It's a paper written for people who don't know anything and aren't likely to go looking on their own much less experimenting. i.e. they will accept whatever they're told regardless of evidence to the contrary.

Regards,

Forrest
[/quote]

A question to you both:
In your opinion how many percent of CPU power should be yielded only from the tuning of RAM.
Chess engines are not very sensitive to the speed of RAM as I experimented.
A note:
In the case of very complicated systems like PCs sometimes the practice
contradicts the theory.

Posted: **Tue May 09, 2017 12:42 am**

corres wrote:A question to you both:
In your opinion how many percent of CPU power should be yielded only from the tuning of RAM.
Chess engines are not very sensitive to the speed of RAM as I experimented.
A note:
In the case of very complicated systems like PCs sometimes the practice
contradicts the theory.

I'm not sure you gave enough specifics that this question can be answered with much clarity. e.g. are you talking about the difference between if I buy a computer off the shelf, like a Dell, vice building one from scratch where I get to specify every component, or, are you talking about taking an existing PC and tweaking the ram timings only? Or taking an exiting PC and upgrading the ram to maximize performance given a particular CPU running at it's maximum speed?

Each of these will have completely different answers. The worst case scenario would be buying a name brand PC and then trying to make it run fast. Good luck with that!

The best case is when you're building a new PC and can specify every component.

In the end it depends on your applications. Many applications aren't sensitive to how fast the CPU or ram is. Some are very sensitive to both. Most chess engines seem to be CPU bound with moderate dependencies on ram speed. Since chess engines access to the TT or any other caching structure is pseudo-random, latency is likely the biggest problem. Therefore ram with the lowest latency possible will likely produce greater results than ram with the highest bandwidth assuming the two are mutually exclusive. In real life higher bandwidth ram will generally have lower latency than low bandwidth ram, but not always. So, If I had to choose between the two, I would choose low latency ram vice high bandwidth ram. But it's probably slightly better to have moderately high bandwidth ram with very low latencies. If you're building a system and money is an issue then it's not clear what would be best unless all quantities are known. i.e. how much more money does it take to get lower latencies vice higher bandwidth etc.

Another factor to consider is ram latencies change as more ram is added to the system due to increased capacitance on the bus. Servers avoid this problem by inserting other electronics between the bus and the ram but this has it's draw backs. So if you only need 8 GB of ram it's completely different than if you need 256 GB.

I could go on but with out specifics it seems pointless.

Regards,

Forrest

Posted: **Tue May 09, 2017 9:47 am**

[quote="Zenmastur"]

[quote="corres"]A question to you both:
In your opinion how many percent of CPU power should be yielded only from the tuning of RAM.
Chess engines are not very sensitive to the speed of RAM as I experimented.
A note:
In the case of very complicated systems like PCs sometimes the practice
contradicts the theory.
[/quote]

I'm not sure you gave enough specifics that this question can be answered with much clarity. e.g. are you talking about the difference between if I buy a computer off the shelf, like a Dell, vice building one from scratch where I get to specify every component, or, are you talking about taking an existing PC and tweaking the ram timings only? Or taking an exiting PC and upgrading the ram to maximize performance given a particular CPU running at it's maximum speed?

Each of these will have completely different answers. The worst case scenario would be buying a name brand PC and then trying to make it run fast. Good luck with that!

The best case is when you're building a new PC and can specify every component.

In the end it depends on your applications. Many applications aren't sensitive to how fast the CPU or ram is. Some are very sensitive to both. Most chess engines seem to be CPU bound with moderate dependencies on ram speed. Since chess engines access to the TT or any other caching structure is pseudo-random, latency is likely the biggest problem. Therefore ram with the lowest latency possible will likely produce greater results than ram with the highest bandwidth assuming the two are mutually exclusive. In real life higher bandwidth ram will generally have lower latency than low bandwidth ram, but not always. So, If I had to choose between the two, I would choose low latency ram vice high bandwidth ram. But it's probably slightly better to have moderately high bandwidth ram with very low latencies. If you're building a system and money is an issue then it's not clear what would be best unless all quantities are known. i.e. how much more money does it take to get lower latencies vice higher bandwidth etc.

Another factor to consider is ram latencies change as more ram is added to the system due to increased capacitance on the bus. Servers avoid this problem by inserting other electronics between the bus and the ram but this has it's draw backs. So if you only need 8 GB of ram it's completely different than if you need 256 GB.

I could go on but with out specifics it seems pointless.

Regards,

Forrest
[/quote]

Thanks for the detailed answer.
Based on the lot of tests what can be read about this subject I think that:
1, Behavior of a given PC is decisively determined by the CPU and the motherboard used and not the RAM.
2, In the case of very memory sensitive tasks the effect of RAM parameters on the power of a given PC is not more than 10 %.
3, Chess programs are more sensitive to the parameters of CPU and CPU cashes than the parameters of RAM. The only one parameter of RAMs
what important to a chess program user is its bigness in GB.

Best regards

Robert

Posted: **Tue May 09, 2017 10:11 am**

[quote="shrapnel"]
And you should "see" properly what I've written.
I'm building a SECOND System, more out of curiosity really.
My PRIMARY System will always be INTEL !
[/quote]

I eagerly wait for your test results.
Except the results like i7 5960x produces at 3500(!) MHz a Fritzmark of 60...

TalkChess.com

RAM speed and engine strength

Re: After re-reading my previous post...

Re: RAM speed and engine strength

Re: After re-reading my previous post...

Re: RAM speed and engine strength

Re: After re-reading my previous post...

Re: RAM speed and engine strength

Re: After re-reading my previous post...

Re: After re-reading my previous post...

Re: After re-reading my previous post...

Re: RAM speed and engine strength