Re: Removing Large Arrays
Posted: Thu Mar 12, 2020 10:00 pm
Fighting against patches like this in stockfish is long lost and pointless. Waste of time. Just a bit sad that noob's resources are wasted on this.
Why is it irrelevant?Dann Corbit wrote: ↑Wed Mar 11, 2020 11:13 pm The new code is an abomination {to me, which means little}.
Stockfish is so crammed with magic numbers that it should be a Disney movie with wands and sorcerer's apprentices and bippity-boppity-boos.
My only point was that a 20 line program to compare the two methods is not valid. You have to profile both versions.
I do not defend the new code, and my personal preference that there should be meaningful constants or comments is also irrelevant.
Only thing I disagree with is "- ... to strive and keep every file under 200 lines of code if at all possible"mvanthoor wrote: ↑Thu Mar 12, 2020 10:14 pmWhy is it irrelevant?Dann Corbit wrote: ↑Wed Mar 11, 2020 11:13 pm The new code is an abomination {to me, which means little}.
Stockfish is so crammed with magic numbers that it should be a Disney movie with wands and sorcerer's apprentices and bippity-boppity-boos.
My only point was that a 20 line program to compare the two methods is not valid. You have to profile both versions.
I do not defend the new code, and my personal preference that there should be meaningful constants or comments is also irrelevant.
It's a long standing pet peeve of mine. Everywhere I've worked, and in many open source engines where I've had a glance through the code, I've been seeing "magic numbers" and "magic strings" floating around; not to mention functions of 200-300 lines long.
I refuse to do it in my own engine. I want...
- ... every magic number removed and put into a constant.
- ... variable names to be consistent (as much as possible)
- ... functions to be as short as possible, at least with regard to "active" code (i.e. code that is not declaring variables)
- ... to strive and keep every file under 200 lines of code if at all possible
- ... every function to be commented
and so on.
My engine will be open source at some point, and I'll be rejecting any patch that doesn't follow Rust's and my code standards. Not saying my coding standards are the best in the world (they probably aren't), but I like to think I'm writing software that is understandable; at least to people who have some experience in programming. And, I'd like to have a program I can actually understand without having to calculate my way through every function, even if I happen to not look at it for two years at some point and then pick it up again.
NOTHING goes "ram -> register". If you start a pre-fetch, and then try to access the data before it makes it into cache, the processor treats this just like the normal reference that resulted in a cache miss. Only difference is that THIS reference has been partially loaded into cache already so you gain some time, but less than what you could gain if you prefetched early enough.D Sceviour wrote: ↑Wed Mar 11, 2020 7:55 pmIt should be possible to emulate the internal processor timers. In the simplified code, the cache demand is singular so there is no interference from other threads. I get the same ratio results if threads are run in the background to overload the memory and processors. The system does not recognize "Stockfish"; it recognizes thread calls.Dann Corbit wrote: ↑Wed Mar 11, 2020 7:04 pm Without demonstration inside the stockfish code, using a profiler to see where the time is really going, the trivial tests are meaningless.
When there are a hundred arrays and large data objects all competing for the cache, the code will behave very differently from when things are all sitting nicely in the cache.
It will also vary a lot from machine to machine. My 3970x has a very large cache, so many arrays will not be a bother. But on a machine with a small cache, it can be a big problem.
A fetch from main memory is very expensive, a fetch from the outer cache is far less expensive, a fetch from the inner cache is almost free. So accessing a table is fast or slow and the reason why is "It depends."
There is of course the prefetch() command to demand cache loading. I use it because it makes things go faster, but have never been clear on how it works. For example, how does it move the memory?
ram -> L3 -> L2 -> L1 -> register
And what happens if the loop calls for the information before the memory has been loaded into the cache? Does it wait in queue on the main bus transfer or does the system just jump to:
ram -> register
Or does all I/O go through one of the caches? It depends.
Some Xeons allow devices to DMA into the L3 cache (Intel DDIO).No I/O goes through caches. The I/O hardware has no access to cache, it is purely device -> memory and the reverse.