Cache misses didn't used to be so expensive e.g. 10 years ago. The problem is that CPUs have been getting faster and getting better at doing lots of calculations simultaneously, and the performance of DRAM has not kept up. Yes we have gigabytes of it now (yay!) but the downside is that it can take many many CPU cycles--like hundreds--to request data from DRAM and wait for it to arrive. There's a bunch of controllers and address decoders and refresh logic and all kinds of crazy stuff. Unless someone invents a new miracle storage technology that is both very dense and very fast to access, this situation is not going to improve much anytime soon.sedicla wrote:In my engine I tried to reduce computing operations by using tables. Now I'm not sure if this was a good idea after all.
I think I'll leave this way for now. Maybe someday I'll review or write a new engine with fewer tables and compare. As long it is fun, why not.
I also expect that newer cpu's will take care of this.
At the other end of the spectrum is SRAM, which is fast but takes up much more chip space than DRAM. So the smallest, fastest level of cache (L1) is made of that stuff. A typical size for that might be 64 KB, divided into cache lines of 64 bytes, and data thats already in the L1 cache can be accessed in just a few cycles. Then there's typically one or two larger caches that are off-chip and take the CPU a little longer to access, but still much faster than DRAM (L2 or L3 accesses probably take a few tens of cycles). Those caches might be a few MB in size.
I guess the bottom line is: Try to avoid doing random/unpredictable memory accesses, because they will probably cause cache misses and be slooooooow as molasses. (Or if you know the address you need to access in advance, you can do a prefetch and then calculate something else while its fetching the data.) Very small lookup tables are no problem, and medium-sized ones (a few dozen KB) are usually no problem because if you're using them heavily they will mostly stay cached. Very predictable accesses (like touching each element of an array in order, from first to last) are usually efficient because the CPU detects the pattern of accesses and starts automatically prefetching data.
Random accesses to large tables (e.g. transposition table) are practically always going to be hella slow, unless you just accessed that same entry recently.