Code: Select all
#define RATT1(f) rays[(f << 6) | key000(BOARD, f)]
#define RATT2(f) rays[(f << 6) | key090(BOARD, f) | 0x1000]
#define BATT3(f) rays[(f << 6) | key045(BOARD, f) | 0x2000]
#define BATT4(f) rays[(f << 6) | key135(BOARD, f) | 0x3000]
In very modern CPUs the first Level Cache may have place for all the data, but it is not guaranteed.
So, in order to squeeze the array into a more manageable size like u64 rays[0x800] = 16 Kbyte there has to be additional operations and breaking of symmetry:
Code: Select all
#define RATT1(f) rays[((f&7) << 6) | key000(BOARD, f)] & rmask0[f]
#define RATT2(f) (rays[((f>>3) << 6) | key090(BOARD, f) | 0x200]) << (f&7)
#define BATT3(f) rays[((f&7) << 6) | key045(BOARD, f)] & bmask45[f]
#define BATT4(f) rays[((f&7) << 6) | key135(BOARD, f)] & bmask135[f]
Now what is here the best way to go?
Maintain readability and symmetry?
Or catch those few ELO points, and perform better on tourney with lesser CPUs?
Another observation: Splitting an array into smaller arrays seems to change the performance. Is this a known feature of CPU Cache?