Don't do any local char stuff in Crafty. But I do have a few global arrays of type char...Gerd Isenberg wrote:Homogeneous char arrays in conjunction with zero- or sign-extension (movzx, movsx) are fine with respect to the smaller size and possible L1 savings. A 64-byte versus 256-byte board array is one 64-byte cacheline versus four (if cachline aligned). Since the board array is heavily used, all the four cachelines of an int-array are most often fetched from L1, while three others need to been thrown out, which may otherwise reside inside the cache with the byte-array.Codeman wrote:According to Pete Isensee's paper (http://www.tantalon.com/pete/cppopt/app ... ativeCosts) about optimization tricks, doing various operations with int variables is a lot faster than with bytes.
So the marginally slower zero- or sign-extension to load chars to a native 32-bit register likely pays off in less cache misses elsewhere. If and by how much depends on your memory and cache footprint and may vary from program to program.
For the stack, that is locals and parameter (most are passed via registers on x64 anyway) and very often used scalar globals, better stay with int and native register width, also due to alignment, load store forewarding issues, and partial register stalls.
the previously mentioned 64 element board array, plus a few other small arrays. I changed them all to ints and see this kind of speed difference. First is original, second is with chars replaced by ints.:
log.001: time=37.90 mat=-1 n=105649503 fh=91% nps=2.8M
log.002: time=38.09 mat=-1 n=105649503 fh=91% nps=2.8M
Results were similar on several tests. Very small difference, but measurable and repeatable. I will note that this was run on my core2 with 4mb of L2. The difference is a little more significant on my office PIV with 512K of L2.