Code: Select all
__builtin_prefetch(void *addr,...)2/ What kind of speed increase should I expect from prefetching. I've done it for the TT and Pawn Cache entries, and only measured a tiny +0.4% speed increase
Moderator: Ras
Code: Select all
__builtin_prefetch(void *addr,...)
You can't prefetch anything OTHER than 64 byte cache lines, unless you are on hardware that uses a different cache line/block size...lucasart wrote:1/ When I call the GCC intrinsicand addr is 64-byte aligned, will a 64-byte cache line be prefetched from that address ? I absolutely need 64-bytes (from addr to addr+63 included)Code: Select all
__builtin_prefetch(void *addr,...)
2/ What kind of speed increase should I expect from prefetching. I've done it for the TT and Pawn Cache entries, and only measured a tiny +0.4% speed increase
Thank you. That confirms my intuition. I just couldn't find an answer in the GCC doc, so I just assumed it would do that. The thing that confused me is Stockfish doesbob wrote:You can't prefetch anything OTHER than 64 byte cache lines, unless you are on hardware that uses a different cache line/block size...lucasart wrote:1/ When I call the GCC intrinsicand addr is 64-byte aligned, will a 64-byte cache line be prefetched from that address ? I absolutely need 64-bytes (from addr to addr+63 included)Code: Select all
__builtin_prefetch(void *addr,...)
2/ What kind of speed increase should I expect from prefetching. I've done it for the TT and Pawn Cache entries, and only measured a tiny +0.4% speed increase
Code: Select all
__builtin_prefetch(addr)
__builtin_prefetch(addr+64)The problem I have it's that it's very hard to measure. Prefetching TT and Pawn Cache entries gains 0.4% speed in total. But I can't isolate the prefetch effect (and if I tried the result wqould be meaningless, as it depends on how much stuff is being done before prefetch and usage of the cache line).bob wrote: Speed increases are difficult to predict. When you prefetch A, you replace B. If you need B before you need A, you get yet another cache line fill and hurt performance. If you don't need A, but prefetch it anyway, you replaced something you might need, and burned some memory bandwidth unnecessarily. It can be a mixed bag unless done carefully.
Code: Select all
TTable::Entry *p = new TTable::Entry[count];You cannot assume. In practice it's often, but not always the case (in my experience, particularly with that other operating system).lucasart wrote: Can I assume (void *)p to be divisible by 64 ? Is this specified by the C++ standard, or compiler specific ? If the latter, is there a portable way to make sure ? 64-byte alignment is crucial here, not doing it defeats the purpose of prefetching in the first place.
Yes, that would work (and can be done the same way with new and delete, but there's an awful lot of pointer casting that is verbose and ugly in C++, but still possible). Possible, but horrible... I'll have a look at the pragma option.Evert wrote:You cannot assume. In practice it's often, but not always the case (in my experience, particularly with that other operating system).lucasart wrote: Can I assume (void *)p to be divisible by 64 ? Is this specified by the C++ standard, or compiler specific ? If the latter, is there a portable way to make sure ? 64-byte alignment is crucial here, not doing it defeats the purpose of prefetching in the first place.
The only portable way that I know of (and this is C rather than C++) is to allocate the memory, with a bit extra, then test whether the pointer is aligned properly. If it isn't, increment the pointer to the point where it is aligned (this is why you allocate extra memory).
As I said, this works with C malloc'ed memory. I don't think you can (should) do the same with memory allocated with C++'s new.
Having said that, there are probably compiler pragma's or platform specific functions that do it for you.