Different performance of equal executables

Discussion of chess software programming and technical issues.

Moderator: Ras

syzygy
Posts: 5730
Joined: Tue Feb 28, 2012 11:56 pm

Re: Different performance of equal executables

Post by syzygy »

OliverBr wrote: Sun Oct 25, 2020 8:44 pm
syzygy wrote: Sun Oct 25, 2020 1:26 pm Type the following as root:
# echo 1 > /proc/sys/vm/drop_caches
Yes, this solved it. Afterwards "bin/olithink589 performs exactly equal as "./olithink589" with 2220 knps.

Now this should explain the issue: What cache went probably wrong and how can this be avoided in the future?
Nothing went really wrong. One of the copies just happened to be laid out in physical memory in a way that decreased caching performance.

When you execute a binary, it is mmap()ed into virtual memory (without actually being loaded) and its pages are then paged in on demand. When a page is paged in, it ends up in the page cache. The next time you execute the binary, its pages are mapped from the page cache into the address space of the process. So the way its pages are spread out in physical memory does not change until the system for some reason release those pages from the page cache (or you force it to by dropping the caches).

Probably N+1 important pages of the binary were placed in memory pages that were some multiple of 2^K apart for some K so that the N-way associativity of the cpu cache was overloaded.
OliverBr
Posts: 852
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: Different performance of equal executables

Post by OliverBr »

syzygy wrote: Sun Oct 25, 2020 9:03 pm Nothing went really wrong. One of the copies just happened to be laid out in physical memory in a way that decreased caching performance.

When you execute a binary, it is mmap()ed into virtual memory (without actually being loaded) and its pages are then paged in on demand. When a page is paged in, it ends up in the page cache. The next time you execute the binary, its pages are mapped from the page cache into the address space of the process. So the way its pages are spread out in physical memory does not change until the system for some reason release those pages from the page cache (or you force it to by dropping the caches).

Probably N+1 important pages of the binary were placed in memory pages that were some multiple of 2^K apart for some K so that the N-way associativity of the cpu cache was overloaded.
Thank you for the elaboration. While being a stronger effect than the alignment issue (see http://talkchess.com/forum3/viewtopic.php?f=7&t=75308), at least it's solvable by purging the cache.
OliThink GitHub: https://github.com/olithink
Nice arcticle about OlIThink: https://www.chessengeria.eu/post/olithink-oldie-goldie
Chess Engine OliThink Homepage: http://brausch.org/home/chess