An AMD compiling hunch

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

schack
Posts: 172
Joined: Thu May 27, 2010 3:32 am

Re: An AMD compiling hunch

Post by schack »

Right, that much I knew. I'm asking because I'm getting what appears to me to be poor performance with my msys2 / gcc 9.2.0 compiles of SF dev -modern on my new 3700x. The abrok compiles are better by 4 or 5% and there are strange things going on with the way that the core frequencies are operating.

From another post:

I was looking at per-core performance in Ryzen Master while running Stockfish, a multi-threaded chess engine. Two of the cores - c07 and c08 - runs at a markedly slower speed than the others when I run SF on 8 threads. Is this normal? (Screenshot attached.) My expectation was that it would run at the same speed on all cores.

Screenshot: https://preview.redd.it/00kxj9wl5c441.p ... 9a069cbd4f

At 12 threads: C01-06 are at roughly 4100 Ghz, while C07 is at 3700 and C08 is at 1235.

At 16 threads: all are at 4011 Ghz.

CPU is at stock. RAM is at 2666, the XMP setting.
Joost Buijs
Posts: 1564
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: An AMD compiling hunch

Post by Joost Buijs »

schack wrote: Fri Dec 13, 2019 4:07 pm Right, that much I knew. I'm asking because I'm getting what appears to me to be poor performance with my msys2 / gcc 9.2.0 compiles of SF dev -modern on my new 3700x. The abrok compiles are better by 4 or 5% and there are strange things going on with the way that the core frequencies are operating.

From another post:

I was looking at per-core performance in Ryzen Master while running Stockfish, a multi-threaded chess engine. Two of the cores - c07 and c08 - runs at a markedly slower speed than the others when I run SF on 8 threads. Is this normal? (Screenshot attached.) My expectation was that it would run at the same speed on all cores.

Screenshot: https://preview.redd.it/00kxj9wl5c441.p ... 9a069cbd4f

At 12 threads: C01-06 are at roughly 4100 Ghz, while C07 is at 3700 and C08 is at 1235.

At 16 threads: all are at 4011 Ghz.

CPU is at stock. RAM is at 2666, the XMP setting.
You're right, I would expect too that when you run Stockfish at 8 threads on a 8 core CPU it would utilize all cores, but I get the impression that Stockfish only uses 6 instead of 8 cores, the performance is way too low, on my 5 year old old i7-5960X @ 3.8 Ghz. I get like 18-19 mnps.

Maybe the GUI initializes Stockfish with the wrong number of threads? Otherwise there really must be something weird going on with these new Zen2 CPU's.
Sesse
Posts: 300
Joined: Mon Apr 30, 2018 11:51 pm

Re: An AMD compiling hunch

Post by Sesse »

DustyMonkey wrote: Thu Dec 12, 2019 5:54 am Are we also going to call instructions like XLAT "phony baloney" also because it perform poorly (even on Intel?)
XLAT is pretty pony baloney! It's a relic of the 8086 days that wasn't really a good idea even back then.
The idea that GCC doesnt have a say is wrong. The goal of the compiler should be to produce the fastest binary given the information it has.
OK, so what would you have GCC do with -march=native in this case? Stockfish asks whether the target supports BMI2, and if GCC says yes, Stockfish forces the use of PEXT via an intrinsic. So should it lie, since some (not all) instructions in the implementation are slow? Rewrite the PEXT intrinsic to generating magic tables by itself? Recognize Stockfish' code via pattern matching?
If the information spawns from the "native" switch, then it should be doing more than just asking what instruction sets are supported
What is “it” here? Stockfish?

In general, I would assume the best choice for AMD CPUs is to enable BMI2 but disable -DHAS_PEXT.