Right, that much I knew. I'm asking because I'm getting what appears to me to be poor performance with my msys2 / gcc 9.2.0 compiles of SF dev -modern on my new 3700x. The abrok compiles are better by 4 or 5% and there are strange things going on with the way that the core frequencies are operating.
From another post:
I was looking at per-core performance in Ryzen Master while running Stockfish, a multi-threaded chess engine. Two of the cores - c07 and c08 - runs at a markedly slower speed than the others when I run SF on 8 threads. Is this normal? (Screenshot attached.) My expectation was that it would run at the same speed on all cores.
Screenshot: https://preview.redd.it/00kxj9wl5c441.p ... 9a069cbd4f
At 12 threads: C01-06 are at roughly 4100 Ghz, while C07 is at 3700 and C08 is at 1235.
At 16 threads: all are at 4011 Ghz.
CPU is at stock. RAM is at 2666, the XMP setting.
An AMD compiling hunch
Moderators: hgm, Rebel, chrisw
-
- Posts: 1564
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: An AMD compiling hunch
You're right, I would expect too that when you run Stockfish at 8 threads on a 8 core CPU it would utilize all cores, but I get the impression that Stockfish only uses 6 instead of 8 cores, the performance is way too low, on my 5 year old old i7-5960X @ 3.8 Ghz. I get like 18-19 mnps.schack wrote: ↑Fri Dec 13, 2019 4:07 pm Right, that much I knew. I'm asking because I'm getting what appears to me to be poor performance with my msys2 / gcc 9.2.0 compiles of SF dev -modern on my new 3700x. The abrok compiles are better by 4 or 5% and there are strange things going on with the way that the core frequencies are operating.
From another post:
I was looking at per-core performance in Ryzen Master while running Stockfish, a multi-threaded chess engine. Two of the cores - c07 and c08 - runs at a markedly slower speed than the others when I run SF on 8 threads. Is this normal? (Screenshot attached.) My expectation was that it would run at the same speed on all cores.
Screenshot: https://preview.redd.it/00kxj9wl5c441.p ... 9a069cbd4f
At 12 threads: C01-06 are at roughly 4100 Ghz, while C07 is at 3700 and C08 is at 1235.
At 16 threads: all are at 4011 Ghz.
CPU is at stock. RAM is at 2666, the XMP setting.
Maybe the GUI initializes Stockfish with the wrong number of threads? Otherwise there really must be something weird going on with these new Zen2 CPU's.
-
- Posts: 300
- Joined: Mon Apr 30, 2018 11:51 pm
Re: An AMD compiling hunch
XLAT is pretty pony baloney! It's a relic of the 8086 days that wasn't really a good idea even back then.DustyMonkey wrote: ↑Thu Dec 12, 2019 5:54 am Are we also going to call instructions like XLAT "phony baloney" also because it perform poorly (even on Intel?)
OK, so what would you have GCC do with -march=native in this case? Stockfish asks whether the target supports BMI2, and if GCC says yes, Stockfish forces the use of PEXT via an intrinsic. So should it lie, since some (not all) instructions in the implementation are slow? Rewrite the PEXT intrinsic to generating magic tables by itself? Recognize Stockfish' code via pattern matching?The idea that GCC doesnt have a say is wrong. The goal of the compiler should be to produce the fastest binary given the information it has.
What is “it” here? Stockfish?If the information spawns from the "native" switch, then it should be doing more than just asking what instruction sets are supported
In general, I would assume the best choice for AMD CPUs is to enable BMI2 but disable -DHAS_PEXT.