I haven't checked md5 but a reboot cures the problem so I don't believe in problems of the disk or the file.Deberger wrote: ↑Sun Mar 08, 2020 9:54 pm Apparently I misunderstood the error you are reporting.
A copy of a binary, executed on the same machine, executed in the same way, has consistently differing results?
I would compare the file sizes and file ownerships and file permissions and md5sums.
If everything is the same I would backup any valuable data and check the file system with fsck.
Strange sporadic speed limitation in engine running in Linux on Ryzen
Moderators: hgm, Rebel, chrisw
-
- Posts: 625
- Joined: Fri Mar 30, 2018 7:20 am
- Full name: Andreas Matthies
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
-
- Posts: 625
- Joined: Fri Mar 30, 2018 7:20 am
- Full name: Andreas Matthies
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
I looked at top interactively running in a second terminal once and I couldn't see "anything special".bob wrote: ↑Sun Mar 08, 2020 9:41 pm Very first thing. Run "top" and keep it active. See if, during the match, the cpu utilization jumps up due to something in your linux distro. I had this happen to me years ago in Suse, which I always considered to be overloaded/bloated anyway. If you don't want to watch top, you might try this:
#!/bin/csh
while (1)
date >>logfile
ps -r | head 10 >>log file
sleep 60
end
When you find a strange slowdown, look in the log file for that time-frame and see if something unexpected is going on.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
Don't know about your processor, but I have had to ALWAYS turn off "turbo-boost" on intel processors. This lets all cores run at the max rated (non-boosted) clock speed so that processors won't change speed during a test. Every high-performance machine we run at UAB has had this disabled.
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
Intel actually has two speed modes: EIST (Speedstep) and Turbo Boost.bob wrote: ↑Tue Mar 10, 2020 6:57 pm Don't know about your processor, but I have had to ALWAYS turn off "turbo-boost" on intel processors. This lets all cores run at the max rated (non-boosted) clock speed so that processors won't change speed during a test. Every high-performance machine we run at UAB has had this disabled.
On my CPU, Speedstep changes the multiplier from 8 to 40, which makes the CPU run at 800 MHz up to 4000 GHz. If I disable this, the CPU always runs at 4 GHz.
Turbo Boost, by default, boosts a single core to 4.2 GHz, if the other cores are lower than a certain load.
My mainboard has several options for handling this kind of speed change:
- Off: Never boost anything.
- Default: Boost only one core if there's one thread that requires a lot of performance, and the other cores are only lightly loaded.
- Multi: Boost 1-4 cores up to 4.2 GHz if needed.
When disabling Speedstep and enabling Multi for Turbo Boost, the CPU will run all of its cores at 4.2 GHz.
Even if you disable Turbo Boost, it's not a guarantee that an Intel CPU will never change speeds if you keep EIST/Speedstep enabled.
Personally (under Windows) I never had a problem with this. The mainboard is set to have EIST/Speedstep enabled, and Turbo Boost is set to Multi. If I run a chess engine at 1-4 threads, then 1-4 cores will boost to 4.2 GHz. If I don't run anything, the CPU runs at 800 MHz. to be honest, I've never tested this under Linux, because my Linux usage has always been for embedded systems.
-
- Posts: 550
- Joined: Tue Nov 19, 2019 8:48 pm
- Full name: Alayan Feh
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
Variable turbo speed means more noise in the results. The increased throughput of turbo isn't worth it compared to a high fixed clock when it comes to engine testing.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
Two things here.
(1) if I am testing, I am going to do my best to make the entire test run at the same constant CPU speed. No turbo/speedstep whatsoever.
(2) in a game, I would use whatever provides the best overall average speed. With a chess engine, running multiple cores on a single chip, most likely it is going to settle in to no turbo/speedstep anyway, since all cores will be 100% busy, keeping temps high. But they can and will fiddle up and down, which makes testing include a little random noise.
The biggest problem I have had is trying to measure parallel speedup. Run using 1 core, then 2, then 4... As you ramp up above one, you start to see speed degradation as core speeds are throttled back. Without your knowing it. So maybe your raw NPS scales 360% at 4 cores and you start trying to debug to see what locks or cache invalidation traffic is causing the problem. Answer can be "none of the above".
So this is really much more important when testing speeds and efficiency. Less important when running lots of games in parallel, knowing there is some random noise tossed in. Unimportant if all you do is play games, one at a time, and want optimal performance with lots of random noise included.
(1) if I am testing, I am going to do my best to make the entire test run at the same constant CPU speed. No turbo/speedstep whatsoever.
(2) in a game, I would use whatever provides the best overall average speed. With a chess engine, running multiple cores on a single chip, most likely it is going to settle in to no turbo/speedstep anyway, since all cores will be 100% busy, keeping temps high. But they can and will fiddle up and down, which makes testing include a little random noise.
The biggest problem I have had is trying to measure parallel speedup. Run using 1 core, then 2, then 4... As you ramp up above one, you start to see speed degradation as core speeds are throttled back. Without your knowing it. So maybe your raw NPS scales 360% at 4 cores and you start trying to debug to see what locks or cache invalidation traffic is causing the problem. Answer can be "none of the above".
So this is really much more important when testing speeds and efficiency. Less important when running lots of games in parallel, knowing there is some random noise tossed in. Unimportant if all you do is play games, one at a time, and want optimal performance with lots of random noise included.
-
- Posts: 5647
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
Does the problem go away after "echo 3 > /proc/sys/vm/drop_caches" as root?RubiChess wrote: ↑Sat Mar 07, 2020 5:26 pm It isn't a bad compilation cause if I copy the slow binary and run the copy, speed is fine. So it seems that "something in the Linux system" slows down the binary. I tried "lsof" to see if there is a process running with a handle on the slow binary but without success.
Last time it happened, a reboot cured the slowness. This time the system is still running and waiting for your ideas to analyse the problem...
-
- Posts: 625
- Joined: Fri Mar 30, 2018 7:20 am
- Full name: Andreas Matthies
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
Thanks for helping.syzygy wrote: ↑Thu Mar 12, 2020 9:04 pmDoes the problem go away after "echo 3 > /proc/sys/vm/drop_caches" as root?RubiChess wrote: ↑Sat Mar 07, 2020 5:26 pm It isn't a bad compilation cause if I copy the slow binary and run the copy, speed is fine. So it seems that "something in the Linux system" slows down the binary. I tried "lsof" to see if there is a process running with a handle on the slow binary but without success.
Last time it happened, a reboot cured the slowness. This time the system is still running and waiting for your ideas to analyse the problem...
I just had another test with a slow binary and indeed this "echo 3 > /proc/sys/vm/drop_caches" seems to (almost) cure the problem:
Code: Select all
$ ./5e75296cceb27470d0e986e8d886688a78887064 -bench -depth 23 > /dev/null
Overall: 30.567137 sec. 62572010 nodes 2047035 nps
$ cp 5e75296cceb27470d0e986e8d886688a78887064 5e-copy
$ ./5e-copy -bench -depth 23 > /dev/null
Overall: 29.015491 sec. 62572010 nodes 2156503 nps
$ ./5e75296cceb27470d0e986e8d886688a78887064 -bench -depth 23 > /dev/null
Overall: 30.556252 sec. 62572010 nodes 2047764 nps
$ echo 3 > /proc/sys/vm/drop_caches
$ ./5e75296cceb27470d0e986e8d886688a78887064 -bench -depth 23 > /dev/null
Overall: 28.989655 sec. 62572010 nodes 2158425 nps
$ ./5e-copy -bench -depth 23 > /dev/null
Overall: 28.822634 sec. 62572010 nodes 2170933 nps
So the next question to the Linux expert will be: Is there any global setting that prevents this caching causing biased binary speed?
-
- Posts: 5647
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
My explanation for what you are seeing is that the binary is sometimes loaded into memory in a way that gives poor L1 caching behaviour.
This may happen when too many important memory cache lines are at a distance that is a multiple of some power of 2, so that they are all mapped to the same few L1 cache lines.
-
- Posts: 300
- Joined: Mon Apr 30, 2018 11:51 pm
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
syzygy's explanation makes sense, and it's easy to see if it holds true or not. Run the binary with perf stat -d, and observe the ratio of L1 dcache misses to total accesses. If it's much higher in the slow runs, it's likely that you're seeing an L1 cache aliasing effect (bank conflicts).