Hi.
Deep Linux system expertise badly needed...
Testing my engine on Andrew Grants OpenBench framework I discovered several very unbalanced testing results.
E.g. testing some eval parameter change http://chess.grantnet.us/test/4856/ I got a +20Elo result on my own worker running OpenBench client under Ubuntu 19.10 on a Ryzen 3700x which is abolutely nonsense. After stopping the Openbench worker and running some bench on the two engine binaries it turned out that one of them was extremely slow (1767721 nps vs. 2122860 nps).
It isn't a bad compilation cause if I copy the slow binary and run the copy, speed is fine. So it seems that "something in the Linux system" slows down the binary. I tried "lsof" to see if there is a process running with a handle on the slow binary but without success.
Last time it happened, a reboot cured the slowness. This time the system is still running and waiting for your ideas to analyse the problem...
More description of the problem can be found here: https://github.com/AndyGrant/OpenBench/issues/50
Any idea is welcome.
./Andreas
Strange sporadic speed limitation in engine running in Linux on Ryzen
Moderators: hgm, Rebel, chrisw
-
- Posts: 625
- Joined: Fri Mar 30, 2018 7:20 am
- Full name: Andreas Matthies
-
- Posts: 91
- Joined: Sat Nov 02, 2019 6:42 pm
- Full name: ɹǝƃɹǝqǝᗡ ǝɔnɹꓭ
-
- Posts: 434
- Joined: Fri Dec 16, 2016 11:04 am
- Location: France
- Full name: Richard Delorme
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
I recommend to see the following video about what can affect program performance:
https://www.youtube.com/watch?v=r-TLSBdHe1A
Many things can affect the memory layout of the program and affect its performance, including the directory where it runs from.
https://www.youtube.com/watch?v=r-TLSBdHe1A
Many things can affect the memory layout of the program and affect its performance, including the directory where it runs from.
Richard Delorme
-
- Posts: 625
- Joined: Fri Mar 30, 2018 7:20 am
- Full name: Andreas Matthies
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
Nope. I ran slow engine - fast copy - slow engine in this order many times. So temperature is not the problem.
-
- Posts: 406
- Joined: Sat May 05, 2012 2:48 pm
- Full name: Oliver Roese
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
Did you investigated that hint?E.g. before and after a test start "top" and check for irregularites, like high system load or a thread hogging the cpu.Last time it happened, a reboot cured the slowness.
O try to boot into an older kernel. Or do some diagnostic system performance tests and check for irregularities.
-
- Posts: 625
- Joined: Fri Mar 30, 2018 7:20 am
- Full name: Andreas Matthies
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
I can exclude reasons like other processes stressing the cpu or something. As I said, is is exactly "this" binary file that runs slow nothing else. Running a copy of it the next moment gives fast speed, running the slow original next gives another slow result. So the system itself and every other engine binary runs fast only the one that caused the bad result in OpenBench still runs slow until system reboot. On Windows I probably would blame the virus scanner but on Linux??BeyondCritics wrote: ↑Sun Mar 08, 2020 12:51 pmDid you investigated that hint?E.g. before and after a test start "top" and check for irregularites, like high system load or a thread hogging the cpu.Last time it happened, a reboot cured the slowness.
O try to boot into an older kernel. Or do some diagnostic system performance tests and check for irregularities.
What do you mean by "do some diagnostic system performance tests" in detail?
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
Very first thing. Run "top" and keep it active. See if, during the match, the cpu utilization jumps up due to something in your linux distro. I had this happen to me years ago in Suse, which I always considered to be overloaded/bloated anyway. If you don't want to watch top, you might try this:
#!/bin/csh
while (1)
date >>logfile
ps -r | head 10 >>log file
sleep 60
end
When you find a strange slowdown, look in the log file for that time-frame and see if something unexpected is going on.
#!/bin/csh
while (1)
date >>logfile
ps -r | head 10 >>log file
sleep 60
end
When you find a strange slowdown, look in the log file for that time-frame and see if something unexpected is going on.
-
- Posts: 91
- Joined: Sat Nov 02, 2019 6:42 pm
- Full name: ɹǝƃɹǝqǝᗡ ǝɔnɹꓭ
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
Apparently I misunderstood the error you are reporting.
A copy of a binary, executed on the same machine, executed in the same way, has consistently differing results?
I would compare the file sizes and file ownerships and file permissions and md5sums.
If everything is the same I would backup any valuable data and check the file system with fsck.
A copy of a binary, executed on the same machine, executed in the same way, has consistently differing results?
I would compare the file sizes and file ownerships and file permissions and md5sums.
If everything is the same I would backup any valuable data and check the file system with fsck.
-
- Posts: 701
- Joined: Mon Apr 19, 2010 7:07 pm
- Location: Sweden
- Full name: Peter Osterlund
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
In particular for your situation:abulmo2 wrote: ↑Sat Mar 07, 2020 9:05 pm I recommend to see the following video about what can affect program performance:
https://www.youtube.com/watch?v=r-TLSBdHe1A
Many things can affect the memory layout of the program and affect its performance, including the directory where it runs from.
- Does the copied engine run faster even if the copy is in the same directory as the original and the filename of the copy has the same length as the original filename?
- If you rename the original engine file (instead of copying it), does it still run slow?
- Does this slowdown happen when you use a single search thread?
- Does this slowdown happen even if you lock the program to a single CPU core, e.g something like:
Code: Select all
taskset 1 ./RubiChess bench
-
- Posts: 625
- Joined: Fri Mar 30, 2018 7:20 am
- Full name: Andreas Matthies
Re: Strange sporadic speed limitation in engine running in Linux on Ryzen
Thanks for response. Here are some answers:petero2 wrote: ↑Sun Mar 08, 2020 10:45 pmIn particular for your situation:abulmo2 wrote: ↑Sat Mar 07, 2020 9:05 pm I recommend to see the following video about what can affect program performance:
https://www.youtube.com/watch?v=r-TLSBdHe1A
Many things can affect the memory layout of the program and affect its performance, including the directory where it runs from.
- Does the copied engine run faster even if the copy is in the same directory as the original and the filename of the copy has the same length as the original filename?
- If you rename the original engine file (instead of copying it), does it still run slow?
- Does this slowdown happen when you use a single search thread?
- Does this slowdown happen even if you lock the program to a single CPU core, e.g something like:
Code: Select all
taskset 1 ./RubiChess bench
- Yes. I even renamed the fast copy to exactly the same folder and name as the slow original (after renaming that) and it still ran faster. Renaming or moving the slow original file doesn't change slow speed.
- Yes. Already answered before. Seems that this doesn't fit to the arguments mentioned in the video.
- Yes. All tests were done with single thread.
- I have used taskset. I don't remeber exactly but I'm pretty sure it didn't change anything in the speed.
I had two "little" reproductions where the binary constantly reached only ~2.09 mnps instead of 2.15mnps. Small difference but noticeable. Both reproductions (and also the original problem I reported) happened after I ran Windows before (computer is dual-boot) and then rebooted directly into Linux (warm reboot) so this may be related. Maybe some hardware that was configured by Windows and not completely reinitialized by Linux. But still strange that single binary files are effected.
I will always do a clean cold boot in the future when running Openbench client and we'll see if it happenes again. But I fear that I have seen other results by other clients at Openbench that are also very biased.
Thanks for your advise.
Andreas
Last edited by RubiChess on Tue Mar 10, 2020 6:45 pm, edited 1 time in total.