Dann Corbit wrote: ↑Wed Jun 05, 2019 2:25 am
flok wrote: ↑Tue Jun 04, 2019 9:06 pm
Hi,
tpoppins from ccrl noticed that Embla slows down when the number of threads increases (when using lazy smp).
I thought I had seen that this only happened on windows but that is not correct: it also happens on Linux.
Number of threads versus nps on a threadripper 1950x
The dramatic slow-down is probably because other things were running on it (e.g. the chrome browser).
Now my question is: what are strategies for finding what causes this slow down?
The threads share no common variables apart from the transposition table. That tt has no locks, it uses the xor-trick.
Interesting that there are 16 cores and 32 active threads for that CPU.
Yes, this cpu has 2 threads per core.
There is a huge nosedive at 33 cores.
That's at 32 actually.
I think that the graph is exactly what we would expect.
Is it? Because stockfish for example shows noise in the nps but no nose-dive (well a tiny one but it had to share that laptop with a browser and other mess):
Code: Select all
# threads nps nps/thread
1: 1924447 1924447
2: 3516980 1758490
3: 5726175 1908725
4: 7295187 1823796
5: 9950080 1990016
6: 9704585 1617430
Dann Corbit wrote: ↑Wed Jun 05, 2019 2:27 am
Err, I have a question.
I assumed that the graph was NPS per core. Is that correct?
If that is NPS for the program, then something is totally broken.
bob wrote: ↑Wed Jun 05, 2019 5:27 am
Good observation. Either this is (a) total-NPS divided by threads or else (b) it is broken. About all NPS is useful for is to detect architectural issues, such as cache thrashing / false sharing or bandwidth issues, processor throttling due to heat, memory bottlenecks, etc. The number of cores is getting large enough that it becomes interesting to figure out what is going on sometimes.
That graph showed the nps for 1 thread.
This new graph shows the average nps for all threads:
The version at github is a new rewrite, not the one working on currently. I'm going to drop that rewrite as its movegen is slower than the previous version.
Anyway, this is the code:
https://vanheusden.com/Embla/files/embla-2.0.8.tgz
Brain.cpp contains search and eval and threading.
there's a "thread()" function and a "calculateMove" method which do the searching. calculateMove invokes search() and starts n - 1 threads via the thread() function.
Tpt.cpp is the hashtable. As you see the transpositiontable has been disabled for the tests.