strategies for finding slowdows in lazy smp

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
flok
Posts: 481
Joined: Tue Jul 03, 2018 10:19 am
Full name: Folkert van Heusden

Re: strategies for finding slowdows in lazy smp

Post by flok »

Hi Dann,
Dann Corbit wrote: Wed Jun 05, 2019 10:12 am
That graph showed the nps for 1 thread.
This new graph shows the average nps for all threads:
Something is very wrong with the calculation.
The aggregate NPS is the sum of the NPS for all threads.
How can it be less than the NPS for one thread?
In that graph it is not the aggregate, it is the average :D

Here's a combined graph of the average and the sum:

Image
smatovic
Posts: 2639
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: strategies for finding slowdows in lazy smp

Post by smatovic »

flok wrote: Tue Jun 04, 2019 9:06 pm Now my question is: what are strategies for finding what causes this slow down?
- implement an benchsmp command to reproduce results quick on the command line
- as always in engine debugging, turn every extension off, bench only with an
basic engine and turn stepwise extensions on, you can also bench smp nps
with TT off

***edit***
- if it's not TT or extensions, then IDF loop and starting/terminating threads is left

--
Srdja
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: strategies for finding slowdows in lazy smp

Post by mar »

flok wrote: Tue Jun 04, 2019 9:06 pm The dramatic slow-down is probably because other things were running on it (e.g. the chrome browser).
First of all, don't mess with affinity (especially if you don't understand how it works).
Let's say your CPU has 2 logical cores per one physical, so if you set affinity mask for one worker to bit 0 and another to bit 1, you force them to run on a single physical core, this is certainly not what you want.
So unless you know exactly what you're doing, simply trust the scheduler.
Martin Sedlak
User avatar
flok
Posts: 481
Joined: Tue Jul 03, 2018 10:19 am
Full name: Folkert van Heusden

Re: strategies for finding slowdows in lazy smp

Post by flok »

mar wrote: Wed Jun 05, 2019 12:34 pm
flok wrote: Tue Jun 04, 2019 9:06 pm The dramatic slow-down is probably because other things were running on it (e.g. the chrome browser).
First of all, don't mess with affinity (especially if you don't understand how it works).
Let's say your CPU has 2 logical cores per one physical, so if you set affinity mask for one worker to bit 0 and another to bit 1, you force them to run on a single physical core, this is certainly not what you want.
But: let's say I have a system with 32 threads (16 physical cores) on which I want to run 32 threads. In that case there's always a case of 2 on the same phsyical core.
Or are you suggesting not to use threading but only 1 thread per core?
User avatar
flok
Posts: 481
Joined: Tue Jul 03, 2018 10:19 am
Full name: Folkert van Heusden

Re: strategies for finding slowdows in lazy smp

Post by flok »

smatovic wrote: Wed Jun 05, 2019 10:44 am
flok wrote: Tue Jun 04, 2019 9:06 pm Now my question is: what are strategies for finding what causes this slow down?
- implement an benchsmp command to reproduce results quick on the command line
- as always in engine debugging, turn every extension off, bench only with an
basic engine and turn stepwise extensions on, you can also bench smp nps
with TT off

***edit***
- if it's not TT or extensions, then IDF loop and starting/terminating threads is left
what is an IDF loop? my googling did not reproduce anything on that

starting/term. threads: I start them once at the start of the whole calculation and stop them when time is up
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: strategies for finding slowdows in lazy smp

Post by mar »

flok wrote: Wed Jun 05, 2019 12:42 pm
mar wrote: Wed Jun 05, 2019 12:34 pm
flok wrote: Tue Jun 04, 2019 9:06 pm The dramatic slow-down is probably because other things were running on it (e.g. the chrome browser).
First of all, don't mess with affinity (especially if you don't understand how it works).
Let's say your CPU has 2 logical cores per one physical, so if you set affinity mask for one worker to bit 0 and another to bit 1, you force them to run on a single physical core, this is certainly not what you want.
But: let's say I have a system with 32 threads (16 physical cores) on which I want to run 32 threads. In that case there's always a case of 2 on the same phsyical core.
Or are you suggesting not to use threading but only 1 thread per core?
Of course I'm not, I'm suggesting you don't mess with affinity and let the scheduler do its job!
Let's say I have 8 logical cores and 4 physical:

Code: Select all

L0L1L2L3L4L5L6L7
P0P0P1P1P2P2P3P3
And I want to run a 4-CPU tournament. The way you allocate the logical cores, you end up with thread masks
L0L1L2L3, but that restricts the threads to only two physical cores instead of 4, so a better mask would be
L0L1 for thread0, L2L3 for thread 1 and so on. (of course, you could have more than 2 logical cores per physical, so this is just an example)

So simply let the OS scheduler handle it (plus it's less code :)
Martin Sedlak
smatovic
Posts: 2639
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: strategies for finding slowdows in lazy smp

Post by smatovic »

flok wrote: Wed Jun 05, 2019 12:45 pm
smatovic wrote: Wed Jun 05, 2019 10:44 am
flok wrote: Tue Jun 04, 2019 9:06 pm Now my question is: what are strategies for finding what causes this slow down?
- implement an benchsmp command to reproduce results quick on the command line
- as always in engine debugging, turn every extension off, bench only with an
basic engine and turn stepwise extensions on, you can also bench smp nps
with TT off

***edit***
- if it's not TT or extensions, then IDF loop and starting/terminating threads is left
what is an IDF loop? my googling did not reproduce anything on that

starting/term. threads: I start them once at the start of the whole calculation and stop them when time is up
IDF - Iterative Deepening Framework

https://www.chessprogramming.org/Iterative_Deepening

Not sure how a lazy smp implementation looks like without Iterative Deepening,
but if you have ID implemented, then maybe you want to implement a termination
strategy for all threads, for the case a thread finishes the search of the
current ID iteration...but this stuff may vary between lazy smp derivatives.

--
Srdja
User avatar
flok
Posts: 481
Joined: Tue Jul 03, 2018 10:19 am
Full name: Folkert van Heusden

Re: strategies for finding slowdows in lazy smp

Post by flok »

smatovic wrote: Wed Jun 05, 2019 1:20 pm IDF - Iterative Deepening Framework
https://www.chessprogramming.org/Iterative_Deepening
Not sure how a lazy smp implementation looks like without Iterative Deepening,
Oh it has IDF, I just didn't know it was called IDF. Thought ID. But never mind.
but if you have ID implemented, then maybe you want to implement a termination
strategy for all threads, for the case a thread finishes the search of the
current ID iteration...but this stuff may vary between lazy smp derivatives.
Currently my main thread is the master-thread. If that one decides the search is finished, then all others terminate as well.
smatovic
Posts: 2639
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: strategies for finding slowdows in lazy smp

Post by smatovic »

flok wrote: Wed Jun 05, 2019 1:27 pm Currently my main thread is the master-thread. If that one decides the search is finished, then all others terminate as well.
And what happens if a helper finishes its search?

--
Srdja
User avatar
flok
Posts: 481
Joined: Tue Jul 03, 2018 10:19 am
Full name: Folkert van Heusden

Re: strategies for finding slowdows in lazy smp

Post by flok »

smatovic wrote: Wed Jun 05, 2019 1:33 pm
flok wrote: Wed Jun 05, 2019 1:27 pm Currently my main thread is the master-thread. If that one decides the search is finished, then all others terminate as well.
And what happens if a helper finishes its search?
It goes on with the next iteration if applicable. Else it'll busy-loop :oops: until the main-thread catches up.