Running multiple instances of Laser at once

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

AndrewGrant
Posts: 1750
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Running multiple instances of Laser at once

Post by AndrewGrant »

So I've got a python script that will launch X copies of a given engine, have the engine search a set of positions to a given depth, and then print the total time.

For Ethereal (and every other engine I've tried so far except laser) it looks about like that.
Running Benchmark For Engines/Ethereal8.28 With 1 Cores
5262.051820755005
Average Time : 5262.051820755005ms

Running Benchmark For Engines/Ethereal8.28 With 4 Cores
5408.526182174683
5508.90851020813
5591.909646987915
5784.143686294556
Average Time : 5573.372006416321ms

Running Benchmark For Engines/Ethereal8.28 With 8 Cores
5541.480302810669
5553.623199462891
5567.0154094696045
5566.6139125823975
5601.499795913696
5640.672922134399
5673.544883728027
5709.365606307983
Average Time : 5606.7270040512085ms
So the time goes up as we do more processes at once (using a 12core machine)

Here is what Laser (just cloned current master branch)
Running Benchmark For laser With 1 Cores
6014.405250549316
Average Time : 6014.405250549316ms

Running Benchmark For laser With 4 Cores
9185.718774795532
9195.00470161438
9180.82308769226
9456.510543823242
Average Time : 9254.514276981354ms

Running Benchmark For laser With 8 Cores
18168.219804763794
18136.425733566284
18175.54473876953
18172.247648239136
18206.12907409668
18272.027492523193
18279.447078704834
18317.724466323853
Average Time : 18215.970754623413ms
My first though was maybe Laser causes extreme bottlenecks in memory. So I dropped the Hash and EvalCache sizes from 16MB to 1MB through the UCI options. Same scaling issues.

Also, this is only occuring (As far as I know) on this one IntelXeon machine.

Which I just proved has some very slow system calls. But that would explain the engine being slow in general, not the CPU failing to manage contexts.

Any thoughts would be appreciated. I need to resolve this.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Running multiple instances of Laser at once

Post by Ras »

AndrewGrant wrote:Which I just proved has some very slow system calls. But that would explain the engine being slow in general, not the CPU failing to manage contexts.
Except if these system calls are slow because they are doing some kind of locking.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Running multiple instances of Laser at once

Post by lucasart »

Indeed Laser has some SMP performance problems, as shown in TCEC (appaling NPS on 24 cores). But I think Peter Osterlund found the problem and notified the author of Laser already.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
jeffreyan11
Posts: 46
Joined: Sat Sep 12, 2015 5:23 am
Location: United States

Re: Running multiple instances of Laser at once

Post by jeffreyan11 »

The lastest master is indeed broken on > 1 core. I can't give an exact time as to when a fixed version will be put onto Github, but hopefully in the next month. In the mean time, if you want the latest working version, take anything from July 22nd or before.

Peter Osterlund found the bug for Nemorino, although Laser's bug is somewhat similar (but much worse, since it affects more than just speed). I accidentally made some evaluation tables global instead of wrapping them in per-thread class objects. I don't understand why the slowdown only seems to occur on NUMA machines, though. I would expect it to be slower on all machines.
AndrewGrant
Posts: 1750
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: Running multiple instances of Laser at once

Post by AndrewGrant »

The same thing happens in Laser1.2, so I'm not sure your July22nd comment will help me

My problem is with running 20 copies of your engine, with Threads=1 on each copy.

My problem is NOT running one copy of your engine, with Threads=20
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Running multiple instances of Laser at once

Post by mar »

jeffreyan11 wrote:(but much worse, since it affects more than just speed). I accidentally made some evaluation tables global instead of wrapping them in per-thread class objects.
Why would you do that?! Reading same location from multiple threads is perfectly fine, it's writing + reading at the same time that causes problems.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Running multiple instances of Laser at once

Post by Ras »

mar wrote:Reading same location from multiple threads is perfectly fine
Depends. The compiler may not be able to determine whether some unrelated write access has aliased to the tables.
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Running multiple instances of Laser at once

Post by mar »

Ras wrote:
mar wrote:Reading same location from multiple threads is perfectly fine
Depends. The compiler may not be able to determine whether some unrelated write access has aliased to the tables.
this is not a task for the compiler, but for the programmer. if you're obsessed, cache-align your tables (or use dumb padding)

initialized data is usually grouped in a separate section, so unless your tables are generated on the fly, the probability should be literally zero
jeffreyan11
Posts: 46
Joined: Sat Sep 12, 2015 5:23 am
Location: United States

Re: Running multiple instances of Laser at once

Post by jeffreyan11 »

Andrew: Ah, I see. Sorry for misunderstanding. I'm not sure what the problem is then, but I'll look into it.

Martin: By evaluation tables, I meant attack map tables for the eval that are calculated on the fly, so they're both written and read. But I agree that read-only data shouldn't be a problem.
petero2
Posts: 684
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: Running multiple instances of Laser at once

Post by petero2 »

jeffreyan11 wrote:Andrew: Ah, I see. Sorry for misunderstanding. I'm not sure what the problem is then, but I'll look into it.
I used git bisect on my 24 core computer to find when the slowdown (when using 24 threads) started, and got:

Code: Select all

8863ee8844b9832e754066dc88d30fafcd81f95a is the first bad commit
commit 8863ee8844b9832e754066dc88d30fafcd81f95a
Author&#58; Jeffrey An <jeffreyan07@gmail.com>
Date&#58;   Sun Jul 23 19&#58;12&#58;42 2017 -0700

    Move evaluation code to a new file eval.cpp. No functional change.
That commit is pretty big though so I have not tried to determine what part of it is causing the slowdown.

If don't know if the slowdown when running 24 processes is related to this problem.