Running multiple instances of Laser at once

AndrewGrant · Post by **AndrewGrant** » Tue Nov 14, 2017 10:02 pm

So I've got a python script that will launch X copies of a given engine, have the engine search a set of positions to a given depth, and then print the total time.

For Ethereal (and every other engine I've tried so far except laser) it looks about like that.

Running Benchmark For Engines/Ethereal8.28 With 1 Cores
5262.051820755005
Average Time : 5262.051820755005ms

Running Benchmark For Engines/Ethereal8.28 With 4 Cores
5408.526182174683
5508.90851020813
5591.909646987915
5784.143686294556
Average Time : 5573.372006416321ms

Running Benchmark For Engines/Ethereal8.28 With 8 Cores
5541.480302810669
5553.623199462891
5567.0154094696045
5566.6139125823975
5601.499795913696
5640.672922134399
5673.544883728027
5709.365606307983
Average Time : 5606.7270040512085ms

So the time goes up as we do more processes at once (using a 12core machine)

Here is what Laser (just cloned current master branch)

Running Benchmark For laser With 1 Cores
6014.405250549316
Average Time : 6014.405250549316ms

Running Benchmark For laser With 4 Cores
9185.718774795532
9195.00470161438
9180.82308769226
9456.510543823242
Average Time : 9254.514276981354ms

Running Benchmark For laser With 8 Cores
18168.219804763794
18136.425733566284
18175.54473876953
18172.247648239136
18206.12907409668
18272.027492523193
18279.447078704834
18317.724466323853
Average Time : 18215.970754623413ms

My first though was maybe Laser causes extreme bottlenecks in memory. So I dropped the Hash and EvalCache sizes from 16MB to 1MB through the UCI options. Same scaling issues.

Also, this is only occuring (As far as I know) on this one IntelXeon machine.

Which I just proved has some very slow system calls. But that would explain the engine being slow in general, not the CPU failing to manage contexts.

Any thoughts would be appreciated. I need to resolve this.

Ras · Post by **Ras** » Tue Nov 14, 2017 10:41 pm

AndrewGrant wrote:Which I just proved has some very slow system calls. But that would explain the engine being slow in general, not the CPU failing to manage contexts.

Except if these system calls are slow because they are doing some kind of locking.

lucasart · Post by **lucasart** » Wed Nov 15, 2017 2:10 am

Indeed Laser has some SMP performance problems, as shown in TCEC (appaling NPS on 24 cores). But I think Peter Osterlund found the problem and notified the author of Laser already.

jeffreyan11 · Post by **jeffreyan11** » Wed Nov 15, 2017 2:53 am

The lastest master is indeed broken on > 1 core. I can't give an exact time as to when a fixed version will be put onto Github, but hopefully in the next month. In the mean time, if you want the latest working version, take anything from July 22nd or before.

Peter Osterlund found the bug for Nemorino, although Laser's bug is somewhat similar (but much worse, since it affects more than just speed). I accidentally made some evaluation tables global instead of wrapping them in per-thread class objects. I don't understand why the slowdown only seems to occur on NUMA machines, though. I would expect it to be slower on all machines.

AndrewGrant · Post by **AndrewGrant** » Wed Nov 15, 2017 6:09 am

The same thing happens in Laser1.2, so I'm not sure your July22nd comment will help me

My problem is with running 20 copies of your engine, with Threads=1 on each copy.

My problem is NOT running one copy of your engine, with Threads=20

mar · Post by **mar** » Wed Nov 15, 2017 11:22 am

jeffreyan11 wrote:(but much worse, since it affects more than just speed). I accidentally made some evaluation tables global instead of wrapping them in per-thread class objects.

Why would you do that?! Reading same location from multiple threads is perfectly fine, it's writing + reading at the same time that causes problems.

Ras · Post by **Ras** » Wed Nov 15, 2017 6:18 pm

mar wrote:Reading same location from multiple threads is perfectly fine

Depends. The compiler may not be able to determine whether some unrelated write access has aliased to the tables.

mar · Post by **mar** » Wed Nov 15, 2017 7:36 pm

Ras wrote:
mar wrote:Reading same location from multiple threads is perfectly fine
Depends. The compiler may not be able to determine whether some unrelated write access has aliased to the tables.

this is not a task for the compiler, but for the programmer. if you're obsessed, cache-align your tables (or use dumb padding)

initialized data is usually grouped in a separate section, so unless your tables are generated on the fly, the probability should be literally zero

jeffreyan11 · Post by **jeffreyan11** » Thu Nov 16, 2017 7:44 am

Andrew: Ah, I see. Sorry for misunderstanding. I'm not sure what the problem is then, but I'll look into it.

Martin: By evaluation tables, I meant attack map tables for the eval that are calculated on the fly, so they're both written and read. But I agree that read-only data shouldn't be a problem.

petero2 · Post by **petero2** » Thu Nov 16, 2017 11:17 pm

jeffreyan11 wrote:Andrew: Ah, I see. Sorry for misunderstanding. I'm not sure what the problem is then, but I'll look into it.

I used git bisect on my 24 core computer to find when the slowdown (when using 24 threads) started, and got:

Code: Select all

8863ee8844b9832e754066dc88d30fafcd81f95a is the first bad commit
commit 8863ee8844b9832e754066dc88d30fafcd81f95a
Author&#58; Jeffrey An <jeffreyan07@gmail.com>
Date&#58;   Sun Jul 23 19&#58;12&#58;42 2017 -0700

    Move evaluation code to a new file eval.cpp. No functional change.

That commit is pretty big though so I have not tried to determine what part of it is causing the slowdown.

If don't know if the slowdown when running 24 processes is related to this problem.

Running multiple instances of Laser at once

Running multiple instances of Laser at once

Re: Running multiple instances of Laser at once

Re: Running multiple instances of Laser at once

Re: Running multiple instances of Laser at once

Re: Running multiple instances of Laser at once

Re: Running multiple instances of Laser at once

Re: Running multiple instances of Laser at once

Re: Running multiple instances of Laser at once

Re: Running multiple instances of Laser at once

Re: Running multiple instances of Laser at once