TalkChess.com

Posted: **Wed Aug 22, 2018 12:48 pm**

kranium wrote: ↑Wed Aug 22, 2018 12:34 pm Ethereal is scaling quite poorly and Stockfish doing much better...perhaps due to Lazy SMP as some are suggesting, but even with that 27% increase, we're talking about an relatively insignificant net Elo gain of approx +25 Elo. Not really an efficient use of the resources if you ask me, especially since some engine may not benefit at all.

I don't know if you are aware but Lc0 gains around those 27% (maybe a bit more) when you go from 2 to 4 Tesla V100.
And that is 18000$ more investment.
Quite a disproportion if you ask me.

Posted: **Wed Aug 22, 2018 12:50 pm**

kranium wrote: ↑Wed Aug 22, 2018 12:46 pm
Guenther wrote: ↑Wed Aug 22, 2018 12:38 pm
kranium wrote: ↑Wed Aug 22, 2018 11:57 am
...
Perhaps this is something we can tweak in the coming weeks.
I hope pgn download will be 'improved' too later for the real tournament.
Currently it contains just the plain moves. No eval/depth/time, the games are quite pointless this way.
Hi Guenther,
The Pgn does include those things:

1. e4 { (e2e4 c7c5 Ng1f3 e7e6 Nb1c3 Nb8c6 Bf1e2 Ng8f6 d2d4 c5xd4 Nf3xd4
Bf8b4 Qd1d3 OO OO e6e5 Nd4xc6 b7xc6 Rf1d1 Qd8e7 Bc1e3 Bc8b7 h2h3 Bb4c5
Nc3a4 Bc5xe3 Qd3xe3 d7d6 Na4c3) +0.15/25 94 }

How did you download?
If that data has been stripped by the front end somehow, I'll definitely look into it.
Thx for the heads up!

I never saw anything besides the download symbol which is a link to 'download pgn'
and this always resulted in plain moves.

Posted: **Wed Aug 22, 2018 12:53 pm**

Milos wrote: ↑Wed Aug 22, 2018 12:48 pm I don't know if you are aware but Lc0 gains around those 27% (maybe a bit more) when you go from 2 to 4 Tesla V100.
And that is 18000$ more investment.
Quite a disproportion if you ask me.

Where is this information from?
I don't have 4x V100 to test, but from the size of typical batch size that Lc0 can gather and mutex contention that we currently have, it's expected that above 2 GPUs currently Lc0 doesn't scale at all.

Posted: **Wed Aug 22, 2018 12:58 pm**

Milos wrote: ↑Wed Aug 22, 2018 12:48 pm
kranium wrote: ↑Wed Aug 22, 2018 12:34 pm Ethereal is scaling quite poorly and Stockfish doing much better...perhaps due to Lazy SMP as some are suggesting, but even with that 27% increase, we're talking about an relatively insignificant net Elo gain of approx +25 Elo. Not really an efficient use of the resources if you ask me, especially since some engine may not benefit at all.
I don't know if you are aware but Lc0 gains around those 27% (maybe a bit more) when you go from 2 to 4 Tesla V100.
And that is 18000$ more investment.
Quite a disproportion if you ask me.

Wow I wasn't aware those Graphic cards were so expensive.
Yes that's quite a disproportionate bang for the buck.

Chess.com purchased the CPU server, but at the moment we're renting the GPU server from a top-tier provider (but looking to purchase).
24 engine RR at 15 + 5 means Lc0 will only play approx once per day, so we're utilizing a cli api to start and stop the instance automatically, which saves a bundle.

Posted: **Wed Aug 22, 2018 12:59 pm**

Guenther wrote: ↑Wed Aug 22, 2018 12:50 pm
I never saw anything besides the download symbol which is a link to 'download pgn'
and this always resulted in plain moves.

I'll look into it and try to take care of it asap.

Posted: **Wed Aug 22, 2018 2:07 pm**

kranium wrote: ↑Wed Aug 22, 2018 12:34 pm

I've taken some time to run some tests and gather NPS for Ethereal (as you requested) and Stockfish.
I ran them each with 46 and 92 threads for 10 seconds:
(I'm presenting only the last PV output before 'bestmove')
Code: Select all
Ethereal 10.86 (bmi2)

setoption name Threads value 46
info string set Threads to 46
go movetime 10000
info depth 25 seldepth 32 score cp 24 time 3125 nodes 206966264 nps 66208000 tbhits 0 hashfull 996 pv d2d4 g8f6 c2c4 e7e6 g1f3 d7d5 b1c3 f8b4 c4d5 e6d5 c1g5 e8g8 e2e3 b8d7 f1d3 c7c5 e1g1 b4c3 b2c3 c5c4 d3c2 d8a5 g5f4 f6e4 c2e4
bestmove d2d4 ponder g8f6

setoption name Threads value 92
info string set Threads to 92
go movetime 10000
info depth 22 seldepth 31 score cp 26 time 7984 nodes 533074064 nps 66759000 tbhits 0 hashfull 1000 pv d2d4 d7d5 g1f3 e7e6 c2c4 g8f6 b1c3 c7c5 c1g5 c5d4 f3d4 d5c4 e2e3 b8d7 f1c4 f8e7 e1g1 e8g8 c4e2 a7a6 d4f3 h7h6 g5f4 g7g5
bestmove d2d4 ponder d7d5

nps +1%
Code: Select all
Stockfish 130818 (bmi2)

setoption name Threads value 46
go movetime 10000
info depth 27 seldepth 26 multipv 1 score cp 46 upperbound nodes 720732280 nps 72066021 hashfull 999 tbhits 0 time 10001 pv d2d4 g8f6
bestmove d2d4 ponder g8f6

setoption name Threads value 92
go movetime 10000
info depth 26 seldepth 32 multipv 1 score cp 45 nodes 916696131 nps 91660447 hashfull 999 tbhits 0 time 10001 pv d2d4 g8f6
bestmove d2d4 ponder g8f6

nps +27%
As you can see there's a big difference in how effectively they scale above 40-50 threads.

Ethereal is scaling quite poorly and Stockfish doing much better...perhaps due to Lazy SMP as some are suggesting, but even with that 27% increase, we're talking about an relatively insignificant net Elo gain of approx +25 Elo. Not really an efficient use of the resources if you ask me, especially since some engine may not benefit at all.

What is the OS of the machine? If Windows, could this be related to "processor groups"? I believe SF has enough NUMA-awareness to deal with this Windows-related issue. I think Ethereal does too, but that was a very recent addition, and might not be in version 10.86.

Only half know what I'm talking about here, but perhaps others can chime in.

[EDIT] I see the OS is Windows Server 2016. It seems that if hyperthreading is enabled, there will be two processor groups. Will all engines even be able to make use of both? I assume you've already considered this issue?

[EDIT] Will the first engine started be assigned to a processor group with 64 logical cores, while the second is assigned to a second group with only 32? Have no idea how Windows does scheduling.

Posted: **Wed Aug 22, 2018 2:22 pm**

crem wrote: ↑Wed Aug 22, 2018 12:53 pm
Milos wrote: ↑Wed Aug 22, 2018 12:48 pm I don't know if you are aware but Lc0 gains around those 27% (maybe a bit more) when you go from 2 to 4 Tesla V100.
And that is 18000$ more investment.
Quite a disproportion if you ask me.
Where is this information from?
I don't have 4x V100 to test, but from the size of typical batch size that Lc0 can gather and mutex contention that we currently have, it's expected that above 2 GPUs currently Lc0 doesn't scale at all.

I was generous. There is a result in benchmarks for 4xV100 and Titan V. At least there was but someone deleted it from the table (IIRC it was 80 or 85k value). Single V100 vs Titan V should have no difference. And 4xV100 having 85knps vs Titan V having 31k (btw someone screwed results in the table a bit). To me this looks like 15-25% when going from 2 to 4xV100.

Posted: **Wed Aug 22, 2018 2:29 pm**

kranium wrote: ↑Wed Aug 22, 2018 12:58 pm
Milos wrote: ↑Wed Aug 22, 2018 12:48 pm
kranium wrote: ↑Wed Aug 22, 2018 12:34 pm Ethereal is scaling quite poorly and Stockfish doing much better...perhaps due to Lazy SMP as some are suggesting, but even with that 27% increase, we're talking about an relatively insignificant net Elo gain of approx +25 Elo. Not really an efficient use of the resources if you ask me, especially since some engine may not benefit at all.
I don't know if you are aware but Lc0 gains around those 27% (maybe a bit more) when you go from 2 to 4 Tesla V100.
And that is 18000$ more investment.
Quite a disproportion if you ask me.
Wow I wasn't aware those Graphic cards were so expensive.
Yes that's quite a disproportionate bang for the buck.

Chess.com purchased the CPU server, but at the moment we're renting the GPU server from a top-tier provider (but looking to purchase).
24 engine RR at 15 + 5 means Lc0 will only play approx once per day, so we're utilizing a cli api to start and stop the instance automatically, which saves a bundle.

Judging from the config you seems to be renting DGX Station actually. To buy one, one needs 70k$

. Supper expensive hardware. Kind of similar price as 8xPlatinum 8168 config.

Posted: **Wed Aug 22, 2018 5:05 pm**

zullil wrote: ↑Wed Aug 22, 2018 2:07 pm
kranium wrote: ↑Wed Aug 22, 2018 12:34 pm

I've taken some time to run some tests and gather NPS for Ethereal (as you requested) and Stockfish.
I ran them each with 46 and 92 threads for 10 seconds:
(I'm presenting only the last PV output before 'bestmove')
Code: Select all
Ethereal 10.86 (bmi2)

setoption name Threads value 46
info string set Threads to 46
go movetime 10000
info depth 25 seldepth 32 score cp 24 time 3125 nodes 206966264 nps 66208000 tbhits 0 hashfull 996 pv d2d4 g8f6 c2c4 e7e6 g1f3 d7d5 b1c3 f8b4 c4d5 e6d5 c1g5 e8g8 e2e3 b8d7 f1d3 c7c5 e1g1 b4c3 b2c3 c5c4 d3c2 d8a5 g5f4 f6e4 c2e4
bestmove d2d4 ponder g8f6

setoption name Threads value 92
info string set Threads to 92
go movetime 10000
info depth 22 seldepth 31 score cp 26 time 7984 nodes 533074064 nps 66759000 tbhits 0 hashfull 1000 pv d2d4 d7d5 g1f3 e7e6 c2c4 g8f6 b1c3 c7c5 c1g5 c5d4 f3d4 d5c4 e2e3 b8d7 f1c4 f8e7 e1g1 e8g8 c4e2 a7a6 d4f3 h7h6 g5f4 g7g5
bestmove d2d4 ponder d7d5

nps +1%
Code: Select all
Stockfish 130818 (bmi2)

setoption name Threads value 46
go movetime 10000
info depth 27 seldepth 26 multipv 1 score cp 46 upperbound nodes 720732280 nps 72066021 hashfull 999 tbhits 0 time 10001 pv d2d4 g8f6
bestmove d2d4 ponder g8f6

setoption name Threads value 92
go movetime 10000
info depth 26 seldepth 32 multipv 1 score cp 45 nodes 916696131 nps 91660447 hashfull 999 tbhits 0 time 10001 pv d2d4 g8f6
bestmove d2d4 ponder g8f6

nps +27%
As you can see there's a big difference in how effectively they scale above 40-50 threads.

Ethereal is scaling quite poorly and Stockfish doing much better...perhaps due to Lazy SMP as some are suggesting, but even with that 27% increase, we're talking about an relatively insignificant net Elo gain of approx +25 Elo. Not really an efficient use of the resources if you ask me, especially since some engine may not benefit at all.
What is the OS of the machine? If Windows, could this be related to "processor groups"? I believe SF has enough NUMA-awareness to deal with this Windows-related issue. I think Ethereal does too, but that was a very recent addition, and might not be in version 10.86.

Only half know what I'm talking about here, but perhaps others can chime in.

[EDIT] I see the OS is Windows Server 2016. It seems that if hyperthreading is enabled, there will be two processor groups. Will all engines even be able to make use of both? I assume you've already considered this issue?

[EDIT] Will the first engine started be assigned to a processor group with 64 logical cores, while the second is assigned to a second group with only 32? Have no idea how Windows does scheduling.

Much too late to edit, but it seems that there will be two processor groups. One group should include all 48 logical cores on one CPU. The second group should include all 48 logical cores on the second cpu. As long as the two engines are assigned to different groups at the start, each engine will basically run on its 46 threads on its own CPU. Or so it seems, based on some quick reading.

A single engine running 92 threads likewise be stuck on one CPU, unless it has enough awareness to request otherwise.

Posted: **Wed Aug 22, 2018 10:05 pm**

kranium wrote: ↑Wed Aug 22, 2018 12:34 pm
DrCliche wrote: ↑Tue Aug 21, 2018 6:33 am
kranium wrote: ↑Sun Aug 19, 2018 1:08 pmIt's fairly well known Carlos that most engines scale very poorly above 30-40 threads, with Elo gains almost flatlining, so allocating 90+ threads would be the real joke...a huge waste of resources.
I believe this is outdated information. There's been a fair amount of testing over the past couple of years (as well as improvements to Lazy SMP implementations) that show CPU engines have good scaling as far as anyone has been able to test:
Not all engines have good scaling...and it probably can vary from system to system.
DrCliche wrote: ↑Wed Aug 22, 2018 2:01 am What was Ethereal's average NPS in your tests? One thing I appreciate about Andrew Grant is that when he tests, he puts numbers to paper rather than making vague claims like "should perform well". Are you outperforming Andrew Grant's $720 processor? Are you outperforming TCEC or the now defunct YLCET? One would expect some pretty gaudy NPS numbers from the CPU engines if the claim that the CCCC will "generate the best possible chess" is to be taken seriously. From what I can tell, most reasonable and knowledgeable people believe that claim to be laughable.
The quote "generate the best possible chess" which is causing you so much dismay, is simply a catch phrase utilized in the announcement by the marketing team. Believe me, if they knew that it was going to cause you so much grief, the would have likely omitted it.
I have no idea if we outperform TCEC or YLCET...that was not high in our list of goals for the events, and has not been measured.

I've taken some time to run some tests and gather NPS for Ethereal (as you requested) and Stockfish.
I ran them each with 46 and 92 threads for 10 seconds:
(I'm presenting only the last PV output before 'bestmove')
Code: Select all
Ethereal 10.86 (bmi2)

setoption name Threads value 46
info string set Threads to 46
go movetime 10000
info depth 25 seldepth 32 score cp 24 time 3125 nodes 206966264 nps 66208000 tbhits 0 hashfull 996 pv d2d4 g8f6 c2c4 e7e6 g1f3 d7d5 b1c3 f8b4 c4d5 e6d5 c1g5 e8g8 e2e3 b8d7 f1d3 c7c5 e1g1 b4c3 b2c3 c5c4 d3c2 d8a5 g5f4 f6e4 c2e4
bestmove d2d4 ponder g8f6

setoption name Threads value 92
info string set Threads to 92
go movetime 10000
info depth 22 seldepth 31 score cp 26 time 7984 nodes 533074064 nps 66759000 tbhits 0 hashfull 1000 pv d2d4 d7d5 g1f3 e7e6 c2c4 g8f6 b1c3 c7c5 c1g5 c5d4 f3d4 d5c4 e2e3 b8d7 f1c4 f8e7 e1g1 e8g8 c4e2 a7a6 d4f3 h7h6 g5f4 g7g5
bestmove d2d4 ponder d7d5

nps +1%
Code: Select all
Stockfish 130818 (bmi2)

setoption name Threads value 46
go movetime 10000
info depth 27 seldepth 26 multipv 1 score cp 46 upperbound nodes 720732280 nps 72066021 hashfull 999 tbhits 0 time 10001 pv d2d4 g8f6
bestmove d2d4 ponder g8f6

setoption name Threads value 92
go movetime 10000
info depth 26 seldepth 32 multipv 1 score cp 45 nodes 916696131 nps 91660447 hashfull 999 tbhits 0 time 10001 pv d2d4 g8f6
bestmove d2d4 ponder g8f6

nps +27%
As you can see there's a big difference in how effectively they scale above 40-50 threads.

Ethereal is scaling quite poorly and Stockfish doing much better...perhaps due to Lazy SMP as some are suggesting, but even with that 27% increase, we're talking about an relatively insignificant net Elo gain of approx +25 Elo. Not really an efficient use of the resources if you ask me, especially since some engine may not benefit at all.

Wait, there's something very wrong with the math in Ethereal. Don't just look at the nps, look at the nodes. With 92 threads it's more than 2x, yet the nps is almost the same. Also, besides the 2x number of nodes, the depth is lower... doesn't make sense.

TalkChess.com

Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship

Re: Chess.com 2018 computer chess championship