kranium wrote: ↑Wed Aug 22, 2018 12:34 pm
Ethereal is scaling quite poorly and Stockfish doing much better...perhaps due to Lazy SMP as some are suggesting, but even with that 27% increase, we're talking about an relatively insignificant net Elo gain of approx +25 Elo. Not really an efficient use of the resources if you ask me, especially since some engine may not benefit at all.
I don't know if you are aware but Lc0 gains around those 27% (maybe a bit more) when you go from 2 to 4 Tesla V100.
And that is 18000$ more investment.
Quite a disproportion if you ask me.
kranium wrote: ↑Wed Aug 22, 2018 11:57 am
...
Perhaps this is something we can tweak in the coming weeks.
I hope pgn download will be 'improved' too later for the real tournament.
Currently it contains just the plain moves. No eval/depth/time, the games are quite pointless this way.
Milos wrote: ↑Wed Aug 22, 2018 12:48 pm
I don't know if you are aware but Lc0 gains around those 27% (maybe a bit more) when you go from 2 to 4 Tesla V100.
And that is 18000$ more investment.
Quite a disproportion if you ask me.
Where is this information from?
I don't have 4x V100 to test, but from the size of typical batch size that Lc0 can gather and mutex contention that we currently have, it's expected that above 2 GPUs currently Lc0 doesn't scale at all.
kranium wrote: ↑Wed Aug 22, 2018 12:34 pm
Ethereal is scaling quite poorly and Stockfish doing much better...perhaps due to Lazy SMP as some are suggesting, but even with that 27% increase, we're talking about an relatively insignificant net Elo gain of approx +25 Elo. Not really an efficient use of the resources if you ask me, especially since some engine may not benefit at all.
I don't know if you are aware but Lc0 gains around those 27% (maybe a bit more) when you go from 2 to 4 Tesla V100.
And that is 18000$ more investment.
Quite a disproportion if you ask me.
Wow I wasn't aware those Graphic cards were so expensive.
Yes that's quite a disproportionate bang for the buck.
Chess.com purchased the CPU server, but at the moment we're renting the GPU server from a top-tier provider (but looking to purchase).
24 engine RR at 15 + 5 means Lc0 will only play approx once per day, so we're utilizing a cli api to start and stop the instance automatically, which saves a bundle.
Last edited by kranium on Wed Aug 22, 2018 1:07 pm, edited 1 time in total.
Guenther wrote: ↑Wed Aug 22, 2018 12:50 pm
I never saw anything besides the download symbol which is a link to 'download pgn'
and this always resulted in plain moves.
I'll look into it and try to take care of it asap.
I've taken some time to run some tests and gather NPS for Ethereal (as you requested) and Stockfish.
I ran them each with 46 and 92 threads for 10 seconds:
(I'm presenting only the last PV output before 'bestmove')
Stockfish 130818 (bmi2)
setoption name Threads value 46
go movetime 10000
info depth 27 seldepth 26 multipv 1 score cp 46 upperbound nodes 720732280 nps 72066021 hashfull 999 tbhits 0 time 10001 pv d2d4 g8f6
bestmove d2d4 ponder g8f6
setoption name Threads value 92
go movetime 10000
info depth 26 seldepth 32 multipv 1 score cp 45 nodes 916696131 nps 91660447 hashfull 999 tbhits 0 time 10001 pv d2d4 g8f6
bestmove d2d4 ponder g8f6
nps +27%
As you can see there's a big difference in how effectively they scale above 40-50 threads.
Ethereal is scaling quite poorly and Stockfish doing much better...perhaps due to Lazy SMP as some are suggesting, but even with that 27% increase, we're talking about an relatively insignificant net Elo gain of approx +25 Elo. Not really an efficient use of the resources if you ask me, especially since some engine may not benefit at all.
What is the OS of the machine? If Windows, could this be related to "processor groups"? I believe SF has enough NUMA-awareness to deal with this Windows-related issue. I think Ethereal does too, but that was a very recent addition, and might not be in version 10.86.
Only half know what I'm talking about here, but perhaps others can chime in.
[EDIT] I see the OS is Windows Server 2016. It seems that if hyperthreading is enabled, there will be two processor groups. Will all engines even be able to make use of both? I assume you've already considered this issue?
[EDIT] Will the first engine started be assigned to a processor group with 64 logical cores, while the second is assigned to a second group with only 32? Have no idea how Windows does scheduling.
Last edited by zullil on Wed Aug 22, 2018 2:26 pm, edited 1 time in total.
Milos wrote: ↑Wed Aug 22, 2018 12:48 pm
I don't know if you are aware but Lc0 gains around those 27% (maybe a bit more) when you go from 2 to 4 Tesla V100.
And that is 18000$ more investment.
Quite a disproportion if you ask me.
Where is this information from?
I don't have 4x V100 to test, but from the size of typical batch size that Lc0 can gather and mutex contention that we currently have, it's expected that above 2 GPUs currently Lc0 doesn't scale at all.
I was generous. There is a result in benchmarks for 4xV100 and Titan V. At least there was but someone deleted it from the table (IIRC it was 80 or 85k value). Single V100 vs Titan V should have no difference. And 4xV100 having 85knps vs Titan V having 31k (btw someone screwed results in the table a bit). To me this looks like 15-25% when going from 2 to 4xV100.
kranium wrote: ↑Wed Aug 22, 2018 12:34 pm
Ethereal is scaling quite poorly and Stockfish doing much better...perhaps due to Lazy SMP as some are suggesting, but even with that 27% increase, we're talking about an relatively insignificant net Elo gain of approx +25 Elo. Not really an efficient use of the resources if you ask me, especially since some engine may not benefit at all.
I don't know if you are aware but Lc0 gains around those 27% (maybe a bit more) when you go from 2 to 4 Tesla V100.
And that is 18000$ more investment.
Quite a disproportion if you ask me.
Wow I wasn't aware those Graphic cards were so expensive.
Yes that's quite a disproportionate bang for the buck.
Chess.com purchased the CPU server, but at the moment we're renting the GPU server from a top-tier provider (but looking to purchase).
24 engine RR at 15 + 5 means Lc0 will only play approx once per day, so we're utilizing a cli api to start and stop the instance automatically, which saves a bundle.
Judging from the config you seems to be renting DGX Station actually. To buy one, one needs 70k$ . Supper expensive hardware. Kind of similar price as 8xPlatinum 8168 config.
I've taken some time to run some tests and gather NPS for Ethereal (as you requested) and Stockfish.
I ran them each with 46 and 92 threads for 10 seconds:
(I'm presenting only the last PV output before 'bestmove')
Stockfish 130818 (bmi2)
setoption name Threads value 46
go movetime 10000
info depth 27 seldepth 26 multipv 1 score cp 46 upperbound nodes 720732280 nps 72066021 hashfull 999 tbhits 0 time 10001 pv d2d4 g8f6
bestmove d2d4 ponder g8f6
setoption name Threads value 92
go movetime 10000
info depth 26 seldepth 32 multipv 1 score cp 45 nodes 916696131 nps 91660447 hashfull 999 tbhits 0 time 10001 pv d2d4 g8f6
bestmove d2d4 ponder g8f6
nps +27%
As you can see there's a big difference in how effectively they scale above 40-50 threads.
Ethereal is scaling quite poorly and Stockfish doing much better...perhaps due to Lazy SMP as some are suggesting, but even with that 27% increase, we're talking about an relatively insignificant net Elo gain of approx +25 Elo. Not really an efficient use of the resources if you ask me, especially since some engine may not benefit at all.
What is the OS of the machine? If Windows, could this be related to "processor groups"? I believe SF has enough NUMA-awareness to deal with this Windows-related issue. I think Ethereal does too, but that was a very recent addition, and might not be in version 10.86.
Only half know what I'm talking about here, but perhaps others can chime in.
[EDIT] I see the OS is Windows Server 2016. It seems that if hyperthreading is enabled, there will be two processor groups. Will all engines even be able to make use of both? I assume you've already considered this issue?
[EDIT] Will the first engine started be assigned to a processor group with 64 logical cores, while the second is assigned to a second group with only 32? Have no idea how Windows does scheduling.
Much too late to edit, but it seems that there will be two processor groups. One group should include all 48 logical cores on one CPU. The second group should include all 48 logical cores on the second cpu. As long as the two engines are assigned to different groups at the start, each engine will basically run on its 46 threads on its own CPU. Or so it seems, based on some quick reading.
A single engine running 92 threads likewise be stuck on one CPU, unless it has enough awareness to request otherwise.
kranium wrote: ↑Sun Aug 19, 2018 1:08 pmIt's fairly well known Carlos that most engines scale very poorly above 30-40 threads, with Elo gains almost flatlining, so allocating 90+ threads would be the real joke...a huge waste of resources.
I believe this is outdated information. There's been a fair amount of testing over the past couple of years (as well as improvements to Lazy SMP implementations) that show CPU engines have good scaling as far as anyone has been able to test:
Not all engines have good scaling...and it probably can vary from system to system.
DrCliche wrote: ↑Wed Aug 22, 2018 2:01 am
What was Ethereal's average NPS in your tests? One thing I appreciate about Andrew Grant is that when he tests, he puts numbers to paper rather than making vague claims like "should perform well". Are you outperforming Andrew Grant's $720 processor? Are you outperforming TCEC or the now defunct YLCET? One would expect some pretty gaudy NPS numbers from the CPU engines if the claim that the CCCC will "generate the best possible chess" is to be taken seriously. From what I can tell, most reasonable and knowledgeable people believe that claim to be laughable.
The quote "generate the best possible chess" which is causing you so much dismay, is simply a catch phrase utilized in the announcement by the marketing team. Believe me, if they knew that it was going to cause you so much grief, the would have likely omitted it.
I have no idea if we outperform TCEC or YLCET...that was not high in our list of goals for the events, and has not been measured.
I've taken some time to run some tests and gather NPS for Ethereal (as you requested) and Stockfish.
I ran them each with 46 and 92 threads for 10 seconds:
(I'm presenting only the last PV output before 'bestmove')
Stockfish 130818 (bmi2)
setoption name Threads value 46
go movetime 10000
info depth 27 seldepth 26 multipv 1 score cp 46 upperbound nodes 720732280 nps 72066021 hashfull 999 tbhits 0 time 10001 pv d2d4 g8f6
bestmove d2d4 ponder g8f6
setoption name Threads value 92
go movetime 10000
info depth 26 seldepth 32 multipv 1 score cp 45 nodes 916696131 nps 91660447 hashfull 999 tbhits 0 time 10001 pv d2d4 g8f6
bestmove d2d4 ponder g8f6
nps +27%
As you can see there's a big difference in how effectively they scale above 40-50 threads.
Ethereal is scaling quite poorly and Stockfish doing much better...perhaps due to Lazy SMP as some are suggesting, but even with that 27% increase, we're talking about an relatively insignificant net Elo gain of approx +25 Elo. Not really an efficient use of the resources if you ask me, especially since some engine may not benefit at all.
Wait, there's something very wrong with the math in Ethereal. Don't just look at the nps, look at the nodes. With 92 threads it's more than 2x, yet the nps is almost the same. Also, besides the 2x number of nodes, the depth is lower... doesn't make sense.