Strange Lc0 TCEC performance

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Strange Lc0 TCEC performance

Post by Laskos »

Milos wrote: Thu Aug 16, 2018 2:03 pm
Laskos wrote: Thu Aug 16, 2018 10:28 am Probably in these conditions, Lc0 is already close to SF8 on 2 cores. When it will achieve the level of SF8 on 4 cores, I could casually say "Lc0 is the level of SF8", like people are used to say about engines, meaning on home reasonable balanced CPU/GPU hardware configuration and blitz time controls.
You keep repeating that but it doesn't make it true. A price-wise balanced CPU/GPU configuration is 16 threads on 8 cores Ryzen vs. 1060 (plus cheap CPU).
Power (TDP) balanced one is 32 threads (16 cores) Ryzen (Threadripper) vs 1060 (plus lower power CPU).
The Nvidia 10x series is 2 year old, and the release of significantly better 11x series is delayed only by crypto-maniacs digging fairy tales. It will be out anyway soon, a question of few months, so my flawed "reasonability" still seems better than your trolling.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Strange Lc0 TCEC performance

Post by Milos »

Laskos wrote: Thu Aug 16, 2018 2:54 pm
Milos wrote: Thu Aug 16, 2018 2:03 pm
Laskos wrote: Thu Aug 16, 2018 10:28 am Probably in these conditions, Lc0 is already close to SF8 on 2 cores. When it will achieve the level of SF8 on 4 cores, I could casually say "Lc0 is the level of SF8", like people are used to say about engines, meaning on home reasonable balanced CPU/GPU hardware configuration and blitz time controls.
You keep repeating that but it doesn't make it true. A price-wise balanced CPU/GPU configuration is 16 threads on 8 cores Ryzen vs. 1060 (plus cheap CPU).
Power (TDP) balanced one is 32 threads (16 cores) Ryzen (Threadripper) vs 1060 (plus lower power CPU).
The Nvidia 10x series is 2 year old, and the release of significantly better 11x series is delayed only by crypto-maniacs digging fairy tales. It will be out anyway soon, a question of few months, so my flawed "reasonability" still seems better than your trolling.
You are the one who is trolling with your 5+ years old CPU (4 generations old) comparing to 2 years old GPU (current state-of-the-art).
When you have no arguments (to my spot on comparison) then you yell trolling and rent about crypto-maniacs and make hypothetical comparisons with hardware that is not yet out and with uncertain prices and performance.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Strange Lc0 TCEC performance

Post by Milos »

jp wrote: Thu Aug 16, 2018 2:51 pm
Milos wrote: Thu Aug 16, 2018 2:03 pm A price-wise balanced CPU/GPU configuration is 16 threads on 8 cores Ryzen vs. 1060 (plus cheap CPU).
Power (TDP) balanced one is 32 threads (16 cores) Ryzen (Threadripper) vs 1060 (plus lower power CPU).
What sort of nps ratio Lc0:SF do these two balanced configs give?
(I know it depends on many things, but just something typical e.g. for larger NN, no TB hits, etc.)
From start position latest testnet and SFdev, first one is 3.5knps vs 14Mnps.
Second one is 3.5knps vs 25Mnps.
Translated to Elo that is very roughly around 350Elo in the first case and around 400Elo in the second in CCRL 40/4 equivalent TC.
Werewolf
Posts: 1795
Joined: Thu Sep 18, 2008 10:24 pm

Re: Strange Lc0 TCEC performance

Post by Werewolf »

Another small sample to toss into the air:

Lc0 Big Net, August 8th release, Nvidia 1060

Houdini 6, 8 physical cores, Broadwell 6950X @ 4.2 GHz

Noomen short lines, 20 game match @ G5 min + 30 sec increment/move (designed to negate issues with Leela's time management)

4 wins each, 12 draws.

If anything I'd say Lc0 was outgunned in the hardware department here...
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: Strange Lc0 TCEC performance

Post by George Tsavdaris »

Laskos wrote: Tue Aug 14, 2018 11:27 pm As you all can see, the TCEC performance in DIv 3 of Lc0 with both nets, ID10520 and Deus one, is severely underpar from what we know having ordinary hardware, say reasonable home CPU and GPU.

So, what happens? TM, "too few games" and other things seem lame excuses. Is there a serious a bug affecting both Lc0 participants? Is there a hardware misconfiguration, invisible in NPS?
Or something I started to suspect: Lc0 scales badly in this 50x time * hardware configuration?
Finally they have found the issue, apparently the GPUs were severely got overheated(the case was too small and had just 1 fan for 2 GTX 1080 Ti, 500 Watt with 1 fan in a small case!) and affected in a big way nodes per second(there were even cases with 75% reduction of normal nodes per second), performance anyway, of the NNs.
Problem was even more serious for Leela as she had this aggressive pruning thing setting that made the situation worse. Both NNs were crippled by this and Leela even more because of this aggressive thing.

In the last 6 games that they have fixed this(not that elegantly since they had to reduce speed of GPUS to 70% so NNs get 70% of the true power but at least there are not any more the throttling issues) Leela has 5.5/6 after the fix! In the previous 19 games she had just 2 wins.

Too bad for TCEC that they haven't fixed this sooner.
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: Strange Lc0 TCEC performance

Post by JJJ »

TCEC should replay all the game played by the NN network then !
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: Strange Lc0 TCEC performance

Post by George Tsavdaris »

JJJ wrote: Thu Aug 16, 2018 10:15 pm TCEC should replay all the game played by the NN network then !
That would be a parody.
So better to stick to unfairness(to 2 NNs) than parody.
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Strange Lc0 TCEC performance

Post by chrisw »

George Tsavdaris wrote: Thu Aug 16, 2018 10:10 pm
Laskos wrote: Tue Aug 14, 2018 11:27 pm As you all can see, the TCEC performance in DIv 3 of Lc0 with both nets, ID10520 and Deus one, is severely underpar from what we know having ordinary hardware, say reasonable home CPU and GPU.

So, what happens? TM, "too few games" and other things seem lame excuses. Is there a serious a bug affecting both Lc0 participants? Is there a hardware misconfiguration, invisible in NPS?
Or something I started to suspect: Lc0 scales badly in this 50x time * hardware configuration?
Finally they have found the issue, apparently the GPUs were severely got overheated(the case was too small and had just 1 fan for 2 GTX 1080 Ti, 500 Watt with 1 fan in a small case!) and affected in a big way nodes per second(there were even cases with 75% reduction of normal nodes per second), performance anyway, of the NNs.
Problem was even more serious for Leela as she had this aggressive pruning thing setting that made the situation worse. Both NNs were crippled by this and Leela even more because of this aggressive thing.

In the last 6 games that they have fixed this(not that elegantly since they had to reduce speed of GPUS to 70% so NNs get 70% of the true power but at least there are not any more the throttling issues) Leela has 5.5/6 after the fix! In the previous 19 games she had just 2 wins.

Too bad for TCEC that they haven't fixed this sooner.
Well, it's the latest "we found the issue".
mirek
Posts: 52
Joined: Sat Mar 24, 2018 4:18 pm

Re: Strange Lc0 TCEC performance

Post by mirek »

George Tsavdaris wrote: Thu Aug 16, 2018 10:10 pm Problem was even more serious for Leela as she had this aggressive pruning thing setting that made the situation worse.
Yes, the TCEC chat is buzzing with details about the HW issue: seems there is part of the code that determines if current best move (with most visits) can be overtaken by some other move within the allocated time frame and if it can't the current best move is simply played.

And the problem here is that the estimated nps while GPU was throttled was messing with the estimate how many visits a second best move can get within the allocated time, thus resulting in shallowly explored best moves to be played thus resulting in observed "fast" play and quite untypical leela style / overall weak play. That's why spikes of 100% speed and e.g. 20% speed were much worse than constant 60% speed.
George Tsavdaris wrote: Thu Aug 16, 2018 10:25 pm
JJJ wrote: Thu Aug 16, 2018 10:15 pm TCEC should replay all the game played by the NN network then !
That would be a parody.
So better to stick to unfairness(to 2 NNs) than parody.
But it was stated upfront that TCEC season 13 is experimental, so maybe parody is a new thing TCEC could try :D
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Strange Lc0 TCEC performance

Post by Milos »

Werewolf wrote: Thu Aug 16, 2018 7:53 pm Another small sample to toss into the air:

Lc0 Big Net, August 8th release, Nvidia 1060

Houdini 6, 8 physical cores, Broadwell 6950X @ 4.2 GHz

Noomen short lines, 20 game match @ G5 min + 30 sec increment/move (designed to negate issues with Leela's time management)

4 wins each, 12 draws.

If anything I'd say Lc0 was outgunned in the hardware department here...
Another small sample. TC 10'+5''. Lc0 OC'ed 1060 net 10780 vs Droidfish (SF9) on Samsung Galaxy S6. 3 years old smartphone and old version of the engine vs. reasonably strong GPU and newest, strongest LC0. Moves manually entered (with sufficient move overhead for both engines). Score: 2 wins Droidfish, 2 draws, 0 wins Lc0!!!