Strange Lc0 TCEC performance

Milos · Post by **Milos** » Wed Aug 15, 2018 10:20 am

Werner wrote: ↑Wed Aug 15, 2018 9:00 am I wonder, why the 15/192 network using CPUs produce here much better results than the new 20/256 network - even when I use more CPUs.
Is it possible the new network is only stronger using a GPU?

The strongest 15x192 network from the main server is around 100 Elo weaker than the strongest 20x256 net from the test server, and that is not selfplay but verified vs SF.
DeusX net is first not stronger than Lc0 (that is just randomness of small sample in Div 3), and second could be stronger than 15x192 net from the main server due to supervised learning.
Another thing might be the issue that 15x192 nets scale better with powerful hardware compared to 20x256, but this needs to be proven first.

Milos · Post by **Milos** » Wed Aug 15, 2018 10:24 am

Uri Blass wrote: ↑Wed Aug 15, 2018 10:19 am I read that lc0 changed pruning and I wonder if you used the same number that I read to be 0.604 in the TCEC chat in your tests

They almost certainly did this because they are getting unrealistically higher nps with 2x1080Ti than what is the case with 1060 for example.
They get 6x higher nps, and when you run normal 1080Ti vs 1060 you get slightly over 2x. And multiplexing 2 cudnn backends in Lc0 brings only 30-40% more nps.
Adam had quite some experiments with changing MCTS parameters and he would usually get higher nps and even sightly better performance on very fast TCs and much worse performance on longer TCs, so it might be also the case here.

Laskos · Post by **Laskos** » Wed Aug 15, 2018 10:27 am

jdart wrote: ↑Wed Aug 15, 2018 5:18 am I don't think you can make assumptions (w/o testing) about how Arasan or any engine scales from 4 cores to 43 cores.

Plus, Arasan is maybe not a typical engine because it is getting about half the NPS of most of the other a/b searchers in the 43 core setup (I am not sure why).

--Jon

First, the underpar results are against all engines in Div 4 and 3. I just picked Arasan 21 because it was one of the engines I used earlier in Lc0 gauntlets and I like it. Second, NPS of Arasan 21 on my CPU is about 1.8 times lower than that of SF dev even on one core, so it seems just a slower engine. I took SMP efficiency (effective speed-up) of 60-70% for Arasan 21 on 43 cores, which is very high. Say, 5 years ago with the old YBW implementations, on 43 cores it was at most 35%-40%. With newer and best SMP implementations (say those of Komodo and SF), it can reach at most 60%-70% on 43 cores, or on 43 cores about a 26x-30x effective speed-up, and I took that. If SMP scaling in Arasan is lower, it only strengthens my case. Again, the underpar performance of Lc0 is across regular engines in Div 4 and Div 3, I doubt there are quirks of not taking into account their scaling and such. I have more concerns about GPU scaling of Lc0 itself.

Laskos · Post by **Laskos** » Wed Aug 15, 2018 10:33 am

corres wrote: ↑Wed Aug 15, 2018 8:08 am
Laskos wrote: ↑Wed Aug 15, 2018 12:35 am
Both Lc0 (testnet and Deus) in both Div 4 and DIv 3 perform consistently below expectations (although people cheered their sore promotion from Div 4). "Too few games" in some total of 80+ games (yes, different nets whatever, matters less) and under-performance of some 200 Elo points is a marginal argument. ID10520 was never weak in my normal tests. Maybe different nets exhibit different scaling and other weird behavior, I have no much knowledge of that (I only know that 6x64 nets scale worse than current 20x256 nets).
Maybe the doubled GTX 1080 Ti does not give the expected power on TCEC hardware or Leele (and so DeusX)
can not use more than one GPU effectively.
I think the second case may be the true.
As from CMcanavessy we can known CPU power has no effect on chess power of Leela and its derivative DeusX.

Yes, it is a possibility. FIrst, I don't know how they get about 6x speed of my GTX 1060, I would have expected more like 4x. Maybe that 2 x GPU set-up is working no better than 1 x GPU, but I have no means to check that. NPS shown in TCEC are certainly impressive.

Laskos · Post by **Laskos** » Wed Aug 15, 2018 10:36 am

Pio wrote: ↑Wed Aug 15, 2018 3:13 am
Hi Kai, I really like your posts. I think one big problem for the neural networks could be that they are forced to play openings they do not like. I really do not like that they do not play from the start position and I also think that learning should be allowed between the games. It could be that the neural networks are highly optimised to play from the opening position. If they during self play would never play the openings given in tcec because they could avoid the positions why should they play them well? If my hypothesis is right lco should be affected much more than deusX.

I am also using a 3-mover book, but my results at 3' + 1'' mimicking TCEC conditions at 50 time * hardware lesser conditions are MUCH better. I don't think it's an issue.

Laskos · Post by **Laskos** » Wed Aug 15, 2018 10:47 am

Branko Radovanovic wrote: ↑Wed Aug 15, 2018 1:38 am
Laskos wrote: ↑Wed Aug 15, 2018 12:35 am Both Lc0 (testnet and Deus) in both Div 4 and DIv 3 perform consistently below expectations (although people cheered their sore promotion from Div 4). "Too few games" in some total of 80+ games (yes, different nets whatever, matters less) and under-performance of some 200 Elo points is a marginal argument. ID10520 was never weak in my normal tests. Maybe different nets exhibit different scaling and other weird behavior, I have no much knowledge of that (I only know that 6x64 nets scale worse than current 20x256 nets).
Just did a quick calculation - if I'm not too much off, Lc0's performance in Div4 was at an upper-half-of-Div1 level. While I wasn't surprised by this, I must say that - having run no tests of my own - I have no idea whether that result is "normal" or not, so I can't really comment on that.

My argument was, rather, that the Div3-Div4 discrepancy by itself did have a precedent, and whatever caused it for Ethereal, could have caused it for Leela too. If anything, Ethereal's case should be easier to explain (straight A-B stuff, no NN mumbo jumbo). If Ethereal was not hurt by "too few games" (and possibly accidentally bolstered in Div4 by the same circumstance), what was it then?

DIv 4 and DIv 3 results are consistent and weak. Lc0 scored about 70% in those hilarious conditions, with almost half of the remaining engines not even working properly with 43 cores and hash usage. IvanHoe was running on 1 core, and if one extrapolates its performance in Div 4 to 43 cores, it would have ended as 1st-2nd, so one Leela would have been out even in Div 4.
It's not only Ethereal performance, it's the general performance of combined Leelas in Div 4 and Div 3, about 90 games with very meager results. Maybe 150-200 Elo points weaker to what I get mimicking TCEC conditions.
About Ethereal, it does seem a strong engine, maybe Div 1, close to Andscacs level (which IIRC was Div P).

megamau · Post by **megamau** » Wed Aug 15, 2018 10:56 am

I think the only scientific way to test is to retrieve the node counts of TCEC (which should be available in the PGN) and reproduce them (with slower time control and time odds if necessary to match the hardware).
Then have a gauntlet (let's say 20x) with Leela net 520, the changed parameters and the opponents in division 3.

Reason for underperformance could be many:

* Simple statistical deviation on small sample size
* Bad scaling (in terms of NPS) of TCEC gpu server.
* Bad scaling (in term of elo) of Leela at very high node count
* "Style" issues (speculative sacrifices less strong at high node count)
* Effect of parameter changes

Laskos · Post by **Laskos** » Wed Aug 15, 2018 10:58 am

Uri Blass wrote: ↑Wed Aug 15, 2018 10:19 am
I read that lc0 changed pruning and I wonder if you used the same number that I read to be 0.604 in the TCEC chat in your tests

I don't even know what this parameter is. I used v16 Lc0, I think they use a newer version, it's possible they introduced either a wrong pruning configuration or the engine is outright buggy. Did they check it thoroughly before sending that to TCEC? Also, pruning properties might change scaling too, so they should have been very careful with behavior when playing many nodes (TCEC time * hardware).

Laskos · Post by **Laskos** » Wed Aug 15, 2018 11:03 am

chrisw wrote: ↑Wed Aug 15, 2018 12:33 am
Laskos wrote: ↑Wed Aug 15, 2018 12:27 am
chrisw wrote: ↑Wed Aug 15, 2018 12:20 am Count the LCZero actual results, 3 results, 15 games

From winner to loser order, 15 games, actual results

9
7
6
7
3 ** LC0
10
7
11

Outlier, neither losing, nor winning, is LZ0. Why that?
Yes, weird, more so because Lc0 is known to have LOW draw rate even at fairly strong level. Deus, OTOH has 7. I do not know what is this.
Dull plodder, but that's not how it's meant to be

Look at mimicked TCEC conditions result of this "dull plodder". 10 times shorter TC, 5 times weaker hardware (both CPU and GPU wise):

Code: Select all

Score of lc0_v16 10520 vs Arasan 21: +26  -2  =12 [0.800]
Elo difference: 240.82 +/- 104.12

40 of 40 games finished.

Laskos · Post by **Laskos** » Wed Aug 15, 2018 11:07 am

Milos wrote: ↑Wed Aug 15, 2018 12:40 am And already after 4'+2.4'' TC no more scaling advantage. In all cases 200 games were played.

That's an interesting observation, as I saw the same (with fewer games) to 10' + 10'', but thought it's a fluke. Corroborated, it might indicate that scaling is not going well to LTC and big hardware.

Strange Lc0 TCEC performance

Re: Strange Lc0 TCEC performance

Re: Strange Lc0 TCEC performance

Re: Strange Lc0 TCEC performance

Re: Strange Lc0 TCEC performance

Re: Strange Lc0 TCEC performance

Re: Strange Lc0 TCEC performance

Re: Strange Lc0 TCEC performance

Re: Strange Lc0 TCEC performance

Re: Strange Lc0 TCEC performance

Re: Strange Lc0 TCEC performance