Strange Lc0 TCEC performance

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Strange Lc0 TCEC performance

Post by Laskos »

As you all can see, the TCEC performance in DIv 3 of Lc0 with both nets, ID10520 and Deus one, is severely underpar from what we know having ordinary hardware, say reasonable home CPU and GPU.

Code: Select all

 N Engine            Rtng  Pts Gm    SB X  Elo Perf Et  Pe  Ar  De  Lc  Ne  Ha  Bo 

 1 Ethereal 10.81    3176 12.0 15 79.25 0 +127 80.0 ··· ==  1=  1== 1=  11  11  11 
 2 Pedone 1.8        3104  9.0 15 58.25 0  +80 60.0 ==  ··· ==  10  ==  0=  11  =11
 3 Arasan TCEC13     3142  8.5 15 56.75 0  +37 56.7 0=  ==  ··· 01  ==  1=1 ==  1= 
 4 DeusX 1.0         3200  8.0 15 56.75 0  -20 53.3 0== 01  10  ··· ==  =1  ==  1= 
 5 lc0 16.10520      3219  7.0 15 48.25 0  -65 46.7 0=  ==  ==  ==  ··· =0  === 1= 
 6 Nemorino 5.01     3104  6.5 15 44.25 2   +4 43.3 00  1=  0=0 =0  =1  ··· 1=  01 
 7 Hannibal 20180806 3193  6.0 15 36.25 0  -77 40.0 00  00  ==  ==  === 0=  ··· 11 
 8 Bobcat 8          3072  3.0 15 22.75 0  -86 20.0 00  =00 0=  0=  0=  10  00  ···
 
One might say TCEC conditions are simply hard to reproduce, GPUs are not working properly, there were bugs introduced in Div 3, and so on. Few of them stand. The performance in Div 4 was bad too, as almost half of the engines there were not working properly on 43 cores and generally engines were pretty weak (aside IvanHoe, which was running... on one core). Few took Div 4 results at face value, and were already thinking of Div 2 and Div 1 almost certain promotion of Lc0, although it really had troubles already in Div 4. By now, it is likely that Lc0 ID10520 will not promote to Div 2.
I also checked the Lc0 ID10520 time management in TCEC, and is it really so terrible? It might be not optimal, but it is not terrible, and completely wrecking the performance. I guess it might weaken by some 30-40 Elo points at most the performance compared to a better TM, that's all. Are there other bugs? In both ID10520 and Deus?
The remaining thing is TCEC conditions, which are really hard to reproduce (to me at least, impossible). So, I took another approach: match CPU part with GPU part as they are in TCEC by the shown NPS and assumed SMP scaling.
I took the Arasan 21 chess engine, which should be very close in strength to Arasan TCEC, and which I was already using in my gauntlets against AB engines. On my 4 cores, NPS is about 8 times lower than TCEC NPS. Efficiency of the SMP on 43 cores is, even with the best SMP implementation, no higher than 60%-70% (which is very high, by the way, for 43 cores). So, all in all, the "effective speed" (inverse of time-to-strength) of TCEC CPU for Arasan 21 is about 5.0-5.5 that of my CPU. For GPU part, NPS seem to be about 6 times higher in TCEC than on my GPU, and an "effective speed" about 5 times higher (correct me on that one if you know better how to get from NPS speed-up the effective speed-up with 2 GPUs). All in all, I can mimic TCEC conditions to some degree by having Lc0 running on my GPU and Arasan 21 on 4 cores (maybe 5 cores would be even better, but it's not that relevant).

I have chosen time control to be 10 times faster than in TCEC: 3m + 1s.
Partial result:

Code: Select all

Score of lc0_v16 10520 vs Arasan 21: +13  -2  =7 [0.750]
Elo difference: 190.85 +/- 139.14

22 of 40 games finished.
As I surely expected, Lc0 ID10520 destroys Arasan 21 in mimicked TCEC conditions, but on about 5 times slower hardware and at 10 times faster games (a total hardware * time factor of 50). I will let the match end to 40 games, but I have no doubts about the shape of the result. Sure, in TCEC conditions, the draw rate is higher, and Elo difference compresses. Nevertheless, from this, Lc0 ID10520 should be at least the level of Ethereal in DIv 3.

So, what happens? TM, "too few games" and other things seem lame excuses. Is there a serious a bug affecting both Lc0 participants? Is there a hardware misconfiguration, invisible in NPS?
Or something I started to suspect: Lc0 scales badly in this 50x time * hardware configuration?
Ratosh
Posts: 77
Joined: Mon Apr 16, 2018 6:56 pm

Re: Strange Lc0 TCEC performance

Post by Ratosh »

Maybe AB engines have better scaling.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Strange Lc0 TCEC performance

Post by zullil »

Laskos wrote: Tue Aug 14, 2018 11:27 pm Nevertheless, from this, Lc0 ID10520 should be at least the level of Ethereal in DIv 3.

So, what happens? TM, "too few games" and other things seem lame excuses. Is there a serious a bug affecting both Lc0 participants? Is there a hardware misconfiguration, invisible in NPS?
Or something I started to suspect: Lc0 scales badly in this 50x time * hardware configuration?
I haven't followed Lc0 development at all.

Has the engine previously competed against traditional engines running on top end hardware, at time controls like those in TCEC?

I guess I'm asking, is there evidence that something is suddenly wrong with Lc0? Or maybe the engine is not nearly as strong as some believe, at least under conditions like those at TCEC?
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Strange Lc0 TCEC performance

Post by Laskos »

Ratosh wrote: Tue Aug 14, 2018 11:50 pm Maybe AB engines have better scaling.
zullil wrote: Tue Aug 14, 2018 11:52 pm
Laskos wrote: Tue Aug 14, 2018 11:27 pm Nevertheless, from this, Lc0 ID10520 should be at least the level of Ethereal in DIv 3.

So, what happens? TM, "too few games" and other things seem lame excuses. Is there a serious a bug affecting both Lc0 participants? Is there a hardware misconfiguration, invisible in NPS?
Or something I started to suspect: Lc0 scales badly in this 50x time * hardware configuration?
I haven't followed Lc0 development at all.

Has the engine previously competed against traditional engines running on top end hardware, at time controls like those in TCEC?

I guess I'm asking, is there evidence that something is suddenly wrong with Lc0? Or maybe the engine is not nearly as strong as some believe, at least under conditions like those at TCEC?
It is generally known, and I myself know from personal experience that, for example on home CPU/GPU reasonable hardware, Lc0 scales better than regular AB engines, from say 0.2s/move to 4s/move, improving by maybe 150 Elo points or so. What happens with very much larger hardware and very much increased time control I do not know if some people know (there is a community dealing with Lc0, I am not aware if they have some clues). But aside some terrible hardware misconfiguration, options misconfiguration, time management, etc. in TCEC, for which I do not see evidence in TCEC files and numbers, there seem to be very few other things to consider, one of the main being that at really long TC and big hardware, Lc0 scales badly, in contrast to what it does in normal conditions.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Strange Lc0 TCEC performance

Post by chrisw »

Count the LCZero actual results, 3 results, 15 games

From winner to loser order, 15 games, actual results

9
7
6
7
3 ** LC0
10
7
11

Outlier, neither losing, nor winning, is LZ0. Why that?
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Strange Lc0 TCEC performance

Post by chrisw »

Laskos wrote: Wed Aug 15, 2018 12:11 am
Ratosh wrote: Tue Aug 14, 2018 11:50 pm Maybe AB engines have better scaling.
zullil wrote: Tue Aug 14, 2018 11:52 pm
Laskos wrote: Tue Aug 14, 2018 11:27 pm Nevertheless, from this, Lc0 ID10520 should be at least the level of Ethereal in DIv 3.

So, what happens? TM, "too few games" and other things seem lame excuses. Is there a serious a bug affecting both Lc0 participants? Is there a hardware misconfiguration, invisible in NPS?
Or something I started to suspect: Lc0 scales badly in this 50x time * hardware configuration?
I haven't followed Lc0 development at all.

Has the engine previously competed against traditional engines running on top end hardware, at time controls like those in TCEC?

I guess I'm asking, is there evidence that something is suddenly wrong with Lc0? Or maybe the engine is not nearly as strong as some believe, at least under conditions like those at TCEC?
It is generally known, and I myself know from personal experience that, for example on home CPU/GPU reasonable hardware, Lc0 scales better than regular AB engines, from say 0.2s/move to 4s/move, improving by maybe 150 Elo points or so. What happens with very much larger hardware and very much increased time control I do not know if some people know (there is a community dealing with Lc0, I am not aware if they have some clues). But aside some terrible hardware misconfiguration, options misconfiguration, time management, etc. in TCEC, for which I do not see evidence in TCEC files and numbers, there seem to be very few other things to consider, one of the main being that at really long TC and big hardware, Lc0 scales badly, in contrast to what it does in normal conditions.
What are the opening conditions at TCEC. I read 8 deep, back to back, but is there a discrete number of selected opening sequences that all engines play, or is it just randomly selected (but back to back)?
Branko Radovanovic
Posts: 89
Joined: Sat Sep 13, 2014 4:12 pm
Location: Zagreb, Croatia
Full name: Branko Radovanović

Re: Strange Lc0 TCEC performance

Post by Branko Radovanovic »

Not unprecedented, happens to A-B engines too. In S12, Ethereal got 23.5 points in Div4 (counting Scorpio's games, but still!) - that was outrageous, something like mid-DivP performance if I'm not mistaken. Yet, as many will remember, Div3 was another story entirely, in which Ethereal was eclipsed by no less than three other engines, none of which were anywhere near DivP level.

There are three explanations I can think of:
  1. Ethereal 9.64 was a regression compared to 9.60.
  2. Div4 engines are really that much weaker.
  3. Too few games, as already suggested. Could have both overperformed in Div3 and underperformed in Div4 - not that unlikely.
Of course, could have also been any combination of the above, but now when I think of it, #3 appears most likely, and could have happened to Leela too (although, in her particular case, we may also have a bit of #1 thrown in).
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Strange Lc0 TCEC performance

Post by Laskos »

chrisw wrote: Wed Aug 15, 2018 12:20 am Count the LCZero actual results, 3 results, 15 games

From winner to loser order, 15 games, actual results

9
7
6
7
3 ** LC0
10
7
11

Outlier, neither losing, nor winning, is LZ0. Why that?
Yes, weird, more so because Lc0 is known to have LOW draw rate even at fairly strong level. Deus, OTOH has 7. I do not know what is this.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Strange Lc0 TCEC performance

Post by chrisw »

Laskos wrote: Wed Aug 15, 2018 12:27 am
chrisw wrote: Wed Aug 15, 2018 12:20 am Count the LCZero actual results, 3 results, 15 games

From winner to loser order, 15 games, actual results

9
7
6
7
3 ** LC0
10
7
11

Outlier, neither losing, nor winning, is LZ0. Why that?
Yes, weird, more so because Lc0 is known to have LOW draw rate even at fairly strong level. Deus, OTOH has 7. I do not know what is this.
Dull plodder, but that's not how it's meant to be
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Strange Lc0 TCEC performance

Post by Laskos »

Branko Radovanovic wrote: Wed Aug 15, 2018 12:26 am Not unprecedented, happens to A-B engines too. In S12, Ethereal got 23.5 points in Div4 (counting Scorpio's games, but still!) - that was outrageous, something like mid-DivP performance if I'm not mistaken. Yet, as many will remember, Div3 was another story entirely, in which Ethereal was eclipsed by no less than three other engines, none of which were anywhere near DivP level.

There are three explanations I can think of:
  1. Ethereal 9.64 was a regression compared to 9.60.
  2. Div4 engines are really that much weaker.
  3. Too few games, as already suggested. Could have both overperformed in Div3 and underperformed in Div4 - not that unlikely.
Of course, could have also been any combination of the above, but now when I think of it, #3 appears most likely, and could have happened to Leela too (although, in her particular case, we may also have a bit of #1 thrown in).
Both Lc0 (testnet and Deus) in both Div 4 and DIv 3 perform consistently below expectations (although people cheered their sore promotion from Div 4). "Too few games" in some total of 80+ games (yes, different nets whatever, matters less) and under-performance of some 200 Elo points is a marginal argument. ID10520 was never weak in my normal tests. Maybe different nets exhibit different scaling and other weird behavior, I have no much knowledge of that (I only know that 6x64 nets scale worse than current 20x256 nets).