Some words on Leela Go Zero

Laskos · Post by **Laskos** » Mon Oct 28, 2019 10:12 am

First, the strength progress:

I checked a Leela Go Zero net from the month of May and a very recent one, end of October. In self rating they differ by 700 Elo points. The games were against elfv2, the best public Facebook network, and until recently, probably the strongest around at fixed time. Time control was 600 6 1, or in our "language", roughly 600'' + 6''.

GPU: RTX 2070

May network versus elfv2:
10:10

October network versus elfv2:
12:8

So, the "700 Elo points" in self ratings is hugely inflated, and the progress is probably no more than 100 Elo points in 5 months, quite modest for Go. They use some SPRT gating and no anchors, so the inflation is expected, but maybe not to a such degree.
But in any case, it seems that by now, Leela Zero weights are the strongest publicly available weights on a good GPU. Being also bignet 40x256, they probably scale better to longer time controls and stronger hardware.

Two issues about the games themselves. I used Sabaki as interface. The engine was always the latest Leela Zero 0.17. I have also the Katago engine, but it requires its own weights, and they+engine are a bit weaker (not by much).

1/ About 60%+ of the games are decided by wild swings in late midgames and endgames. Here are the eval graphs in Sabaki for two typical games (white line is the alternating eval of both engines):

I am not sure whether it denotes an instability of the game of Go itself at high level of play towards the end of the game, or, more likely, it's the relatively poor handling of these late phases of the game compared to openings in the "Zero" approach. My naive proposal would be either to depart from "Zero" and hand-craft a patch for endgames, or to build a specific late game network, helping the main net.

2/ Here something ridiculous happened (in just 20 games played). White is the famed Facebook elfv2 network:

A ladder patch would be very easy to hand-craft, and here a departure from pure "Zero" approach is mandatory. I mean, these things usually play much above 9-dan pros, but here ridiculously below 5-kyu patzers.

Laskos · Post by **Laskos** » Mon Oct 28, 2019 7:48 pm

The first of the two long time control games 6000'' + 60'' TC, Leela is Black, Facebook elfv2 is White. Leela won, but the instability is even more apparent. The evals stayed quite stable for almost the whole games as fairly balanced, until a large sudden swing occurred in the endgame. I am not sure what is this. This is Go at very high level, either it denotes that this level of Go is volatile and is decided in the endgame, or that this "Zero" approach handles badly endgames.

jhellis3 · Post by **jhellis3** » Mon Oct 28, 2019 9:02 pm

What you are experiencing is likely due to a couple of reasons...

Initial search space for Go is massive, and NNs are (relatively) slow to eval, which means resolving tactics of any significant depth by search is rather hopeless. One might try something similar to what I use in Chess for fortress detection (pathological extensions), but that would require recognition of the situation (which is not trivial as in the former case), as well as some sort of dummy or inherited fast eval to avoid wasting eval time on the in between moves.

The other issue (with LZ) is, to my knowledge, the value signal is still incredibly noisy due to temp moves. That is actually quite easy to fix.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Tue Oct 29, 2019 9:03 am

Laskos wrote: ↑Mon Oct 28, 2019 10:12 am GPU: RTX 2070

May network versus elfv2:
10:10

October network versus elfv2:
12:8

So, the "700 Elo points" in self ratings is hugely inflated, and the progress is probably no more than 100 Elo points in 5 months, quite modest for Go.

It's hard to say anything from such a short match, no? I think there's still regularly matches scheduled to check progress, and indeed progress is very minimal now. Due to the procedure, there will be a new promotion (with a higher Elo) every x nets even if we are fully stalled, so inflation will continue to rise.

There should be a new attempt with an improved setup and net architecture, but so far no-one has stepped forward.

2/ Here something ridiculous happened (in just 20 games played). White is the famed Facebook elfv2 network:

A ladder patch would be very easy to hand-craft, and here a departure from pure "Zero" approach is mandatory. I mean, these things usually play much above 9-dan pros, but here ridiculously below 5-kyu patzers.

This was observed for the ELF network already a lot, i.e. it is very weak with ladders compared to LZ. There were some theories that this was because Facebook used a higher playout count, so the search has more chance to tactically resolve the problem, so the net doesn't have to learn it and they happen less in training games.

Laskos · Post by **Laskos** » Thu Oct 31, 2019 11:05 am

Gian-Carlo Pascutto wrote: ↑Tue Oct 29, 2019 9:03 am
Laskos wrote: ↑Mon Oct 28, 2019 10:12 am GPU: RTX 2070

May network versus elfv2:
10:10

October network versus elfv2:
12:8

So, the "700 Elo points" in self ratings is hugely inflated, and the progress is probably no more than 100 Elo points in 5 months, quite modest for Go.
It's hard to say anything from such a short match, no? I think there's still regularly matches scheduled to check progress, and indeed progress is very minimal now. Due to the procedure, there will be a new promotion (with a higher Elo) every x nets even if we are fully stalled, so inflation will continue to rise.

There should be a new attempt with an improved setup and net architecture, but so far no-one has stepped forward.

2/ Here something ridiculous happened (in just 20 games played). White is the famed Facebook elfv2 network:

A ladder patch would be very easy to hand-craft, and here a departure from pure "Zero" approach is mandatory. I mean, these things usually play much above 9-dan pros, but here ridiculously below 5-kyu patzers.
This was observed for the ELF network already a lot, i.e. it is very weak with ladders compared to LZ. There were some theories that this was because Facebook used a higher playout count, so the search has more chance to tactically resolve the problem, so the net doesn't have to learn it and they happen less in training games.

In fact, the latest Leela is extremely powerful at LTC now. It smacked Facebook ELFv2 10 : 0 in an LTC match at 6000'' + 60" on RTX 2070. You previously said "It's hard to say anything from such a short match, no?" on 20 games matches, which is wrong. People go to extremes, 20 years ago they had certainty conclusions about 20 game results, now they are saying is "you always need thousands of games".

Here (and previously) I applied correctly the methodology, I intended to play 10 games irrespective what (aside crashes), I didn't stop at pleasure, and 10:0 result is valid as superiority even if I had stopped at pleasure (8:0 is the threshold for stopping at pleasure for less than 5% Type I error). With my methodology, the LOS can be taken literally, and for 10:0 with uniform prior LOS = 99.95%. There is almost no any doubt that at LTC, the latest Leela net is stronger than ELFv2. There is no much doubt too that it is at LTC FAR stronger than ELFv2. Using

1/ Very general (very weak assumptions) prior for performance
2/ Continuum approximation of the binomial distribution (Beta, Gamma functions, no big fuss and very accurate), because normal distribution approximation here is useless.

I can even give the range of the latest Leela net superiority over the ELFv2 at LTC in understandable Elo manner:

Median superiority at LTC: 350 Elo points.
95% confidence that Leela net is more than 110 Elo points stronger than ELFv2 at LTC.

===========================

Second point.

For the first time I managed to analyze meaningfully the AlphaGo loss to Lee Sedol. The overnight analysis was performed in Lizzie with the latest Leela net and engine at 300k playouts per move. The eval graph for the whole game and the situation after the "divine move" 78 is here:

This move is no any special, AlphaGo was still at 93.5% in Leela terms, which is decisively a win.

The commentary in the SGF file on that move is:

-----
"At last, Lee Sedol launched his attack. Like an earthquake, the wedge at 78 tore apart the cracks in Black's fortress! None of us had anticipated this. When Gu Li saw White 78 from his broadcasting studio in China, he shouted: "The divine move!" All of Lee's painstaking preparations were finally about to bear fruit.

Actually, Lee spent very little time on this move itself. Later, during the press conference, he told the assembled reporters that he had not spent much time calculating. He had simply played what felt right.

Without a doubt, this move is a spectacular flash of insight - but does it really work? See the variation for an explanation of White's plan and Black's best response.

Regardless, this move cast AlphaGo into complete confusion."
-----

The only correct thing here is the last sentence. It was AlphaGo blundering, not once, but seriously blundering 5 times. The following, move 79, then moves 83, 87, 93, 97 are major blunders according to Leela. I checked them later with 1 million playouts, and yes, they are serious blunders. So, no any "divine move" by Lee Seedol, just a shot at poor tactics of NN + MCTS.

I am linking to the ZIP file containing SGFs of the 10 LTC games together with the analysis of that game at 300k playouts.

http://s000.tinyupload.com/?file_id=044 ... 5795332376

I am not sure what the progress of Leela since the month of May is at LTC, but the current strength is impressive.

Some words on Leela Go Zero

Some words on Leela Go Zero

Re: Some words on Leela Go Zero

Re: Some words on Leela Go Zero

Re: Some words on Leela Go Zero

Re: Some words on Leela Go Zero