Gian-Carlo Pascutto wrote: ↑Tue Oct 29, 2019 9:03 am
Laskos wrote: ↑Mon Oct 28, 2019 10:12 am
GPU: RTX 2070
May network versus elfv2:
10:10
October network versus elfv2:
12:8
So, the "700 Elo points" in self ratings is hugely inflated, and the progress is probably no more than 100 Elo points in 5 months, quite modest for Go.
It's hard to say anything from such a short match, no? I think there's still regularly matches scheduled to check progress, and indeed progress is very minimal now. Due to the procedure, there will be a new promotion (with a higher Elo) every x nets even if we are fully stalled, so inflation will continue to rise.
There should be a new attempt with an improved setup and net architecture, but so far no-one has stepped forward.
2/ Here something ridiculous happened (in just 20 games played). White is the famed Facebook elfv2 network:
A ladder patch would be very easy to hand-craft, and here a departure from pure "Zero" approach is mandatory. I mean, these things usually play much above 9-dan pros, but here ridiculously below 5-kyu patzers.
This was observed for the ELF network already a lot, i.e. it is very weak with ladders compared to LZ. There were some theories that this was because Facebook used a higher playout count, so the search has more chance to tactically resolve the problem, so the net doesn't have to learn it and they happen less in training games.
In fact, the latest Leela is extremely powerful at LTC now. It smacked Facebook ELFv2
10 : 0 in an LTC match at
6000'' + 60" on RTX 2070. You previously said "It's hard to say anything from such a short match, no?" on 20 games matches, which is wrong. People go to extremes, 20 years ago they had certainty conclusions about 20 game results, now they are saying is "you always need thousands of games".
Here (and previously) I applied correctly the methodology, I intended to play 10 games irrespective what (aside crashes), I didn't stop at pleasure, and 10:0 result is valid as superiority even if I had stopped at pleasure (8:0 is the threshold for stopping at pleasure for less than 5% Type I error). With my methodology, the LOS can be taken literally, and for 10:0 with uniform prior LOS = 99.95%. There is almost no any doubt that at LTC, the latest Leela net is stronger than ELFv2. There is no much doubt too that it is at LTC FAR stronger than ELFv2. Using
1/ Very general (very weak assumptions) prior for performance
2/ Continuum approximation of the binomial distribution (Beta, Gamma functions, no big fuss and very accurate), because normal distribution approximation here is useless.
I can even give the range of the latest Leela net superiority over the ELFv2 at LTC in understandable Elo manner:
Median superiority at LTC: 350 Elo points.
95% confidence that Leela net is more than 110 Elo points stronger than ELFv2 at LTC.
===========================
Second point.
For the first time I managed to analyze meaningfully the AlphaGo loss to Lee Sedol. The overnight analysis was performed in Lizzie with the latest Leela net and engine at 300k playouts per move. The eval graph for the whole game and the situation after the "divine move" 78 is here:
This move is no any special, AlphaGo was still at 93.5% in Leela terms, which is decisively a win.
The commentary in the SGF file on that move is:
-----
"At last, Lee Sedol launched his attack. Like an earthquake, the wedge at 78 tore apart the cracks in Black's fortress! None of us had anticipated this. When Gu Li saw White 78 from his broadcasting studio in China, he shouted: "The divine move!" All of Lee's painstaking preparations were finally about to bear fruit.
Actually, Lee spent very little time on this move itself. Later, during the press conference, he told the assembled reporters that he had not spent much time calculating. He had simply played what felt right.
Without a doubt, this move is a spectacular flash of insight - but does it really work? See the variation for an explanation of White's plan and Black's best response.
Regardless, this move cast AlphaGo into complete confusion."
-----
The only correct thing here is the last sentence. It was AlphaGo blundering, not once, but seriously blundering 5 times. The following, move 79, then moves 83, 87, 93, 97 are major blunders according to Leela. I checked them later with 1 million playouts, and yes, they are serious blunders. So, no any "divine move" by Lee Seedol, just a shot at poor tactics of NN + MCTS.
I am linking to the ZIP file containing SGFs of the 10 LTC games together with the analysis of that game at 300k playouts.
http://s000.tinyupload.com/?file_id=044 ... 5795332376
I am not sure what the progress of Leela since the month of May is at LTC, but the current strength is impressive.