leela is official(?) better than sf9

Laskos · Post by **Laskos** » Mon Sep 24, 2018 12:58 pm

Tobber wrote: ↑Mon Sep 24, 2018 12:48 pm
Laskos wrote: ↑Mon Sep 24, 2018 11:22 am
Tobber wrote: ↑Mon Sep 24, 2018 11:14 am
Laskos wrote: ↑Mon Sep 24, 2018 9:45 am
Our differences probably come from using different set of openings, as Lc0 and A0 seem sensitive to the choice. 12 openings for 1200 games match and 1 opening for 100 games match is bad testing methodology by A0 team. And probably on purpose bad, as it bolster A0 (or Lc0) by about 100 Elo points.
Please read the paper and then check for yourself. The 1200 games match were selected as the 12 most popular openings from an online database.

/John
I don't understand what you are saying, and I won't read again the paper right now. How many positions they did use for 1200 games match? The fact is the 12 provided positions are very favorable for Lc0 and probably A0. Similarly, no openings at all heavily favors Lc0 and A0.
So the 12 most popular openings in human play favor Lc0 and A0? The idea was to show that A0 by self-play could figure out how to play the most popular openings. What is it you don't understand?
/John

So, it doesn't matter as methodology goes, 1200 games from only 12 starting positions usually give skewed results. And, hardly accidentally, 100 Elo points better results for Lc0 and A0 than from varied openings. My English is bad, maybe you had to explain me simpler: yes, there are only 12 openings, but they are the most popular human openings. I would have replied that the methodology for determining the strength compared to SF8 was still wrong, and it seems by now hardly by accident.

corres · Post by **corres** » Mon Sep 24, 2018 1:08 pm

chrisw wrote: ↑Mon Sep 24, 2018 10:24 am
corres wrote: ↑Mon Sep 24, 2018 10:09 am AB engines are tested mainly for middle game.
NN engines play learning games from start position so their knowledge decreases going further from start
position. These facts cause that starting the games from a sort opening book or without opening book is favorable for NN engines and disadvantageous for AB engines.
The fewer pieces there are, especially pieces that can combine in mating nets, the stronger is the AB search (the search width is naturally smaller, AB is less likely to prune away important lines. and, the opportunities to create imbalances are fewer, if there is an imbalance, it is probably already there). NN-MCTS is better at finding imbalances and potential mating net structures, the opportunity for that is early on.

Your point of greater learnt knowledge at start (eg most pieces still on board) is also true. What's also true is that there is no reason why more training and larger nets can't overcome the weaknesses in later parts of the game. Just give it training time.

...and bigger NN.
If NN is too small during a longer training time the good content of NN will be overwritten.

jkiliani · Post by **jkiliani** » Mon Sep 24, 2018 2:41 pm

corres wrote: ↑Mon Sep 24, 2018 1:08 pm
chrisw wrote: ↑Mon Sep 24, 2018 10:24 am
corres wrote: ↑Mon Sep 24, 2018 10:09 am AB engines are tested mainly for middle game.
NN engines play learning games from start position so their knowledge decreases going further from start
position. These facts cause that starting the games from a sort opening book or without opening book is favorable for NN engines and disadvantageous for AB engines.
The fewer pieces there are, especially pieces that can combine in mating nets, the stronger is the AB search (the search width is naturally smaller, AB is less likely to prune away important lines. and, the opportunities to create imbalances are fewer, if there is an imbalance, it is probably already there). NN-MCTS is better at finding imbalances and potential mating net structures, the opportunity for that is early on.

Your point of greater learnt knowledge at start (eg most pieces still on board) is also true. What's also true is that there is no reason why more training and larger nets can't overcome the weaknesses in later parts of the game. Just give it training time.
...and bigger NN.
If NN is too small during a longer training time the good content of NN will be overwritten.

Bigger NN is a part of it, but this can be substituted or augmented with "better NN". What I'm talking about here is to change the neural net architecture from the standard residual neural net (ResNet), to a higher performing one. https://arxiv.org/pdf/1709.01507.pdf shows that significant improvements to ResNet are feasible for image recognition, by using SEResNet (Squeeze-and-Excitation), or even combining this technique with ResNeXt, a generalisation of the ResNet idea. The SEResNet architecture was tried on the standard dataset for Lc0 and achieved a significantly lower MSE and policy loss compared to our regular networks.

Long story short, our current neural nets are nowhere near the top performing network architectures, and when we get a working implementation of an improved architecture, we can expect a significant jump in performance.

chrisw · Post by **chrisw** » Mon Sep 24, 2018 2:55 pm

jkiliani wrote: ↑Mon Sep 24, 2018 2:41 pm
corres wrote: ↑Mon Sep 24, 2018 1:08 pm
chrisw wrote: ↑Mon Sep 24, 2018 10:24 am
corres wrote: ↑Mon Sep 24, 2018 10:09 am AB engines are tested mainly for middle game.
NN engines play learning games from start position so their knowledge decreases going further from start
position. These facts cause that starting the games from a sort opening book or without opening book is favorable for NN engines and disadvantageous for AB engines.
The fewer pieces there are, especially pieces that can combine in mating nets, the stronger is the AB search (the search width is naturally smaller, AB is less likely to prune away important lines. and, the opportunities to create imbalances are fewer, if there is an imbalance, it is probably already there). NN-MCTS is better at finding imbalances and potential mating net structures, the opportunity for that is early on.

Your point of greater learnt knowledge at start (eg most pieces still on board) is also true. What's also true is that there is no reason why more training and larger nets can't overcome the weaknesses in later parts of the game. Just give it training time.
...and bigger NN.
If NN is too small during a longer training time the good content of NN will be overwritten.
Bigger NN is a part of it, but this can be substituted or augmented with "better NN". What I'm talking about here is to change the neural net architecture from the standard residual neural net (ResNet), to a higher performing one. https://arxiv.org/pdf/1709.01507.pdf shows that significant improvements to ResNet are feasible for image recognition, by using SEResNet (Squeeze-and-Excitation), or even combining this technique with ResNeXt, a generalisation of the ResNet idea. The SEResNet architecture was tried on the standard dataset for Lc0 and achieved a significantly lower MSE and policy loss compared to our regular networks.

Long story short, our current neural nets are nowhere near the top performing network architectures, and when we get a working implementation of an improved architecture, we can expect a significant jump in performance.

Sounds more than brilliant. I've been hunting around, not very successfully, for some kind of source in C or + variants where the heavy work of defining architecture and GPU support is already done. Fann and gann are ok, but is a bit daunting to then assemble them into architectures more complex than fully connected, and they don't do GPU anyway.

Javier Ros · Post by **Javier Ros** » Mon Sep 24, 2018 3:38 pm

Tobber wrote: ↑Mon Sep 24, 2018 11:14 am
Laskos wrote: ↑Mon Sep 24, 2018 9:45 am
Our differences probably come from using different set of openings, as Lc0 and A0 seem sensitive to the choice. 12 openings for 1200 games match and 1 opening for 100 games match is bad testing methodology by A0 team. And probably on purpose bad, as it bolster A0 (or Lc0) by about 100 Elo points.
Please read the paper and then check for yourself. The 1200 games match were selected as the 12 most popular openings from an online database.

/John

Page 6 of the paper:

Table 2: Analysis of the 12 most popular human openings (played more than 100,000 times
in an online database (1)). Each opening is labelled by its ECO code and common name. The
plot shows the proportion of self-play training games in which AlphaZero played each opening,
against training time. We also report the win/draw/loss results of 100 game AlphaZero vs.
Stockfish matches starting from each opening, as either white (w) or black (b), from AlphaZero’s
perspective. Finally, the principal variation (PV) of AlphaZero is provided from each opening

When you force a program to play an opening variant that it would not have played on its own you are forcing it to play outside its style and the performance goes down a lot. This is particularly important for lc0 that has only trained starting from the initial position.

Javier Ros · Post by **Javier Ros** » Mon Sep 24, 2018 3:42 pm

Laskos wrote: ↑Mon Sep 24, 2018 12:58 pm
Tobber wrote: ↑Mon Sep 24, 2018 12:48 pm
Laskos wrote: ↑Mon Sep 24, 2018 11:22 am
Tobber wrote: ↑Mon Sep 24, 2018 11:14 am
Laskos wrote: ↑Mon Sep 24, 2018 9:45 am
Our differences probably come from using different set of openings, as Lc0 and A0 seem sensitive to the choice. 12 openings for 1200 games match and 1 opening for 100 games match is bad testing methodology by A0 team. And probably on purpose bad, as it bolster A0 (or Lc0) by about 100 Elo points.
Please read the paper and then check for yourself. The 1200 games match were selected as the 12 most popular openings from an online database.

/John
I don't understand what you are saying, and I won't read again the paper right now. How many positions they did use for 1200 games match? The fact is the 12 provided positions are very favorable for Lc0 and probably A0. Similarly, no openings at all heavily favors Lc0 and A0.
So the 12 most popular openings in human play favor Lc0 and A0? The idea was to show that A0 by self-play could figure out how to play the most popular openings. What is it you don't understand?
/John
So, it doesn't matter as methodology goes, 1200 games from only 12 starting positions usually give skewed results. And, hardly accidentally, 100 Elo points better results for Lc0 and A0 than from varied openings. My English is bad, maybe you had to explain me simpler: yes, there are only 12 openings, but they are the most popular human openings. I would have replied that the methodology for determining the strength compared to SF8 was still wrong, and it seems by now hardly by accident.

I agree.

Tobber · Post by **Tobber** » Mon Sep 24, 2018 4:16 pm

Javier Ros wrote: ↑Mon Sep 24, 2018 3:42 pm
Laskos wrote: ↑Mon Sep 24, 2018 12:58 pm
Tobber wrote: ↑Mon Sep 24, 2018 12:48 pm
Laskos wrote: ↑Mon Sep 24, 2018 11:22 am
Tobber wrote: ↑Mon Sep 24, 2018 11:14 am
Laskos wrote: ↑Mon Sep 24, 2018 9:45 am
Our differences probably come from using different set of openings, as Lc0 and A0 seem sensitive to the choice. 12 openings for 1200 games match and 1 opening for 100 games match is bad testing methodology by A0 team. And probably on purpose bad, as it bolster A0 (or Lc0) by about 100 Elo points.
Please read the paper and then check for yourself. The 1200 games match were selected as the 12 most popular openings from an online database.

/John
I don't understand what you are saying, and I won't read again the paper right now. How many positions they did use for 1200 games match? The fact is the 12 provided positions are very favorable for Lc0 and probably A0. Similarly, no openings at all heavily favors Lc0 and A0.
So the 12 most popular openings in human play favor Lc0 and A0? The idea was to show that A0 by self-play could figure out how to play the most popular openings. What is it you don't understand?
/John
So, it doesn't matter as methodology goes, 1200 games from only 12 starting positions usually give skewed results. And, hardly accidentally, 100 Elo points better results for Lc0 and A0 than from varied openings. My English is bad, maybe you had to explain me simpler: yes, there are only 12 openings, but they are the most popular human openings. I would have replied that the methodology for determining the strength compared to SF8 was still wrong, and it seems by now hardly by accident.
I agree.

I see, what exactly do you agree with? That the 12 openings, selected to favor A0, happens to be the 12 most popular openings played by humans? A remarkable coincidence no doubt.
/John

Milos · Post by **Milos** » Mon Sep 24, 2018 6:00 pm

Tobber wrote: ↑Mon Sep 24, 2018 4:16 pm I see, what exactly do you agree with? That the 12 openings, selected to favor A0, happens to be the 12 most popular openings played by humans? A remarkable coincidence no doubt.
/John

Did you even check the openings or you are just repeating like a parrot BS from that PR manifest?
Simple example take Reti from the paper. The actual opening used is 5 moves deep. After 2.c4 most further moves by both black and white are like 2nd and 3rd possibility. So from 70k possible games after 2.c4 we end up at only 6k after 5. .. O-O.
The most popular human openings my ass. These were deliberately chosen 12 positions from 12 most popular openings that have nothing to do with popularity between humans or representably of chess. They were only selected to give A0 advantage it needed for that PR stunt to succeed.

Spliffjiffer · Post by **Spliffjiffer** » Mon Sep 24, 2018 6:24 pm

thats pretty much an imputation from my personal point of view and my experience tells me not to do so if there is no evidence for such claims..to tell people that google manipulated the entries to get a result that was intended to get because otherwise it would not have been possible or likely is not the right way to talkchess imho...if u present indicators then its normal to discuss things like that but if u have NOTHING in hand then its a pure assumption that brings google in miscredit without evidence...if ud ask me id like to prefere to get google in miscredit by having sth in hand

Javier Ros · Post by **Javier Ros** » Mon Sep 24, 2018 6:27 pm

Tobber wrote: ↑Mon Sep 24, 2018 4:16 pm
Javier Ros wrote: ↑Mon Sep 24, 2018 3:42 pm
Laskos wrote: ↑Mon Sep 24, 2018 12:58 pm
Tobber wrote: ↑Mon Sep 24, 2018 12:48 pm
Laskos wrote: ↑Mon Sep 24, 2018 11:22 am
Tobber wrote: ↑Mon Sep 24, 2018 11:14 am
Laskos wrote: ↑Mon Sep 24, 2018 9:45 am
Our differences probably come from using different set of openings, as Lc0 and A0 seem sensitive to the choice. 12 openings for 1200 games match and 1 opening for 100 games match is bad testing methodology by A0 team. And probably on purpose bad, as it bolster A0 (or Lc0) by about 100 Elo points.
Please read the paper and then check for yourself. The 1200 games match were selected as the 12 most popular openings from an online database.

/John
I don't understand what you are saying, and I won't read again the paper right now. How many positions they did use for 1200 games match? The fact is the 12 provided positions are very favorable for Lc0 and probably A0. Similarly, no openings at all heavily favors Lc0 and A0.
So the 12 most popular openings in human play favor Lc0 and A0? The idea was to show that A0 by self-play could figure out how to play the most popular openings. What is it you don't understand?
/John
So, it doesn't matter as methodology goes, 1200 games from only 12 starting positions usually give skewed results. And, hardly accidentally, 100 Elo points better results for Lc0 and A0 than from varied openings. My English is bad, maybe you had to explain me simpler: yes, there are only 12 openings, but they are the most popular human openings. I would have replied that the methodology for determining the strength compared to SF8 was still wrong, and it seems by now hardly by accident.
I agree.
I see, what exactly do you agree with? That the 12 openings, selected to favor A0, happens to be the 12 most popular openings played by humans? A remarkable coincidence no doubt.
/John

"So, it doesn't matter as methodology goes, 1200 games from only 12 starting positions usually give skewed results." I agree, I think that they should have chosen a wider sampler of openings.

I have played a lot of games with lc0 using these 12 positions and using other initial positions like Nunn or Noomen Sharp Gambits getting a completely different result, see

http://talkchess.com/forum3/viewtopic.p ... 40#p774276

Playing all these games starting from A0 12 positions I also noticed that the positions reached were of positional style, that lc0 understands better than classic alpha-beta engines and that similar games often occurred but without an exact repetition.

When you force a program to play an opening variant that it would not have played on its own you are forcing it to play outside its style and the performance goes down a lot.

So what is the real level of lc0?
35% obtained from Noomen Sharp Openings that I have forced it to play or the 40%-42% obtained from the other positions?
The rules of chess say that you have to start from the initial position and not from other artificially chosen ones.

From the 35% obtained from Noomen Sharp Openings against 65% of Stockfish you can conclude that Stockfish is better in that type of position, but this positions would never be played by lc0 starting from the initial position.

So I think that the real level of lc0 is reached starting from the initial position or opening variants that lc0 plays naturally by itself.

leela is official(?) better than sf9

Re: leela is official(?) better than sf9

Re: leela is official(?) better than sf9

Re: leela is official(?) better than sf9

Re: leela is official(?) better than sf9

Re: leela is official(?) better than sf9

Re: leela is official(?) better than sf9

Re: leela is official(?) better than sf9

Re: leela is official(?) better than sf9

Re: leela is official(?) better than sf9

Re: leela is official(?) better than sf9