LCZero: Progress and Scaling. Relation to CCRL Elo

jkiliani · Post by **jkiliani** » Sat May 05, 2018 7:57 pm

syzygy wrote:
Albert Silver wrote:80,000 NPS didn't hurt either...
But they don't help if the search simply never looks at the key move.

I think that's the key point to the whole tactics discussion. Leela misses tactics whenever the policy priors on one of the moves of the tactical line are low. But this just proves that Leela is no AlphaZero (yet), not that she can never reach this level: The ResNet architecture with enough layers and filters is assuredly capable of resolving such tactics, once it has been shown enough examples. Dirichlet noise in self-play games provides the momentum to learn deeper tactics in self-play by exploring moves not known by the policy head, this process just takes a while and its performance ceiling is limited by the size of the network. Once that is increased, improvement resumes.

Laskos · Post by **Laskos** » Sun May 06, 2018 2:50 pm

jkiliani wrote:
syzygy wrote:
Albert Silver wrote:80,000 NPS didn't hurt either...
But they don't help if the search simply never looks at the key move.
I think that's the key point to the whole tactics discussion. Leela misses tactics whenever the policy priors on one of the moves of the tactical line are low. But this just proves that Leela is no AlphaZero (yet), not that she can never reach this level: The ResNet architecture with enough layers and filters is assuredly capable of resolving such tactics, once it has been shown enough examples. Dirichlet noise in self-play games provides the momentum to learn deeper tactics in self-play by exploring moves not known by the policy head, this process just takes a while and its performance ceiling is limited by the size of the network. Once that is increased, improvement resumes.

Still, I don't see how it will be improved dramatically, say on WAC positions, which most reasonable AB engines solve overwhelmingly. The current size network, which is surely not saturated yet, shows no progress on tactical shots, on WAC it shows even a regression:

6s/position 4 CPU threads (equivalent to 1s on GTX 1060)
ID227: 120/300
ID252: 113/300

After more than 1 million games with the bigger net. This is worrying. These WAC positions are piece of cake for reasonable AB engines, and an exploit can be made in order for this sort of positions to occur more often in games. Don't you suspect that even A0 was severely sub-par compared to 280+/300 results in WAC of reasonable AB engines? Also, it seems that network itself at 1 playout improves tactically (visible playing against it), but the search doesn't improve with better network in solving the tactics.

It was interesting to see LC0 trashing an AB engine in normal games from normal starting positions:

Openings: 3moves_GM.epd (side and reversed)
Score of LC0_245 vs Predateur 2.2.1: 93 - 1 - 6 [0.960] 100
ELO difference: 552.08 +/- 173.83
Finished match

But from WAC starting position (side and reversed), so in games having 1 tactical shot, it manages to lose to an engine considered in normal conditions almost 600 Elo points weaker:

Openings: WAC300.epd (side and reversed)
Score of LC0_245 vs Predateur 2.2.1: 42 - 52 - 6 [0.450] 100
ELO difference: -34.86 +/- 66.75
Finished match

Again, I suspect such a thing can happen even to A0 against SF or even much weaker AB engines.

syzygy · Post by **syzygy** » Sun May 06, 2018 3:45 pm

jkiliani wrote:
syzygy wrote:
Albert Silver wrote:80,000 NPS didn't hurt either...
But they don't help if the search simply never looks at the key move.
I think that's the key point to the whole tactics discussion. Leela misses tactics whenever the policy priors on one of the moves of the tactical line are low. But this just proves that Leela is no AlphaZero (yet), not that she can never reach this level: The ResNet architecture with enough layers and filters is assuredly capable of resolving such tactics, once it has been shown enough examples.

The tactics it missed here it will probably learn eventually, but my point is that it is indeed the NN that *must* have learned the tactic before its search will ever start to pay attention to such a bad-looking move.

There are so many possible tactics that I doubt that it can learn enough of them that it could ever hope to measure itself, tactically, with SF.

Maybe I will turn out to be wrong on this, or maybe Leela's better positional understanding will outweigh its tactical shortcomings. We'll see.

Jhoravi · Post by **Jhoravi** » Mon May 07, 2018 7:20 am

The way I understand is that the net does positional evaluation in every position by means of probability. But sharp tactics are not meant to be evaluated positionaly because they are meant to be searched until quiescence state is reached. Even humans don't do positional evaluation in the midst of captures. I wish it possible to exclude captures and checks in the Net's learning and just let quiescence search handle the rest.

mhull · Post by **mhull** » Mon May 07, 2018 7:21 am

Laskos wrote: ↑Sun May 06, 2018 2:50 pm
jkiliani wrote:
syzygy wrote: But they don't help if the search simply never looks at the key move.
I think that's the key point to the whole tactics discussion. Leela misses tactics whenever the policy priors on one of the moves of the tactical line are low. But this just proves that Leela is no AlphaZero (yet), not that she can never reach this level: The ResNet architecture with enough layers and filters is assuredly capable of resolving such tactics, once it has been shown enough examples. Dirichlet noise in self-play games provides the momentum to learn deeper tactics in self-play by exploring moves not known by the policy head, this process just takes a while and its performance ceiling is limited by the size of the network. Once that is increased, improvement resumes.
Still, I don't see how it will be improved dramatically, say on WAC positions, which most reasonable AB engines solve overwhelmingly. The current size network, which is surely not saturated yet, shows no progress on tactical shots, on WAC it shows even a regression:

6s/position 4 CPU threads (equivalent to 1s on GTX 1060)
ID227: 120/300
ID252: 113/300

After more than 1 million games with the bigger net. This is worrying. These WAC positions are piece of cake for reasonable AB engines, and an exploit can be made in order for this sort of positions to occur more often in games. Don't you suspect that even A0 was severely sub-par compared to 280+/300 results in WAC of reasonable AB engines? Also, it seems that network itself at 1 playout improves tactically (visible playing against it), but the search doesn't improve with better network in solving the tactics.

It was interesting to see LC0 trashing an AB engine in normal games from normal starting positions:

Openings: 3moves_GM.epd (side and reversed)
Score of LC0_245 vs Predateur 2.2.1: 93 - 1 - 6 [0.960] 100
ELO difference: 552.08 +/- 173.83
Finished match

But from WAC starting position (side and reversed), so in games having 1 tactical shot, it manages to lose to an engine considered in normal conditions almost 600 Elo points weaker:

Openings: WAC300.epd (side and reversed)
Score of LC0_245 vs Predateur 2.2.1: 42 - 52 - 6 [0.450] 100
ELO difference: -34.86 +/- 66.75
Finished match

Again, I suspect such a thing can happen even to A0 against SF or even much weaker AB engines.

This seems normal to me. As an example, Chest excels in chess puzzles, and cooks of same but not chess itself. WAC are a set of puzzles in a way and obviously represent positions LC0 doesn't encounter very often in self-play. Just as A/B chess programs don't excel at chess puzzles, LC0 doesn't excel at tactics, at least not at the beginning. People are concerned about tactics holes but if you think about the reality that Leela is learning positional chess, principles of position that lead to favorable tactics, and we think of positional chess as deep tactics, not shallow, then its knowledge base is being filled-in as it were in reverse order than it would be for an A/B program which is after all a human teaching/writing a program in the order that humans tend to learn chess. LC0 is learning chess with a difference guidance regime that is dictated by the nature of artificial network training. We shouldn't expect it to learn the game the way we learn and thus the way we teach our A/B programs. So I see nothing of concern here.

Albert Silver · Post by **Albert Silver** » Mon May 07, 2018 5:01 pm

Laskos wrote: ↑Sun May 06, 2018 2:50 pm
jkiliani wrote:
syzygy wrote: But they don't help if the search simply never looks at the key move.
I think that's the key point to the whole tactics discussion. Leela misses tactics whenever the policy priors on one of the moves of the tactical line are low. But this just proves that Leela is no AlphaZero (yet), not that she can never reach this level: The ResNet architecture with enough layers and filters is assuredly capable of resolving such tactics, once it has been shown enough examples. Dirichlet noise in self-play games provides the momentum to learn deeper tactics in self-play by exploring moves not known by the policy head, this process just takes a while and its performance ceiling is limited by the size of the network. Once that is increased, improvement resumes.
Still, I don't see how it will be improved dramatically, say on WAC positions, which most reasonable AB engines solve overwhelmingly. The current size network, which is surely not saturated yet, shows no progress on tactical shots, on WAC it shows even a regression:

6s/position 4 CPU threads (equivalent to 1s on GTX 1060)
ID227: 120/300
ID252: 113/300

Were both runs done with the same LCZero version?

Laskos · Post by **Laskos** » Mon May 07, 2018 5:13 pm

Albert Silver wrote: ↑Mon May 07, 2018 5:01 pm
Laskos wrote: ↑Sun May 06, 2018 2:50 pm
jkiliani wrote: I think that's the key point to the whole tactics discussion. Leela misses tactics whenever the policy priors on one of the moves of the tactical line are low. But this just proves that Leela is no AlphaZero (yet), not that she can never reach this level: The ResNet architecture with enough layers and filters is assuredly capable of resolving such tactics, once it has been shown enough examples. Dirichlet noise in self-play games provides the momentum to learn deeper tactics in self-play by exploring moves not known by the policy head, this process just takes a while and its performance ceiling is limited by the size of the network. Once that is increased, improvement resumes.
Still, I don't see how it will be improved dramatically, say on WAC positions, which most reasonable AB engines solve overwhelmingly. The current size network, which is surely not saturated yet, shows no progress on tactical shots, on WAC it shows even a regression:

6s/position 4 CPU threads (equivalent to 1s on GTX 1060)
ID227: 120/300
ID252: 113/300
Were both runs done with the same LCZero version?

Yes, v0.8.
And, now ID258 has 110/300. Seems consistent with worsening on WAC.

Laskos · Post by **Laskos** » Mon May 07, 2018 6:40 pm

Laskos wrote: ↑Mon May 07, 2018 5:13 pm
Albert Silver wrote: ↑Mon May 07, 2018 5:01 pm
Laskos wrote: ↑Sun May 06, 2018 2:50 pm

Still, I don't see how it will be improved dramatically, say on WAC positions, which most reasonable AB engines solve overwhelmingly. The current size network, which is surely not saturated yet, shows no progress on tactical shots, on WAC it shows even a regression:

6s/position 4 CPU threads (equivalent to 1s on GTX 1060)
ID227: 120/300
ID252: 113/300
Were both runs done with the same LCZero version?
Yes, v0.8.
And, now ID258 has 110/300. Seems consistent with worsening on WAC.

Interesting, in positional opening suite, the trend is exactly the opposite. I show the results for the first 15x192 net compared to the las tone:

LCZero v0.8
6s/position 4 CPU threads (equivalent to 1s on GTX 1060)

WAC300 tactical:
ID227: 120/300
ID258: 110/300
performance below 1800 Elo points AB engines, worsening

Openings200 positional:
ID227: 98/200
ID258: 111/200
performance above 3200 Elo points AB engines, improving

There seem to be some conflict between these two aspects, at least in the net+search part.

CMCanavessi · Post by **CMCanavessi** » Mon May 07, 2018 7:15 pm

Kai, can you try some nets in the 231-236 range? Particularly 231, 232 and 236. Those are the ones that several of us consider the strongest.

Daniel Shawul · Post by **Daniel Shawul** » Mon May 07, 2018 7:28 pm

Laskos wrote: ↑Mon May 07, 2018 6:40 pm
Laskos wrote: ↑Mon May 07, 2018 5:13 pm
Albert Silver wrote: ↑Mon May 07, 2018 5:01 pm

Were both runs done with the same LCZero version?
Yes, v0.8.
And, now ID258 has 110/300. Seems consistent with worsening on WAC.
Interesting, in positional opening suite, the trend is exactly the opposite. I show the results for the first 15x192 net compared to the las tone:

LCZero v0.8
6s/position 4 CPU threads (equivalent to 1s on GTX 1060)

WAC300 tactical:
ID227: 120/300
ID258: 110/300
performance below 1800 Elo points AB engines, worsening

Openings200 positional:
ID227: 98/200
ID258: 111/200
performance above 3200 Elo points AB engines, improving

There seem to be some conflict between these two aspects, at least in the net+search part.

It is going to be a massive heartbreak for many who believe the NN is going to solve tactics

Hardware + cherry-picking seems to be the only explanation left so far ...

syzygy also gets it: judge only based on the evidence presented so far on tactics -- which is none.

Daniel

LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo