Asking to people who believe Leela NN is a book, what they think about SF NN now?

mmt · Post by **mmt** » Tue Sep 15, 2020 7:17 am

Milos wrote: ↑Mon Sep 14, 2020 11:40 am
mmt wrote: ↑Mon Sep 14, 2020 7:29 am Overfitting is basically memorization. But I don't think LC0 overfits much. Using modern methods you can now have huge neural nets that are not overfitting and LC0's nets are not even large.
That is a huge simplification that is also logically problematic. While overfitting can be an indication of memory, there is zero indication that nets that don't overfit don't have memory. Assuming that is plain wrong.

And where did I say that nets that don't overfit don't have memory? If you want to talk about logic, A->B doesn't imply ~A->~B.

mmt · Post by **mmt** » Tue Sep 15, 2020 7:28 am

jhellis3 wrote: ↑Mon Sep 14, 2020 1:20 am Oh, and btw... I actually did outline a way to test if opening knowledge was in the NNs in the google groups post I made many moons ago. A pity nobody seems to have read it.

What was your test? Something like comparing sizes of the most compressed version of the net to the most compressed version of the list of positions with moves could be an indicator. Or maybe just checking that the net hasn't seen this exact position in training? Or maybe checking positions modified by changing some insignificant detail (when possible)? During a supervised training of a NN, the loss on the test set vs the training set is a good indicator.

Here is a paper about this: "The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks" https://www.usenix.org/system/files/sec19-carlini.pdf

jhellis3 · Post by **jhellis3** » Tue Sep 15, 2020 7:18 pm

What was your test?

Nothing like that. But those ideas bring up a good point which may or may not be clear to others...

While I can only speak for myself, I do not personally have any issue whatsoever with the idea of biases in weights or internal structures of the NN provided the information available to it is completely "kosher".

For most of the features provided to the NN in Lc0 regard, they are perfectly fine and indeed critical to being able to play the game of chess. I also don't have any issue with certain non-zero features of a general nature. The one and only question mark as far as I am concerned is the existence of pre-root history, especially for the first 8 ply.

Let p be a position.
Let h be history positions.

Most of the data is going to look something like: [h7, h6, h5, h4, h3, h2, h1, h0, p].

Except the single most trained on position in the data set: [x, x, x, x, x, x, x, x, p].
Then you have: [x, x, x, x, x, x, x, h0, p].
Then: [x, x, x, x, x, x, h1, h0, p].

This continues until the history is properly filled after 8 ply. So the NN has fairly unique identifiers here. These are not just unique positions, but a unique class of positions, which also happen to be some (if not the) most heavily trained on data. Of course, one might correctly contend that endgame positions are also going to be overly represented, but to my knowledge, Lc0 employs resign to avoid this at that end of the spectrum.

Anyway, the tests I had in mind are of a fairly straightforward nature. First one needs to create an executable disabling history. Then one can conduct a series of tests measuring the relative changes in outcomes under various conditions.

A couple of examples might be a very short opening book(s) say 2 - 3 ply, where the two engines are matched against each other. However, in one instance, the openings are preceded by an 8 ply knight dance where the end position is the starting position. Such tests could be carried out both with just policy and also conditions mirroring that of training. With the policy test, it should be immediately telling if it is dependent on history or not. However, even in the case that it is, I would not consider this evidence of book like features courtesy of information outside the rules of chess. To establish that is significantly more difficult and involves measuring the relative increases, decreases, or static behavior of the performance of the engine without history as the distance from the starting position increases.

Another idea could be to use chess 960 and measure the relative differences between a variety of nets & exes in comparison to standard chess. If the Elo disparities significantly & consistently compress with 960 vs standard chess, that could be cause for concern.

And this brings up another good point. 960 in itself is a good safeguard against such behavior (when used in training) because it makes the underlying math much less favorable to any sort of otherwise non-intuitive knowledge being baked in. It has been a while since I did my estimates, but I seem to recall useful data for standard chess possibly being generated up to somewhere in the 6 - 12 ply range, while 960 doesn't really make it past 3 ply (and more likely 2).

Anyway, I had planned to carry out some of these tests this fall/winter after I build my new PC. Recent developments with NNUE have quelled my interest in Lc0 for the most part, but I may still carry them out just satisfy my own curiosity. Or maybe someone else will beat me to the punch....

I will say, that immediately drawing conclusions one way or another from any tests done is likely foolhardy as there are a number of subtleties which can be easy to overlook. However, overall it should be easier to establish the lack of history based opening knowledge, than to definitively prove its existence. Potential positive tests in this regard are more likely to be considered suspicious but unclear if the signal is not overwhelming (which I doubt).

Again, just to be clear for my part, I personally have no problem with any form of internal net optimization which arise as a result sampling bias (or lack thereof), necessary information provided as input, or even unnecessary information provided it is of a wholly general nature.

cucumber · Post by **cucumber** » Tue Sep 15, 2020 7:30 pm

jhellis3 wrote: ↑Tue Sep 15, 2020 7:18 pm
What was your test?
Nothing like that. But those ideas bring up a good point which may or may not be clear to others...

...

Of course, one might correctly contend that endgame positions are also going to be overly represented, but to my knowledge, Lc0 employs resign to avoid this at that end of the spectrum.

...

I'll revisit the rest of your post later, but I think you'd be surprised. On June 23 of last year, Crem noted that "there are more RKvsRK positions in training data than positions after 1. c4, 1. d4 and 1. e4 combined." Opening diversity has increased substantially since then, so it's possible that rkvrk is even more overrepresented relative to c4, d4, and e4 now.

This reply isn't meant to be a refutation of anything you're saying, just a neat tidbit that I've always found pretty interesting. Thought it was interesting enough to share.

corres · Post by **corres** » Wed Sep 16, 2020 10:01 am

cucumber wrote: ↑Tue Sep 15, 2020 7:30 pm ...
I'll revisit the rest of your post later, but I think you'd be surprised. On June 23 of last year, Crem noted that "there are more RKvsRK positions in training data than positions after 1. c4, 1. d4 and 1. e4 combined." Opening diversity has increased substantially since then, so it's possible that rkvrk is even more overrepresented relative to c4, d4, and e4 now.
This reply isn't meant to be a refutation of anything you're saying, just a neat tidbit that I've always found pretty interesting. Thought it was interesting enough to share.

From the viewpoint of playing endgame for Leela the most important knowledge is the knowing of transitions between endgame with many figures to the simple endgame. Obviously to teach Leela to know the endgame database it would be a superfluous thing and, I am afraid for this knowledge there is not enough room in the net of Leela.

jp · Post by jp » Wed Sep 16, 2020 10:22 am

cucumber wrote: ↑Tue Sep 15, 2020 7:30 pm On June 23 of last year, Crem noted that "there are more RKvsRK positions in training data than positions after 1. c4, 1. d4 and 1. e4 combined."

Did he mean that those endgames were fed into training directly? If so, how were the positions chosen?

corres · Post by **corres** » Wed Sep 16, 2020 10:33 am

Laskos wrote: ↑Tue Sep 15, 2020 12:23 am ...
According to my observations and tests, positionally in openings Leela > SF NNUE > SF classical.

It is not an astonishing result.
Classical Stockfish mainly search for position with tactical opportunity. Leela mainly search for calm and positionally sound positions. The number of line what drive to a tactical position are relative small opposite to the number of line driving to strategically sound position. Moreover the search speed of classical Stockfish is much more higher than Leela has, so Classical Stockfish can find its own line more easily than
Leela find her own lines. But this behavior do not prove only the Classical Stockfish has a (good) opening book, even Leela has not it.
I am afraid if somebody investigate the situation in Middle-game, it would turn to light, the build-in knowledge of Leela is better and deeper than Classical Stockfish, so in this sense Leela has better mid(!)-game book.

Asking to people who believe Leela NN is a book, what they think about SF NN now?

Re: Asking to people who believe Leela NN is a book, what they think about SF NN now?

Re: Asking to people who believe Leela NN is a book, what they think about SF NN now?

Re: Asking to people who believe Leela NN is a book, what they think about SF NN now?

Re: Asking to people who believe Leela NN is a book, what they think about SF NN now?

Re: Asking to people who believe Leela NN is a book, what they think about SF NN now?

Re: Asking to people who believe Leela NN is a book, what they think about SF NN now?

Re: Asking to people who believe Leela NN is a book, what they think about SF NN now?