Can Leela CPU train from a PGN file instead of selfplay?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Albert Silver
Posts: 2894
Joined: Wed Mar 08, 2006 8:57 pm
Location: Rio de Janeiro, Brazil

Re: Can Leela CPU train from a PGN file instead of selfplay?

Post by Albert Silver » Thu Mar 05, 2020 5:19 pm

phhnguyen wrote:
Thu Mar 05, 2020 5:10 am
dkappe wrote:
Thu Mar 05, 2020 3:49 am
A small number of games will create a net that doesn’t play well, but illegal moves aren’t in the picture.
As I have read for recent versions Lc0 has installed some chess knowledge, including move rules (thus it is not completely Zero) to speed up learning and it can avoid the above issue.
Not recent versions, all versions, and the knowledge is only the rules of chess. Exactly as AlphaZero did.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."

jjoshua2
Posts: 67
Joined: Sat Mar 10, 2018 5:16 am

Re: Can Leela CPU train from a PGN file instead of selfplay?

Post by jjoshua2 » Thu Mar 05, 2020 10:27 pm

Ovyron wrote:
Thu Mar 05, 2020 4:08 am
I wonder what would happen if the entire Lichess database was used to train a neural network. It wouldn't play very strongly but could possibly play in a way indistinguishable from humans.
That is what the Darkqueen net is actually. Although latest ones have added a little bit of other data as well.

jjoshua2
Posts: 67
Joined: Sat Mar 10, 2018 5:16 am

Re: Can Leela CPU train from a PGN file instead of selfplay?

Post by jjoshua2 » Thu Mar 05, 2020 10:32 pm

Jimbo I wrote:
Wed Mar 04, 2020 5:06 pm
I'm hoping that it's possible for Leela CPU to train from a large PGN file instead of selfplay. If so, does anyone know how to explain it simply enough for an average Joe to understand it?
There are many different levels to do this. If you just use straight PGN it creates a very weak policy head (like Darkqueen has), because it only has data that the move played is "good", but it is not always the best one, which is ok because with a large enough sample size noise will cancel out, but all the other moves are weighted as zero, so we have no idea what other moves are important to look at.

If you throw in a few million PGNs and a few million games of selfplay data with a score for each move then you can get a very strong network like Leelenstein, and then you can go back and re-annotate those PGNs with that newly created / strongest version to get the same % probability for each move played instead of the "one-hot encoded" version, and get another jump up in strength as it completes which can take months of GPU time.

There are some other techniques you can do in between these two extremes, like policy smoothing to distribute some of the 100% training target on the move played to other moves, using simple instant rules like a distribution, or a short Stockfish search for example, which dkappe has used for some of his networks like Gyal.

Post Reply