Is Allie a 'Leela net player'?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27814
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Is Allie a 'Leela net player'?

Post by hgm »

[Moderation] Please stick to the subject of teaching NN. Specifics on the tuning of Ethereal or other AB engines can be done in the Engine Origins section.
User avatar
hgm
Posts: 27814
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Is Allie a 'Leela net player'?

Post by hgm »

To make my own contribution to this discussion:

These neural networks beasically add a new layer to the computer architecture, which complicates the discussion. If someone wrote a Chess program in JavaScript, and I would write a betterJavaScript interpreter, or even a compiler, but would use it to interpret/compile that existing engine, I think everyone would agree I did not write a chess engine. Even though I might have contributed greatly to its strength, if my compiled code was a lot faster than available interpreters. But I have been working in the 'layer' that had nothing to do with chess, and all chessic properties come from the interpreted program.

Writing C code to run a NN is a bit similar. Not completely, because a NN is just an evaluation (or extension / reduction oracle, if you want), and a chess engine is more than just evaluation. It also needs search. That search is not done by the NN. So writing code for running a NN or NNUE chess net does involve a tiny bit of effort specific to chess; you could make your own variation on PUCT, which would provide a different engine, even if you used the same net as evaluation. Just like you can use Fruit's evaluation function with Stockfish' search, etc.

Search is not the most creative part of engines, though. It is highly standardized. In A-B engines move sorting and the amount of extension reduction still gives you a little freedom. For PUCT there no doubt are different parameters you could tune, but the situation is a bit similar. If you just implement the standard PUCT algorithm, you haven't made a new search, you just re-implemented the same search. If you then also use the same network (i.e. trained the same way), you haven't made anything new at all. Even if you completely re-wrote the code. It would be just like rewriting Stockfish in plain C, assembler or JavaScript. It would still be Stockfish.
Tony P.
Posts: 216
Joined: Sun Jan 22, 2017 8:30 pm
Location: Russia

Re: Is Allie a 'Leela net player'?

Post by Tony P. »

Alayan wrote: Wed Sep 30, 2020 6:55 pm Realistically, any NNUE engine authors could take some old sergio net, put it through a short SL phase that doesn't change much its strength but change all the floats making up the weights so the source can't be traced, claim it as his own achievement, and get a net that's a few elo short of SF's one. This is trivial, and simex has proven to be unable to detect this kind of similarity anyway.
The risk of this kind of cloning is what open source authors face. In a closed source engine, the only way to find out how a nonstandard protobuf is used would be RE.
User avatar
Rebel
Posts: 6997
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Is Allie a 'Leela net player'?

Post by Rebel »

Tony P. wrote: Fri Oct 02, 2020 6:32 pm
Alayan wrote: Wed Sep 30, 2020 6:55 pm Realistically, any NNUE engine authors could take some old sergio net, put it through a short SL phase that doesn't change much its strength but change all the floats making up the weights so the source can't be traced, claim it as his own achievement, and get a net that's a few elo short of SF's one. This is trivial, and simex has proven to be unable to detect this kind of similarity anyway.
The risk of this kind of cloning is what open source authors face. In a closed source engine, the only way to find out how a nonstandard protobuf is used would be RE.
Or a new util.
90% of coding is debugging, the other 10% is writing bugs.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Is Allie a 'Leela net player'?

Post by Daniel Shawul »

hgm wrote: Fri Oct 02, 2020 5:55 pm To make my own contribution to this discussion:

These neural networks beasically add a new layer to the computer architecture, which complicates the discussion. If someone wrote a Chess program in JavaScript, and I would write a betterJavaScript interpreter, or even a compiler, but would use it to interpret/compile that existing engine, I think everyone would agree I did not write a chess engine. Even though I might have contributed greatly to its strength, if my compiled code was a lot faster than available interpreters. But I have been working in the 'layer' that had nothing to do with chess, and all chessic properties come from the interpreted program.

Writing C code to run a NN is a bit similar. Not completely, because a NN is just an evaluation (or extension / reduction oracle, if you want), and a chess engine is more than just evaluation. It also needs search. That search is not done by the NN. So writing code for running a NN or NNUE chess net does involve a tiny bit of effort specific to chess; you could make your own variation on PUCT, which would provide a different engine, even if you used the same net as evaluation. Just like you can use Fruit's evaluation function with Stockfish' search, etc.
The new paradigm does take away a lot of the "art", but this art is something programmers should not miss and infact should actually enjoy it is gone. After all the goal of AI is to automate everything. Deep NNs and NNUE take care of evaluation, and even a modern search could be fully automatied like the MCTSnet algorithm I mentioned somewhere in this thread. The backup (i.e. minmax or averaging) and exploration formula and other components of the search are all learned using sub-networks (not even PUCT formula needs to be there), so basically the programmer doesn't have any input to the evaluation() or search() of an engine. He does code the framwork which btw is more involved for NNs than AB engines, but that is about it.
Search is not the most creative part of engines, though. It is highly standardized. In A-B engines move sorting and the amount of extension reduction still gives you a little freedom. For PUCT there no doubt are different parameters you could tune, but the situation is a bit similar. If you just implement the standard PUCT algorithm, you haven't made a new search, you just re-implemented the same search. If you then also use the same network (i.e. trained the same way), you haven't made anything new at all. Even if you completely re-wrote the code. It would be just like rewriting Stockfish in plain C, assembler or JavaScript. It would still be Stockfish.
So why should we hate MCTS, it is a beautiful algorithm that is not domain-specific and performs really well compared to domain-specific search algorithms? Imagine trying to replicate MCTSnet with AB search as well. i.e automatically learn all aspects of modern AB search with neural networks. That is one neural network for determining LMR reductions, another for Null move threshold, and another for futility pruning etc. Then once you are done, the "art" of the AB search is no more, but this ABnet probably would be useless in other domains. So the fact that MCTS or MCTSnet doesn't leave a lot of room for engine author's "creativity" should not be taken against it. Btw implementing MCTS+NN involves a lot of work as well: there is now new platform GPU, batching and multi-threading is needed for something other than toy programs such A0lite, storing the whole tree in memory is alien to AB programmers. The PUCT formula "cooked up" by DeepMind i generally follows the more mathematically sound UCB formula, so there is really not a lot you can do there.

About supervized learning, there is no difference between training a NN, NNUE or an hand-crafted evaluation function, other than for the fact that hand-crafted eval needs you to pick the features yourself, and for the fact that you have a handful of parameters. If you want to have a PSQT for king-piece combinations to emulate NNUE in your hand-crafted evaluation, the number of paramters would ballon up to the point where you would rather train a NN so no difference there.

This thread about Allie using lc0 games for training belongs the EO forums as well. It is clearly started by a person keen on bad mouthing Allie based on the fact that its net used lc0 games, but later stomped by the fact that a lot of AB engines do the same. How the turn tables :) After this realization, the argument is now like I have been doing "reinforcement learning" kind of tuning from the beginning when in reality this is not the case for most. How many people used CCRL games for training (if you use high-rated games, you are basically learning from L0,stockfish,Komodo). How many used stockish scored test sets such as the Zurichess's for tuning? Ofcouse some engines did TD-lambda, and other forms of reinforcement learning. But putting a taboo on one form of learning i.e. supervized training, is not a good idea at all. So why is Allie treated differently when many AB engines do the same anyway.
Madeleine Birchfield
Posts: 512
Joined: Tue Sep 29, 2020 4:29 pm
Location: Dublin, Ireland
Full name: Madeleine Birchfield

Re: Is Allie a 'Leela net player'?

Post by Madeleine Birchfield »

Daniel Shawul wrote: Fri Oct 02, 2020 8:44 pm This thread about Allie using lc0 games for training belongs the EO forums as well. It is clearly started by a person keen on bad mouthing Allie based on the fact that its net used lc0 games, but later stomped by the fact that a lot of AB engines do the same. How the turn tables :) After this realization, the argument is now like I have been doing "reinforcement learning" kind of tuning from the beginning when in reality this is not the case for most. How many people used CCRL games for training (if you use high-rated games, you are basically learning from L0,stockfish,Komodo). How many used stockish scored test sets such as the Zurichess's for tuning? Ofcouse some engines did TD-lambda, and other forms of reinforcement learning. But putting a taboo on one form of learning i.e. supervized training, is not a good idea at all. So why is Allie treated differently when many AB engines do the same anyway.
Actually, my original post was about the rules regarding NNUE vs the ankan CUDA NN backend used by Allie and Leela, and this thread seems to have devolved into different reinforcement learning techniques in training and tuning, which wasn't helped by Andrew Grant and Alayant contributing to the thread both who have a strong opinion against Allie and NNUE. I said that Allie was like an ankan NN version of a NNUE player but didn't imply that it was a bad thing; I would be perfectly fine of such thing was accepted as valid and adopted widely in the community, only that rules applying to one should be applied to the other as well in both TCEC and in the computer chess community in general. But putting a taboo on one neural network architecture, i.e. NNUE, is not a good idea at all. So why is NNUE treated differently from Allie, Leela and the other AB engines?

Regarding the entire reinforcement learning discussion, I would rather wished that to have been a separate conversation done on a different thread but it seems not to have panned out.
dkappe
Posts: 1631
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

Re: Is Allie a 'Leela net player'?

Post by dkappe »

Madeleine Birchfield wrote: Fri Oct 02, 2020 11:01 pm
Actually, my original post was about the rules regarding NNUE vs the ankan CUDA NN backend used by Allie and Leela...
The code in question is not a backend but a small bit of custom linear algebra transformation in CUDA.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
smatovic
Posts: 2663
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Is Allie a 'Leela net player'?

Post by smatovic »

Madeleine Birchfield wrote: Fri Oct 02, 2020 11:01 pm
Daniel Shawul wrote: Fri Oct 02, 2020 8:44 pm This thread about Allie using lc0 games for training belongs the EO forums as well. It is clearly started by a person keen on bad mouthing Allie based on the fact that its net used lc0 games, but later stomped by the fact that a lot of AB engines do the same. How the turn tables :) After this realization, the argument is now like I have been doing "reinforcement learning" kind of tuning from the beginning when in reality this is not the case for most. How many people used CCRL games for training (if you use high-rated games, you are basically learning from L0,stockfish,Komodo). How many used stockish scored test sets such as the Zurichess's for tuning? Ofcouse some engines did TD-lambda, and other forms of reinforcement learning. But putting a taboo on one form of learning i.e. supervized training, is not a good idea at all. So why is Allie treated differently when many AB engines do the same anyway.
Actually, my original post was about the rules regarding NNUE vs the ankan CUDA NN backend used by Allie and Leela, and this thread seems to have devolved into different reinforcement learning techniques in training and tuning, which wasn't helped by Andrew Grant and Alayant contributing to the thread both who have a strong opinion against Allie and NNUE. I said that Allie was like an ankan NN version of a NNUE player but didn't imply that it was a bad thing; I would be perfectly fine of such thing was accepted as valid and adopted widely in the community, only that rules applying to one should be applied to the other as well in both TCEC and in the computer chess community in general. But putting a taboo on one neural network architecture, i.e. NNUE, is not a good idea at all. So why is NNUE treated differently from Allie, Leela and the other AB engines?

Regarding the entire reinforcement learning discussion, I would rather wished that to have been a separate conversation done on a different thread but it seems not to have panned out.
Just to clarify, TCEC = Top Chess Engines *Competition*

So this thread (and maybe a couple of others) is about the rules of TCEC?

What is allowed in TCEC concerning NNs, what is fair in TCEC in terms of
hardware, what is a fair competition concerning copying ideas and copying code?
And, how does TCEC sanction this?

Other people have stated it already, chess program authors are free to come
up with whatever they like, if I figure out a way how to use the DSPs of Sound-
Cards, why not use them for chess, what entities does the community has to put
any restrictions on me? The rating-lists and tournament organizers set up their
own rules, these rules can not be generalized and applied on me as individual
programmer, if CCRL offers one GPU in their hardware, fine, I am free to run
4 of them, if TCEC says my approach is a clone, okay, I have to live with that.

In motor sports there are various leagues with various rules for whatever, of
course these rules can be a matter of discussion, but they apply only in the
context of the specific tournament, not on me as individual.

Looking back into history maybe ICGA was the old thing, now TCEC seems to be the
new thing...anyway, their rules and definitions do not apply to me neither on
Computer Chess as a whole.

--
Srdja

PS: maybe this all seems to be crystal-clear to the majority, anyways.