NNUE accessible explanation

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Tue Aug 04, 2020 7:59 pm

Milos wrote: ↑Mon Aug 03, 2020 8:21 am On CPU, LC0 relies on Blas. Afaik Blas backends can be used interchangeably and share the same basic interface.

There's an Eigen backend as well, but it is just used for basic matrix/matrix multiplies, i.e. as an alternative BLAS library.

Giraffe relies on Eigen. I am not really familiar with Eigen aside from knowing that it is a general purpose linear algebra library. If we are lucky, Matthew Lai could pop in and explain its benefits and whether he would recommend using that.
Eigen is very good for more complex operations (beyond winograd convolutions, matrix-matrix multiply and dot products) on a single thread. On multi-threads nothing much except general matrix-matrix products is implemented.
In general using Blas as backend is a much safer bet.

One reason why Eigen is nice is that it is a header only library, you don't need to compile and link it separately. This simplifies the build process. For small matrices Eigen's operations can be inlined better and perform faster because the compilers' optimizer has more visibility. Also the free BLAS libraries like OpenBLAS can be rather buggy.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Tue Aug 04, 2020 8:30 pm

jorose wrote: ↑Thu Jul 23, 2020 4:12 pm Notably, my understanding is nodchip also has as input for each square whether a capture occured there.

Ha, interesting, there's a lot of tricks one can do here. You can also cheat with piece types. It's not clear to me whether NNUE can distinguish between castling state right now? But you can fix such issues by making separate "king" and "king that can never castle" pieces. Or "pawn" and "passed pawn" pieces, if you want to "help" the neural network a bit.

Pio · Post by **Pio** » Tue Aug 04, 2020 10:32 pm

Gian-Carlo Pascutto wrote: ↑Tue Aug 04, 2020 8:30 pm
jorose wrote: ↑Thu Jul 23, 2020 4:12 pm Notably, my understanding is nodchip also has as input for each square whether a capture occured there.
Ha, interesting, there's a lot of tricks one can do here. You can also cheat with piece types. It's not clear to me whether NNUE can distinguish between castling state right now? But you can fix such issues by making separate "king" and "king that can never castle" pieces. Or "pawn" and "passed pawn" pieces, if you want to "help" the neural network a bit.

Actually you need to have the castling rights in the rooks so that a rook with castling opportunity is a castlingRook piece. I do that and it works. Whenever the king moves both castlingRooks if existing will be transformed To normal rooks and whenever a castlingRook moves it will become a normal rook. I also put the En passant pawn as a special piece. In that way I have eight piece types that can be encoded in three bits.

It would be interesting to make a neural network that is vertical flip symmetrical plus a diff part when castling is available. If no pawns are present It could be both vertical, horizontal and diagonal symmetrical.

Andrew · Post by **Andrew** » Tue Nov 17, 2020 9:21 am

fierz wrote: ↑Fri Jul 24, 2020 11:14 pm

3) When it comes to training, all that I know so far is Texel's tuning method which I recently used to improve my checkers engine (http://www.fierz.ch/cake186.php) by using logistic regression on win-loss-draw information on a few million (N) positions to improve the weights of my handwritten eval function. So essentially I do some kind of gradient descent for a rather small handful of parameters (a few 100). When training an NN, I read that the relu activation has the advantage that its derivative is easy to compute, but I'm not sure if/how I would need to use that. If I think of Texel's tuning method, I would set up some small NN to start with, and try to do the same as I did there = calculate the output of the NN for all N positions, calculate the error vs the game results; then change the weights of all parameters layer by layer starting from the last one, by a small amount to calculate a gradient, and do the same I did before? Is this totally wrong (because I don't calculate any derivatives here?)?

4) From reading about Stockfish NNUE I get the impression that they are not doing a regression vs game results, but rather vs the evaluation of a Stockfish search to X ply and try to learn that rather than the game result, which is different then the Texel tuning method. Is this distincition of trying to learn search eval vs trying to learn game results actually relevant or are the two +/- equal?

Sorry for the really stupid questions... but perhaps other people have them too...

best regards
Martin-who-didn't-realize-you-were-in-Indiana! Hope you are doing well there!

A followup to this post, is anyone aware of any NNUE engines that have been developed for Checkers/Draughts?

The 8x8 version of course has been proved as a draw, but 10x10 and other variants might be a nice challenge!

In fact 8x8 might be an interesting experiment to see how simple an initial evaluation function you can use to train the network.

I found this interesting paper from 1999 using networks for checkers but in a different way.

https://www.researchgate.net/publicatio ... _knowledge

Andrew

Rein Halbersma · Post by **Rein Halbersma** » Tue Nov 17, 2020 10:58 am

Andrew wrote: ↑Tue Nov 17, 2020 9:21 am A followup to this post, is anyone aware of any NNUE engines that have been developed for Checkers/Draughts?

The 8x8 version of course has been proved as a draw, but 10x10 and other variants might be a nice challenge!

In fact 8x8 might be an interesting experiment to see how simple an initial evaluation function you can use to train the network.

I found this interesting paper from 1999 using networks for checkers but in a different way.

https://www.researchgate.net/publicatio ... _knowledge

Andrew

Perhaps you missed this post by Jonathan Kreuzer: http://talkchess.com/forum3/viewtopic.p ... 20#p872289
It's not the same as NNUE, but similar in spirit. Just the raw board representation (121 bytes per position, including side to move) fed into 3 fully connected layers of 192, 32 and 32 neurons each, squashed with a sigmoid onto an eval score. Works like a charm.

Rein Halbersma · Post by **Rein Halbersma** » Tue Nov 17, 2020 1:15 pm

Note that Jonathan's network has an input layer PieceSq x PieceType (i.e. 32 x 4 inputs, but since checkers can't be on the promotion line, he used (32 + 28) x 2 = 120, and then 1 more for the side to move).

A NNUE for checkers / draughts would have to feature a fully quadratic PieceSq x PieceSq input layer since there is no unique piece. Fortunately, there are fewer squares on a chequered board and only 2 piece types per color. So for 8x8 checkers there would be essentially (32 * 4)^2 = 16K inputs, and for 10x10 draughts there'd be (50*4)^2 = 40K inputs, almost the same as for chess. Should be doable to keep a high NPS for these type of networks.

Andrew · Post by **Andrew** » Wed Nov 18, 2020 8:05 am

Thanks Rein I hadn't seen that. Having a look now!

Andrew

NNUE accessible explanation

Re: NNUE accessible explanation

Re: NNUE accessible explanation

Re: NNUE accessible explanation

Re: NNUE accessible explanation

Re: NNUE accessible explanation

Re: NNUE accessible explanation

Re: NNUE accessible explanation