Hi Jamie,JacquesRW wrote: ↑Sun Oct 06, 2024 9:24 pm You can see Orion falls into one of the classic beginner traps of immediately trying to use more than one hidden layer. The author may have non-elo driven reasons to do this, but I don't think it is particularly helpful to include it in what looks like a tutorial for a basic NNUE.
I'm not sure that "Orion falls into one of the classic beginner traps", as I was one of the early adopters of NNUE back in 2020 (see this post), and the current NN architecture is just derived from those early experiments.
I'm fascinated by the "compression" of chess knowledge allowed by these NNs and have certainly quickly tested smaller networks (as simple as 768x64x1). With v1.0, my goal was to see if it was possible to use "weak" and "dirty" labels (i.e. only game results) to train a "decent" network (in terms of performance). Keeping a second hidden layer gave me (slightly) better results, so I've kept the scheme

But you're right : if you have good labels for training, it's possible to have smaller and more efficient networks (see the current trend with small language models) !
David