https://www.chessprogramming.org/Stockfish_NNUE. In the NNUE network architecture, I can see 256 weights in the first hidden layer. It requires an int16 to keep the weights.
Can we reduce the number of hidden layer neurons from say 256 to 128, and it will reduce the size of the weights to int8? Also, what's the reason for keeping it to 256 and not any other numbers? It looks by reducing the neurons just one, the weight vectors sharply dropped to int8.
256 in NNUE?
Moderators: hgm, Rebel, chrisw
-
- Posts: 276
- Joined: Sat Mar 04, 2017 12:24 pm
- Location: Hungary
Re: 256 in NNUE?
Yes you can reduce. 256 is only a tested number (i think). There are some recommendations for neurons, but there is no perfect recipe.kinderchocolate wrote: ↑Thu Jan 28, 2021 8:16 pm https://www.chessprogramming.org/Stockfish_NNUE. In the NNUE network architecture, I can see 256 weights in the first hidden layer. It requires an int16 to keep the weights.
Can we reduce the number of hidden layer neurons from say 256 to 128, and it will reduce the size of the weights to int8? Also, what's the reason for keeping it to 256 and not any other numbers? It looks by reducing the neurons just one, the weight vectors sharply dropped to int8.
-
- Posts: 406
- Joined: Sat May 05, 2012 2:48 pm
- Full name: Oliver Roese
Re: 256 in NNUE?
The number of weights tells you something about the dimension of your problem, the size of ints something about your precision. Obviously there is no correspondence between them. You can of course try to vary them, as already mentioned. AFAIK there exists still no recommendable guideline to prefer one over some other, except for cornercases.kinderchocolate wrote: ↑Thu Jan 28, 2021 8:16 pm ...
Can we reduce the number of hidden layer neurons from say 256 to 128, and it will reduce the size of the weights to int8? ...
-
- Posts: 1789
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: 256 in NNUE?
You can have a network with 2x128 neurons in L1 as opposed to 2x256 like SF. Its what I do in Ethereal, it works fine.
The number of weights has nothing to do with the size of the individual weights.
The number of weights has nothing to do with the size of the individual weights.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
-
- Posts: 454
- Joined: Mon Nov 01, 2010 6:55 am
- Full name: Ted Wong
Re: 256 in NNUE?
Thanks! Follow up questions:
1.) Why integer16 for the weights? Shouldn't the weights be floating numbers?
2.) Why int16 in the first layer but int8 in the remaining of the network?
1.) Why integer16 for the weights? Shouldn't the weights be floating numbers?
2.) Why int16 in the first layer but int8 in the remaining of the network?
-
- Posts: 27869
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: 256 in NNUE?
My guess is that this is just trial and error. Heavily influenced by what the hardware can do in terms of SIMD instructions. The weights of the first layer never have to be multiplied (as the inputs are just 0/1), and are furthermore updated incrementally. Which means in practice only adding / subtracting a very small fraction of them (those for which the input toggled) to the KPST sums. So I suppose there is very little gain in reducing the precision to 8 bits. Which of course would not be enough of a reason to refrain doing it if making them 16 bits would serve no purpose at all.
-
- Posts: 16
- Joined: Fri Dec 27, 2019 8:47 pm
- Full name: Jacek Dermont
Re: 256 in NNUE?
It's called neural network quantization. You convert into int16 or int8, you lose some precision, which may worsen the net a liitle bit, but SIMD calculations then can be faster as more weights fit into registers. As for int16 for first layer, this reduces memory by 2x (floats are 32bits), so more or entire neural network can fit into cache, thus again it becomes faster.
-
- Posts: 1789
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: 256 in NNUE?
https://chess.stackexchange.com/questio ... 3736#33736kinderchocolate wrote: ↑Sat Jan 30, 2021 11:06 am Thanks! Follow up questions:
1.) Why integer16 for the weights? Shouldn't the weights be floating numbers?
2.) Why int16 in the first layer but int8 in the remaining of the network?
That answers why the input is 16 bits and the others 8 bits. TL;DR: Due the the way its computed _without_ multiplication.
As for floats -- you can pack ints more heavily. Also, you remove the possibility of variance between how compilers and platforms treat floats. Not all floating point expressions will be evaluated the same way on all compilers / platforms.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
-
- Posts: 300
- Joined: Mon Apr 30, 2018 11:51 pm
Re: 256 in NNUE?
But most certainly will, especially now that x87 is effectively dead and gone. (The biggest differences are in library functions like exp() and the likes.)AndrewGrant wrote: ↑Sat Jan 30, 2021 3:21 pm As for floats -- you can pack ints more heavily. Also, you remove the possibility of variance between how compilers and platforms treat floats. Not all floating point expressions will be evaluated the same way on all compilers / platforms.