dkappe wrote: ↑Mon Jul 27, 2020 5:50 am
Ovyron wrote: ↑Mon Jul 27, 2020 5:04 am
corres wrote: ↑Sat Jul 25, 2020 4:59 pm
It is pity, but the NN of NNUE grows more faster than the speed of (def)Stockfish.
This can't go on forever, not with the exponential nature of chess. NNUE plateaus, and it needs better eval to improve, it's going to come from someone that translates the concepts of NNUE into stockfish dev's eval (if the former scores a position at 1.50 while the latter says it's 0.00, and NNUE wins, you've gotta figure out the cause of the discrepancy), then a new NNUE based on this eval can arise.
Luckily NNUE has a lot of fuel, as any position can get better eval by using more depth. Perhaps what we need is a method to differentiate positions where the eval is already good from those that are still bad, and increase the depth of the bad ones. Otherwise, more depth will be wasted on positions where it doesn't help to get better eval (if eval is 0.15 at depth 8 and 0.15 at depth 9, you just wasted time getting there).
NNUE is a relatively simple beast whose main innovation is that it’s efficiently computable when there’s a minor change in the inputs. So efficient, in fact, that you don’t need a GPU to run it quickly.
Now anyone that’s worked with shallow, fully connected networks knows that they’re relatively limited. Approximating a real valued function over a domain of bit strings is a great use. In fact it’s been a lively topic of research for over a decade. But how good the approximation is depends very much on the function to be approximated. Eval at depth 8 may be a good candidate, but a higher depth search may become increasingly difficult, especially if it has bigger and more frequent discontinuities. An educated guess is that for the 256 and 384 networks there is an N beyond which it no longer improves.
Let’s disentangle this a bit. You’re actually training on a meld of game result and SF d8. Eg if SF says 0.6, the game result is 1.0, the train target is c. 0.8, or if result 0.5, target C 0.55
But let’s say you train only (no results data) on SF eval from depth N. The NN is slower and it’s an approximation, so if D=1, it’s worse.
As D increases, you’ll gain from the forward knowledge contained in the improved eval, but you still have NN “approximation” error. At a certain D the search knowledge gain overcomes the approximation error and the entity evaluation “improves”.
However, as above, you’re not training on SF eval only, you’re training on eval plus result, so the training targets already have forward knowledge, and in our little example, we’ll just be altering the target from 0.55 / 0.8 to something a little either side, noisy/steppy for low D, smoother/more consistent for high D. Which all points to more D is good. And the paradox that if game results were “perfect”, even infinite D would not be helpful, because your training targets already know everything anyway.
So, what’s really being trained on is a meld of imperfect game result and imperfect evaluation (less imperfect hopefully with increasing D) and you end up with an NN approximation of the two. Then that NN approximation is inserted into another search, and hopefully that search copes with it all.Apparently it does.
Inputs are:
Relatively poor beancount SF eval and SF dubiously pruning search produces noisy non-optimal position eval.
NoIsy non-optimal position eval plus imperfect results data and relatively small and primitive NN produce (slightly less?) non optimal NNUE eval.
Noisy non optimal NNUE eval plus even more dubiously non-tuned imperfect SF search eval produce something supposedly improved on the original.
Fun!