alvinypeng wrote: ↑Wed May 24, 2023 3:41 amA deep neural network (like a Lc0 network) acts like a shallow search. And a deep AB search has a bunch of shallow searches at the ends. So why not replace the shallow searches at the ends with a deep neural network? After all, a deep neural network optimized with gradient descent might yield better results than a bunch of human-written search code.
Because of how AB works, as already discussed. "Replacing shallow searches at the ends" is just another way of saying you call the deep NN to evaluate the position in leaf nodes.
But here I am (and have been) assuming you use the GPU or some other kind of massively parallel hardware to do the evaluation.
If you mean why not use the CPU to evaluate Lc0-type NNs, then the answer is that this is too slow to be competitive.
I suppose an AB search plus Lc0-type evaluation entirely run on CPU would play stronger than Lc0 entirely run on CPU, but it would be a lot weaker than SF on the same hardware. (I would be happy to be proven wrong.)
GPUs (and the DeepMind papers) are what enabled Lc0, and the AlphaZero/Lc0 approach is what allows us to utilise GPUs for chess.
I think the reason why deep neural networks haven't replaced shallow searches is because they are very inefficient - over 90% of connections in many neural networks are useless and can be pruned away with little to no accuracy loss (as claimed in
"The Lottery Ticket Hypothesis" paper).
If an Lc0-type network can be reduced 90% in size without loss of quality, then that is an easy way to improve Lc0. But this sounds like low-hanging fruit which, if it existed at all, most likely has already been picked by the Lc0 developers a long time ago. (If not, then it is worth a try.)
If you compare Elo per operations/second, then SF on a CPU is far more "efficient" than Lc0 on a GPU.
But GPU-type hardware is much simpler and can be made to go much faster. CPUs have more or less hit their peak, GPUs are still improving.
(It is an interesting question if AI would be a big thing right now if single-threaded CPU speeds had not hit a wall.)
GPUs are good at dense-dense matrix multiplications. But if you have a sparse neural network with sparse matrix multiplications, perhaps you could evaluate a deep neural network on CPU quickly, thus eliminating any GPU-CPU latency. One big hurdle for this idea is that the neural network would have to be extremely sparse in order to be evaluated on CPU quickly, and I'm not sure how one would train such a sparse network.
Agreed (and I have no idea how to create such a sparse network either).
It seems unlikely that NNUE is the best we can do on CPU, but I don't know how to do it better.