Page 1 of 2

CPU NN question

Posted: Fri Nov 13, 2020 5:46 pm
by Leo
I dont remeber people saying NNs on CPUs was going to happen. I dont know for sure. Does anyone have comments on NNs being used on CPUs ? It seems like its a big surprise. It seemed that only GPUs could do that. FYI I am an amateur hack on these questions.

Re: CPU NN question

Posted: Fri Nov 13, 2020 6:17 pm
by smatovic
Leo wrote: Fri Nov 13, 2020 5:46 pm I dont remeber people saying NNs on CPUs was going to happen. I dont know for sure. Does anyone have comments on NNs being used on CPUs ? It seems like its a big surprise. It seemed that only GPUs could do that. FYI I am an amateur hack on these questions.
IMO Peter Osterlund's 2017 Texel/Giraffe experiment showed the potential of NN in AB on CPU...

http://talkchess.com/forum3/viewtopic.p ... 10#p719539
...
From the results it can be seen that the giraffe evaluation function makes texel around 250-350 elo weaker depending on time control. This is caused by the giraffe evaluation function being very slow. If it was somehow possible to make the giraffe evaluation function run as fast as the texel evaluation function, the giraffe eval version would actually be around 100-120 elo stronger than the texel eval version.

Whether future hardware and software improvements will make it possible to run an ANN evaluator as quickly as a traditional evaluator remains to be seen.
--
Srdja

Re: CPU NN question

Posted: Fri Nov 13, 2020 7:13 pm
by Leo
smatovic wrote: Fri Nov 13, 2020 6:17 pm
Leo wrote: Fri Nov 13, 2020 5:46 pm I dont remeber people saying NNs on CPUs was going to happen. I dont know for sure. Does anyone have comments on NNs being used on CPUs ? It seems like its a big surprise. It seemed that only GPUs could do that. FYI I am an amateur hack on these questions.
IMO Peter Osterlund's 2017 Texel/Giraffe experiment showed the potential of NN in AB on CPU...

http://talkchess.com/forum3/viewtopic.p ... 10#p719539
...
From the results it can be seen that the giraffe evaluation function makes texel around 250-350 elo weaker depending on time control. This is caused by the giraffe evaluation function being very slow. If it was somehow possible to make the giraffe evaluation function run as fast as the texel evaluation function, the giraffe eval version would actually be around 100-120 elo stronger than the texel eval version.

Whether future hardware and software improvements will make it possible to run an ANN evaluator as quickly as a traditional evaluator remains to be seen.
--
Srdja
Nice. Thanks.

Re: CPU NN question

Posted: Fri Nov 13, 2020 7:46 pm
by brianr
There are many different NN sizes and architectures.

The larger ones like Lc0 uses are only really practical on GPUs.

The smaller ones like SF-NNUE run on CPUs (and are very cleverly incrementally updated to make even faster).

Re: CPU NN question

Posted: Fri Nov 13, 2020 7:50 pm
by Leo
brianr wrote: Fri Nov 13, 2020 7:46 pm There are many different NN sizes and architectures.

The larger ones like Lc0 uses are only really practical on GPUs.

The smaller ones like SF-NNUE run on CPUs (and are very cleverly incrementally updated to make even faster).
Was it a mater of programing it to be clever?

Re: CPU NN question

Posted: Sat Nov 14, 2020 12:22 am
by yurikvelo
Leo wrote: Fri Nov 13, 2020 5:46 pm I dont remeber people saying NNs on CPUs was going to happen.
maybe they were talking about MCTS, not about static evaluation
GPU run MCTS, CPU NN run alpha-beta minimax, but static evaluation is not handcrafted

Re: CPU NN question

Posted: Sat Nov 14, 2020 3:25 am
by Leo
yurikvelo wrote: Sat Nov 14, 2020 12:22 am
Leo wrote: Fri Nov 13, 2020 5:46 pm I dont remeber people saying NNs on CPUs was going to happen.
maybe they were talking about MCTS, not about static evaluation
GPU run MCTS, CPU NN run alpha-beta minimax, but static evaluation is not handcrafted
OK. Interesting.

Re: CPU NN question

Posted: Sat Nov 14, 2020 6:32 am
by Madeleine Birchfield
Leo wrote: Fri Nov 13, 2020 5:46 pm I dont remeber people saying NNs on CPUs was going to happen.
yurikvelo wrote: Sat Nov 14, 2020 12:22 am maybe they were talking about MCTS, not about static evaluation
GPU run MCTS, CPU NN run alpha-beta minimax, but static evaluation is not handcrafted
Leo wrote: Sat Nov 14, 2020 3:25 am OK. Interesting.
Leo's original post was correct; people didn't say that neural networks on CPUs were going to happen because at the time the neural networks (like in Leela, Allie) were too slow to be calculated on CPU. The development of NNUE from computer shogi happened without much attention from the computer chess community, and when Hisayori Noda ported NNUE to his Stockfish fork in autumn 2019, it went unnoticed and was left untouched for almost an entire year until people like Henk Drost, Mark Jordan, Sergio Vieri, and so on started experimenting with it around the beginning of June 2020. When it succeeded it shocked the computer chess community as many previously believed that neural networks were too slow to work on CPUs.

MCTS search is completely unrelated to neural network evaluation, as shown by Komodo and Shashchess, both engines that (have an option to) use MCTS with their handcrafted eval on CPU.

Re: CPU NN question

Posted: Sat Nov 14, 2020 8:10 am
by smatovic
Another one, according to this TC poll from 2018, most programmers simply underestimated NNs for chess:

http://talkchess.com/forum3/viewtopic.php?f=7&t=67121

but at least one voted for "take Giraffe and tune it", hence I guess a minority was aware of the potential, or alike.

--
Srdja

Re: CPU NN question

Posted: Sat Nov 14, 2020 8:40 am
by Madeleine Birchfield
smatovic wrote: Sat Nov 14, 2020 8:10 am Another one, according to this TC poll from 2018, most programmers simply underestimated NNs for chess:

http://talkchess.com/forum3/viewtopic.php?f=7&t=67121

but at least one voted for "take Giraffe and tune it", hence I guess a minority was aware of the potential, or alike.

--
Srdja
I was a big Leela fan back in 2018 and believed it was going to become the strongest chess engine on the scene and it was superior to traditional engines due to its strong positional evaluation that handcrafted evaluations simply cannot replicate, but I largely saw the neural network thing as something that was largely incompatible with the traditional alpha-beta search paradigm, due to the fact that no traditional engine supports the use of GPUs. So for me, it was more a fact of how many people were willing to shift away from the traditional paradigm over to the Leela-style paradigm using GPUs, and nobody, aside from Daniel Shawul with Scorpio, seemed interested in moving to the new Leela-style paradigm, so I concluded that it would be a very long time until traditional engines adopted neural networks.