CPU NN question

Leo · Post by **Leo** » Fri Nov 13, 2020 5:46 pm

I dont remeber people saying NNs on CPUs was going to happen. I dont know for sure. Does anyone have comments on NNs being used on CPUs ? It seems like its a big surprise. It seemed that only GPUs could do that. FYI I am an amateur hack on these questions.

smatovic · Post by **smatovic** » Fri Nov 13, 2020 6:17 pm

Leo wrote: ↑Fri Nov 13, 2020 5:46 pm I dont remeber people saying NNs on CPUs was going to happen. I dont know for sure. Does anyone have comments on NNs being used on CPUs ? It seems like its a big surprise. It seemed that only GPUs could do that. FYI I am an amateur hack on these questions.

IMO Peter Osterlund's 2017 Texel/Giraffe experiment showed the potential of NN in AB on CPU...

http://talkchess.com/forum3/viewtopic.p ... 10#p719539

...
From the results it can be seen that the giraffe evaluation function makes texel around 250-350 elo weaker depending on time control. This is caused by the giraffe evaluation function being very slow. If it was somehow possible to make the giraffe evaluation function run as fast as the texel evaluation function, the giraffe eval version would actually be around 100-120 elo stronger than the texel eval version.

Whether future hardware and software improvements will make it possible to run an ANN evaluator as quickly as a traditional evaluator remains to be seen.

--
Srdja

Leo · Post by **Leo** » Fri Nov 13, 2020 7:13 pm

smatovic wrote: ↑Fri Nov 13, 2020 6:17 pm
Leo wrote: ↑Fri Nov 13, 2020 5:46 pm I dont remeber people saying NNs on CPUs was going to happen. I dont know for sure. Does anyone have comments on NNs being used on CPUs ? It seems like its a big surprise. It seemed that only GPUs could do that. FYI I am an amateur hack on these questions.
IMO Peter Osterlund's 2017 Texel/Giraffe experiment showed the potential of NN in AB on CPU...

http://talkchess.com/forum3/viewtopic.p ... 10#p719539

...
From the results it can be seen that the giraffe evaluation function makes texel around 250-350 elo weaker depending on time control. This is caused by the giraffe evaluation function being very slow. If it was somehow possible to make the giraffe evaluation function run as fast as the texel evaluation function, the giraffe eval version would actually be around 100-120 elo stronger than the texel eval version.

Whether future hardware and software improvements will make it possible to run an ANN evaluator as quickly as a traditional evaluator remains to be seen.

--
Srdja

Nice. Thanks.

brianr · Post by **brianr** » Fri Nov 13, 2020 7:46 pm

There are many different NN sizes and architectures.

The larger ones like Lc0 uses are only really practical on GPUs.

The smaller ones like SF-NNUE run on CPUs (and are very cleverly incrementally updated to make even faster).

Leo · Post by **Leo** » Fri Nov 13, 2020 7:50 pm

brianr wrote: ↑Fri Nov 13, 2020 7:46 pm There are many different NN sizes and architectures.

The larger ones like Lc0 uses are only really practical on GPUs.

The smaller ones like SF-NNUE run on CPUs (and are very cleverly incrementally updated to make even faster).

Was it a mater of programing it to be clever?

yurikvelo · Post by **yurikvelo** » Sat Nov 14, 2020 12:22 am

Leo wrote: ↑Fri Nov 13, 2020 5:46 pm I dont remeber people saying NNs on CPUs was going to happen.

maybe they were talking about MCTS, not about static evaluation
GPU run MCTS, CPU NN run alpha-beta minimax, but static evaluation is not handcrafted

Leo · Post by **Leo** » Sat Nov 14, 2020 3:25 am

yurikvelo wrote: ↑Sat Nov 14, 2020 12:22 am
Leo wrote: ↑Fri Nov 13, 2020 5:46 pm I dont remeber people saying NNs on CPUs was going to happen.
maybe they were talking about MCTS, not about static evaluation
GPU run MCTS, CPU NN run alpha-beta minimax, but static evaluation is not handcrafted

OK. Interesting.

Madeleine Birchfield · Sat Nov 14, 2020 6:32 am

Leo wrote: ↑Fri Nov 13, 2020 5:46 pm I dont remeber people saying NNs on CPUs was going to happen.

yurikvelo wrote: ↑Sat Nov 14, 2020 12:22 am maybe they were talking about MCTS, not about static evaluation
GPU run MCTS, CPU NN run alpha-beta minimax, but static evaluation is not handcrafted

Leo wrote: ↑Sat Nov 14, 2020 3:25 am OK. Interesting.

Leo's original post was correct; people didn't say that neural networks on CPUs were going to happen because at the time the neural networks (like in Leela, Allie) were too slow to be calculated on CPU. The development of NNUE from computer shogi happened without much attention from the computer chess community, and when Hisayori Noda ported NNUE to his Stockfish fork in autumn 2019, it went unnoticed and was left untouched for almost an entire year until people like Henk Drost, Mark Jordan, Sergio Vieri, and so on started experimenting with it around the beginning of June 2020. When it succeeded it shocked the computer chess community as many previously believed that neural networks were too slow to work on CPUs.

MCTS search is completely unrelated to neural network evaluation, as shown by Komodo and Shashchess, both engines that (have an option to) use MCTS with their handcrafted eval on CPU.

smatovic · Post by **smatovic** » Sat Nov 14, 2020 8:10 am

Another one, according to this TC poll from 2018, most programmers simply underestimated NNs for chess:

http://talkchess.com/forum3/viewtopic.php?f=7&t=67121

but at least one voted for "take Giraffe and tune it", hence I guess a minority was aware of the potential, or alike.

--
Srdja

Madeleine Birchfield · Sat Nov 14, 2020 8:40 am

smatovic wrote: ↑Sat Nov 14, 2020 8:10 am Another one, according to this TC poll from 2018, most programmers simply underestimated NNs for chess:

http://talkchess.com/forum3/viewtopic.php?f=7&t=67121

but at least one voted for "take Giraffe and tune it", hence I guess a minority was aware of the potential, or alike.

--
Srdja

I was a big Leela fan back in 2018 and believed it was going to become the strongest chess engine on the scene and it was superior to traditional engines due to its strong positional evaluation that handcrafted evaluations simply cannot replicate, but I largely saw the neural network thing as something that was largely incompatible with the traditional alpha-beta search paradigm, due to the fact that no traditional engine supports the use of GPUs. So for me, it was more a fact of how many people were willing to shift away from the traditional paradigm over to the Leela-style paradigm using GPUs, and nobody, aside from Daniel Shawul with Scorpio, seemed interested in moving to the new Leela-style paradigm, so I concluded that it would be a very long time until traditional engines adopted neural networks.

CPU NN question

CPU NN question

Re: CPU NN question

Re: CPU NN question

Re: CPU NN question

Re: CPU NN question

Re: CPU NN question

Re: CPU NN question

Re: CPU NN question

Re: CPU NN question

Re: CPU NN question