SF-NNUE going forward...

Zenmastur · Post by **Zenmastur** » Thu Jul 30, 2020 3:30 am

ChickenLogic wrote: ↑Thu Jul 30, 2020 12:30 am Haven't read all of this thread so excuse me if it was already mentioned: tttak's fork has a lot of different architectures to try out. I'm interested in the "Gameply40x4" one which effectively holds 4 NNs in one file which get used sequentially switching every 40 plies until the 4th is loaded. This sacrifices no speed over the regular halfkp-nets with the same input layer size. It'll require at least 4x more training data though.

Well, considering Sergio is generating 1,000,000,000 fens several time per day the amount of training doesn't seem to be a huge problem. So what if only one new net is released everyday or two on overage. The nets are coming out so fast they can't be tested properly before the next one is ready for testing. Slowing down a bit on the releases wouldn't be a bad thing. He's released 15 nets in the last two days. This would be reduced to about 3 if it takes 4 times as much data per release.

I think splitting it by number of plies played is a mistake. You can be in an endgame in 40 plies. Better to do it on the number of pieces and pawns left on the board. Every capture decrements the piece count by one so it's easy to keep track of. The other advantage is the depth of search can be varied between opening, middlegame, engame, and TB positions ect.

ChickenLogic wrote: ↑Thu Jul 30, 2020 12:30 am With weaker nets this will introduce more eval instability of course but I think a "more correct" eval will benefit search more than eval instability will hurt it. Maybe it is also possible to make NNUE viable on really old hw with "Gameply40x4" nets with smaller input layers.
I think this concept of multiple nets will work better for SF than for Leela. The most obvious reason is that it takes less positions to saturate NNUE nets and thus we can experiment much more.

The nets will eventually saturate regardless of size, so weaker nets would only be a temporary condition.

Regards,

Zenmastur

cma6 · Post by **cma6** » Thu Jul 30, 2020 3:54 am

Zenmastur, considering that Sergio's nets are about 20 MB, what would be the recommended hash size for the NNUE engine file, assuming one has ample RAM on the system?

ChickenLogic · Post by **ChickenLogic** » Thu Jul 30, 2020 4:48 am

Zenmastur wrote: ↑Thu Jul 30, 2020 3:30 am [...]
I think splitting it by number of plies played is a mistake. You can be in an endgame in 40 plies. Better to do it on the number of pieces and pawns left on the board. Every capture decrements the piece count by one so it's easy to keep track of. The other advantage is the depth of search can be varied between opening, middlegame, engame, and TB positions ect.
[...]

Very recently ttak added an arch that supports switching based on pieces left rather than gameply.

ChickenLogic · Post by **ChickenLogic** » Thu Jul 30, 2020 4:50 am

cma6 wrote: ↑Thu Jul 30, 2020 3:54 am Zenmastur, considering that Sergio's nets are about 20 MB, what would be the recommended hash size for the NNUE engine file, assuming one has ample RAM on the system?

Nodes take up the same amount of RAM as regular SF. I think there is no need for extra ram - just use what you would normally use for SF.

Zenmastur · Post by **Zenmastur** » Thu Jul 30, 2020 7:30 am

ChickenLogic wrote: ↑Thu Jul 30, 2020 4:48 am
Zenmastur wrote: ↑Thu Jul 30, 2020 3:30 am [...]
I think splitting it by number of plies played is a mistake. You can be in an endgame in 40 plies. Better to do it on the number of pieces and pawns left on the board. Every capture decrements the piece count by one so it's easy to keep track of. The other advantage is the depth of search can be varied between opening, middlegame, engame, and TB positions ect.
[...]
Very recently ttak added an arch that supports switching based on pieces left rather than gameply.

Is an EXE of this fork available?

mmt · Post by **mmt** » Thu Jul 30, 2020 7:51 am

ChickenLogic wrote: ↑Thu Jul 30, 2020 4:48 am Very recently ttak added an arch that supports switching based on pieces left rather than gameply.

Having weights for each piece might be even better. But the best would be to have a small net trained to select the best evaluation net.

chrisw · Post by **chrisw** » Thu Jul 30, 2020 9:08 am

Zenmastur wrote: ↑Thu Jul 30, 2020 3:30 am
ChickenLogic wrote: ↑Thu Jul 30, 2020 12:30 am Haven't read all of this thread so excuse me if it was already mentioned: tttak's fork has a lot of different architectures to try out. I'm interested in the "Gameply40x4" one which effectively holds 4 NNs in one file which get used sequentially switching every 40 plies until the 4th is loaded. This sacrifices no speed over the regular halfkp-nets with the same input layer size. It'll require at least 4x more training data though.
Well, considering Sergio is generating 1,000,000,000 fens several time per day

Rough maths suggests he has at least 1000 cores to do that. If we call several = 5

5 billion fens a day = 50000 fens per second, which if he samples at 25% implies 12500 games per second. One game every 0.08 ms. For massively fast bullet that needs 1000 cores at least, I think.
1000 cores would allow for game in 80ms

the amount of training doesn't seem to be a huge problem. So what if only one new net is released everyday or two on overage. The nets are coming out so fast they can't be tested properly before the next one is ready for testing. Slowing down a bit on the releases wouldn't be a bad thing. He's released 15 nets in the last two days. This would be reduced to about 3 if it takes 4 times as much data per release.

I think splitting it by number of plies played is a mistake. You can be in an endgame in 40 plies. Better to do it on the number of pieces and pawns left on the board. Every capture decrements the piece count by one so it's easy to keep track of. The other advantage is the depth of search can be varied between opening, middlegame, engame, and TB positions ect.

ChickenLogic wrote: ↑Thu Jul 30, 2020 12:30 am With weaker nets this will introduce more eval instability of course but I think a "more correct" eval will benefit search more than eval instability will hurt it. Maybe it is also possible to make NNUE viable on really old hw with "Gameply40x4" nets with smaller input layers.
I think this concept of multiple nets will work better for SF than for Leela. The most obvious reason is that it takes less positions to saturate NNUE nets and thus we can experiment much more.
The nets will eventually saturate regardless of size, so weaker nets would only be a temporary condition.

Regards,

Zenmastur

towforce · Post by **towforce** » Thu Jul 30, 2020 5:33 pm

Dann Corbit wrote: ↑Wed Jul 29, 2020 7:54 am
Ovyron wrote: ↑Tue Jul 28, 2020 10:22 pm
marsell wrote: ↑Tue Jul 28, 2020 11:01 am People need several expensive and power-hungry GPUs to beat Stockfish
What is interesting is that those people got the GPUs for Leela. They still have the GPUs. The GPUs didn't just disappear.
GPUs are marvelous, incredible things. I bought GPUs long before I found out they can be useful for chess. I have GPUs in all of my machines and boxes of GPUs besides.

You can test Mersenne Prime candidates.
You can fold proteins and cure diseases.
You can look for gravitational waves
It's not a case of what can you do with a GPU, it's what can't you do. I guess you can even stop a herd of stampeding Yak with GPUs.

The uses are endless. I think GPUs are the most fabulous thing (hardware-wise) on the surface of the earth. I can't wait to buy some more and some really smoking hot ones are just around the corner

Per previous post, GPUs are everything Dann says - equivalent of the top supercomputers of just 19 years ago - which is absolutely astonishing!

Very quickly, a couple of alternatives:

1. rent time in the cloud (thread link)

2. if you're only going to be training NNs, and not, say, modelling stampeding yaks, you'd probably be better off buying a TPU (tensor processing unit - a bit like a GPU, but all arithmetic in low precision). Google sell their TPU under the "Coral Edge" brand for some reason, and you can buy it as either a board or a USB device - link

towforce · Post by **towforce** » Thu Jul 30, 2020 6:19 pm

towforce wrote: ↑Thu Jul 30, 2020 5:33 pm2. if you're only going to be training NNs, and not, say, modelling stampeding yaks, you'd probably be better off buying a TPU (tensor processing unit - a bit like a GPU, but all arithmetic in low precision). Google sell their TPU under the "Coral Edge" brand for some reason, and you can buy it as either a board or a USB device - link

Apologies - those devices are for running a trained and compiled net - not for learning.

Could still be useful for running chess engines with large, trained nets.

cma6 · Post by **cma6** » Thu Jul 30, 2020 8:23 pm

Nodes take up the same amount of RAM as regular SF. I think there is no need for extra ram - just use what you would normally use for SF.

Thanks, CL.

SF-NNUE going forward...

Re: SF-NNUE going forward...

Recommended hash size for NNUE.exe

Re: SF-NNUE going forward...

Re: Recommended hash size for NNUE.exe

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: Recommended hash size for NNUE.exe