Surprise: Official release version of Stockfish 14.1

Albert Silver · Post by **Albert Silver** » Thu Oct 28, 2021 6:36 pm

dkappe wrote: ↑Thu Oct 28, 2021 6:29 pm
Paloma wrote: ↑Thu Oct 28, 2021 5:54 pm What the heck is RL ?
Two abbreviations important to game neural network training: SL = supervised learning, RL = reinforcement learning. What Leela, AlphaZero and Dragon were/are doing is RL, so the training data from the improving nets is feeding back into the training. Stockfish is currently doing SL, so just a big batch of data from leela that they try to massage into a better net.

Although no one is really doing it now, there is also SSL, semi-supervised learning, which is actually a combination of the two. The first Fat Fritz worked that way. It was trained on 100% human games ranging from 2200 Elo to World Champion caliber, plus engine games. This supervised learning took it past the first stage, after which it then built on that through RL.

Uri Blass · Post by **Uri Blass** » Thu Oct 28, 2021 9:48 pm

bmp1974 wrote: ↑Thu Oct 28, 2021 8:54 am I feel there was nothing wrong in releasing it as Stockfish 15. After all, 17 elo gain over SF 14 is not a small thing at this level.

I doubt if there is really 17 elo.

17 elo at bullet time control may be translated to clearly less than it in long time control so I am even not sure 14.1 is going to be better than 14 in long time control of CCRL or CEGT.

Sopel · Post by **Sopel** » Fri Oct 29, 2021 12:11 am

Uri Blass wrote: ↑Thu Oct 28, 2021 9:48 pm
bmp1974 wrote: ↑Thu Oct 28, 2021 8:54 am I feel there was nothing wrong in releasing it as Stockfish 15. After all, 17 elo gain over SF 14 is not a small thing at this level.
I doubt if there is really 17 elo.

17 elo at bullet time control may be translated to clearly less than it in long time control so I am even not sure 14.1 is going to be better than 14 in long time control of CCRL or CEGT.

Obviously, you could always say the same thing regardless of how the elo was measured... Because it's not an absolute metric and depends on the testing conditions... And that's why the Stockfish's testing conditions are very transparent.

dkappe · Post by **dkappe** » Fri Oct 29, 2021 12:22 am

Albert Silver wrote: ↑Thu Oct 28, 2021 6:36 pm
Although no one is really doing it now, there is also SSL, semi-supervised learning, which is actually a combination of the two. The first Fat Fritz worked that way. It was trained on 100% human games ranging from 2200 Elo to World Champion caliber, plus engine games. This supervised learning took it past the first stage, after which it then built on that through RL.

Although it’s pretty far in the rear view mirror, Dragon started out with data from Komodo 14 and Komodo 14.1, so technically semi-supervised. But it’s all RL for quite some time now.

MikeB · Post by **MikeB** » Fri Oct 29, 2021 3:58 am

dkappe wrote: ↑Thu Oct 28, 2021 6:29 pm
Paloma wrote: ↑Thu Oct 28, 2021 5:54 pm What the heck is RL ?
Two abbreviations important to game neural network training: SL = supervised learning, RL = reinforcement learning. What Leela, AlphaZero and Dragon were/are doing is RL, so the training data from the improving nets is feeding back into the training. Stockfish is currently doing SL, so just a big batch of data from leela that they try to massage into a better net.

Note that SF started out with RL which produced their initial strong nets.

Please define RL - I suppose it might mean training one net on top of another or is it something else?

dkappe · Post by **dkappe** » Fri Oct 29, 2021 4:22 am

MikeB wrote: ↑Fri Oct 29, 2021 3:58 am
dkappe wrote: ↑Thu Oct 28, 2021 6:29 pm
Paloma wrote: ↑Thu Oct 28, 2021 5:54 pm What the heck is RL ?
Two abbreviations important to game neural network training: SL = supervised learning, RL = reinforcement learning. What Leela, AlphaZero and Dragon were/are doing is RL, so the training data from the improving nets is feeding back into the training. Stockfish is currently doing SL, so just a big batch of data from leela that they try to massage into a better net.

Note that SF started out with RL which produced their initial strong nets.
Please define RL - I suppose it might mean training one net on top of another or is it something else?

Oof, that’s a broad topic. In this case we are using approaches similar to what AlphaZero did with TD-leaf. https://kstatic.googleusercontent.com/f ... 45072dc7bd

We are taking a network and using it to generate training data, then training or retraining a stronger network with that data and using that second network to generate new training data. Rinse, repeat until you no longer improve.

MikeB · Post by **MikeB** » Fri Oct 29, 2021 4:32 am

dkappe wrote: ↑Fri Oct 29, 2021 4:22 am
MikeB wrote: ↑Fri Oct 29, 2021 3:58 am
dkappe wrote: ↑Thu Oct 28, 2021 6:29 pm
Paloma wrote: ↑Thu Oct 28, 2021 5:54 pm What the heck is RL ?
Two abbreviations important to game neural network training: SL = supervised learning, RL = reinforcement learning. What Leela, AlphaZero and Dragon were/are doing is RL, so the training data from the improving nets is feeding back into the training. Stockfish is currently doing SL, so just a big batch of data from leela that they try to massage into a better net.

Note that SF started out with RL which produced their initial strong nets.
Please define RL - I suppose it might mean training one net on top of another or is it something else?
Oof, that’s a broad topic. In this case we are using approaches similar to what AlphaZero did with TD-leaf. https://kstatic.googleusercontent.com/f ... 45072dc7bd

We are taking a network and using it to generate training data, then training or retraining a stronger network with that data and using that second network to generate new training data. Rinse, repeat until you no longer improve.

Thanks, the current SF net went a through a series of learnings, for the most part always usng the prior net as the base and the at the end, it went through a series SPSA tunings that look like this. https://github.com/official-stockfish/S ... a5a85a66b9
It eventually reaches a point where learnings or tunings do not add any additional Elo (enough to pass anyway).
One aspect, that is quite evident, when compared to pre-NNUE Stockfish , it is hundreds and hundreds of Elo stronger at low depth 1 to 7.

dkappe · Post by **dkappe** » Fri Oct 29, 2021 4:45 am

MikeB wrote: ↑Fri Oct 29, 2021 4:32 am
Thanks, the current SF net went a through a series of learnings, for the most part always usng the prior net as the base and the at the end, it went through a series SPSA tunings that look like this. https://github.com/official-stockfish/S ... a5a85a66b9
It eventually reaches a point where learnings or tunings do not add any additional Elo (enough to pass anyway).
One aspect, that is quite evident, when compared to pre-NNUE Stockfish , it is hundreds and hundreds of Elo stronger at low depth 1 to 7.

The old, pre SF12 nets did go through this process, with dozens or even hundreds of rounds of data generation and training.

The current ones do not (though there is a SPSA optimization of the final layer at the end). If you look at the current model, you’ll see there are some basic things missing which cause the model to run out of gas after 3-4 rounds of training. So, the current process uses a leela data trained net as it’s base (possibly starting from an earlier net), followed by a some Stockfish and corrective data. Doing a few hundred rounds of training and generation just doesn’t work with the current model.

MikeB · Post by **MikeB** » Fri Oct 29, 2021 7:57 am

dkappe wrote: ↑Fri Oct 29, 2021 4:45 am
MikeB wrote: ↑Fri Oct 29, 2021 4:32 am
Thanks, the current SF net went a through a series of learnings, for the most part always usng the prior net as the base and the at the end, it went through a series SPSA tunings that look like this. https://github.com/official-stockfish/S ... a5a85a66b9
It eventually reaches a point where learnings or tunings do not add any additional Elo (enough to pass anyway).
One aspect, that is quite evident, when compared to pre-NNUE Stockfish , it is hundreds and hundreds of Elo stronger at low depth 1 to 7.
The old, pre SF12 nets did go through this process, with dozens or even hundreds of rounds of data generation and training.

The current ones do not (though there is a SPSA optimization of the final layer at the end). If you look at the current model, you’ll see there are some basic things missing which cause the model to run out of gas after 3-4 rounds of training. So, the current process uses a leela data trained net as it’s base (possibly starting from an earlier net), followed by a some Stockfish and corrective data. Doing a few hundred rounds of training and generation just doesn’t work with the current model.

Interesting, so do you think genererating new data, using the current net and the process you describe, should push it higher?

dkappe · Post by **dkappe** » Fri Oct 29, 2021 8:35 am

MikeB wrote: ↑Fri Oct 29, 2021 7:57 am
dkappe wrote: ↑Fri Oct 29, 2021 4:45 am Doing a few hundred rounds of training and generation just doesn’t work with the current model.
Interesting, so do you think genererating new data, using the current net and the process you describe, should push it higher?

Nope. The current model just isn’t set up for RL. Compare with the leela models or some of the fully connected models from OpenAI Gym. You’ll note there are some things missing.

Surprise: Official release version of Stockfish 14.1

Re: Surprise: Official release version of Stockfish 14.1

Re: Surprise: Official release version of Stockfish 14.1

Re: Surprise: Official release version of Stockfish 14.1

Re: Surprise: Official release version of Stockfish 14.1

Re: Surprise: Official release version of Stockfish 14.1

Re: Surprise: Official release version of Stockfish 14.1

Re: Surprise: Official release version of Stockfish 14.1

Re: Surprise: Official release version of Stockfish 14.1

Re: Surprise: Official release version of Stockfish 14.1

Re: Surprise: Official release version of Stockfish 14.1