NNUE - only from own engine?

MartinBryant · Post by **MartinBryant** » Mon Oct 25, 2021 5:00 pm

This is a difficult subject and I'm sure there are as many opinions as programmers.
Personally, at the moment, I am still enjoying working on other things in Colossus (including tweaking the HCE) and so I haven't had to make any moral decisions yet regarding NNUEs. My current thinking is that I may kick that can down the road for a while yet!

I certainly understand the strong feelings against using other program's NNUEs. It just feels wrong somehow.
However, NNUEs also feel very similar to endgame tablebases... a black box containing a bunch of numbers with some probing code.

I wonder if there would be the same 'concerns' about NNUEs if they only gave a handful of ELO (like EGTBs) rather than several hundred?
Also consider the reverse... if EGTBs had proven to give several hundred ELO I imagine that they would have been much more jealously guarded in the past and would we now be berating people who just include someone elses endgame tables?

I'm sure this debate will rage on for some time and perhaps the status quo in a few years will be different than it is now?

connor_mcmonigle · Post by **connor_mcmonigle** » Mon Oct 25, 2021 5:13 pm

MartinBryant wrote: ↑Mon Oct 25, 2021 5:00 pm This is a difficult subject and I'm sure there are as many opinions as programmers.
Personally, at the moment, I am still enjoying working on other things in Colossus (including tweaking the HCE) and so I haven't had to make any moral decisions yet regarding NNUEs. My current thinking is that I may kick that can down the road for a while yet!

I certainly understand the strong feelings against using other program's NNUEs. It just feels wrong somehow.
However, NNUEs also feel very similar to endgame tablebases... a black box containing a bunch of numbers with some probing code.

I wonder if there would be the same 'concerns' about NNUEs if they only gave a handful of ELO (like EGTBs) rather than several hundred?
Also consider the reverse... if EGTBs had proven to give several hundred ELO I imagine that they would have been much more jealously guarded in the past and would we now be berating people who just include someone elses endgame tables?

I'm sure this debate will rage on for some time and perhaps the status quo in a few years will be different than it is now?

I think the distinction between EGTB and copying Stockfish's evaluation function lies in the fact that the evaluation function has an outsized impact on the engine's style of play relative to EGTB. There is no notion of style, no subjectivity, in <= 7 man position nor any real subjectivity for the simple late endgame positions easily solved with EGTB. For basically all other position types, EGTB has little to no bearing on playing strength nor style.

ChickenLogic · Post by **ChickenLogic** » Mon Oct 25, 2021 5:31 pm

MartinBryant wrote: ↑Mon Oct 25, 2021 5:00 pm This is a difficult subject and I'm sure there are as many opinions as programmers.
Personally, at the moment, I am still enjoying working on other things in Colossus (including tweaking the HCE) and so I haven't had to make any moral decisions yet regarding NNUEs. My current thinking is that I may kick that can down the road for a while yet!

I certainly understand the strong feelings against using other program's NNUEs. It just feels wrong somehow.
However, NNUEs also feel very similar to endgame tablebases... a black box containing a bunch of numbers with some probing code.

I wonder if there would be the same 'concerns' about NNUEs if they only gave a handful of ELO (like EGTBs) rather than several hundred?
Also consider the reverse... if EGTBs had proven to give several hundred ELO I imagine that they would have been much more jealously guarded in the past and would we now be berating people who just include someone elses endgame tables?

I'm sure this debate will rage on for some time and perhaps the status quo in a few years will be different than it is now?

Table bases are anything but black boxes. They're not human readable without 'translation' but that doesn't make them a black box.
Regardless, it's not about whether using freely available data and code is immoral (hint: it's not). It is simply about how people present their work. In case of fire the author was dishonest for the longest time and didn't even spell the name of the NN creator correctly - even after he had been told to correct it multiple times.
It's all about people not having the balls to admit that their work is just a small increment (if any) on top of another ones work.

connor_mcmonigle · Post by **connor_mcmonigle** » Mon Oct 25, 2021 5:39 pm

What tournaments, testers, etc. want to see from engines is their doing something at least somewhat unique with respect to their evaluation functions. It's no great wonder that using Stockfish's inference code, Stockfish's network architecture, Stockfish's training code and the same training data as Stockfish yields a boring engine from the perspective of testers. You've just recreated or copied (in the case of engines like Fire) Stockfish's evaluation function and paired it with an inferior search.

If you're having fun doing this, then more power to you, but no one is obligated to test or care about your engine. While TCEC has made their guidelines less explicit, I fully expect them to maintain similar uniqueness requirements for participation. I think this is a good change as it allows for more subjectivity in what we mean by uniqueness. Already, we have engines participating at TCEC such as Winter, which uses a neural network trained on Stockfish self play game results. However, Winter employs a radically different network architecture, incorporating many hand crafted features and a king+pawn specific convolutional subnetwork. It's certainly a very interesting engine and certainly sufficiently unique, with respect to its evaluation function, to participate. Unique training data is just one of many paths to sufficient uniqueness for participation (and the most boring imho). Even a radically different approach to search, under certain circumstances, could be sufficient grounds for participation on its own.

On the other hand, generating your own training data, starting from a randomly initialized network, is not as difficult as many seem to think and yields very competitive results. While large networks can require prohibitively copious amounts of data, generating data for smaller networks (such as those used in the likes of Halogen, Koivisto, Berserk, Bit-Genie, Zahak and many others) should be very accessible. Seer and many other OpenBench engines alike, will remain unique in training code, inference code, network architecture/input features and the self play data they are trained on.

ChickenLogic · Post by **ChickenLogic** » Mon Oct 25, 2021 5:58 pm

connor_mcmonigle wrote: ↑Mon Oct 25, 2021 5:39 pm What tournaments, testers, etc. want to see from engines is their doing something at least somewhat unique with respect to their evaluation functions. It's no great wonder that using Stockfish's inference code, Stockfish's network architecture, Stockfish's training code and the same training data as Stockfish yields a boring engine from the perspective of testers. You've just recreated or copied (in the case of engines like Fire) Stockfish's evaluation function and paired it with an inferior search.

If you're having fun doing this, then more power to you, but no one is obligated to test or care about your engine. While TCEC has made their guidelines less explicit, I fully expect them to maintain similar uniqueness requirements for participation. I think this is a good change as it allows for more subjectivity in what we mean by uniqueness. Already, we have engines participating at TCEC such as Winter, which uses a neural network trained on Stockfish self play game results. However, Winter employs a radically different network architecture, incorporating many hand crafted features and a king+pawn specific convolutional subnetwork. It's certainly a very interesting engine and certainly sufficiently unique, with respect to its evaluation function, to participate. Unique training data is just one of many paths to sufficient uniqueness for participation (and the most boring imho). Even a radically different approach to search, under certain circumstances, could be sufficient grounds for participation on its own.

On the other hand, generating your own training data, starting from a randomly initialized network, is not as difficult as many seem to think and yields very competitive results. While large networks can require prohibitively copious amounts of data, generating data for smaller networks (such as those used in the likes of Halogen, Koivisto, Berserk, Bit-Genie, Zahak and many others) should be very accessible. Seer and many other OpenBench engines alike, will remain unique in training code, inference code, network architecture/input features and the self play data they are trained on.

And again it shows that people here talking about NNUE know very little about how much it takes to do properly. Generating your own data that is just about half way decent and confirming it as such takes Stockfish well over a couple of weeks, and that is with multiple V100s and threadrippers. You also need multiple full training runs to confirm your new method isn't a fluke. If you don't happen to have multiple PCs in the range of multiple thousand of dollars there is no way you produce anything close to top engines without taking 'their' data. There are people in the Stockfish project that exclusively train neural nets. You sir just see that Ethereal and Halogen progress quickly but fail to see the insane amount of hardware they have in the background aside from open bench. OpenBench is also 'sponsored' by noob. You really think a single engine author can compete with a guy who has his own data center? And even if you join openbench, you still need insane amounts of hardware for training on your own.

Madeleine Birchfield · Mon Oct 25, 2021 6:09 pm

ChickenLogic wrote: ↑Mon Oct 25, 2021 5:58 pm And again it shows that people here talking about NNUE know very little about how much it takes to do properly. Generating your own data that is just about half way decent and confirming it as such takes Stockfish well over a couple of weeks, and that is with multiple V100s and threadrippers. You also need multiple full training runs to confirm your new method isn't a fluke. If you don't happen to have multiple PCs in the range of multiple thousand of dollars there is no way you produce anything close to top engines without taking 'their' data. There are people in the Stockfish project that exclusively train neural nets. You sir just see that Ethereal and Halogen progress quickly but fail to see the insane amount of hardware they have in the background aside from open bench. OpenBench is also 'sponsored' by noob. You really think a single engine author can compete with a guy who has his own data center? And even if you join openbench, you still need insane amounts of hardware for training on your own.

Halogen prpgressed quickly because Andrew Grant wanted an engine to test his net trainer and ended up using Halogen as his guinea pig. The hardware used to train Halogen and Ethereal was indeed provided by noobpwnftw. In December 2020, noobpwnftw shut down and restarted his server and Andrew Grant lost weeks of training for Ethereal, and i remember him being very upset over that loss.

connor_mcmonigle · Post by **connor_mcmonigle** » Mon Oct 25, 2021 6:12 pm

ChickenLogic wrote: ↑Mon Oct 25, 2021 5:58 pm ...

And again it shows that people here talking about NNUE know very little about how much it takes to do properly. Generating your own data that is just about half way decent and confirming it as such takes Stockfish well over a couple of weeks, and that is with multiple V100s and threadrippers. You also need multiple full training runs to confirm your new method isn't a fluke. If you don't happen to have multiple PCs in the range of multiple thousand of dollars there is no way you produce anything close to top engines without taking 'their' data. There are people in the Stockfish project that exclusively train neural nets. You sir just see that Ethereal and Halogen progress quickly but fail to see the insane amount of hardware they have in the background aside from open bench. OpenBench is also 'sponsored' by noob. You really think a single engine author can compete with a guy who has his own data center? And even if you join openbench, you still need insane amounts of hardware for training on your own.

I guess you didn't read my post very carefully.
I've generated about 2B d8 self play games which I use for training Seer on a R5 3600. I train solely using a GTX 950. Seer is a top 15 engine.
Koivisto is trained on 500M mixed depth self play games generated using comparatively modest hardware. Koivisto's training code is CPU based. Koivisto is a top 10 engine (at least).
BitGenie is a 3100 elo engine. It's selfplay data was generated on 4 thread laptop and it was also trained on said laptop.

So no, it doesn't take weeks of compute with "multiple V100s and threadrippers" to obtain competitive results. Maybe that's what it would take to beat Stockfish, but that's not what I claimed.

Madeleine Birchfield · Mon Oct 25, 2021 6:14 pm

connor_mcmonigle wrote: ↑Mon Oct 25, 2021 5:39 pm While TCEC has made their guidelines less explicit, I fully expect them to maintain similar uniqueness requirements for participation. I think this is a good change as it allows for more subjectivity in what we mean by uniqueness.

This is what should have always been the case with TCEC, less explicit guidelines that could allow for things such as Stockfish's use of Leela's data.

Madeleine Birchfield · Mon Oct 25, 2021 6:25 pm

I suppose to mention here that Connor McMonigle has used Lichess games to train Seer. As Lichess games are not the engine's own games, should the rating lists not test Seer?

A good compromise could be that an engine should not be trained with a majority of its data from a single other engine. Most people's complaints about using external data is somebody taking the data entirely from Stockfish and using that to train their nets, as that would just create another Stockfish net. However, restricting data to that generated from an engine's own games is too restrictive, and would eliminate half of the NNUE engines on the rating lists (Nemorino, Stockfish, Seer, et cetera).

connor_mcmonigle · Post by **connor_mcmonigle** » Mon Oct 25, 2021 6:40 pm

Madeleine Birchfield wrote: ↑Mon Oct 25, 2021 6:25 pm I suppose to mention here that Connor McMonigle has used Lichess games to train Seer. As Lichess games are not the engine's own games, should the rating lists not test Seer?

A good compromise could be that an engine should not be trained with a majority of its data from a single other engine. Most people's complaints about using external data is somebody taking the data entirely from Stockfish and using that to train their nets, as that would just create another Stockfish net. However, restricting data to that generated from an engine's own games is too restrictive, and would eliminate half of the NNUE engines on the rating lists (Nemorino, Stockfish, Seer, et cetera).

You don't seem to know what you're talking about and, therefore, don't seem to have much to add to this conversation. My retrograde learning approach didn't use the game results from human games. Rather it just sampled the positions randomly from human games to get a reasonable distribution over "normal" chess positions. All evaluations originated from Seer + EGTB. Subsequently, the network produced by way of this process was used to initialize the network later used to generate the aforementioned 2B position dataset through multiple self play iterations.

Furthermore, Seer doesn't even use Stockfish's training code, inference code, architecture, etc. In fact, Seer was "nnue" before Stockfish.

NNUE - only from own engine?

Re: NNUE - only from own engine?

Re: NNUE - only from own engine?

Re: NNUE - only from own engine?

Re: NNUE - only from own engine?

Re: NNUE - only from own engine?

Re: NNUE - only from own engine?

Re: NNUE - only from own engine?

Re: NNUE - only from own engine?

Re: NNUE - only from own engine?

Re: NNUE - only from own engine?