3. My preference is that I am in favor that any programmer creates his own network either from human game collections or from his own engine.
This is the ideal. Otherwise, you and a group of others essentially generate the same networks and then use them in your engine. Which is a roundabout way to get to option 2, and then that is equal to option 1. If everyone now trains a network on Leela data ... well congrats everyone now has a disgusting level of similarity.
1. My preference is that I am allowed to use any existing network for my engine.
This option is disgusting. HCE hardly matters in engines now, so its search and eval. For a long time now, search has been less different between engines than eval. Eval was -- and still could be -- the major driver of diversity in engine development.
3. My preference is that I am in favor that any programmer creates his own network either from human game collections or from his own engine.
This is the ideal.
I knew that but what about staters, should they first must write a good HCE before they can move to the real thing, networks? Seem unfair to me.
Otherwise, you and a group of others essentially generate the same networks and then use them in your engine. Which is a roundabout way to get to option 2, and then that is equal to option 1. If everyone now trains a network on Leela data ... well congrats everyone now has a disgusting level of similarity.
Out of curiosity, can you offer evidence for your (disgusting level of similarity) claim? I am pretty sure you aware SIMEX is dead when it comes to NNUE.
90% of coding is debugging, the other 10% is writing bugs.
3. My preference is that I am in favor that any programmer creates his own network either from human game collections or from his own engine.
This is the ideal.
I knew that but what about staters, should they first must write a good HCE before they can move to the real thing, networks? Seem unfair to me.
Otherwise, you and a group of others essentially generate the same networks and then use them in your engine. Which is a roundabout way to get to option 2, and then that is equal to option 1. If everyone now trains a network on Leela data ... well congrats everyone now has a disgusting level of similarity.
Out of curiosity, can you offer evidence for your (disgusting level of similarity) claim? I am pretty sure you aware SIMEX is dead when it comes to NNUE.
You can start with zero. Halogen author right now is starting from essentially nothing and building up. Material eval and a PSQT by hand is enough to kick start a neural network. So I think there is no special case for starters.
As for your second statement: No. I don't have some metric to pull out of my ass. However, what is the goal of an NN trained with GD? Your goal is to find weights for a function such that you best mimic the data that you are training on. Sure, the networks will be different, and behave differently. But the goal is to mimic Leela's evaluation patterns. Therefore, I would think, if the training is any good, many attempts will converge in some regards.
3. My preference is that I am in favor that any programmer creates his own network either from human game collections or from his own engine.
This is the ideal. Otherwise, you and a group of others essentially generate the same networks and then use them in your engine. Which is a roundabout way to get to option 2, and then that is equal to option 1. If everyone now trains a network on Leela data ... well congrats everyone now has a disgusting level of similarity.
Handcrafted evaluations are just degenerate one-node neural networks, so option 3 would be essentially banning just about any programmer from tuning their engine with data outside their own engine. You would be banning a lot of free and commercial engines from the past 20 years because they used CCRL data that had some Crafty vs Fruit games or Lichess games that had a Stockfish bot vs different Stockfish bot game mixed in amongst the human games. That is nowhere near ideal whatsoever.
3. My preference is that I am in favor that any programmer creates his own network either from human game collections or from his own engine.
This is the ideal. Otherwise, you and a group of others essentially generate the same networks and then use them in your engine. Which is a roundabout way to get to option 2, and then that is equal to option 1. If everyone now trains a network on Leela data ... well congrats everyone now has a disgusting level of similarity.
Handcrafted evaluations are just degenerate one-node neural networks, so option 3 would be essentially banning just about any programmer from tuning their engine with data outside their own engine. You would be banning a lot of free and commercial engines from the past 20 years because they used CCRL data that had some Crafty vs Fruit games or Lichess games that had a Stockfish bot vs different Stockfish bot game mixed in amongst the human games. That is nowhere near ideal whatsoever.
While I'd agree that the distinction between neural networks and linear models isn't super clear, HCE definitely exhibit far more "character" dependent solely upon the respective author's chosen evaluation terms. If everyone was using neural networks as varied in topology as evaluation functions in their terms and layout, there wouldn't be an issue in everyone using exactly the same data. But as of today, we have a bunch of engines copying the halfkp architecture and code from Stockfish. It's subjective, but copying data and NN code+arch from another engine and you've really just gone too far imho (what's the point of even training your own net at that point?). This is the stance that TCEC has more or less adopted and it seems a reasonable compromise to me.
Ultimately, do whatever you want (besides being dishonest/misleading or violating licenses) and have fun with computer chess, but don't whine if tournaments and rating lists aren't interested in your engine if you're taking so much from another engine.
I think people really overestimate the difficulty associated with achieving option 3. Generating ~100 million labeled positions from fixed low depth games needed to basically saturate a 768->512->1 network and even train a reasonable halfkp net takes maybe two days on modern hardware. Worst case, if your hardware is ancient, game generation is going to take a week of your time.
connor_mcmonigle wrote: ↑Sat Jul 03, 2021 4:36 am
While I'd agree that the distinction between neural networks and linear models isn't super clear, HCE definitely exhibit far more "character" dependent solely upon the respective author's chosen evaluation terms. If everyone was using neural networks as varied in topology as evaluation functions in their terms and layout, there wouldn't be an issue in everyone using exactly the same data. But as of today, we have a bunch of engines copying the halfkp architecture and code from Stockfish. It's subjective, but copying data and NN code+arch from another engine and you've really just gone too far imho (what's the point of even training your own net at that point?). This is the stance that TCEC has more or less adopted and it seems a reasonable compromise to me.
I would rather loosen the requirements on the source of data to option 2, and require the inference code and ideally neural network architecture as well to be written from scratch by the engine author, rather than the other way around as it currently is in ratings lists and at tournaments. This also follows the tradition of handcrafted evaluations: they tend to get written from scratch, but allowed to be tuned using whatever data they want.
3. My preference is that I am in favor that any programmer creates his own network either from human game collections or from his own engine.
This is the ideal.
I knew that but what about staters, should they first must write a good HCE before they can move to the real thing, networks? Seem unfair to me.
Otherwise, you and a group of others essentially generate the same networks and then use them in your engine. Which is a roundabout way to get to option 2, and then that is equal to option 1. If everyone now trains a network on Leela data ... well congrats everyone now has a disgusting level of similarity.
Out of curiosity, can you offer evidence for your (disgusting level of similarity) claim? I am pretty sure you aware SIMEX is dead when it comes to NNUE.
You can start with zero. Halogen author right now is starting from essentially nothing and building up. Material eval and a PSQT by hand is enough to kick start a neural network. So I think there is no special case for starters.
Okay, I take your word for it.
As for your second statement: No. I don't have some metric to pull out of my ass. However, what is the goal of an NN trained with GD? Your goal is to find weights for a function such that you best mimic the data that you are training on. Sure, the networks will be different, and behave differently. But the goal is to mimic Leela's evaluation patterns. Therefore, I would think, if the training is any good, many attempts will converge in some regards.
So far (emphasis added) my conclusion is that (your) theory and practice collide. In HCE when you make a minor change in the evaluation the move similarity will in the 90%+ area. Not so with NNUE, a minor change in the network will cause 40-55% move similarity. Hence, at the time, I declared SIMEX dead for NNUE.
90% of coding is debugging, the other 10% is writing bugs.
3. My preference is that I am in favor that any programmer creates his own network either from human game collections or from his own engine.
This is the ideal.
I knew that but what about staters, should they first must write a good HCE before they can move to the real thing, networks? Seem unfair to me.
Otherwise, you and a group of others essentially generate the same networks and then use them in your engine. Which is a roundabout way to get to option 2, and then that is equal to option 1. If everyone now trains a network on Leela data ... well congrats everyone now has a disgusting level of similarity.
Out of curiosity, can you offer evidence for your (disgusting level of similarity) claim? I am pretty sure you aware SIMEX is dead when it comes to NNUE.
You can start with zero. Halogen author right now is starting from essentially nothing and building up. Material eval and a PSQT by hand is enough to kick start a neural network. So I think there is no special case for starters.
Okay, I take your word for it.
As for your second statement: No. I don't have some metric to pull out of my ass. However, what is the goal of an NN trained with GD? Your goal is to find weights for a function such that you best mimic the data that you are training on. Sure, the networks will be different, and behave differently. But the goal is to mimic Leela's evaluation patterns. Therefore, I would think, if the training is any good, many attempts will converge in some regards.
So far (emphasis added) my conclusion is that (your) theory and practice collide. In HCE when you make a minor change in the evaluation the move similarity will in the 90%+ area. Not so with NNUE, a minor change in the network will cause 40-55% move similarity. Hence, at the time, I declared SIMEX dead for NNUE.
Perhaps my theory is wrong. But the basic premise is clear, I feel. If I have everyone on talkchess draw me a picture of a rabbit, I will get hundreds of different pictures. None of them will be the same, and there will be sufficient differences such that no one would claim one image was made in the likeness of the other; However all drawing would share similar features. Not just the fact that they are rabbits, but basic things. Everyone would probably draw the ears upright, and shade the inside. But more nuanced things would be different.
Is this similar? I'm positing that it is, but as I admit, my theory is only as good as my presuppositions.
connor_mcmonigle wrote: ↑Sat Jul 03, 2021 4:47 am
I think people really overestimate the difficulty associated with achieving option 3. Generating ~100 million labeled positions from fixed low depth games needed to basically saturate a 768->512->1 network and even train a reasonable halfkp net takes maybe two days on modern hardware. Worst case, if your hardware is ancient, game generation is going to take a week of your time.
On my 8 cores i7 and GTX 1060, here are the figures :
- data generation for 1B "good enough" positions (d5 selfplay, qsearch leaf of pv move, d8 to d12 search based on game phase) : 3 weeks
- training (around 300 epoch of 100M fens, current Minic net topology) : a week
- testing (all nets saved at each epoch is tested versus previous master net) : a week (this generates this kind of picture : https://github.com/tryingsomestuff/NNUE ... losity.png)
So basically, 1 net == 1 month for Minic
Fortunately I'm able to rent some more hardware on the cloud that is generating data from last version of Minic while my own computer is training and testing.