connor_mcmonigle wrote: ↑Thu Feb 11, 2021 6:45 pm
The culmination of all your efforts implementing the underlying tools to train a network is the ability to "press go" and then sit back and watch as your training tools produce networks for you. Pressing go is the easiest part in all of this! Sure, you'll end up having to fiddle/experiment with some hyperparameters and likely "press go" several times to produce a strong network. This requires some perseverance, but using the tools is the easy part and that's all Albert has done to the extent that I am aware.
The effort invested by Albert in training a network using tools written by other people is not remotely comparable to the effort invested by those having written the training tools, let alone the effort invested by those developing an engine such as Stockfish.
As such, I'm firmly of the opinion that CCRL should not list FF2 as a separate engine. It should be listed as what it is: a version of Stockfish with the parameters of the evaluation function changed. Listing FF2 as a separate engine misinforms users of CCRL's data, falsely leading them to believe that FF2 is a separate engine.
I have a somewhat different perspective.
Tools:
I’ve written tools to train networks, generate data, etc., in chess and other domains. My latest experiment for bad gyal using pytorch took me about 30 minutes to write using lightning. Most of the effort was writing a dataset class to transform the raw data into tensors.
The lc0 and sf teams had a harder time of things as they were trying to reimplement something (AlphaZero and nodchip, respectively) rather than create something new from scratch.
Data:
Generating data is expensive, and if you don’t capture all the data you want, you might have to go back and regenerate. That’s why lots of people use existing data, like t60, sfen datasets from the stockfish project, ccrl game data, etc. If you want new training targets and/or inputs, or different types of data (like from an mcts/nn rather than an ab engine) you can’t get around generating your own.
Engines:
It’s got to be fast enough to do reasonable tests, but flexible enough to be able to try new net types. Lc0 supports many backends optimized for speed and changing them to support a new net type has been somewhat expensive. SF seems a little more flexible in this regard, so may have an advantage. I have moved to torchscript for my backend. Reasonably fast and the networks can be a black box aside from the inputs and outputs.
A simple mcts is easier to write than a simple ab, in my experience, given the search tricks and techniques you have to layer on in the ab case. Most of the complexity in the mcts case has to do with the gpu.
Training:
Training NN’s continues to be more of an engineering discipline than a science. Tweaking hyper parameters, trying new RL data generations, are based on experience, guess work and expensive, time consuming experiments.
Conclusion:
I’ve found training — the tweaking and the failures — to be the hardest and most time consuming part. For evidence, just go through the last few years of the leela chess discord.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".