Update on earlier
Alright, I have used the majority of my afternoon/evening finishing the training algorithm. And I am finally done
The files can all be found here:
https://github.com/BimmerBass/Loki/tree/neural/Loki/lnn
The nice thing about it is, IMO, that it is completely independent of Loki. This means that the whole subdirectory can simply be copied and inserted into another engine. The only changes that will be needed are related to the incremental updates in make/unmake move, the initialization/UCI-setup and the insertion into the existing evaluation function. This is also done in Loki, so it should be really easy to implement for others.
With all that said, I do not think this network will be state-of-the-art in any way, and the primary reason for making it engine independent was for the challenge.
Now on to the final implementation
The implementation is rather simple compared to other deep learning models.
The network itself:
The network is a normal multi layer perceptron with the architecture 768x256x32x32x1. This has been chosen quite arbitrarily, but it is kind of a mix of Halogen's and SF NNUE's architectures. I think it is good since it keeps a rather good complexity without having thousands of inputs like SF NNUE. The activation functions of the hidden layers is a simple ReLU (R(x) = max(0, x)), and the output doesn't have an activation function.
The networks can be read from either a CSV file or a binary file with a ".lnn" extension. These file formats are written by the training framework, I have developed.
Regarding use of the network in an engine, it is made to be used the following way:
- At startup the network will be initialized with all parameters set to zero. This is done in the constructor, so no methods have to be created. One thing to note is that for engines that are multithreaded, the network needs to be inside the engine's thread class since it would otherwise risk being partly overwritten while forward propagating. Additionally, use of the network should be disabled at engine startup.
- A UCI function like "Use_LNN" should be implemented which would take a file-path as parameter. At the moment, the network implementation aborts the program's execution if the file can't be opened or doesn't exist, but this will probably be changed in the near future to just returning a boolean value indicating whether the file-read was succesful or not.
- When using the network, the initial position given by the GUI should be loaded into the network before searching. Then, when making and unmaking moves, the do_incremental and undo_incremental methods should be inserted into said functions. Note: The incremental updates are heavily inspired by Halogen's implementation.
- The network's evaluation can be used in the evaluation function however the user likes. I think a proper way to do it will be to just use the network's score, add a tempo bonus, and scale by the fifty-move rule.
The training framework:
This is also quite simplistic at the moment. It works the following way:
- I initialize a Trainer class which inherits from the Network class. In the constructor, a CSV datafile is loaded, which holds 768 boolean numbers as the position representation and a score, for each row. All hyperparameters are also set up here. These include:
- As said, a path to a file with the training data.
- The number of epochs to run the algorithm for.
- The batch size to use.
- The loss function to use. There are currently two types: Mean squared error (1/n * sum((ai - yi)^2)) and Average absolute error (1/n * sum(|ai - yi|))
- The amount of threads to use (default: 1).
- The initial learning rate (default: 0.01).
- The learning rate decay factor (default: 0.0001).
- The minimum and maximum values weights and biases can be initialized to if the network should be randomly initialized (default: +/- 2).
- The output file format (default: Binary)
- The output file-path and name (default: Same directory as executable with the name "LokiNet-{date and time}.csv/.lnn")
- After initialization of the trainer, the method Trainer::run() can be executed which can take an existing network file-path as input and train this further. If none is given, the network will be randomly initialized. The dataset will then be divided into batches and a vector of gradient containers will be set up corresponding to the amount of threads.
- During each epoch, all the batches will be run. When a batch is loaded, it will be distributed to the different threads that will calculate their own average gradients of their sub-batches. Then the average of all these gradients will be calculated, and the weights will be updated. For weight update, I have implemented a standard Adam optimization (beta_one = 0.9, beta_two = 0.999 and epsilon = E-10).
Additionally, I have tested that the trainer will overfit on small datasets, but only when Adam is not used. When Adam is used, the model will go straight past the minimum on small datasets, but on bigger ones, convergence speeds are
much faster.
Data generation:
I have also created a rather simple training data generator. At the moment, it takes in a file of FEN's and evaluates them statically. This will be worked upon. Right now, my main goal is to get the LNN to Loki's current strength. I think that I will use data from zurichess, ethereal and lichess to do this. That should hopefully give me around 200M positions.
To-do:
- Modify UCI such that the trainer can be started with a simple command.
- Modify the UCI such that the data generator can be started with a simple command.
- Improve the data generation. The following additions will be tried:
- Self-play data generation.
- Scoring of the data by using the search function instead of the evaluation (I don't want to just copy Loki's HCE into my NN).
- Data generation from play against other engines. This will probably be the last thing I implement though.
- Find a way to automatically determine the learning rate for the network being trained.
- Instead of running for a limited amount of epochs, make it possible to set up some convergence criteria that will stop the algorithm automatically.
- Implement validation split and test-set usage such that the model can be evaluated properly during training.
- Try to add quantization. Quantization is converting all weights and biases to integers in the hopes of speeding the network up very significantly with some loss of accuracy (or at least, so I think... Please tell me if I'm wrong).
I am very happy with my implementation so far