Progress on Loki

niel5946 · Post by **niel5946** » Tue May 11, 2021 7:51 pm

Loki will be getting a neural network!
Now, this is big news!

- and before anyone asks, I haven't consulted or used code from NNUE.
During this afternoon, I was bored and thinking of something big to try with Loki. Then a thought hit me: I am graduating high school in June, and here in Denmark we have something called SRP, which is a big project we write at the end of high school. The subjects of these projects are usually the ones we have chosen in the first year of high school, and I wrote it in mathematics and physics. It was about the special theory of relativity, and I had to learn some linear algebra to write it.
Now, what does this have to do with chess programming and neural networks? Well, neural networks rely on linear algebra for their operations (specifically for forward-propagation), and this means that I now have at least some mathematical understanding, which I think is required to implement a neural network. Therefore, I chose to give it a try, and I now have a somewhat working implementation.
With some help from Halogen's code - which will be credited as a source of inspiration if I decide to keep this new feature - I got a neural network class which now has the following features:

When a position is passed to it, it computes the network up to the first hidden layer.
Forward propagation for when evaluating the position.
Write the current weights and biases to a file.
Read a file with weights and biases.

My network class is the following:

Code: Select all

	/*
	The network class will hold a network when it gets loaded.
	*/
	class Network {
	public:
		Network(std::string net_file = "");


		// This method is responsible for loading a position into the network and calculating the values of the first hidden layer.
		void load_position(std::array<int16_t, INPUT_SIZE>& position);

		// This is the key method of the network. It is responsible for evaluating a position.
		int16_t evaluate();

		// This method is responsible for loading a network from a file. The network will be initialized randomly at first.
		void load_net(std::string file_path);

		// This method is responsible for saving a network to a file.
		void save_net(std::string filename = "");
	private:

		// Layers
		std::array<Neuron, INPUT_SIZE> INPUT_LAYER;
		std::array<Neuron, FIRST_HIDDEN_SIZE> FIRST_HIDDEN_LAYER;
		std::array<std::array<Neuron, HIDDEN_STD_SIZE>, HIDDEN_STD_COUNT> HIDDEN_LAYERS;
		Neuron output;

		// Weights
		std::array<std::array<int16_t, INPUT_SIZE>, FIRST_HIDDEN_SIZE> INPUT_TO_FIRST;
		std::array<std::array<int16_t, FIRST_HIDDEN_SIZE>, HIDDEN_STD_SIZE> FIRST_TO_HIDDEN;
		std::array<std::array<std::array<int16_t, HIDDEN_STD_SIZE>, HIDDEN_STD_SIZE>, HIDDEN_STD_COUNT> HIDDEN_TO_HIDDEN;
		std::array<int16_t, HIDDEN_STD_SIZE> HIDDEN_TO_OUTPUT;

		// +/- bound for the output
		const int16_t INF = 30000;
	};

After having written the load_net and save_net methods, and assuring that they work, I am now ready to train a net! I think, I will have to find a more elaborate training algorithm later on, but for now I'll try using my SPSA framework.

It is written to be completely independent of Loki, which makes it somewhat of a general purpose network. The two files are here: network.h and network.cpp

I hope this will give a good strength increase for Loki, but at the end of the day, I just created it for fun to see if I even had the ability to implement a neural network

Desperado · Post by **Desperado** » Tue May 11, 2021 8:00 pm

niel5946 wrote: ↑Tue May 11, 2021 7:51 pm Loki will be getting a neural network!
Now, this is big news! - and before anyone asks, I haven't consulted or used code from NNUE.
During this afternoon, I was bored and thinking of something big to try with Loki. Then a thought hit me: I am graduating high school in June, and here in Denmark we have something called SRP, which is a big project we write at the end of high school. The subjects of these projects are usually the ones we have chosen in the first year of high school, and I wrote it in mathematics and physics. It was about the special theory of relativity, and I had to learn some linear algebra to write it.
Now, what does this have to do with chess programming and neural networks? Well, neural networks rely on linear algebra for their operations (specifically for forward-propagation), and this means that I now have at least some mathematical understanding, which I think is required to implement a neural network. Therefore, I chose to give it a try, and I now have a somewhat working implementation.
With some help from Halogen's code - which will be credited as a source of inspiration if I decide to keep this new feature - I got a neural network class which now has the following features:

When a position is passed to it, it computes the network up to the first hidden layer.

Forward propagation for when evaluating the position.

Write the current weights and biases to a file.

Read a file with weights and biases.

My network class is the following:
Code: Select all
	/*
	The network class will hold a network when it gets loaded.
	*/
	class Network {
	public:
		Network(std::string net_file = "");


		// This method is responsible for loading a position into the network and calculating the values of the first hidden layer.
		void load_position(std::array<int16_t, INPUT_SIZE>& position);

		// This is the key method of the network. It is responsible for evaluating a position.
		int16_t evaluate();

		// This method is responsible for loading a network from a file. The network will be initialized randomly at first.
		void load_net(std::string file_path);

		// This method is responsible for saving a network to a file.
		void save_net(std::string filename = "");
	private:

		// Layers
		std::array<Neuron, INPUT_SIZE> INPUT_LAYER;
		std::array<Neuron, FIRST_HIDDEN_SIZE> FIRST_HIDDEN_LAYER;
		std::array<std::array<Neuron, HIDDEN_STD_SIZE>, HIDDEN_STD_COUNT> HIDDEN_LAYERS;
		Neuron output;

		// Weights
		std::array<std::array<int16_t, INPUT_SIZE>, FIRST_HIDDEN_SIZE> INPUT_TO_FIRST;
		std::array<std::array<int16_t, FIRST_HIDDEN_SIZE>, HIDDEN_STD_SIZE> FIRST_TO_HIDDEN;
		std::array<std::array<std::array<int16_t, HIDDEN_STD_SIZE>, HIDDEN_STD_SIZE>, HIDDEN_STD_COUNT> HIDDEN_TO_HIDDEN;
		std::array<int16_t, HIDDEN_STD_SIZE> HIDDEN_TO_OUTPUT;

		// +/- bound for the output
		const int16_t INF = 30000;
	};
After having written the load_net and save_net methods, and assuring that they work, I am now ready to train a net! I think, I will have to find a more elaborate training algorithm later on, but for now I'll try using my SPSA framework.

It is written to be completely independent of Loki, which makes it somewhat of a general purpose network. The two files are here: network.h and network.cpp

I hope this will give a good strength increase for Loki, but at the end of the day, I just created it for fun to see if I even had the ability to implement a neural network

I like your attitude, good luck and have fun !

lithander · Post by **lithander** » Wed May 12, 2021 11:49 am

Desperado wrote: ↑Tue May 11, 2021 8:00 pm I like your attitude, good luck and have fun !

I second that!

You might want to consider trying your luck first on some even simpler dataset than chess positions to make sure that it's working correctly and the math is sound.

I can recommend the MINST dataset. It's basically the machine learning aquivalent of the 'fruit fly' model organism, hehe. At least it was what I used when I felt curious about the topic a few years ago.

niel5946 · Post by **niel5946** » Mon May 17, 2021 12:44 am

In the last couple of days I have been working hard to create a trainer for the network implementation, but I have failed

I tried normal batch gradient descent, stochastic gradient descent, momentum with adam, and a genetic algorithm, but none of them seemed to work. I also tried to ask Andrew Grant (he has created a trainer of his own used by Ethereal and Halogen), and he gave me a lot of useful advice. The two main problems was:
1. Some weird C++-related issue where, even though I specifically initialized all weights and biases to random values, they would still all have a value of 0, which gave problems with the backpropagation algorithm.
2. Instability with the SGD algorithms.

What I ended up doing was to create the trainer with keras and tensorflow in python. I am not satisfied with this black-box implementation though... Therefore I will work on creating my own trainer in the future, but for now I am really only interested in seeing what strength related differences it makes for Loki. The reasoning behind this decision is that, if I were to keep on developing a training algorithm myself, but it only gave Loki 10 or 20 elo points, I would consider that time wasted.
The way I do the training in python at the moment is the following:
1. Training set generation: This can be done two ways. 1) A set of FEN's are given in the form of an EPD file and Loki 3.0 is then used to evaluate all these to a certain depth and this will be used for the desired network output. Right now I only omit positions with mate scores, but later on I'll probably also exclude positions with a lot of tactics (maybe > 800cp while there is little material difference or something). 2) Loki 3.0 will play a match with a short TC against itself, collecting positions as it goes. Then these will be evaluated like the first method and used for training.
2. Training: As I stated above, this is done with Keras. I will try to create my own training algorithm in C++ at some point though.
3. Model evaluation and saving: Here I use the matplotlib library to see how the loss/mean absolute error has progressed over epochs, and then the model is saved to a file. I will need to find some way of converting the files made by keras to ones Loki can read.
I am hoping that this will give nice results.

I am feeling rather ambivalent about using keras and tensorflow since it lets me focus on increasing Loki's strength, but on the other hand, I don't really know what is going on during training. I kind of feel like a copy-cat

niel5946 · Post by **niel5946** » Mon May 17, 2021 3:28 pm

History of reductions
I had an idea for a feature, I wanted to add to LMR. I don't know if this has been tried before, but since LMR/LMP was originally based on history, it probably has.
The idea was to track the history of reducing moves and the results of said reductions: I start out with the following array in thread.h:

Code: Select all

// This is used for LMR
int reduction_history[2][64][64] = { {{0}} };

Which will be indexed by side to move, origin square and destination square of a move. The way this table is updated is the following:

Code: Select all

void SearchThread_t::update_reduction_history(unsigned int move, bool failed_low) {
	if (failed_low) {
		stats.reduction_history[(pos->side_to_move == WHITE) ? BLACK : WHITE][FROMSQ(move)][TOSQ(move)] += 1;
	}
	else {
		stats.reduction_history[(pos->side_to_move == WHITE) ? BLACK : WHITE][FROMSQ(move)][TOSQ(move)] -= 1;
	}
	// Bound the reduction history table value to [-3;1]
	stats.reduction_history[(pos->side_to_move == WHITE) ? BLACK : WHITE][FROMSQ(move)][TOSQ(move)] = std::max(-3,
		std::min(1, stats.reduction_history[(pos->side_to_move == WHITE) ? BLACK : WHITE][FROMSQ(move)][TOSQ(move)]));
}

Where, as you see, I increment the entry if the reduced search failed low, and decrement it if the score was above alpha. This will then be used to increase/decrease the reduction in LMR with R += ss->stats.reduction_history[(ss->pos->side_to_move == WHITE) ? BLACK : WHITE][fromSq][toSq].
After searching a reduced move, I do the following (before re-searching in case of a score above alpha),

Code: Select all

// Step 13A.1. If we're reducing then, depending on whether we fail high or not, modify the reduction history table
if (reducing) {
	bool reduction_fail_low = (score <= alpha);

	ss->update_reduction_history(move, reduction_fail_low);
}

I initially tested this with a table entry bounded by [-3;3], but that failed really badly:

Code: Select all

ELO   | -84.18 +- 23.06 (95%)
SPRT  | 10.0+0.1s Threads=1 Hash=16MB
LLR   | -3.01 (-2.94, 2.94) [0.00, 5.00]
Games | N: 648 W: 149 L: 303 D: 196

Now, I am trying the version with [-3;1], which seems to be better:

Code: Select all

ELO   | -8.98 +- 17.94 (95%)
SPRT  | 10.0+0.1s Threads=1 Hash=16MB
LLR   | -0.60 (-2.94, 2.94) [0.00, 5.00]
Games | N: 968 W: 312 L: 337 D: 319

It still looks like it'll fail, but considering the improvement from changing bounds, I will try the following before deciding if the feature should stay: [-3;0], [-2;1], [-3;2] and [-2;1].

I am still hopeful that it will work. The depth is increasing very rapidly

niel5946 · Post by **niel5946** » Thu May 27, 2021 12:24 pm

Update on Loki's brain
While the previous implementations of the neural network in Loki worked with forward propagation and file-loading, it wasn't very easy to train. As you probably know, I gave up on the idea of a training algorithm myself and turned to Keras/Tensorflow. This quickly turned out to be sub-optimal too, considering the slow speed of python which made loading of 725k positions take ~50 minutes. Therefore, that solution wasn't very scalable considering I will be needing at least 100M-200M training positions when I begin the real training.
Because of the above, I began working on my own trainer again, and it worked this time

. The only problems I had was:

1. Giant gradients: This was because of my way of calculating the loss. My network's output has a linear activation function (σ(x)=x), which means that this can be very far from the expected values, resulting in big gradients.
2. Slow training: This wasn't a convergence-speed problem but rather the algorithms execution speed itself. It did as it was supposed to, and overfitted on small datasets, but it was just too slow..

My solutions to these were the following:

1. This isn't a real problem and could be solved by just using really low learning rates (E-6 or something), but after discussing this network with another author (since the repo is private, I won't say his name since I'm not sure if he wants his method known.) who has his own, private trainer, I found a more elegant solution. This was to use the texel tuning sigmoid function and determine the constant such that Loki's HCE gets mapped to a WDL space. This is then used as the activation function of the output when training, which keeps the error of a single data point below 1, and thus the gradients relatively low.
2. I am currently trying to solve this by making the trainer multithreaded. This will be done by splitting a single batch into more sub-batches for each thread. Then these will compute their own gradients and the average of those will then be calculated. The algorithm time will then be reduced by 1/N where n is the number of threads.

I am hoping to train a working network within a month from now. It's optimistic, but probably doable

niel5946 · Post by **niel5946** » Thu May 27, 2021 11:46 pm

Update on earlier
Alright, I have used the majority of my afternoon/evening finishing the training algorithm. And I am finally done

The files can all be found here: https://github.com/BimmerBass/Loki/tree/neural/Loki/lnn

The nice thing about it is, IMO, that it is completely independent of Loki. This means that the whole subdirectory can simply be copied and inserted into another engine. The only changes that will be needed are related to the incremental updates in make/unmake move, the initialization/UCI-setup and the insertion into the existing evaluation function. This is also done in Loki, so it should be really easy to implement for others.
With all that said, I do not think this network will be state-of-the-art in any way, and the primary reason for making it engine independent was for the challenge.

Now on to the final implementation
The implementation is rather simple compared to other deep learning models.
The network itself:
The network is a normal multi layer perceptron with the architecture 768x256x32x32x1. This has been chosen quite arbitrarily, but it is kind of a mix of Halogen's and SF NNUE's architectures. I think it is good since it keeps a rather good complexity without having thousands of inputs like SF NNUE. The activation functions of the hidden layers is a simple ReLU (R(x) = max(0, x)), and the output doesn't have an activation function.
The networks can be read from either a CSV file or a binary file with a ".lnn" extension. These file formats are written by the training framework, I have developed.
Regarding use of the network in an engine, it is made to be used the following way:

At startup the network will be initialized with all parameters set to zero. This is done in the constructor, so no methods have to be created. One thing to note is that for engines that are multithreaded, the network needs to be inside the engine's thread class since it would otherwise risk being partly overwritten while forward propagating. Additionally, use of the network should be disabled at engine startup.
A UCI function like "Use_LNN" should be implemented which would take a file-path as parameter. At the moment, the network implementation aborts the program's execution if the file can't be opened or doesn't exist, but this will probably be changed in the near future to just returning a boolean value indicating whether the file-read was succesful or not.
When using the network, the initial position given by the GUI should be loaded into the network before searching. Then, when making and unmaking moves, the do_incremental and undo_incremental methods should be inserted into said functions. Note: The incremental updates are heavily inspired by Halogen's implementation.
The network's evaluation can be used in the evaluation function however the user likes. I think a proper way to do it will be to just use the network's score, add a tempo bonus, and scale by the fifty-move rule.

The training framework:
This is also quite simplistic at the moment. It works the following way:

I initialize a Trainer class which inherits from the Network class. In the constructor, a CSV datafile is loaded, which holds 768 boolean numbers as the position representation and a score, for each row. All hyperparameters are also set up here. These include:
- As said, a path to a file with the training data.
- The number of epochs to run the algorithm for.
- The batch size to use.
- The loss function to use. There are currently two types: Mean squared error (1/n * sum((ai - yi)^2)) and Average absolute error (1/n * sum(|ai - yi|))
- The amount of threads to use (default: 1).
- The initial learning rate (default: 0.01).
- The learning rate decay factor (default: 0.0001).
- The minimum and maximum values weights and biases can be initialized to if the network should be randomly initialized (default: +/- 2).
- The output file format (default: Binary)
- The output file-path and name (default: Same directory as executable with the name "LokiNet-{date and time}.csv/.lnn")
After initialization of the trainer, the method Trainer::run() can be executed which can take an existing network file-path as input and train this further. If none is given, the network will be randomly initialized. The dataset will then be divided into batches and a vector of gradient containers will be set up corresponding to the amount of threads.
During each epoch, all the batches will be run. When a batch is loaded, it will be distributed to the different threads that will calculate their own average gradients of their sub-batches. Then the average of all these gradients will be calculated, and the weights will be updated. For weight update, I have implemented a standard Adam optimization (beta_one = 0.9, beta_two = 0.999 and epsilon = E-10).

Additionally, I have tested that the trainer will overfit on small datasets, but only when Adam is not used. When Adam is used, the model will go straight past the minimum on small datasets, but on bigger ones, convergence speeds are much faster.

Data generation:
I have also created a rather simple training data generator. At the moment, it takes in a file of FEN's and evaluates them statically. This will be worked upon. Right now, my main goal is to get the LNN to Loki's current strength. I think that I will use data from zurichess, ethereal and lichess to do this. That should hopefully give me around 200M positions.

To-do:

Modify UCI such that the trainer can be started with a simple command.
Modify the UCI such that the data generator can be started with a simple command.
Improve the data generation. The following additions will be tried:
- Self-play data generation.
- Scoring of the data by using the search function instead of the evaluation (I don't want to just copy Loki's HCE into my NN).
- Data generation from play against other engines. This will probably be the last thing I implement though.
Find a way to automatically determine the learning rate for the network being trained.
Instead of running for a limited amount of epochs, make it possible to set up some convergence criteria that will stop the algorithm automatically.
Implement validation split and test-set usage such that the model can be evaluated properly during training.
Try to add quantization. Quantization is converting all weights and biases to integers in the hopes of speeding the network up very significantly with some loss of accuracy (or at least, so I think... Please tell me if I'm wrong).

I am very happy with my implementation so far

mvanthoor · Post by **mvanthoor** » Fri May 28, 2021 12:24 am

niel5946 wrote: ↑Thu May 27, 2021 11:46 pm I am very happy with my implementation so far

Good luck with training the network. If you're lucky and you can achieve a 600 Elo boost (some engines managed even more, when using Stockfish networks...), you'd actually manage to get to 3000 Elo. It feels as if you started yesterday. I clearly remember sending you that PM warning you that your engine was under-performing for the number of functions it had.

I'm still trying to get the last Elo point out of each function i add (in this case, killer moves), and hope to hit 2000 Elo even before I add the first non-basic functionality: tapering and tuning the evaluation. I actually hope to hit at least 2300 with the tapering and tuning, and 2400+ after adding a tuned mobility component.

A NN for Rustic is yet a looong way off.

niel5946 · Post by **niel5946** » Sat May 29, 2021 2:54 pm

mvanthoor wrote: ↑Fri May 28, 2021 12:24 am Good luck with training the network. If you're lucky and you can achieve a 600 Elo boost (some engines managed even more, when using Stockfish networks...), you'd actually manage to get to 3000 Elo. It feels as if you started yesterday. I clearly remember sending you that PM warning you that your engine was under-performing for the number of functions it had.

Thank you

I don't think the network alone will surpass 3000 elo, but perhaps with the added staged movegen, LMR/LMP etc. it will. The network will probably need to have some major speed increase (probably by quantization) in order to be that good though...
I also remember that PM. To be honest, it was pretty embarrassing to receive, but without it Loki would probably still be at around 2000 elo by now. Therefore I want to thank you for pointing it out to me

. I just changed README.md to credit you for this!

mvanthoor · Post by **mvanthoor** » Sat May 29, 2021 5:23 pm

niel5946 wrote: ↑Sat May 29, 2021 2:54 pm I also remember that PM. To be honest, it was pretty embarrassing to receive, but without it Loki would probably still be at around 2000 elo by now. Therefore I want to thank you for pointing it out to me . I just changed README.md to credit you for this!

Thanks

Don't worry about embarrassment. If you put anything out there, someone, somewhere, will find things wrong with it. I had to remove a release myself. I wasn't really sure about the meager Elo and speed gain by the TT. In the end I posted a topic to see if anything was wrong. One of the comments about one piece of code was: "But that's all wrong! It should be..." Damn.

And there was my instant +100 Elo I was expecting from hash move sorting

(However, because I'm proud of the fact that I very rarely have to retract or fix stuff I write for my job, it was still a downer that I could make such a massive mistake.)

PS: I just noticed you even seem to like my signature enough to copy it

I wish I had more time to work on Rustic besides an evening hour here or there. Writing the book about the engine takes even more time than I thought as well

Not even talking about testing. It would be very helpful if I had another computer. (I'm waiting for AMD's AM5 socket, or a successor to intel's X299. Then testing will be a LOT faster.) No matter; it'll get there.

Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki