Running a linear regression to replicate the evaluation weight vector in Senpai 2

nkg114mc · Post by **nkg114mc** » Wed Oct 13, 2021 1:03 pm

Hi all,

I am trying to go through the code of Senpai 2,0 recently. I noticed that the evaluation function of Senpai 2.0 has been completely parameterized: for each evaluation component score is computed by a feature value times a weight, and then sum all component scores to get the final evaluation. Fabien mentioned that the evaluation weight is trained from logistic regression. I am not exactly sure how Fabien did this, so I tried a quite brute force approach: directly run a linear regression wrt some training data, and see how it performs. The training data is automatically generated from the NNUE data-generation algorithm. I use two training sets, one with 100 million examples and the other with 1 billion examples.

Featurization

To compute a feature vector from a given position in Senpai, I have to modify the original evaluation code.

First, to get the feature value, I have to postpone the dot product between the feature and the weight vector. The evaluation utilized the phase material interpolation between opening and ending games, like this:

Code: Select all

 eval =  \alpha * (\phi \dot w_mg) + (1 - \alpha) * (\phi \dot w_eg)

where \alpha is the game phase ratio in the tapered evaluation and \dot is the vector dot product operator. Since \alpha is just a scalar, we can combine it with \phi to get

Code: Select all

   eval =  ((\alpha * \phi) \dot w_mg) + (((1 - \alpha) * \phi) \dot w_eg) 
        =  ((\alpha * \phi) \concat ((1 - \alpha) * \phi)) \dot (w_mg \concat w_eg)

where \concat is the vector concatenation.
If we set x = (\alpha * \phi) \concat ((1 - \alpha) * \phi), and w = w_mg \concat w_eg, Then we obtain eval = x \dot w, which turns out to be a standard linear model. The original weight in Senpai is a 759 * 2 (N_dimension * N_game_phase) vector, I simply flatten it to a 1-dimension vector of length 1518.

Second, Senpai has the reducing factor at the end of evaluation (after the dot product between the feature and weight) to cover the case of the draw positions. Since this is a totally afterward step following the dot product, I ignored this part in my feature vector. I think it is possible to integrate it to the feature computation, but because it involves some division and may introduce additional floating rounding errors, I haven’t included it for now.

Training

The learning algorithm is simple: optimizing the MSE with the mini-batch gradient descent. I set the mini batch size to 40000, learning rate to 0.1, and total epoch number to 200. I also did a data shuffle before the starting of each epoch. No regularization was applied yet. So far, one epoch training with 100m set takes around 3min, but the 1b set will take around 30mins. It took almost 4 days to finish the 200 epochs with the 1b training set.

The following issues about training bothered me most:

Training speed. 30mins for just one epoch is too slow in my opinion. The major time is consumed on inference, which contains 3 steps: (1) decode the NNUE sfen structure to a Senpai position; (2) compute the feature vector; (3) do dot product with weight to get predicted score. Obviously, repeatedly compute the feature vector for each sfen in each epoch looks stupid, but dumping the features to file may take huge amount of disk space. I created a unique compressed file format to store the feature file, but it still takes more than 70GB to store the dumping features for the 100m training set. You can imagine that dumping 1b set may take nearly 1TB.
Data shuffling. Shuffling a huge training set is also a painful task. The original 1b training set has a size of 40GB. To shuffle this file without consuming too much memory, I designed a very wired shuffling algorithm: partitioning the original file into N chunks, shuffling each chunk first, then merging the chunks by randomly picking the queue top among all the chunks.

Results

Looks like the weight I trained is still not comparable to the original weight Senpai has. I tested it on 40moves/5min time control, with 800 games. There are three setups, two training sets with/without data shuffling between epochs (I did not do data shuffling on the 1b set because that is too slow).

Code: Select all

Name   			   Elo	+	- games 
senpai20		    0    10   10   800 
senpai21_100m_unshuffled   -100  10   10   800   
senpai21_100m_shuffled.    -70   10   10   800   
senpai21_1b_unshuffled     -90   10   10   800

“senpai20” is the original senpai and the one with my learned weight was named “senpai21”. There is a gap of around 70 to 100 elo, even with the best result. I am still trying to understand why such a gap exists. Some potential improvement in my mind:

Adding L1/L2 regularizers
Tuning the other parameters
Try some other learning algorithms instead of mini-batch gradient descent
Add data shuffle to the 1b training
Did a qsearch during inference (not clear why but NNUE training did this)

Does anyone have any suggestions for this result? Also please let me know if you know how the original Senpai weight was trained. I would sincerely appreciate it. Thanks!

nkg114mc · Post by **nkg114mc** » Sun Dec 26, 2021 12:48 pm

After investigating for two additional months, I am still not able to train a weight that has a similar strength to the original weights of Senpai 2. The current weight I have trained is still about 60 ELO weaker under 40moves/5mins time control. I plan to write a complete report including all my findings. But they can be summarized in following points:

Data shuffling is helpul, but also time consuming;
L1 regularization is helpful, but it did not bring huge ELO gain (less than 50);
Feature (scale) normalization is not very helpful;
NNUE style loss is not very helpful;

I have put the code on github in case any one is interested: https://github.com/nkg114mc/senpai2_regression
Please file an issue/leave a comment if you find any bugs or have any questions

Running a linear regression to replicate the evaluation weight vector in Senpai 2

Running a linear regression to replicate the evaluation weight vector in Senpai 2

Re: Running a linear regression to replicate the evaluation weight vector in Senpai 2