bullet trainer problems

jdart · Post by **jdart** » Thu Oct 17, 2024 11:30 pm

I have noticed that a number of strong engines now implement a 2 layer NNUE, with the first layer being a horizontally mirrored feature transformer with king buckets, followed by an activation step (SqrCRelU usually but could be CRelU or others), and then the second layer being an affine transform with a singleton output, usually with piece-count based output buckets.

My attempts to get this architecture to work with Arasan have hit some problems. I could not get nnue-pytorch to work. I am sure using it is possible, but that code base is really set up for Stockfish networks and use for other architectures is not directly supported or documented. So lately I have been working with the bullet trainer (https://github.com/jw1912/bullet), which is set up for the type of architecture mentioned.

I have done a couple of experiments, one using the architecture here: https://github.com/jdart1/arasan-chess/tree/V3 (see nnparms.h for the parameters - these are just placeholders, subject to change) and another tailored to use the architecture in PlentyChess (https://github.com/Yoshie2000/PlentyChess) - see https://github.com/jdart1/arasan-chess/tree/test.

I am pretty sure the runtime code is correct. It passes unit tests for the NNUE code, and if I actually load the PlentyChess network.bin file into the Arasan executable built with its network params, then that executable passes the engine unit tests, produces a decent "bench" result, and got a decent score on the Arasan test suite.

But training my own network has not gone so well.

I have modified my selfplay position generator to output bullet format data files. That generator was used to produce datasets for the current neural network architecture in Arasan. It excludes positions where a capture is the best move and positions with the side to move in check. I am pretty sure it is generating correctly formatted data for the trainer. The output passes bullet's validation tool. Also, I tried modifying it to produce the bullet "text" format alongside bullet binaries, and if I do that and run the bullet converter on the text file, the binary output from the converter is bit for bit identical with what my generator produces. I have generated about 800 million training positions. That is a relatively small number, I know, but I wanted to use that dataset to do some testing before generating more.

Rust files for the two architectures are here: https://github.com/jdart1/bullet/blob/m ... rasanv3.rs and https://github.com/jdart1/bullet/blob/m ... /plenty.rs. These have some hacks because the arasan build and test process is not quite the same as other engines. Both files work with the trainer. I see the errors go down and it produces a network file that I calculate to be the right size for the architecture.

However, the resultant nets are not good. They do not even pass the engine unit tests, which include a few easy test positions that should be solved in 3 ply or so (the test code gives the engine 10 ply). The scores do not look reasonable. For example, this position: 8/6pk/5pb1/7p/Q6P/2r1N3/5PP1/6K1 w - - has White up a piece, but the NNUE score from the PlentyChess compatible network is actually negative.

There are a lot of places things can go wrong: getting bad or improperly formatted data into the training set, something wrong with the Rust code, network not being read correctly, network not being evaluated correctly. But as you can tell, I have tried to do testing along the way to rule out these issues. Maybe the 800 million position data set is just not large enough? Any suggestions for further testing would be appreciated. It would help if I had a known good training dataset to run through the trainer - that would help rule out problems with my own dataset.

JacquesRW · Post by **JacquesRW** » Fri Oct 18, 2024 2:51 am

jdart wrote: ↑Thu Oct 17, 2024 11:30 pm I have noticed that a number of strong engines now implement a 2 layer NNUE, with the first layer being a horizontally mirrored feature transformer with king buckets, followed by an activation step (SqrCRelU usually but could be CRelU or others), and then the second layer being an affine transform with a singleton output, usually with piece-count based output buckets.

My attempts to get this architecture to work with Arasan have hit some problems. I could not get nnue-pytorch to work. I am sure using it is possible, but that code base is really set up for Stockfish networks and use for other architectures is not directly supported or documented. So lately I have been working with the bullet trainer (https://github.com/jw1912/bullet), which is set up for the type of architecture mentioned.

I have done a couple of experiments, one using the architecture here: https://github.com/jdart1/arasan-chess/tree/V3 (see nnparms.h for the parameters - these are just placeholders, subject to change) and another tailored to use the architecture in PlentyChess (https://github.com/Yoshie2000/PlentyChess) - see https://github.com/jdart1/arasan-chess/tree/test.

I am pretty sure the runtime code is correct. It passes unit tests for the NNUE code, and if I actually load the PlentyChess network.bin file into the Arasan executable built with its network params, then that executable passes the engine unit tests, produces a decent "bench" result, and got a decent score on the Arasan test suite.

But training my own network has not gone so well.

I have modified my selfplay position generator to output bullet format data files. That generator was used to produce datasets for the current neural network architecture in Arasan. It excludes positions where a capture is the best move and positions with the side to move in check. I am pretty sure it is generating correctly formatted data for the trainer. The output passes bullet's validation tool. Also, I tried modifying it to produce the bullet "text" format alongside bullet binaries, and if I do that and run the bullet converter on the text file, the binary output from the converter is bit for bit identical with what my generator produces. I have generated about 800 million training positions. That is a relatively small number, I know, but I wanted to use that dataset to do some testing before generating more.

Rust files for the two architectures are here: https://github.com/jdart1/bullet/blob/m ... rasanv3.rs and https://github.com/jdart1/bullet/blob/m ... /plenty.rs. These have some hacks because the arasan build and test process is not quite the same as other engines. Both files work with the trainer. I see the errors go down and it produces a network file that I calculate to be the right size for the architecture.

However, the resultant nets are not good. They do not even pass the engine unit tests, which include a few easy test positions that should be solved in 3 ply or so (the test code gives the engine 10 ply). The scores do not look reasonable. For example, this position: 8/6pk/5pb1/7p/Q6P/2r1N3/5PP1/6K1 w - - has White up a piece, but the NNUE score from the PlentyChess compatible network is actually negative.

There are a lot of places things can go wrong: getting bad or improperly formatted data into the training set, something wrong with the Rust code, network not being read correctly, network not being evaluated correctly. But as you can tell, I have tried to do testing along the way to rule out these issues. Maybe the 800 million position data set is just not large enough? Any suggestions for further testing would be appreciated. It would help if I had a known good training dataset to run through the trainer - that would help rule out problems with my own dataset.

I see you have output buckets, my bet would be that you aren't transposing the output weights and are producing garbage evals as a result. PlentyChess applies this transpose as a preprocessing step on the network itself (i.e. the network you have downloaded is not the original output from bullet).

BTW, there is a dedicated channel for bullet in the engine programming discord server that is much more appropriate for use than here.

jdart · Post by **jdart** » Fri Oct 18, 2024 5:01 am

That sounds sensible. What is the "native" output format from bullet for an output layer with buckets?
Currently I assume:
(for each bucket)
weight vector for bucket
(for each bucket)
bias for bucket

The output size from the layer is 1, so the weights are basically vectors.

JacquesRW · Post by **JacquesRW** » Fri Oct 18, 2024 4:24 pm

jdart wrote: ↑Fri Oct 18, 2024 5:01 am That sounds sensible. What is the "native" output format from bullet for an output layer with buckets?
Currently I assume:
(for each bucket)
weight vector for bucket
(for each bucket)
bias for bucket

The output size from the layer is 1, so the weights are basically vectors.

So if you denote the `i`th bucket's weights and the `j`th weight in that by

Code: Select all

W[i][j]

then the weights are stored as

Code: Select all

W[1][1], W[2][1], W[3][1], ..., W[N][1], W[1][2], W[2][2], ..., W[N][N]

These are followed by the biases

Code: Select all

B[1], B[2], ..., B[N]

one for each bucket.

As you want all the weights of a single bucket to be stored contiguously for fast CPU inference, you'll need to transpose the weights matrix `Wij`. Generally I encourage people to do this at load time rather than pre-processing the network files.

bullet trainer problems

bullet trainer problems

Re: bullet trainer problems

Re: bullet trainer problems

Re: bullet trainer problems