The problem I am having is that I used the stockfish evals from lichess https://database.lichess.org/#evals, which seems to be a little unoriginal.
The next best option I can see is playing out many games between the old HCE versions of my engine for training data, but my simple calculations show that half an hour of parsing lichess data is equivalent to about 300 hours(!) of running test matches at 50ms/move. I think that makes generating a good dataset near-impossible.
Are there any strategies for collecting training data more quickly, for example
- using positions reached during the search
- using human games
Thanks in advance!