For the first set of data, I downloaded a list of several thousand games from Magnus Carlsen and Hikaru Nakamura from here: http://www.pgnmentor.com/files.html#players. So about ~8000 games. For the second set, I collected self-play games using Blunder at a hyper-bullet time control like what was originally suggested.
For the self-play games I used the following cutechess command:
Code: Select all
cutechess-cli -pgnout games.pgn -srand $RANDOM -engine cmd=blunder name="blunder1" -engine cmd=blunder name="blunder2" -openings file=$HOME/2moves_v2a.pgn format=pgn order=random -each tc=40/2+0.05 proto=uci option.Hash=64 timemargin=60000 -games 2 -rounds 4000 -repeat 2 -concurrency 8 -recover -ratinginterval 50 -tournament gauntlet
- No duplicate positions
- No check or checkmate positions
- No positions until 16 half moves.
- No positions within six moves of the end of the game.
- No positions where static eval was greater than 60 cp.
- No positions where static eval and quiescence search were more than 30 cp apart.
From both sets, it seems the values the parameters were being tweaked to are inferior to the current values. Particularly, the mobility parameters (knight mobility, bishop mobility, rook endgame, and middle mobility, and queen midgame and endgame mobility) are driven from the current values to one or zero. This seems to indicate the tuner is attempting to make mobility irrelevant in the evaluation. This definitely seems to minimize the mean square error, but I highly doubt making the engine blind to mobility is going to make it play better (from my current testing, with the features Blunder now has, mobility is currently worth ~50-60 Elo).
I'm going to keep experimenting to see if I can generate anything useful. Computing the K value for both data sets also seems a bit weird, as K is driven to an incredibly small value, which to me suggests overfitting is occurring. So perhaps I'll experiment with using more positions (1M-2M), though I'd probably need to look into better hardware to do this reasonably efficiently. I'll also try to get a better variety of grandmaster games from tournaments and whatnot.
I'm curious to know what others' experiences have been like with regards to generating training data from self-play or human (grandmaster) games? I've also read the paper here (https://bitbucket.org/zurichess/zuriche ... g%20Method) from Zurichess and the experimenting they've done.