The full set of parameters I set up for tuning include the (opening/middle game) values for all pieces, which were initially set at 100, 325, 325, 550 and 975, the bishop pair bonus (50), bad trade penalties (rook vs minor and two minors vs rook, each set at 50) and razoring and futility margins. The reason for including the latter is that they're supposedly related to the piece values and it seemed wrong to try to tune the piece values while leaving raoring and futility margins fixed.
The pool of opponents is currently quite small, consisting of the untuned version of Jazz as well as OliThink. I should expand that, but this should do for now.
Despite what I said above, I found the number of variables to tune a bit large to begin with, so I started the run with just the piece values. After some 15000 games, this seemed to converge to about 100, 365, 365, 566 and 1122. I interrupted the run and added the other variables to the mix, setting them to their fixed values in the log file.
I also switched from using cutechess-cli to using my own referee program at about this point, but that should make no difference.
The run is currently at 40000 games, and by now the bishop and knight values have dropped back to ~325 and the rook is at ~525. The queen remains at ~1100, but there is still a large scatter in the plots and it "seems" obvious that the run has not yet converged. More interestingly, the rook-vs-minor bad trade penalty seems to be bi-modal, showing an increase in the number of points near 0 and near 100 (which are the limits of the domain).
So, questions I have now:
- Was it a bad idea to start the run with a small number of variables and then later on add the other ones in by hadn by faking their input? It seemed reasonable, but the convergence that seemed to be emerging from the figures disappeared when I added them and only reappeared much later, but converging on different values.
- Any thoughts on why the queen might be getting a large adjustment in value and the other pieces seem to settle to where they already are? Missing evaluation terms? Poor handling of the queen later in the game?
- Does anyone have experience with variables that show bimodal behaviour? Do I need to widen the range if points cluster near the edge of the domain?
- Should I expect to see some measure of convergence after 40000 games?
- What is a good estimate for the error bar on the values derived by CLOP? Should I just look at the width of the point cloud?