I thought about engine1 vs engine2, they are the same except their piece values, pst, mobility etc. are taken randomly as starting values, or engine1 takes the random values and engine2 will take from your estimated best values. Let them play from random start positions, save the positions from this match for tuning later. If engine1 wins use it in the tuning. After the tuning use the tuned values against engine1, create a match between engine1 and the tuner values, save the positions from this match for tuning. If tuner values wins use it in the next tuning session. However if engine1 wins then the tuner fails to optimize the values or the optimal values are already reached as per engine capability at this moment and the given parameters under tuning. Perhaps generate more positions for tuning and repeat the tuning process, create more random start positions and use it in the match and save the positions from the match for tuning. After the tuning if engine1 still wins that ends the tuning, the values in engine1 are the best.hgm wrote: ↑Sun Mar 07, 2021 10:49 pm I finally seem to have the tuning process working correctly. There were lots of stupid errors, and because the whole process happened 'under the hood', I had a hard time identifying them. It also revealed a bad bug in the engine: the evaluation forgot to add the piece value of the Pawns. This could happen because Pawns need special treatment: because Pawns do not promote in Janggi they essentially become worthless when they reach the last two ranks (where the King can sneak behind them), and on second rank extra Pawns on the 7th rank could also be useless.
I made a small program to generate synthetic training positions, which calculates the win probability of those based on some standard piece values, and then determine a win/loss outcome with that probability. On a file of 10,000 such positions the Texel tuner was able to extract piece values that were close to those used in the generation of the results (starting from all-zero piece values). Since the tuner currently works by random trial and error, keeping any improvement, it is excruciatingly slow. But at least I get a sensible result now. (No more negative Pawn values!)
I am now having the engine, fixed for the Pawn value, self-play 3000 games, at a nominal thinking 'time' of 100K nodes/move. This should produce about 250K quiet positions in a few hours. The plan is to first use this training set to optimize piece values and their dependence on game phase, keeping the other eval parameters fixed at the value they had during generation of the games. If the resulting piece values look reasonable, I will use those as a starting point for tuning the complete set of parameters with a smaller step. The current evaluation has 40 parameters (including the opening and end-game piece values). I suppose I will be able to use the same set of training positions also for tuning the weights of eval terms I have still to add, such as mobility. Many of the eval terms had been set to 0 for the self-play games anyway, to make sure the various values for the term would be equally sampled. For terms that are known to strongly affect the possibility to win I did not do that, though, otherwise too many games might end in a draw even in the presence of a winning advantage. (The comparable issue in orthodox Chess would be to not encourage pushing of Pawns, so that the engine at the fast TC used to generate the game would never realize it could promote them.)
I hope to have some results tomorrow.
Tuning can be done again when there are new features added or new features exposed for tuning.