You may need many games for most changes butDon wrote:There has been progress, but nobody has come up with anything better than just playing games for testing a single change. One optimization algorithm that has some merit is CLOP, but guess what? It's based on playing thousands of games.Henk wrote:If tuning is done by a tuner (algorithm) you can also try to find a better tuning algorithm. Or try to constrain the parameters to be tuned to a smaller domain. I guess not all combinations are allowed.
Consider this. To measure a small ELO change, you much play thousands of games. This is using the most direct measure possible, playing actual games.
Now is it reasonable to expect that you could measure this just as accurately using some indirect methods that requires far less effort?
Of course it isn't. If you want to explore this further, try to figure out how to get more out of your testing procedure. CLOP is one such way and HG proposed another methods called orthogonal multi-tuning. Both of these are based on playing games but try to squeeze more information out of those games.
it is dependent on the change and there are changes when it is better not to use games to test them.
For example if you do some change that gives 1% speed improvement then I think that playing many games to prove it may be a waste of time and you can simply see that the program get the same number of nodes slightly faster in searching to a fixed depth in many positions.
Another example is that you fix some bug that you know that is relevant only in some rare endgame.
The speed after fixing the bug is the same and you simply change evaluation that is not correct for some tablebases position to correct evaluation in some rare cases.
Again my opinion is that testing it in many games is a waste of time.
