Ok! Thanks!bob wrote:Do _NOT_ do this. It is fraught with problems. Use more positions or more opponents, but do not play more games with the same positions,jesper_nielsen wrote:Speaking as a very resource limited (un-)happy amateur, the question of testing for me becomes "how do I get the most bang for my bucks?"
I have recently started using the "cutechess-cli" tool for running very fast games. Great tool by the way!
I am currently running 40 noomen positions tests against 8 different opponents at 5+0.4 timecontrol, giving 40*2*8 = 640 games. This is clearly not enough games to make any kind of conclusion except for very big elo jumps.
There are (at least! ) three ways to increase the number of games played.
1. Add more repetitions of the test runs.
More positions is easier. Finding more opponents can be problematic, in that some programs do _very_ poorly at fast time controls, which even I use on the cluster to get results faster. So fast games limits the potential candidates for testing opponents. But clearly more is better. And I am continually working on this issue. One important point is that you really want to test against stronger opponents for the most part, not weaker ones. That way you can recognize gains quicker than if you are way ahead of your opponents. That is another limiting factor in choosing opponents.2. Add more opponents.
3. Add more starting positions.
Which one of these options is "better"?
Or are they equal, meaning that the real value is only in the higher number of games?
That to me is an interesting question.
Kind regards,
Jesper
P.S.
I would hate to see anyone withdraw there contribution to this forum.
The diverse inputs are, i believe, one of the strengths of this place.
Even if the going gets a bit rough from time to time.
The reason option 1 looks tasty to me, is that it gives the option of iteratively adding more precision.
So you can run the test, look at the results, and decide if you think the change is good, bad or uncertain. Then if uncertain run the test again.
In this way there is an option to "break off" early, if a good or bad change is spotted, thereby saving some time.
But maybe having a large number of start positions you can break them up into chunks of a manageble number of positions. And them run the tests as needed?!
How to pick the positions to use in the tests?
One idea could be to take the positions where your program left book in the tournaments it has played in.
Another idea could be to randomly generate them by using your own book. So basically let your program pick a move, like it would in a real game, and then follow the line to the end of the book.
The pro is that the positions are biased towards positions your program is likely to pick in a tournament game.
The con is that the testing then inherits the blindspots from the opening book.
Thanks for the ideas!
Kind regards,
Jesper