I don't think this by itself is going to work well, especially when the players are close together in strength. One could add rounds to the tournament, or increases the number of games per individual match and perhaps this should take error margins into account.Don wrote:That would be an excellent paper if done well.mcostalba wrote:I think that one of the most valuable things that a kwnoledge contributor can give to the comunity is a paper with the title "The most resource efficent way of testing a chess engine"
In the past I have experimented with learning algorithms such as PBIL, which is related to genetic algorithms. The key requirement of PBIL is that you produce a set of players and pick the most "fit" from among them. In some domains fitness is directly measurable, in others it has to be estimated and chess skill of course is one of those.
So given a set of random players, what is the most efficient way to determine which program is strongest? I relaxed the requirement slightly and recognized that the problem is really how to find a GOOD player with the least amount of resources.
I'm sure there are probabilistic ways to do this. But the solution I came to is a divide and conquer approach. Given a nearly infinite pool of players (which I can generate at will) a knockout competition will very quickly find a reasonable opponent. The round robin approach is probably orders of magnitude less efficient. In a knockout, there is a very good chance that the winner will be one of the top players and the effort required to find this player is pretty small. Early rounds quickly separate the men from the boys. It also makes sense to play longer matches in later rounds, since a late round is a tiny fraction of the total effort.
So I wonder if some variation of this method might be used for testing in general - although it's doubtful since I am usually testing only 1 change at a time.
Someone better than myself at math might be able to come up with the proper tournament schedule (or other method) for "finding" the strongest player from a set of random players with the least amount of effort.
Whatever is done should be based on trying to recognize quickly where effort is being wasted evaluating a clearly inferior player.