So the plan is roughly this ... Each machine has a command line tournament director (like Winboard or Cutechess or whatever), a variety of good, non-commercial "reference" chess engines, and whatever other engines are to be tested. And a new program that I'll write that the user can run when the machine isn't doing anything else and has cycles to spare. This program will make an HTTP connection to a server which will tell it what match to play - what engines, time controls, settings, starting position, etc. It will run the match, send the results back, and request the next match.
In this way, I can decide what tests I want to run by lineing up matches in the central database, and they'll get played out as resources are available. I can come back and look at the results and schedule new matches accordingly. This will help to tweek paramters that require a large number of games to reduce the error bar enough to determine optimum values.
Given that lofty goal, I've got some questions before I embark on a specific path ...
1. What should I use to actually run the matches? Winboard? Cutechess? Something else? Sorry, first I must point out that I know most of you are Linux people, and, being an open-source freak myself, you'd think I would be too. But sadly, I'm in Windows land at home, and even if I wanted to change, the machines at work that I'd be using run Windows, and my friend's computers are running Windows, etc. So it has to work with Windows. Cutechess-cli claims to be cross-platform but there are no builds for any environment available that I've located and before I put time into trying to recompile it, I'd be interested to hear if it's been done and how easy it is. Winboard obviously offers Windows executables, but the number of command line options is so long I looked at it for over an hour without figuring out exactly what the heck I need and don't need. That'll require a lot of testing, so I'd be interested to hear if anyone knows exactly what works there (and many good engines are UCI.) Any other ideas here?
2. I'm concerned that test results might be inaccurate if I run thousands of games on different machines, even if they have the same time controls, are limited to only one thread, and the tables are small enough to fit in ram on all machines, simply because CPUs will be of different speeds (and thus will reach different search depths on different machines.) On the other hand, maybe it'll balence out. And I can make sure all machines are reasonably powerful. Maybe I could run a chess benchmark and send that with the results. Any thoughts about this?
3. Anything I'm overlooking?
Thanks in advance for any feedback. You can be sure that whatever I develop I'll make freely available for all as I have with ChessV

