Code: Select all
Tue Aug 12 00:49:44 CDT 2008
time control = 1+1
crafty-22.2R4
Rank Name                   Elo    +    - games score oppo. draws
   1 Glaurung 2-epsilon/5   108    7    7  7782   67%   -21   20%
   2 Fruit 2.1               62    7    6  7782   61%   -21   23%
   3 opponent-21.7           25    6    6  7780   57%   -21   33%
   4 Glaurung 1.1 SMP        10    6    6  7782   54%   -21   20%
   5 Crafty-22.2            -21    4    4 38908   46%     4   23%
   6 Arasan 10.0           -185    7    7  7782   29%   -21   19%
Tue Aug 12 11:36:10 CDT 2008
time control = 1+1
crafty-22.2R4
Rank Name                   Elo    +    - games score oppo. draws
   1 Glaurung 2-epsilon/5   110    6    7  7782   67%   -19   21%
   2 Fruit 2.1               63    6    7  7782   61%   -19   23%
   3 opponent-21.7           26    6    6  7782   57%   -19   33%
   4 Glaurung 1.1 SMP         7    6    7  7782   54%   -19   20%
   5 Crafty-22.2            -19    4    3 38910   47%     4   23%
   6 Arasan 10.0           -187    6    7  7782   28%   -19   19%
Those are the first two runs, and they seem to be more consistent than the last two big runs with just 40 positions. Run 3 in in progress and will finish tonight, by noon tomorrow the entire 160,000 games should be done. If, as Karl has suggested, these runs stay within expected variable limits, then we can start a discussion on reducing this computational load to something more palatable.
I'm just hoping that we see stable Elo numbers. But then the worry may well be that most sensible changes do not affect a program's Elo enough for this test to measure, which would be a completely different problem to deal with.
Note that this is not a full round-robin, although I could run one after the current test finishes if anyone wants to see how that would collapse the overall rating differences into a smaller range.
Remember, my goal is to compare A to A'. I don't care about absolute Elo values, or exactly how much better or worse A is than A', I only want to see if A' (which represents a slightly modified version of Crafty, AKA program A) is better or worse. Don't give a hoot about how much better or worse.


