Your test version, Embla3949-fp, showed no improvement. Usually the results with the most games played provide the best answer. Going forward, run the RR first between all engines. Make that your base. then run a gauntlet against your base, but include in your evaluation all games from your base in addition to your gauntlet games. Naturally , your gauntlet will use the same openings as your base.flok wrote:Hi,
I'm trying to figure out of a change gave an improvement or not.
For that I let a version run a few times against other versions and other programs. In gauntlet mode with the main program being the one being tested (Embla3949-fp).
This gave:
To verify I also ran other combinations:Code: Select all
Rank Name Elo + - games score oppo. draws 1 dorpsgek 119 10 10 2900 69% -37 12% 2 Embla3949 33 8 8 2900 61% -37 40% 3 Embla3949-fp -37 6 6 8700 44% 12 30% 4 0.9.7 -115 9 9 2900 38% -37 39%
To my horror this all gave different results!Code: Select all
Rank Name Elo + - games score oppo. draws 1 dorpsgek 87 6 6 8700 65% -29 13% 2 Embla3949 64 9 9 2900 47% 87 13% 3 0.9.7 -75 9 10 2900 29% 87 17% 4 Embla3949-fp -76 9 10 2900 30% 87 10% Rank Name Elo + - games score oppo. draws 1 dorpsgek 66 10 10 2900 71% -93 17% 2 Embla3949 47 9 8 2900 71% -93 36% 3 Embla3949-fp -20 9 9 2900 61% -93 39% 4 0.9.7 -93 6 6 8700 32% 31 31% non-gauntlet Rank Name Elo + - games score oppo. draws 1 dorpsgek 85 6 6 8700 64% -28 13% 2 Embla3949 46 6 6 8700 60% -15 30% 3 Embla3949-fp -39 6 5 8700 43% 13 30% 4 0.9.7 -92 6 5 8700 33% 31 30%
So I wonder: is there a correct result to find?
(this may have been discussed already but I can't find that thread anymore)[/b]
testing
Moderator: Ras
-
MikeB
- Posts: 4889
- Joined: Thu Mar 09, 2006 6:34 am
- Location: Pen Argyl, Pennsylvania