testing

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: testing

Post by MikeB »

flok wrote:Hi,

I'm trying to figure out of a change gave an improvement or not.
For that I let a version run a few times against other versions and other programs. In gauntlet mode with the main program being the one being tested (Embla3949-fp).
This gave:

Code: Select all

Rank Name           Elo    +    - games score oppo. draws
   1 dorpsgek       119   10   10  2900   69%   -37   12%
   2 Embla3949       33    8    8  2900   61%   -37   40%
   3 Embla3949-fp   -37    6    6  8700   44%    12   30%
   4 0.9.7         -115    9    9  2900   38%   -37   39%
To verify I also ran other combinations:

Code: Select all

Rank Name           Elo    +    - games score oppo. draws
   1 dorpsgek        87    6    6  8700   65%   -29   13%
   2 Embla3949       64    9    9  2900   47%    87   13%
   3 0.9.7          -75    9   10  2900   29%    87   17%
   4 Embla3949-fp   -76    9   10  2900   30%    87   10%

Rank Name           Elo    +    - games score oppo. draws
   1 dorpsgek        66   10   10  2900   71%   -93   17%
   2 Embla3949       47    9    8  2900   71%   -93   36%
   3 Embla3949-fp   -20    9    9  2900   61%   -93   39%
   4 0.9.7          -93    6    6  8700   32%    31   31%

non-gauntlet
Rank Name           Elo    +    - games score oppo. draws
   1 dorpsgek        85    6    6  8700   64%   -28   13%
   2 Embla3949       46    6    6  8700   60%   -15   30%
   3 Embla3949-fp   -39    6    5  8700   43%    13   30%
   4 0.9.7          -92    6    5  8700   33%    31   30%
To my horror this all gave different results!
So I wonder: is there a correct result to find?




(this may have been discussed already but I can't find that thread anymore)[/b]
Your test version, Embla3949-fp, showed no improvement. Usually the results with the most games played provide the best answer. Going forward, run the RR first between all engines. Make that your base. then run a gauntlet against your base, but include in your evaluation all games from your base in addition to your gauntlet games. Naturally , your gauntlet will use the same openings as your base.