Ultra-fast Tournament with toppest engines

beachknight · Post by **beachknight** » Fri Jun 04, 2010 8:09 am

   Engine               Score                                         Fi                                      Ko                                      Ry                                      Iv                                      Ho                                      Ro                                      Na                                      Ig                                      Cr    S-B
1: FireBird 1.3.1 1cpu  202.5/310 ······································· 10011=11110=10=11=1==001000=0=010010=00 1=11111010=1100010111110111011==00101=0 101010101=10101010101010101110101011101 10101111010=11=110=101111111=1111=11010 11=1010=1=111111=1=111=1=0=1=01101111=1 1010101010101010101010=01010=010101010  1011111111111111101110101=1=11=0=010101 =1110111110=1=111111=111=11111111=11=1   29353,
2: Komodo 1.2           180.5/310 01100=00001=01=00=0==110111=1=101101=11 ······································· =10000010=11=0=1=1==1=0100111101111010  0000001010101=11001=01111111=111100=111 11111101011=11011011111=10011111001111  00000000110000=1==111===0111011111111=1 000===0=100=0100000101100=101=1=1=11111 00010=11=1=110111=0==11=1110==1110111=1 0001=10=0101=1=00=010000111=111111=110=  27176,
3: Rybka 4 1cpu         169.0/309 0=00000101=0011101000001000100==11010=1 =01111101=00=1=0=0==0=1011000010000101  ······································· 1==101100=1=111111110101=1==1111110=11  0001111011010000=100=1000001000=001=01  ===1011000=0010=10=1==1=1=101=01==11111 01=101011111=1111111111101000101010=010 11110110111=1=11=010=01110100=1=00010=0 1111=11110110=11=000110=000111010011111  25215,
4: IvanHoe 9.63.2s 1cpu 168.0/309 010101010=01010101010101010001010100010 1111110101010=00110=10000000=000011=000 0==010011=0=000000001010=0==0000001=00  ······································· 1010101=0=0101=111=11101=1=101010111011 111101=1=10=0101=101=1=1=1==1010100010  100=101=1=1011100010101=101=1=101=1=101 10101=111=1111=0=0=111=1==1110101010111 111=1111=1110=1=1111=0111=11100=1=1==0   24558,
5: Houdini 1.01 2cpu    155.0/309 01010000101=00=001=010000000=0000=00101 00000010100=00100100000=01100000110000  1110000100101111=011=0111110111=110=10  0101010=1=1010=000=00010=0=010101000100 ······································· 10101011010101010100010101110=0=010101  10111=1001010101=1110100010111111111010 100010110111010=0101010=01=1010=010101= 11111111=01=1111111111=11111011=1110111  22447,
6: RobboLito 0.086h6    153.0/309 00=0101=0=000000=0=000=0=1=0=10010000=0 11111111001111=0==000===1000100000000=0 ===0100111=1101=01=0==0=0=010=10==00000 000010=0=01=1010=010=0=0=0==0101011101  01010100101010101011101010001=1=101010  ······································· 001010001010001110101010101010101010101 =11==1111111=1=1=101110=01=1111=1=1111  10111000111=10=1==11=0111=1111==100111=  22273,
7: Naum 4.2 1cpu        151.5/310 0101010101010101010101=10101=101010101  111===1=011=1011111010011=010=0=0=00000 10=010100000=0000000000010111010101=101 011=010=0=0100011101010=010=0=010=0=010 01000=0110101010=0001011101000000000101 110101110101110001010101010101010101010 ······································· 1110111110=0101110=0111010101110111010  =10011=01011011111100010110=1001001=10=  23065,
8: Igorrit 0.086v9 1cpu 112.5/309 0100000000000000010001010=0=00=1=101010 11101=00=0=001000=1==00=0001==0001000=0 00001001000=0=00=101=10001011=0=11101=1 01010=000=0000=1=1=000=0==0001010101000 011101001000101=1010101=10=0101=101010= =00==0000000=0=0=010001=10=0000=0=0000  0001000001=1010001=1000101010001000101  ······································· 101011=11110=0001011=00=110011=10=010=   17442,
9: Critter 0.70 1cpu    100.0/309 =0001000001=0=000000=000=00000000=00=0  1110=01=1010=0=11=101111000=000000=001= 0000=00001001=00=111001=111000101100000 000=0000=0001=0=0000=1000=00011=0=0==1  00000000=10=0000000000=00000100=0001000 01000111000=01=0==00=1000=0000==011000= =01100=10100100000011101001=0110110=01= 010100=00001=1110100=11=001100=01=101=  ·······································  15716,

1392 of 1764 games played
Name of the tournament: CET2708c
Site/ Country: CET_Antalya_TUR, Türkiye
Level: Blitz 0:01/0,1
Hardware: Intel(R) Pentium(R) 4 CPU 3.00GHz  with 1,536 MB Memory
Operating system: Microsoft Windows Vista Professional Service Pack 1 (Build 6001)

beachknight · Post by **beachknight** » Fri Jun 04, 2010 8:19 pm

Code: Select all

   Engine               Score    Fi        Ko        Ry      Iv      Ro    Ho       Na      Ig      Cr    S-B 
1: FireBird 1.3.1 1cpu  263.0/392 ·················································   47925, 
2: Komodo 1.2           237.0/392 01100=00001=01=00=0==110111=1=101101=111111==11=1   45038, 
3: Rybka 4 1cpu         213.5/392 0=00000101=0011101000001000100==11010=10000=00001   40305, 
4: IvanHoe 9.63.2s 1cpu 207.0/392 010101010=01010101010101010001010100010=010100010   38133, 
5: RobboLito 0.086h6    196.5/392 00=0101=0=000000=0=000=0=1=0=10010000=000000=0==0   35880, 
6: Houdini 1.01 2cpu    193.5/392 01010000101=00=001=010000000=0000=001010000=00000   35461, 
7: Naum 4.2 1cpu        185.0/392 0101010101010101010101=10101=10101010101010001010   35891, 
8: Igorrit 0.086v9 1cpu 138.0/392 0100000000000000010001010=0=00=1=1010100000000010   27052, 
9: Critter 0.70 1cpu    130.5/392 =0001000001=0=000000=000=00000000=00=000000100000   25676, 

1764 games played / Tournament finished

beachknight · Post by **beachknight** » Fri Jun 04, 2010 8:20 pm

Code: Select all


    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 FireBird 1.3.1 1cpu            : 3234   32  32   455    68.9 %   3095   13.4 %
  2 Komodo 1.2                     : 3180   29  29   455    61.2 %   3101   20.0 %
  3 Rybka 4 1cpu                   : 3159   30  30   455    57.9 %   3103   16.0 %
  4 IvanHoe 9.63.2s 1cpu           : 3146   30  30   455    55.9 %   3105   15.2 %
  5 Houdini 1cpu                   : 3138   41  40   250    56.0 %   3096   13.6 %
  6 RobboLito 0.086h6              : 3137   29  29   455    54.7 %   3104   17.6 %
  7 Houdini 1.01 2cpu              : 3130   32  32   399    50.1 %   3129   10.5 %
  8 Naum 4.2 1cpu                  : 3109   30  30   456    50.2 %   3107   10.1 %
  9 Igorrit 0.086v9 1cpu           : 3054   30  30   455    41.4 %   3115   14.7 %
 10 Critter 0.70 1cpu              : 3018   30  31   455    35.9 %   3118   15.6 %
 11 Stockfish 1.7.1 1cpu           : 2730   62  65   250     8.6 %   3141   10.0 %
 12 Protector 1.3.5 1cpu           : 2538  195   0    66     2.3 %   3138    1.5 %

beachknight · Post by **beachknight** » Sun Jun 06, 2010 10:53 am

Code: Select all

 

 
    Engine               Score         S-B

01: FireBird 1.3.1 1cpu  345.0/490    76346,
02: Rybka 4 1cpu         310.0/490    68249,
03: Komodo 1.2           307.5/490    70486,
04: IvanHoe 9.63.2s 1cpu 289.0/490    61409,
05: RobboLito 0.086h6    285.0/490    61015,
06: Houdini 1cpu         269.5/490    58616,
07: Naum 4.2 1cpu        259.0/490    58338,
08: Igorrit 0.086v9 1cpu 220.0/490    46521,
09: Critter 0.70 1cpu    215.5/490    45282,
10: Spark 0.4 1cpu       141.0/490    28938,
11: Stockfish 1.7.1 1cpu  53.5/490    13625,
 
 
2695 games played / Tournament finished
Name of the tournament: CET2708c
Site/ Country: CET_Antalya_ TUR, Türkiye

beachknight · Post by **beachknight** » Sun Jun 06, 2010 10:53 am

Code: Select all

06.06.2010 11:49:24 :

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 FireBird 1.3.1 1cpu            : 3251   30  30   546    71.4 %   3092   12.1 %
  2 Komodo 1.2                     : 3199   27  27   546    64.3 %   3097   20.9 %
  3 Rybka 4 1cpu                   : 3180   28  27   546    61.4 %   3099   15.6 %
  4 IvanHoe 9.63.2s 1cpu           : 3170   28  28   546    60.0 %   3099   13.7 %
  5 RobboLito 0.086h6              : 3156   27  27   547    58.0 %   3100   16.1 %
  6 Houdini 1.01 2cpu              : 3151   32  32   399    50.1 %   3150   10.5 %
  7 Houdini 1.0 1cpu               : 3139   29  29   489    54.9 %   3104   13.7 %
  8 Naum 4.2 1cpu                  : 3117   28  28   547    52.0 %   3103   10.4 %
  9 Igorrit 0.086v9 1cpu           : 3080   27  27   546    46.1 %   3108   13.4 %
 10 Critter 0.70 1cpu              : 3054   27  27   547    42.1 %   3109   15.5 %
 11 Spark 0.4 1cpu                 : 2965   32  32   489    28.8 %   3121    9.8 %
 12 Stockfish 1.7.1 1cpu           : 2776   42  44   490    10.9 %   3140    9.6 %
 13 Protector 1.3.5 1cpu           : 2558  195   0    66     2.3 %   3158    1.5 %

Stefan · Post by **Stefan** » Fri Jun 11, 2010 1:00 pm

brianr wrote:I have been trying to run some games with "fast" time controls under Arena (it looks like Arena was used in your games also).

After running the games, I load all of the separate tournaments (I run on two quads) with SCID and delete the losses on time (search header for "on time" and negate the filter before exporting). Then, I use PGNEXTRACT to remove any duplicate games before finally using BAYESELO. Even using random games from Bob Hyatt's 4,000 position suite, there are typically some duplicate games.

I have found that some engines handle "fast" times much better than others. Moreover, engine startup time can result in time losses (say for initially handling EGTB files, although they are not often used with search times of 0.1 seconds or less).

One thing that I have found helps a lot with Arena is to only run one pair of engines per copy of Arena (I run 8 copies on the two quads--no pondering, of course). It is a bit more tedious to set things up this way, but the tournament duplicate command is helpful. This works better than more engines since Arena will not shut down and restart each engine if there are only two. I have tried not using engine restart, but Arena always does seem to do so anyway with more than two engines per tournament. This method saves a lot of time, since in some cases the engine start up time is a significant portion of the entire game time. For much longer time controls it would not be worth the bother to do separate individual pairings.

My hope is that removing the time losses and duplicate games will minimize any resulting ratings impact. Incidentally, the fast games seem to work quite well when testing evaluation changes, but I use longer times for search-related testing (as others have mentioned). These suggestions may not be as important for larger engine ranking tournaments, but I am primarily interested in testing/measuring small improvements in Tinker, which usually takes several thousand games.

In Arena I use 1/4+1/4 time control (15 sec + 0,25 sec increment) for fast games. Still faster games are not useful in Arena because of the large GUI time overhead. Also I had never a lost on time at 1/4+1/4, but at faster games I had. For still faster games I would prefer cutechess. I only use 2 engines in a tournament to avoid engine start up time. I don't use EGTB and use a small hash table size to avoid any trouble at ultra fast games.

Stefan · Post by **Stefan** » Fri Jun 11, 2010 1:55 pm

beachknight wrote:

Code: Select all

06.06.2010 11:49:24 :

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 FireBird 1.3.1 1cpu            : 3251   30  30   546    71.4 %   3092   12.1 %
  2 Komodo 1.2                     : 3199   27  27   546    64.3 %   3097   20.9 %
  3 Rybka 4 1cpu                   : 3180   28  27   546    61.4 %   3099   15.6 %
  4 IvanHoe 9.63.2s 1cpu           : 3170   28  28   546    60.0 %   3099   13.7 %
  5 RobboLito 0.086h6              : 3156   27  27   547    58.0 %   3100   16.1 %
  6 Houdini 1.01 2cpu              : 3151   32  32   399    50.1 %   3150   10.5 %
  7 Houdini 1.0 1cpu               : 3139   29  29   489    54.9 %   3104   13.7 %
  8 Naum 4.2 1cpu                  : 3117   28  28   547    52.0 %   3103   10.4 %
  9 Igorrit 0.086v9 1cpu           : 3080   27  27   546    46.1 %   3108   13.4 %
 10 Critter 0.70 1cpu              : 3054   27  27   547    42.1 %   3109   15.5 %
 11 Spark 0.4 1cpu                 : 2965   32  32   489    28.8 %   3121    9.8 %
 12 Stockfish 1.7.1 1cpu           : 2776   42  44   490    10.9 %   3140    9.6 %
 13 Protector 1.3.5 1cpu           : 2558  195   0    66     2.3 %   3158    1.5 %

After 500 games your list should not differ to much from the official rating lists. Something must cause a distortion.

beachknight · Post by **beachknight** » Fri Jun 11, 2010 11:26 pm

Stefan wrote:
brianr wrote:I have been trying to run some games with "fast" time controls under Arena (it looks like Arena was used in your games also).

After running the games, I load all of the separate tournaments (I run on two quads) with SCID and delete the losses on time (search header for "on time" and negate the filter before exporting). Then, I use PGNEXTRACT to remove any duplicate games before finally using BAYESELO. Even using random games from Bob Hyatt's 4,000 position suite, there are typically some duplicate games.

I have found that some engines handle "fast" times much better than others. Moreover, engine startup time can result in time losses (say for initially handling EGTB files, although they are not often used with search times of 0.1 seconds or less).

One thing that I have found helps a lot with Arena is to only run one pair of engines per copy of Arena (I run 8 copies on the two quads--no pondering, of course). It is a bit more tedious to set things up this way, but the tournament duplicate command is helpful. This works better than more engines since Arena will not shut down and restart each engine if there are only two. I have tried not using engine restart, but Arena always does seem to do so anyway with more than two engines per tournament. This method saves a lot of time, since in some cases the engine start up time is a significant portion of the entire game time. For much longer time controls it would not be worth the bother to do separate individual pairings.

My hope is that removing the time losses and duplicate games will minimize any resulting ratings impact. Incidentally, the fast games seem to work quite well when testing evaluation changes, but I use longer times for search-related testing (as others have mentioned). These suggestions may not be as important for larger engine ranking tournaments, but I am primarily interested in testing/measuring small improvements in Tinker, which usually takes several thousand games.
In Arena I use 1/4+1/4 time control (15 sec + 0,25 sec increment) for fast games. Still faster games are not useful in Arena because of the large GUI time overhead. Also I had never a lost on time at 1/4+1/4, but at faster games I had. For still faster games I would prefer cutechess. I only use 2 engines in a tournament to avoid engine start up time. I don't use EGTB and use a small hash table size to avoid any trouble at ultra fast games.

Thank you for your observations and thoughtful advice, Stefan.
I have to re-think and probably revise my ultra-fast testing methodology.

Best,

beachknight · Post by **beachknight** » Fri Jun 11, 2010 11:30 pm

Stefan wrote:

beachknight wrote:

Code: Select all

06.06.2010 11:49:24 :

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 FireBird 1.3.1 1cpu            : 3251   30  30   546    71.4 %   3092   12.1 %
  2 Komodo 1.2                     : 3199   27  27   546    64.3 %   3097   20.9 %
  3 Rybka 4 1cpu                   : 3180   28  27   546    61.4 %   3099   15.6 %
  4 IvanHoe 9.63.2s 1cpu           : 3170   28  28   546    60.0 %   3099   13.7 %
  5 RobboLito 0.086h6              : 3156   27  27   547    58.0 %   3100   16.1 %
  6 Houdini 1.01 2cpu              : 3151   32  32   399    50.1 %   3150   10.5 %
  7 Houdini 1.0 1cpu               : 3139   29  29   489    54.9 %   3104   13.7 %
  8 Naum 4.2 1cpu                  : 3117   28  28   547    52.0 %   3103   10.4 %
  9 Igorrit 0.086v9 1cpu           : 3080   27  27   546    46.1 %   3108   13.4 %
 10 Critter 0.70 1cpu              : 3054   27  27   547    42.1 %   3109   15.5 %
 11 Spark 0.4 1cpu                 : 2965   32  32   489    28.8 %   3121    9.8 %
 12 Stockfish 1.7.1 1cpu           : 2776   42  44   490    10.9 %   3140    9.6 %
 13 Protector 1.3.5 1cpu           : 2558  195   0    66     2.3 %   3158    1.5 %

After 500 games your list should not differ to much from the official rating lists. Something must cause a distortion.

I'll replicate this test with two different TCs later, one you just suggested,
1/4+1/4 and the other inbetween. Lets see whether the picture changes.

Ultra-fast Tournament with toppest engines

Re: Ultra-fast Tournament with toppest engines

Re: Ultra-fast Tournament with toppest engines

Re: Ultra-fast Tournament with toppest engines

Re: Ultra-fast Tournament with toppest engines

Re: Ultra-fast Tournament with toppest engines

Re: Ultra-fast Tournament with toppest engines

Re: Ultra-fast Tournament with toppest engines

Re: Ultra-fast Tournament with toppest engines

Re: Ultra-fast Tournament with toppest engines