NEBB-Rankinglists: RobboLito 0.10

pohl4711 · Post by **pohl4711** » Sat Jan 21, 2012 10:08 am

The NEBB-Rankingslists (Naked Engine Bullet and Blitz) now with RobboLito 0.10:

Intel Q9550 2.83GHz Quad (no SSE support, Vista 64bit), LittleBlitzerGUI, 256 MB Hash, 1 Core per Engine, no ponder, no bases, no resign. 50 super-short test-positions (1.a3 a6, 1.a3 b6, 1.a3 c6…..1.h3 g6, 1.h3 h6) = Naked Engines (no openings (book or long test-positions (Noomen etc.)), no endgame-databases) – only engine-thinking from move 2 until mate or draw. Elos calculated with bayeselo (fixpoint Stockfish 2.1.1 JA 3000 Elo).
Two lists with exact same conditions except the thinking time. That makes it possible to see, which engine scores better or worse with more or less thinking time...

Blitzlist (4’+2’’)

Code: Select all

Rank Name                       Elo    +    - games score oppo. draws 
   1 Houdini 2.0c x64          3110   17   16  1000   61%  3038   39% 
   2 Houdini 1.5a x64          3100   17   17  1000   60%  3038   40% &#40;best freeware&#41; 
   3 Critter 1.4 64-bit        3082   17   17   900   56%  3047   46% 
   4 Komodo 4 x64              3070   16   16  1000   53%  3049   43% &#40;singlecore&#41;
   5 Critter 1.2 64-bit        3055   18   18   800   51%  3049   45% 
   6 Ivanhoe B46fa x64         3041   15   15  1100   49%  3050   53% 
   7 Stockfish 2.2.1 JA 64bit  3036   18   18   800   47%  3059   42% 
   8 Komodo 3 x64              3031   19   19   700   47%  3050   46% &#40;singlecore&#41; 
   9 Rybka 4.1 x64             3026   15   15  1100   46%  3051   47% 
  10 RobboLito 0.10 x64        3024   19   19   700   43%  3066   49% 
  11 RobboLito 0.09 x64        3016   16   16  1000   44%  3055   50% &#40;singlecore&#41; 
  12 Stockfish 2.1.1 JA 64bit  3000   17   17   900   41%  3059   42%

Bulletlist (1’+500 ms)

Code: Select all

Rank Name                       Elo    +    - games score oppo. draws 
   1 Houdini 2.0c x64          3122   17   17  1000   63%  3036   36% 
   2 Houdini 1.5a x64          3103   17   17  1000   60%  3036   37% &#40;best freeware&#41;
   3 Critter 1.4 64-bit        3092   17   17   900   57%  3045   43% 
   4 Critter 1.2 64-bit        3067   18   18   800   53%  3047   43% 
   5 Komodo 4 x64              3053   17   17  1000   51%  3051   38% &#40;singlecore&#41;
   6 Ivanhoe B46fa x64         3042   15   15  1100   49%  3050   48% 
   7 RobboLito 0.10 x64        3034   19   19   700   45%  3065   47% 
   8 Komodo 3 x64              3024   20   20   700   46%  3052   37% &#40;singlecore&#41;
   9 Rybka 4.1 x64             3021   16   16  1100   45%  3052   40% 
  10 Stockfish 2.2.1 JA 64bit  3020   19   19   800   44%  3060   37% 
  11 RobboLito 0.09 x64        3009   16   16  1000   43%  3054   45% &#40;singlecore&#41;
  12 Stockfish 2.1.1 JA 64bit  3000   18   17   900   41%  3059   38%

Greetings – Stefan

Houdini · Post by **Houdini** » Sun Jan 22, 2012 2:30 pm

pohl4711 wrote:Two lists with exact same conditions except the thinking time. That makes it possible to see, which engine scores better or worse with more or less thinking time...

Stefan, you don't play enough games for this. Within the error bars the two lists show exactly the same result.

The two columns "+" and "-" in your table show the 95% confidence interval of the individual ratings, error bars in your list are more than 15 points.
Comparing "which engine scores better or worse with more or less thinking time" involves using 4 individual ratings, the error bar on this comparison is more than 30 points.

Robert

lucasart · Post by **lucasart** » Sun Jan 22, 2012 4:30 pm

Houdini wrote:
pohl4711 wrote:Two lists with exact same conditions except the thinking time. That makes it possible to see, which engine scores better or worse with more or less thinking time...
Stefan, you don't play enough games for this. Within the error bars the two lists show exactly the same result.

The two columns "+" and "-" in your table show the 95% confidence interval of the individual ratings, error bars in your list are more than 15 points.
Comparing "which engine scores better or worse with more or less thinking time" involves using 4 individual ratings, the error bar on this comparison is more than 30 points.

Robert

Indeed.

But your reasoning to add the error bars is somewhat more conservative than the 95%. The right way is to pull the LOS matrix from Bayeselo.

Houdini · Post by **Houdini** » Sun Jan 22, 2012 4:53 pm

lucasart wrote:But your reasoning to add the error bars is somewhat more conservative than the 95%. The right way is to pull the LOS matrix from Bayeselo.

There is no LOS matrix for the case Stefan mentioned, to "see, which engine scores better or worse with more or less thinking time".

Note that I didn't simply "add" the error bars, I combine them by SRSS (square root of the sum of squares). If you need 4 ratings to produce a result, the error bar on the result is SQRT(4) = 2 times larger than the individual error bar.

Robert

lucasart · Post by **lucasart** » Sun Jan 22, 2012 5:05 pm

Houdini wrote: Note that I didn't simply "add" the error bars, I combine them by SRSS (square root of the sum of squares).

Ah sorry, I didn't verify the calculation to notice it. But yes, if you don't have the LOS matrix, the SRSS is probably the best proxy.

NEBB-Rankinglists: RobboLito 0.10

NEBB-Rankinglists: RobboLito 0.10

Re: NEBB-Rankinglists: RobboLito 0.10

Re: NEBB-Rankinglists: RobboLito 0.10

Re: NEBB-Rankinglists: RobboLito 0.10

Re: NEBB-Rankinglists: RobboLito 0.10