LS-rankinglist: restart with double thinking-time

pohl4711 · Post by **pohl4711** » Tue Oct 09, 2012 3:59 pm

The LS-rankinglist (LightSpeed-rankinglist), now restarted with more than double thinking time (45''+500ms instead of 20''+250ms)

Intel i7-2630QM (SSE42 support, Windows 7 64bit, 2 GHz Quadcore, FritzMark=20.2), 64 MB Hash, 1 core per engine (Hyperthreading off), no ponder, no endgame-bases, no resign. 500 selected opening-positions (all 8 moves deep, from Frank Q.-database)
Elos calculated with bayeselo (mm 0 1)(fixpoint Robbolito 0.085g3 3000 Elo). LittleBlitzerGUI (gauntlet-mode only, because this GUI chooses opening-positions per random in the round-robin-mode from the PGN-file...)
Time: 45''+500ms Fischerbonus (= 85-90 seconds per game/engine).

LS-rankinglist with best engine-versions only (no betas, no settings, no development-versions):

Code: Select all

Rank Name                    Elo    +    - games score oppo. draws 
   1 Houdini 2.0c x64       3104    5    5 10000   62%  3016   43% 
   2 Critter 1.6a x64       3071    5    5 10000   57%  3019   52% 
   3 Strelka 5.5 x64        3070    5    5 10000   57%  3020   52% &#40;singlecore&#41;
   4 Komodo 5 x64           3058    5    5 10000   55%  3021   44% &#40;singlecore&#41;
   5 Ivanhoe 46h x64        3020    5    5 10000   49%  3024   54% &#40;best open source&#41;
   6 Robbolito 0.10 x64s    3018    5    5 10000   49%  3025   56% 
   7 Rybka 4.1 x64s         3012    5    5 10000   48%  3025   47% 
   8 Robbolito 0.085g3 x64  3000    5    5 10000   46%  3026   53% &#40;singlecore&#41;&#40;Ippolit 2009&#41;
   9 Stockfish 2.2.2 x64s   2994    5    5 10000   45%  3027   45% 
  10 Saros 3.0 x64          2988    5    5 10000   44%  3028   47% 
  11 Bouquet 1.4 x64s       2930    5    5 10000   35%  3034   44%

The complete LS-rankinglist:

Code: Select all

Rank Name                    Elo    +    - games score oppo. draws 
   1 Houdini 2.0c x64       3105    5    5 10000   62%  3017   43% 
   2 Houdini 1.5a x64       3084    5    5 10000   59%  3017   44% &#40;best freeware &#40;multicore&#41;)
   3 Critter 1.6a x64       3073    5    5 11000   57%  3026   52% 
   4 Strelka 5.5 x64        3073    5    5 11000   57%  3026   52% &#40;singlecore&#41;
   5 Komodo 5 x64           3060    5    5 11000   55%  3028   43% &#40;singlecore&#41;
   6 Ivanhoe 46h x64        3021    5    5 11000   49%  3031   54% &#40;best open source&#41;
   7 Robbolito 0.10 x64s    3019    5    5 11000   48%  3031   55% 
   8 Rybka 4.1 x64s         3013    5    5 11000   47%  3032   46% 
   9 Robbolito 0.085g3 x64  3000    5    5 11000   45%  3033   53% &#40;singlecore&#41;&#40;Ippolit 2009&#41;
  10 Stockfish 2.2.2 x64s   2996    5    5 11000   45%  3033   44% 
  11 Saros 3.0 x64          2988    5    5 11000   43%  3034   47% 
  12 Bouquet 1.4 x64s       2931    5    5 11000   35%  3039   43%

(x64=64bit version, x64s=64bit SSE42-version)

deleted betas, development-versions, settings: none
aborted test-gauntlets (because of too bad results): none

If you want to get the games from the LS-list, send me an eMail-adress per PM. I will send the games (PGN-file) as soon as possible...

Greetings – Stefan

ThatsIt · Post by **ThatsIt** » Tue Oct 09, 2012 7:25 pm

Wow!, average (nearly) 1 (one) second per move !
Not too bad ... ermm, too short.

pohl4711 · Post by **pohl4711** » Wed Oct 10, 2012 7:29 am

1 second is a very long time on a modern computer. And in the middlegame the time per move is around 2-3 seconds and 0.5 seconds in endgame. 1 second per move is the average time per move displayed by the LittleBlitzerGUI (for all moves in all games played by one engine).

Best - Stefan

lkaufman · Post by **lkaufman** » Wed Oct 10, 2012 5:10 pm

pohl4711 wrote:1 second is a very long time on a modern computer. And in the middlegame the time per move is around 2-3 seconds and 0.5 seconds in endgame. 1 second per move is the average time per move displayed by the LittleBlitzerGUI (for all moves in all games played by one engine).

Best - Stefan

I'm glad you doubled the time limit, it is at least now about what people call "bullet chess". I note that your computer is only 2 GHz, so it's still faster than bullet chess on any recent model computer. Your list will be pretty accurate for comparing similar engines, for example all Ippo-related engines, or different versions of the same engine, but will not be a good predictor of slower results when comparing dissimilar engines (i.e. Komodo or Stockfish vs. Ippos). Our latest (unreleased) Komodo on Windows is clearly stronger than H 1.5 (and probably equal to 2.0) at 5' level, but when I test at a level similar to the one you are using we are still 9 elo behind H 1.5 (after over 5000 games), though a PGO compile should close this gap. We still have no explanation for the extreme bullet strength of these Ippo-cousins, but it fades rapidly with longer time limits.

pohl4711 · Post by **pohl4711** » Sat Oct 13, 2012 8:48 am

lkaufman wrote: Your list will not be a good predictor of slower results

Compared with my old LS-rankinglist (20''+250ms) only Stockfish scores really better with the double time in the new LS-list (around +20 Elo). And only Houdini 2.0c scores weaker (-14 Elo). Houdini 1.5a score and elo did not change with doubled time. Komodo 5 is only around 8-10 Elo stronger with doubled time.
So I believe, that only Houdini 2.0c gets stronger and stronger with less thinking time,but not Houdini 1.5a (perhaps because Houdini 2 plays more aggressive than Houdini 1.5a ?!?). And only Stockfish gets really stronger with more thinking time and Komodo a little bit.

Best - Stefan

Houdini · Post by **Houdini** » Sat Oct 13, 2012 3:56 pm

pohl4711 wrote:
lkaufman wrote: Your list will not be a good predictor of slower results
Compared with my old LS-rankinglist (20''+250ms) only Stockfish scores really better with the double time in the new LS-list (around +20 Elo). And only Houdini 2.0c scores weaker (-14 Elo). Houdini 1.5a score and elo did not change with doubled time. Komodo 5 is only around 8-10 Elo stronger with doubled time.
So I believe, that only Houdini 2.0c gets stronger and stronger with less thinking time,but not Houdini 1.5a (perhaps because Houdini 2 plays more aggressive than Houdini 1.5a ?!?). And only Stockfish gets really stronger with more thinking time and Komodo a little bit.

Best - Stefan

Your 20"+250ms result for Houdini 2.0c was clearly a statistical outlier.
I play even faster games (e.g. 8"+80ms) on weaker hardware, and my results are very much in line with your current 45"+500ms results, Houdini 2.0c is about 20 to 25 Elo stronger than Houdini 1.5a.

It shows how dangerous it is to make extrapolations based on a limited number of data points, most of the simple conclusions tend to be incorrect.

Robert

lkaufman · Post by **lkaufman** » Sat Oct 13, 2012 4:20 pm

pohl4711 wrote:
lkaufman wrote: Your list will not be a good predictor of slower results
Compared with my old LS-rankinglist (20''+250ms) only Stockfish scores really better with the double time in the new LS-list (around +20 Elo). And only Houdini 2.0c scores weaker (-14 Elo). Houdini 1.5a score and elo did not change with doubled time. Komodo 5 is only around 8-10 Elo stronger with doubled time.
So I believe, that only Houdini 2.0c gets stronger and stronger with less thinking time,but not Houdini 1.5a (perhaps because Houdini 2 plays more aggressive than Houdini 1.5a ?!?). And only Stockfish gets really stronger with more thinking time and Komodo a little bit.

Best - Stefan

That agrees pretty well with my findings. But even a swing of ten elo for a doubling is not insignificant, it would imply that Komodo would gain 20 elo relative to H 1.5 and a lot more relative to H2 with two more doublings, which would be in the range of normal blitz.

LS-rankinglist: restart with double thinking-time

LS-rankinglist: restart with double thinking-time

Re: LS-rankinglist: restart with double thinking-time

Re: LS-rankinglist: restart with double thinking-time

Re: LS-rankinglist: restart with double thinking-time

Re: LS-rankinglist: restart with double thinking-time

Re: LS-rankinglist: restart with double thinking-time

Re: LS-rankinglist: restart with double thinking-time