Playing strength development - increasing time control

fastgm · Post by **fastgm** » Thu Apr 17, 2014 7:42 pm

After nearly a year and a total of 540000 games the experiment playing strength development with increasing time control (exact quadrupling) is completed.

Here are the results (Time controls: 3.75+0.0375, 15.00+0.15, 60.00+0.60 and 240.00+2.40 in seconds) and the corresponding diagram:

The 10 tested engines can be divided into three groups:

The balanced:
Komodo, Gull and Hannibal play at all time controls fairly balanced.

The losers:
Houdini, Rybka, Bouquet and Critter points over the competition especially good in shorter time controls. At longer time controls they lose in this experiment between 41 and 70 Elo.

The winners:
Protector, Naum und Stockfish increase their playing strength at longer time controls.
Stockfish 59 Elo, Naum 68 Elo and Protector even 92 Elo!

Code: Select all

ELOStat Start Elo&#58; 3000

Programm                 | ELO  |  ELO |  ELO |  ELO | 
                         | 3.75 |  15  |  60  |  240 | Diff.
-------------------------+------+------+------+------+------
Houdini 3 x64            | 3215 | 3180 | 3159 | 3145 | -70
Critter 1.6a 64-bit      | 3125 | 3086 | 3070 | 3059 | -66
Komodo CCT 64-bit        | 3099 | 3088 | 3087 | 3089 | -10
Bouquet 1.6 pop64        | 3074 | 3049 | 3031 | 3021 | -53
Deep Rybka 4.1 SSE42 x64 | 3056 | 3033 | 3022 | 3015 | -41
Gull R375 x64            | 2980 | 3000 | 3001 | 2996 | +16
Stockfish 3 Ja 64bit     | 2970 | 3012 | 3018 | 3029 | +59
Hannibal 1.3 x64         | 2878 | 2877 | 2876 | 2882 | + 4
Naum 4.2 x64             | 2823 | 2869 | 2877 | 2891 | +68
Protector 1.5.0 x64      | 2781 | 2806 | 2859 | 2873 | +92
-------------------------+------+------+------+------+-----
                   Diff. |  434 |  374 |  300 |  272 |

Furthermore the difference between 15 sec and 240 sec:

Code: Select all

Programm                 | Diff.|
-------------------------+------+
Houdini 3 x64            | -35  |
Critter 1.6a 64-bit      | -27  |
Komodo CCT 64-bit        | + 1  |
Bouquet 1.6 pop64        | -28  |
Deep Rybka 4.1 SSE42 x64 | -18  |
Gull R375 x64            | - 4  |
Stockfish 3 Ja 64bit     | +17  |
Hannibal 1.3 x64         | + 5  |
Naum 4.2 x64             | +22  |
Protector 1.5.0 x64      | +67  |
-------------------------+------+

The end result for all time controls with evaluation (Elo, Bayeselo and Ordo) and individual results are available on my website (Experimental Rating Lists).

http://www.fastgm.de

Regards,
Andreas

velmarin · Post by **velmarin** » Thu Apr 17, 2014 8:12 pm

Bouquet particular,
The latest version 1.8, one of the improvements it was time, I think it worked fine.

The fact is that "all these leading engines"
when they play have a particular way of extending in certain plays,
the immediately lower rated engines tend to play more automatic in time, probably the result of function search more simple.

Interesting statistics.

Thanks!

Uri Blass · Post by **Uri Blass** » Thu Apr 17, 2014 8:24 pm

I think that the comparison is not fair because the weaker engines have the advantage in earning rating points at longer time control because of diminishing returns.

If you want to test which engine earn more from time you should start with unequal time control when all the engines score something close to 50% and multiply the time control.

Uri

aturri · Post by **aturri** » Fri Apr 18, 2014 2:25 pm

Nice work, and very interesting results!

aturri · Post by **aturri** » Fri Apr 18, 2014 2:37 pm

Uri Blass wrote:I think that the comparison is not fair because the weaker engines have the advantage in earning rating points at longer time control because of diminishing returns.

If you want to test which engine earn more from time you should start with unequal time control when all the engines score something close to 50% and multiply the time control.

Uri

I think the experiment is very good. At the very end, if you use one engine to analise a game, it is important to know how your program scales both on time and in CPUs. You use the 100% of your time and the (near) 100% of your computer. Specially now that the tests and many rating lists focus on lightning chess.

Were you to decide which engine to use to analise a game, and had you only Komodo and Critter, and it seems obvious that Komodo would be preferable over Critter, as it scales much better on time, so if you were to analise several hours the same position, one could expect Komodo to perform better than Critter.

Even if the time is enough big, perhaps there is no so much difference between Komodo and Houdini, even if Houdini is +120 ELO points ahead on fast time controls.

Anyway, your proposal would be also really interesting, but these kind of experiments take a looooooong time to be run

petero2 · Post by **petero2** » Fri Apr 18, 2014 3:49 pm

Uri Blass wrote:I think that the comparison is not fair because the weaker engines have the advantage in earning rating points at longer time control because of diminishing returns.

If you want to test which engine earn more from time you should start with unequal time control when all the engines score something close to 50% and multiply the time control.

I think you can compensate for most of this effect with some post processing of the data. Assuming that the rating scale compresses at longer time controls, it seems relevant to convert the absolute ratings to a relative value by subtracting the mean and dividing by the standard deviation. This gives the following table:

Code: Select all

Program                     3.75      15        60        240
Houdini 3 x64               1.53789   1.55175   1.59083   1.57912
Critter 1.6a 64-bit         0.89382   0.74139   0.70037   0.64254
Komodo CCT 64-bit           0.70776   0.75863   0.87045   0.96925
Bouquet 1.6 pop64           0.52885   0.42242   0.31016   0.22870
Deep Rybka 4.1 SSE42 x64    0.40004   0.28449   0.22011   0.16336
Gull R375 x64              -0.14384   0.00000   0.01001  -0.04356
Stockfish 3 Ja 64bit       -0.21541   0.10345   0.18009   0.31582
Hannibal 1.3 x64           -0.87379  -1.06036  -1.24065  -1.28507
Naum 4.2 x64               -1.26738  -1.12933  -1.23064  -1.18706
Protector 1.5.0 x64        -1.56795  -1.67244  -1.41074  -1.38309

For each engine you can then compute how well it scales by fitting a straight line to its 4 relative rating values and compute the slope of the line. This gives the following table after sorting:

Code: Select all

 Slope     Program
 0.167033  Stockfish 3 Ja 64bit
 0.089629  Komodo CCT 64-bit
 0.081629  Protector 1.5.0 x64
 0.031085  Gull R375 x64
 0.016275  Houdini 3 x64
 0.013966  Naum 4.2 x64
-0.077442  Deep Rybka 4.1 SSE42 x64
-0.079489  Critter 1.6a 64-bit
-0.101272  Bouquet 1.6 pop64
-0.141415  Hannibal 1.3 x64

Playing strength development - increasing time control

Playing strength development - increasing time control

Re: Playing strength development - increasing time control

Re: Playing strength development - increasing time control

Re: Playing strength development - increasing time control

Re: Playing strength development - increasing time control

Re: Playing strength development - increasing time control