can we estimate strength of an engine given time handicap?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

ozziejoe
Posts: 811
Joined: Wed Mar 08, 2006 10:07 pm

can we estimate strength of an engine given time handicap?

Post by ozziejoe »

Hi folks,

I have heard that each time you double processing speed of computer, the engine gets 50 pnts elo approximately. can this idea be used to estimate the playing level of an engine handicapped on time?

E.g., lets say an engine plays at 3000 level when given 3 hours on computer with hardware X. do you think the following are reasonable estimates at the shorter times.


elo time in minutes

3000 180

2950 90

2900 45

2850 22.5 minutes

2800 11.25 minutes

2750 about 5.5 minutes

Is an engine playing this strong at five minutes? Or is the starting value (3000) or the transformation (-50 elo per doubling in time) off?

If this is anywere close to accurate , it would suggest rybka could be competitive with top gm's when given only 5 1/2 minutes to play the whole game.




J
User avatar
tiger
Posts: 819
Joined: Sat Mar 11, 2006 3:15 am
Location: Guadeloupe (french caribbean island)

Re: can we estimate strength of an engine given time handica

Post by tiger »

ozziejoe wrote:Hi folks,

I have heard that each time you double processing speed of computer, the engine gets 50 pnts elo approximately. can this idea be used to estimate the playing level of an engine handicapped on time?

E.g., lets say an engine plays at 3000 level when given 3 hours on computer with hardware X. do you think the following are reasonable estimates at the shorter times.


elo time in minutes

3000 180

2950 90

2900 45

2850 22.5 minutes

2800 11.25 minutes

2750 about 5.5 minutes

Is an engine playing this strong at five minutes? Or is the starting value (3000) or the transformation (-50 elo per doubling in time) off?

If this is anywere close to accurate , it would suggest rybka could be competitive with top gm's when given only 5 1/2 minutes to play the whole game.




J


It is approximately correct, assuming:

- 60 to 70 elo points is a better estimate I think. The SSDF list can be used to compute this number as they have tested a number of engines on different hardware platform (for example K6-450, then Athlon 1200). Last time I did the math it was more like 70 elo points for Chess Tiger (it may vary for other programs, I think the 60 to 80 elo range is a reasonable guess).

- Naturally you assume that the opponent's time is NOT reduced. If you also reduce the opponent's time it does not work: the elo is almost unchanged if you reduce the thinking time of both opponents, because the elo is a measure of relative strength in equal conditions, it is not an absolute measure. When you reduce the time for both opponents their relative strength is generally unchanged until you reach the point where one gets mad trying to play the whole game in 10 seconds (and it happens to computers as well :) ).

- In order to get real reduced thinking time you must force the engine to play without "permanent brain" (thinking on opponent's time). So the initial rating of the engine must be corrected down (maybe 20 to 40 elo points).

You can find more about this concept of reducing thinking time at my FAQ page at http://www.chesstiger.com with some formulas.


// Christophe
ozziejoe
Posts: 811
Joined: Wed Mar 08, 2006 10:07 pm

Re: can we estimate strength of an engine given time handica

Post by ozziejoe »

Thanks Christofhe,

I took the liberty of applying your formula to various speed increases

new_elo = old_elo + 70 * log ( new_speed / old_speed ) / log ( 2 )

I used log 10, if that is ok?
Assumptions:

STart ELO of 3000 at 180 minutes against same oponent also taking 180 minutes,
70 pnt decrement estimate based on chess tiger
Oponent's time does not change


speed Elo
(Minutes)
180 3000
90 2930
45 2860
20 2778.10525
10 2708.10525
5 2638.10525



So, at 20 minutes time control, a 3000 rated program will be playing at approximately the same level as a super-gm playing 180 minute time control
gerold
Posts: 10121
Joined: Thu Mar 09, 2006 12:57 am
Location: van buren,missouri

Re: can we estimate strength of an engine given time handica

Post by gerold »

ozziejoe wrote:Thanks Christofhe,

I took the liberty of applying your formula to various speed increases

new_elo = old_elo + 70 * log ( new_speed / old_speed ) / log ( 2 )

I used log 10, if that is ok?
Assumptions:

STart ELO of 3000 at 180 minutes against same oponent also taking 180 minutes,
70 pnt decrement estimate based on chess tiger
Oponent's time does not change


speed Elo
(Minutes)
180 3000
90 2930
45 2860
20 2778.10525
10 2708.10525
5 2638.10525



So, at 20 minutes time control, a 3000 rated program will be playing at approximately the same level as a super-gm playing 180 minute time control
Joseph i don't know if you can apply program elo with human elo.

Good luck,

Gerold.
User avatar
tiger
Posts: 819
Joined: Sat Mar 11, 2006 3:15 am
Location: Guadeloupe (french caribbean island)

Re: can we estimate strength of an engine given time handica

Post by tiger »

ozziejoe wrote:Thanks Christofhe,

I took the liberty of applying your formula to various speed increases

new_elo = old_elo + 70 * log ( new_speed / old_speed ) / log ( 2 )

I used log 10, if that is ok?
Assumptions:

STart ELO of 3000 at 180 minutes against same oponent also taking 180 minutes,
70 pnt decrement estimate based on chess tiger
Oponent's time does not change


speed Elo
(Minutes)
180 3000
90 2930
45 2860
20 2778.10525
10 2708.10525
5 2638.10525



So, at 20 minutes time control, a 3000 rated program will be playing at approximately the same level as a super-gm playing 180 minute time control

Using any base for the logarithm is OK, as in the formula we have log divided by log. So using log10 is fine and you would get the same result by using the neperian logarithm (base e).

I have not checked your numbers by the principle is OK.

Your conclusion about the comparison with a super GM is probably right. The difficulty is that we do not know for sure the "human" elo of the strongest chess programs, by this I mean that we do not know what rating the strongest programs would get if they participated in all the human chess tournaments and were rated according to the same rules as human players. That is the most unknown part.


// Christophe
Uri Blass
Posts: 10281
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: can we estimate strength of an engine given time handica

Post by Uri Blass »

ozziejoe wrote:Thanks Christofhe,

I took the liberty of applying your formula to various speed increases

new_elo = old_elo + 70 * log ( new_speed / old_speed ) / log ( 2 )

I used log 10, if that is ok?
Assumptions:

STart ELO of 3000 at 180 minutes against same oponent also taking 180 minutes,
70 pnt decrement estimate based on chess tiger
Oponent's time does not change


speed Elo
(Minutes)
180 3000
90 2930
45 2860
20 2778.10525
10 2708.10525
5 2638.10525



So, at 20 minutes time control, a 3000 rated program will be playing at approximately the same level as a super-gm playing 180 minute time control
I think that you will find diminishing returns in case that the programs use native book because at longer time control more games are decided by the opening.

I am less sure about diminishing returns when generic short book is used and no games are decided based on book(assuming the sides have enough time)

I tried to look at the CCRL 40/4

http://computerchess.org.uk/ccrl/404.li ... _length=30

Here are some results(I remember that 64 bit relative to 32 bit is 60-70% speed improvement):

It seems results are consistent with the estimate of 60-70 elo improvement from doubling the speed.

Rybka 2.2 64-bit 3015 +24 −24 73.6% −168.1 30.1% 670
Rybka 2.2 32-bit 2992 +14 −13 73.8% −174.3 28.8% 2202

difference 27 elo

Rybka 2.1o 64-bit 3020 +26 −26 74.9% −179.6 28.6% 574
Rybka 2.1o 32-bit 2959 +14 −14 68.3% −131.6 31.4% 1947

difference 61 elo

Rybka 2.1c 64-bit 2CPU 3019 +23 −23 74.9% −181.9 27.5% 767
Rybka 2.1c 32-bit 2CPU 2974 +31 −30 75.3% −186.3 27.1% 432

difference 45 elo

Rybka 2.1c 64-bit 3006 +28 −27 74.7% −177.6 27.7% 513
Rybka 2.1c 32-bit 2951 +14 −14 67.2% −123.9 30.9% 1810

diffference 55 elo

Rybka 1.1 64-bit 2982 +41 −39 79.5% −225.3 22.7% 273
Rybka 1.1 32-bit 2941 +21 −20 66.7% −115.1 35.2% 819

difference 41 elo

Rybka 1.0 Beta 64-bit 2938 +23 −22 77.1% −198.0 27.9% 800
Rybka 1.0 Beta 32-bit 2894 +16 −16 67.6% −125.6 29.7% 1523

difference 44 elo


Uri
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: can we estimate strength of an engine given time handica

Post by hgm »

I would expect the ELO deterioration with search time to be dependent on the branching factor of the engine: the lower the branching factor, the more ply you lose at a certain time reduction. And the ELO difference between two versions of the same engine differing in search depth by one ply seems to be a constant.

I once measured this for micro-Max 2.3 (which did not have null-move pruning), and thids seems to gain 80 points of ELO (a 66% score) for every factor 3 search-time increase. Recalculating this to a factor 2 (log(3)/log(2)=0.48/0.30=1.6) would indeed give 50 ELO per factor 2.

But with null-move pruning the branching factor will be lower, so the engine will become more sensitive. This explains the higher number Christophe finds.

In fact this observation could be used to define the 'effective ply depth' for any scheme of prunings/reductions/extensions. Measure in time-odds matches how the strength and the reported depth depends on the search time, calculate how many ELO per reported ply that is, and then translate the ELO difference to 'effective ply' based on a similar measurement on a fixed-depth search. Then you will know how the reported plies correspond to real plies.
Uri Blass
Posts: 10281
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: can we estimate strength of an engine given time handica

Post by Uri Blass »

hgm wrote:I would expect the ELO deterioration with search time to be dependent on the branching factor of the engine: the lower the branching factor, the more ply you lose at a certain time reduction. And the ELO difference between two versions of the same engine differing in search depth by one ply seems to be a constant.

I once measured this for micro-Max 2.3 (which did not have null-move pruning), and thids seems to gain 80 points of ELO (a 66% score) for every factor 3 search-time increase. Recalculating this to a factor 2 (log(3)/log(2)=0.48/0.30=1.6) would indeed give 50 ELO per factor 2.

But with null-move pruning the branching factor will be lower, so the engine will become more sensitive. This explains the higher number Christophe finds.

In fact this observation could be used to define the 'effective ply depth' for any scheme of prunings/reductions/extensions. Measure in time-odds matches how the strength and the reported depth depends on the search time, calculate how many ELO per reported ply that is, and then translate the ELO difference to 'effective ply' based on a similar measurement on a fixed-depth search. Then you will know how the reported plies correspond to real plies.
I think that the rating difference may be also dependent on the evaluation so you cannot define the effective ply depth for any scheme of prunings/reductions/extensions.

Uri
User avatar
tiger
Posts: 819
Joined: Sat Mar 11, 2006 3:15 am
Location: Guadeloupe (french caribbean island)

Re: can we estimate strength of an engine given time handica

Post by tiger »

hgm wrote:I would expect the ELO deterioration with search time to be dependent on the branching factor of the engine: the lower the branching factor, the more ply you lose at a certain time reduction. And the ELO difference between two versions of the same engine differing in search depth by one ply seems to be a constant.

I once measured this for micro-Max 2.3 (which did not have null-move pruning), and thids seems to gain 80 points of ELO (a 66% score) for every factor 3 search-time increase. Recalculating this to a factor 2 (log(3)/log(2)=0.48/0.30=1.6) would indeed give 50 ELO per factor 2.

But with null-move pruning the branching factor will be lower, so the engine will become more sensitive. This explains the higher number Christophe finds.

In fact this observation could be used to define the 'effective ply depth' for any scheme of prunings/reductions/extensions. Measure in time-odds matches how the strength and the reported depth depends on the search time, calculate how many ELO per reported ply that is, and then translate the ELO difference to 'effective ply' based on a similar measurement on a fixed-depth search. Then you will know how the reported plies correspond to real plies.

A 66% winning percentage translates to a 112 elo points difference, not 80.


// Christophe
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: can we estimate strength of an engine given time handica

Post by hgm »

Hmm, I was using this table:

http://gobase.org/studying/articles/elo/

Should I multiply by a factor sqrt(2), because 200 is the standard deviation per player, and what you measure is the difference in strength of two players?