Elo Increase per Doubling

Don · Post by **Don** » Wed May 09, 2012 12:52 am

Laskos wrote:
Adam Hair wrote:
For the short time control, the base time was 6 seconds + 0.1 seconds per move.
I mean, what interface are you using to assign assymmetrical time controls? LittleBlitzer can't do this. Or, how Don assigns fixed, but different nodes and depths? I managed to get fixed, different depths only in Fritz and Aquarium.

Kai

I'm using my own non-graphical java based automated tester. The time control is part of the identity of the player, so I can even mix types of time controls in the same test.

Adam Hair · Post by **Adam Hair** » Wed May 09, 2012 1:14 am

Laskos wrote:
Adam Hair wrote:
For the short time control, the base time was 6 seconds + 0.1 seconds per move.
I mean, what interface are you using to assign assymmetrical time controls? LittleBlitzer can't do this. Or, how Don assigns fixed, but different nodes and depths? I managed to get fixed, different depths only in Fritz and Aquarium.

Kai

For asymmetric time controls, I use Winboard.

For fix nodes and depths, I suppose I could use Winboard. However, I have been using Arena and InBetween.

Adam Hair · Post by **Adam Hair** » Wed May 09, 2012 1:26 am

Don wrote:
Adam Hair wrote:
Don wrote:
petero2 wrote:
Don wrote:
JuLieN wrote:@Don

What is the value of the asymptote to your red curve? (This is of course the maximal Elo in your system).
Is that a joke? I wish I knew where it all ended

I think if we gathered this data for several more levels and got enough data for each point we could try to estimate the value using some type of curve fitting.

If someone wants to take a crack at estimating this with curve fitting, here is the final result:
Code: Select all
Rank Name    Elo      +      -    games   score   oppo.   draws 
   1 10    3127.9   25.4   25.4    2739   94.2%  2328.9   10.4% 
   2 09    2997.0   21.6   21.6    2739   86.8%  2341.9   15.2% 
   3 08    2891.6   20.4   20.4    2740   79.9%  2352.8   15.5% 
   4 07    2752.0   19.9   19.9    2740   70.2%  2366.7   15.8% 
   5 06    2615.3   20.2   20.2    2740   60.8%  2380.4   14.5% 
   6 05    2467.2   20.9   20.9    2740   51.2%  2395.2   13.0% 
   7 04    2307.4   22.1   22.1    2740   41.4%  2411.2   10.5% 
   8 03    2118.3   23.5   23.5    2740   30.6%  2430.1    8.6% 
   9 02    1932.7   25.3   25.3    2740   20.9%  2448.6    7.8% 
  10 01    1709.7   28.6   28.6    2740   10.9%  2470.9    5.3% 
  11 00    1500.0   36.9   36.9    2740    3.2%  2491.9    3.1% 
After some trial and error: Elo=1500+2350*ln(1+Name/10)
.
It seems to be a really good fit, but I extrapolated it to level 1000 and it shows an ELO of over 12000!

If I go to 100000 it comes out to almost 18,000 ELO!

I don't believe perfect play goes that high. If I'm not mistaken you can achieve an arbitrarily high ELO with this formula - there is no asymptote so I guess it's just a good approximation at these low levels.

Do you have any other models that might impose some sort of limit?
Elo = 6028.21*e^(-12.48/(Name+8.95))

This equation fits your data well and is asymtopic to Elo = 6028.21.

Of course, 6028.21 Elo is only relevant to the Elo values assigned in your data. But, your engine, limited to only 2^18 nodes per move, seems to be ~2900 Elo away from perfection. If you let it search 2^1,000,000 nodes per move, Komodo will play virtually perfect chess (according to this model, its Elo would be 6028.13). 2^10,000 nodes/move would be within 8 Elo of perfection.
I find 4500 to be a much more believable figure than 6028 - but what do I know? It could be anything!

BTW I adjusted my data so that version 00 is calibrated to 1500 ELO - I think that is within about 100 or 200 ELO of the truth. Komodo is doing about 3.18 ply on average. At level 02, Komodo would be about 433 ELO stronger or 1933 doing on average 5.37 ply.

What would be interesting is to run the same experiment with a different program using the same basic formula (with adjusted constants) and see if they agree!

I think that it is probably closer to 4500 than 6000 also. But you are right; it could be any number.

I started a fixed node test this morning with Gaviota. I could start over using 512 nodes as the base.

Laskos · Post by **Laskos** » Wed May 09, 2012 2:20 am

Adam Hair wrote:
Laskos wrote:
Adam Hair wrote:
For the short time control, the base time was 6 seconds + 0.1 seconds per move.
I mean, what interface are you using to assign assymmetrical time controls? LittleBlitzer can't do this. Or, how Don assigns fixed, but different nodes and depths? I managed to get fixed, different depths only in Fritz and Aquarium.

Kai
For asymmetric time controls, I use Winboard.

For fix nodes and depths, I suppose I could use Winboard. However, I have been using Arena and InBetween.

If I understood correctly, you are using Arena with InBetween, I don't see another way with Arena. Could you post the InBetween.ini files for Arena with the cases: double number of nodes, 1 ply more and double the time? Also, maybe you will show what command-line calls one has to make (for example, with Houdini under Arena).

Thanks,

Kai

Adam Hair · Post by **Adam Hair** » Wed May 09, 2012 4:44 am

Laskos wrote:
Adam Hair wrote:
Laskos wrote:
Adam Hair wrote:
For the short time control, the base time was 6 seconds + 0.1 seconds per move.
I mean, what interface are you using to assign assymmetrical time controls? LittleBlitzer can't do this. Or, how Don assigns fixed, but different nodes and depths? I managed to get fixed, different depths only in Fritz and Aquarium.

Kai
For asymmetric time controls, I use Winboard.

For fix nodes and depths, I suppose I could use Winboard. However, I have been using Arena and InBetween.
If I understood correctly, you are using Arena with InBetween, I don't see another way with Arena. Could you post the InBetween.ini files for Arena with the cases: double number of nodes, 1 ply more and double the time? Also, maybe you will show what command-line calls one has to make (for example, with Houdini under Arena).

Thanks,

Kai

I will assume you know how to set up fixed depth and fixed nodes tournaments in Arena. For ply testing, set up a folder for each number of plies you plan to test. Copy the engine you are using into each folder along with InBetween.exe and InBetween.ini. Change the name of InBetween.exe to represent the engine and number of plies and then install it in Arena as if it were an engine. Modify the following line in InBeTween.ini:

; clientword := serverword

to

sd x := sd y (Winboard engine)
or
go depth x := go depth y (UCI engine)

where x equals the fixed depth Arena is set to send and y equals the depth you want that copy of the engine to play. It is similar for nodes testing.

For time odds, I use Winboard. If you use the latest Winboard with the built-in tournament manager, then set up the tournament with your base time control. Then, you have to modify winboard.ini (found at C:\Documents and Settings\Administrator\Application Data in Windows XP). For each engine in your tournament, add -firstTimeOdds=x, so that1/x times the base time control equals the time control that particular engine will play with. If you use PSWBTM as the tournament manager, then you have to add /%sTimeOdds=x (where x is the same as above) for each engine's parameter in the PSWBTM Engine Manager.

Don · Post by **Don** » Wed May 09, 2012 4:50 am

Adam Hair wrote:
Don wrote:
Adam Hair wrote:
Don wrote:
petero2 wrote:
Don wrote:
JuLieN wrote:@Don

What is the value of the asymptote to your red curve? (This is of course the maximal Elo in your system).
Is that a joke? I wish I knew where it all ended

I think if we gathered this data for several more levels and got enough data for each point we could try to estimate the value using some type of curve fitting.

If someone wants to take a crack at estimating this with curve fitting, here is the final result:
Code: Select all
Rank Name    Elo      +      -    games   score   oppo.   draws 
   1 10    3127.9   25.4   25.4    2739   94.2%  2328.9   10.4% 
   2 09    2997.0   21.6   21.6    2739   86.8%  2341.9   15.2% 
   3 08    2891.6   20.4   20.4    2740   79.9%  2352.8   15.5% 
   4 07    2752.0   19.9   19.9    2740   70.2%  2366.7   15.8% 
   5 06    2615.3   20.2   20.2    2740   60.8%  2380.4   14.5% 
   6 05    2467.2   20.9   20.9    2740   51.2%  2395.2   13.0% 
   7 04    2307.4   22.1   22.1    2740   41.4%  2411.2   10.5% 
   8 03    2118.3   23.5   23.5    2740   30.6%  2430.1    8.6% 
   9 02    1932.7   25.3   25.3    2740   20.9%  2448.6    7.8% 
  10 01    1709.7   28.6   28.6    2740   10.9%  2470.9    5.3% 
  11 00    1500.0   36.9   36.9    2740    3.2%  2491.9    3.1% 
After some trial and error: Elo=1500+2350*ln(1+Name/10)
.
It seems to be a really good fit, but I extrapolated it to level 1000 and it shows an ELO of over 12000!

If I go to 100000 it comes out to almost 18,000 ELO!

I don't believe perfect play goes that high. If I'm not mistaken you can achieve an arbitrarily high ELO with this formula - there is no asymptote so I guess it's just a good approximation at these low levels.

Do you have any other models that might impose some sort of limit?
Elo = 6028.21*e^(-12.48/(Name+8.95))

This equation fits your data well and is asymtopic to Elo = 6028.21.

Of course, 6028.21 Elo is only relevant to the Elo values assigned in your data. But, your engine, limited to only 2^18 nodes per move, seems to be ~2900 Elo away from perfection. If you let it search 2^1,000,000 nodes per move, Komodo will play virtually perfect chess (according to this model, its Elo would be 6028.13). 2^10,000 nodes/move would be within 8 Elo of perfection.
I find 4500 to be a much more believable figure than 6028 - but what do I know? It could be anything!

BTW I adjusted my data so that version 00 is calibrated to 1500 ELO - I think that is within about 100 or 200 ELO of the truth. Komodo is doing about 3.18 ply on average. At level 02, Komodo would be about 433 ELO stronger or 1933 doing on average 5.37 ply.

What would be interesting is to run the same experiment with a different program using the same basic formula (with adjusted constants) and see if they agree!
I think that it is probably closer to 4500 than 6000 also. But you are right; it could be any number.

I started a fixed node test this morning with Gaviota. I could start over using 512 nodes as the base.

Yes, why don't we both do the same basic test and see what kind of numbers we get using the basic formula of Peter Österlund (but we can tune the constants to fit the curve.) I'm going to extend at least 2 more level beyond what I started with.

Don

hgm · Post by **hgm** » Wed May 09, 2012 10:05 am

Adam Hair wrote:For time odds, I use Winboard. If you use the latest Winboard with the built-in tournament manager, then set up the tournament with your base time control. Then, you have to modify winboard.ini (found at C:\Documents and Settings\Administrator\Application Data in Windows XP). For each engine in your tournament, add -firstTimeOdds=x, so that1/x times the base time control equals the time control that particular engine will play with. If you use PSWBTM as the tournament manager, then you have to add /%sTimeOdds=x (where x is the same as above) for each engine's parameter in the PSWBTM Engine Manager.

Note that it is also easy to do 'depth odds' in WinBoard. To limit an engine to play at a given depth D, without affecting the depth of its opponent, you can add to its install line:

-firstInitString="new\nrandom\nsd D\n"

i.e. append an sd D line to the usual "new\nrandom\n". This will then be automatically sent to the engine at the beginning of every game. The same method can be used to set a parameter in only a single engine, where WinBoard does not provide separate controls for each engine. E.g. 'core odds', where you want to fix the number of CPUs on a per-engine basis, rather than having the engine use the GUI settings for this, can be achieved by

-firstInitString="cores 4\nnew\nrandom\n"

In this case the cores 4 has to preceed the new, as the cores command is typically sent before new (to spare engines the agony of having to change number of threads during a game or search). The same you could do for 'hash-odds games', where the engines use a different hash setting; just prefix a memory M command with the desired memory size M to the init string.

Don · Post by **Don** » Wed May 09, 2012 6:26 pm

petero2 wrote:Yes, the logarithmic formula goes to infinity, so not a good approximation for large x.

Here is an exponential fit instead: Elo=1495+2996*(1-exp(-Name/12.8))

I've added 2 more levels to the same test and I'm going to get a larger sample of games. This is interesting, so even though I don't know if it's a very accurate way to estimate the highest rating it's a lot of fun to try.

I wrote a quicky program to find the 3 constants in this formula that minimizes the "least squares" error. With the new data points I get this:

Elo = 1488.20 + 2692.70 * (1 - exp( -level / 11.09 )

And the maximum achievable ELO is ... (drum roll please) is 4180! I don't have large samples at these high levels and I will continue to run the test for at least a couple more days or longer - and recheck the estimate.

It may be interesting to run the same test on a different program and see if the estimate is in the same ballpark. Preferably a program that has a solid fixed nodes testing level. This would make the "guess" more believable if they agreed.

Don

petero2 · Post by **petero2** » Wed May 09, 2012 6:52 pm

Don wrote:
petero2 wrote:Yes, the logarithmic formula goes to infinity, so not a good approximation for large x.

Here is an exponential fit instead: Elo=1495+2996*(1-exp(-Name/12.8))

I've added 2 more levels to the same test and I'm going to get a larger sample of games. This is interesting, so even though I don't know if it's a very accurate way to estimate the highest rating it's a lot of fun to try.

I wrote a quicky program to find the 3 constants in this formula that minimizes the "least squares" error. With the new data points I get this:

Elo = 1488.20 + 2692.70 * (1 - exp( -level / 11.09 )

And the maximum achievable ELO is ... (drum roll please) is 4180! I don't have large samples at these high levels and I will continue to run the test for at least a couple more days or longer - and recheck the estimate.

It may be interesting to run the same test on a different program and see if the estimate is in the same ballpark. Preferably a program that has a solid fixed nodes testing level. This would make the "guess" more believable if they agreed.

Don

My parameters were also computed by minimizing the least squares error. I happened to have an old octave implementation of the Gauss-Newton method that I used. To see how much the estimate is affected by measurement errors, I added normally distributed noise with standard deviation 10 to the rating values and computed the corresponding maximum ELO. I repeated this 100000 times and made a histogram:

Average value: 4497
Standard deviation: 116

Although, I believe an even bigger error source is the fact that the true rating curve is most likely not an exponential function, so the large extrapolation is probably unsound.

Don · Post by **Don** » Wed May 09, 2012 7:00 pm

Don wrote:
petero2 wrote:Yes, the logarithmic formula goes to infinity, so not a good approximation for large x.

Here is an exponential fit instead: Elo=1495+2996*(1-exp(-Name/12.8))

I've added 2 more levels to the same test and I'm going to get a larger sample of games. This is interesting, so even though I don't know if it's a very accurate way to estimate the highest rating it's a lot of fun to try.

I wrote a quicky program to find the 3 constants in this formula that minimizes the "least squares" error. With the new data points I get this:

Elo = 1488.20 + 2692.70 * (1 - exp( -level / 11.09 )

And the maximum achievable ELO is ... (drum roll please) is 4180! I don't have large samples at these high levels and I will continue to run the test for at least a couple more days or longer - and recheck the estimate.

It may be interesting to run the same test on a different program and see if the estimate is in the same ballpark. Preferably a program that has a solid fixed nodes testing level. This would make the "guess" more believable if they agreed.

Don

P.S. I also did a run where I used the same numbers but reduced the ELO levels of the highest 3 entries by 10, 20 and 30 ELO respectively to give a pessimistic answer. Even with this reduction we get the asymptote of 4052.

There are all sorts of problems with this type of estimate - ELO is just an approximate model and it's not even transitive - self play exaggerating the ELO is another potential issue. I would like to believe that playing hundreds of ELO down cancels the self-play effect and I think it does but I cannot quantify any of this. Style of play can effect your ELO too but I think that falls under the category of trying to measure ELO as if where transitive when it's not.

I cannot quite reconcile this data because:

1. Projected ELO for Komodo at level 12 (2 million nodes) is 3250.
2. I don't believe Komodo at 40/2 hours against a human is 3250 ELO looking at 2 million nodes.

Of course point 2 is just my opinion. At 2 million nodes Komodo is looking about 16.4 depth which for Komodo is a lot (Komodo is slow at a given depth but stronger compared to other top programs.)

Shredder is assumed to be 2800 on most tests - presumably that is an estimate of Shredders's level of play against humans at 40/2 hours. My estimate of level 00 (512 nodes) is based on the same reasoning, that Komodo looking at 512 nodes and playing against a human at 40/2 hour time control would play about 1500 ELO FIDE strength. I don't think this is off by more than 100 ELO. So the understanding is that playing against a human at 40/2 hours a perfect player would play at the 4000+ level - and obviously the human would draw occasionally or else the rating would be infinite.

Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling