I'm using my own non-graphical java based automated tester. The time control is part of the identity of the player, so I can even mix types of time controls in the same test.Laskos wrote:I mean, what interface are you using to assign assymmetrical time controls? LittleBlitzer can't do this. Or, how Don assigns fixed, but different nodes and depths? I managed to get fixed, different depths only in Fritz and Aquarium.Adam Hair wrote:
For the short time control, the base time was 6 seconds + 0.1 seconds per move.
Kai
Elo Increase per Doubling
Moderators: hgm, Rebel, chrisw
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Elo Increase per Doubling
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: Elo Increase per Doubling
For asymmetric time controls, I use Winboard.Laskos wrote:I mean, what interface are you using to assign assymmetrical time controls? LittleBlitzer can't do this. Or, how Don assigns fixed, but different nodes and depths? I managed to get fixed, different depths only in Fritz and Aquarium.Adam Hair wrote:
For the short time control, the base time was 6 seconds + 0.1 seconds per move.
Kai
For fix nodes and depths, I suppose I could use Winboard. However, I have been using Arena and InBetween.
-
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: Elo Increase per Doubling
I think that it is probably closer to 4500 than 6000 also. But you are right; it could be any number.Don wrote:I find 4500 to be a much more believable figure than 6028 - but what do I know? It could be anything!Adam Hair wrote:Don wrote:It seems to be a really good fit, but I extrapolated it to level 1000 and it shows an ELO of over 12000!petero2 wrote:After some trial and error: Elo=1500+2350*ln(1+Name/10)Don wrote:Is that a joke? I wish I knew where it all endedJuLieN wrote:@Don
What is the value of the asymptote to your red curve? (This is of course the maximal Elo in your system).
I think if we gathered this data for several more levels and got enough data for each point we could try to estimate the value using some type of curve fitting.
If someone wants to take a crack at estimating this with curve fitting, here is the final result:
Code: Select all
Rank Name Elo + - games score oppo. draws 1 10 3127.9 25.4 25.4 2739 94.2% 2328.9 10.4% 2 09 2997.0 21.6 21.6 2739 86.8% 2341.9 15.2% 3 08 2891.6 20.4 20.4 2740 79.9% 2352.8 15.5% 4 07 2752.0 19.9 19.9 2740 70.2% 2366.7 15.8% 5 06 2615.3 20.2 20.2 2740 60.8% 2380.4 14.5% 6 05 2467.2 20.9 20.9 2740 51.2% 2395.2 13.0% 7 04 2307.4 22.1 22.1 2740 41.4% 2411.2 10.5% 8 03 2118.3 23.5 23.5 2740 30.6% 2430.1 8.6% 9 02 1932.7 25.3 25.3 2740 20.9% 2448.6 7.8% 10 01 1709.7 28.6 28.6 2740 10.9% 2470.9 5.3% 11 00 1500.0 36.9 36.9 2740 3.2% 2491.9 3.1%
.
If I go to 100000 it comes out to almost 18,000 ELO!
I don't believe perfect play goes that high. If I'm not mistaken you can achieve an arbitrarily high ELO with this formula - there is no asymptote so I guess it's just a good approximation at these low levels.
Do you have any other models that might impose some sort of limit?
Elo = 6028.21*e^(-12.48/(Name+8.95))
This equation fits your data well and is asymtopic to Elo = 6028.21.
Of course, 6028.21 Elo is only relevant to the Elo values assigned in your data. But, your engine, limited to only 2^18 nodes per move, seems to be ~2900 Elo away from perfection. If you let it search 2^1,000,000 nodes per move, Komodo will play virtually perfect chess (according to this model, its Elo would be 6028.13). 2^10,000 nodes/move would be within 8 Elo of perfection.
BTW I adjusted my data so that version 00 is calibrated to 1500 ELO - I think that is within about 100 or 200 ELO of the truth. Komodo is doing about 3.18 ply on average. At level 02, Komodo would be about 433 ELO stronger or 1933 doing on average 5.37 ply.
What would be interesting is to run the same experiment with a different program using the same basic formula (with adjusted constants) and see if they agree!
I started a fixed node test this morning with Gaviota. I could start over using 512 nodes as the base.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Elo Increase per Doubling
If I understood correctly, you are using Arena with InBetween, I don't see another way with Arena. Could you post the InBetween.ini files for Arena with the cases: double number of nodes, 1 ply more and double the time? Also, maybe you will show what command-line calls one has to make (for example, with Houdini under Arena).Adam Hair wrote:For asymmetric time controls, I use Winboard.Laskos wrote:I mean, what interface are you using to assign assymmetrical time controls? LittleBlitzer can't do this. Or, how Don assigns fixed, but different nodes and depths? I managed to get fixed, different depths only in Fritz and Aquarium.Adam Hair wrote:
For the short time control, the base time was 6 seconds + 0.1 seconds per move.
Kai
For fix nodes and depths, I suppose I could use Winboard. However, I have been using Arena and InBetween.
Thanks,
Kai
-
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: Elo Increase per Doubling
I will assume you know how to set up fixed depth and fixed nodes tournaments in Arena. For ply testing, set up a folder for each number of plies you plan to test. Copy the engine you are using into each folder along with InBetween.exe and InBetween.ini. Change the name of InBetween.exe to represent the engine and number of plies and then install it in Arena as if it were an engine. Modify the following line in InBeTween.ini:Laskos wrote:If I understood correctly, you are using Arena with InBetween, I don't see another way with Arena. Could you post the InBetween.ini files for Arena with the cases: double number of nodes, 1 ply more and double the time? Also, maybe you will show what command-line calls one has to make (for example, with Houdini under Arena).Adam Hair wrote:For asymmetric time controls, I use Winboard.Laskos wrote:I mean, what interface are you using to assign assymmetrical time controls? LittleBlitzer can't do this. Or, how Don assigns fixed, but different nodes and depths? I managed to get fixed, different depths only in Fritz and Aquarium.Adam Hair wrote:
For the short time control, the base time was 6 seconds + 0.1 seconds per move.
Kai
For fix nodes and depths, I suppose I could use Winboard. However, I have been using Arena and InBetween.
Thanks,
Kai
; clientword := serverword
to
sd x := sd y (Winboard engine)
or
go depth x := go depth y (UCI engine)
where x equals the fixed depth Arena is set to send and y equals the depth you want that copy of the engine to play. It is similar for nodes testing.
For time odds, I use Winboard. If you use the latest Winboard with the built-in tournament manager, then set up the tournament with your base time control. Then, you have to modify winboard.ini (found at C:\Documents and Settings\Administrator\Application Data in Windows XP). For each engine in your tournament, add -firstTimeOdds=x, so that1/x times the base time control equals the time control that particular engine will play with. If you use PSWBTM as the tournament manager, then you have to add /%sTimeOdds=x (where x is the same as above) for each engine's parameter in the PSWBTM Engine Manager.
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Elo Increase per Doubling
Yes, why don't we both do the same basic test and see what kind of numbers we get using the basic formula of Peter Österlund (but we can tune the constants to fit the curve.) I'm going to extend at least 2 more level beyond what I started with.Adam Hair wrote:I think that it is probably closer to 4500 than 6000 also. But you are right; it could be any number.Don wrote:I find 4500 to be a much more believable figure than 6028 - but what do I know? It could be anything!Adam Hair wrote:Don wrote:It seems to be a really good fit, but I extrapolated it to level 1000 and it shows an ELO of over 12000!petero2 wrote:After some trial and error: Elo=1500+2350*ln(1+Name/10)Don wrote:Is that a joke? I wish I knew where it all endedJuLieN wrote:@Don
What is the value of the asymptote to your red curve? (This is of course the maximal Elo in your system).
I think if we gathered this data for several more levels and got enough data for each point we could try to estimate the value using some type of curve fitting.
If someone wants to take a crack at estimating this with curve fitting, here is the final result:
Code: Select all
Rank Name Elo + - games score oppo. draws 1 10 3127.9 25.4 25.4 2739 94.2% 2328.9 10.4% 2 09 2997.0 21.6 21.6 2739 86.8% 2341.9 15.2% 3 08 2891.6 20.4 20.4 2740 79.9% 2352.8 15.5% 4 07 2752.0 19.9 19.9 2740 70.2% 2366.7 15.8% 5 06 2615.3 20.2 20.2 2740 60.8% 2380.4 14.5% 6 05 2467.2 20.9 20.9 2740 51.2% 2395.2 13.0% 7 04 2307.4 22.1 22.1 2740 41.4% 2411.2 10.5% 8 03 2118.3 23.5 23.5 2740 30.6% 2430.1 8.6% 9 02 1932.7 25.3 25.3 2740 20.9% 2448.6 7.8% 10 01 1709.7 28.6 28.6 2740 10.9% 2470.9 5.3% 11 00 1500.0 36.9 36.9 2740 3.2% 2491.9 3.1%
.
If I go to 100000 it comes out to almost 18,000 ELO!
I don't believe perfect play goes that high. If I'm not mistaken you can achieve an arbitrarily high ELO with this formula - there is no asymptote so I guess it's just a good approximation at these low levels.
Do you have any other models that might impose some sort of limit?
Elo = 6028.21*e^(-12.48/(Name+8.95))
This equation fits your data well and is asymtopic to Elo = 6028.21.
Of course, 6028.21 Elo is only relevant to the Elo values assigned in your data. But, your engine, limited to only 2^18 nodes per move, seems to be ~2900 Elo away from perfection. If you let it search 2^1,000,000 nodes per move, Komodo will play virtually perfect chess (according to this model, its Elo would be 6028.13). 2^10,000 nodes/move would be within 8 Elo of perfection.
BTW I adjusted my data so that version 00 is calibrated to 1500 ELO - I think that is within about 100 or 200 ELO of the truth. Komodo is doing about 3.18 ply on average. At level 02, Komodo would be about 433 ELO stronger or 1933 doing on average 5.37 ply.
What would be interesting is to run the same experiment with a different program using the same basic formula (with adjusted constants) and see if they agree!
I started a fixed node test this morning with Gaviota. I could start over using 512 nodes as the base.
Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
- Posts: 27837
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Elo Increase per Doubling
Note that it is also easy to do 'depth odds' in WinBoard. To limit an engine to play at a given depth D, without affecting the depth of its opponent, you can add to its install line:Adam Hair wrote:For time odds, I use Winboard. If you use the latest Winboard with the built-in tournament manager, then set up the tournament with your base time control. Then, you have to modify winboard.ini (found at C:\Documents and Settings\Administrator\Application Data in Windows XP). For each engine in your tournament, add -firstTimeOdds=x, so that1/x times the base time control equals the time control that particular engine will play with. If you use PSWBTM as the tournament manager, then you have to add /%sTimeOdds=x (where x is the same as above) for each engine's parameter in the PSWBTM Engine Manager.
-firstInitString="new\nrandom\nsd D\n"
i.e. append an sd D line to the usual "new\nrandom\n". This will then be automatically sent to the engine at the beginning of every game. The same method can be used to set a parameter in only a single engine, where WinBoard does not provide separate controls for each engine. E.g. 'core odds', where you want to fix the number of CPUs on a per-engine basis, rather than having the engine use the GUI settings for this, can be achieved by
-firstInitString="cores 4\nnew\nrandom\n"
In this case the cores 4 has to preceed the new, as the cores command is typically sent before new (to spare engines the agony of having to change number of threads during a game or search). The same you could do for 'hash-odds games', where the engines use a different hash setting; just prefix a memory M command with the desired memory size M to the init string.
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Elo Increase per Doubling
petero2 wrote:Yes, the logarithmic formula goes to infinity, so not a good approximation for large x.
Here is an exponential fit instead: Elo=1495+2996*(1-exp(-Name/12.8))
I've added 2 more levels to the same test and I'm going to get a larger sample of games. This is interesting, so even though I don't know if it's a very accurate way to estimate the highest rating it's a lot of fun to try.
I wrote a quicky program to find the 3 constants in this formula that minimizes the "least squares" error. With the new data points I get this:
Elo = 1488.20 + 2692.70 * (1 - exp( -level / 11.09 )
And the maximum achievable ELO is ... (drum roll please) is 4180! I don't have large samples at these high levels and I will continue to run the test for at least a couple more days or longer - and recheck the estimate.
It may be interesting to run the same test on a different program and see if the estimate is in the same ballpark. Preferably a program that has a solid fixed nodes testing level. This would make the "guess" more believable if they agreed.
Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
- Posts: 694
- Joined: Mon Apr 19, 2010 7:07 pm
- Location: Sweden
- Full name: Peter Osterlund
Re: Elo Increase per Doubling
My parameters were also computed by minimizing the least squares error. I happened to have an old octave implementation of the Gauss-Newton method that I used. To see how much the estimate is affected by measurement errors, I added normally distributed noise with standard deviation 10 to the rating values and computed the corresponding maximum ELO. I repeated this 100000 times and made a histogram:Don wrote:petero2 wrote:Yes, the logarithmic formula goes to infinity, so not a good approximation for large x.
Here is an exponential fit instead: Elo=1495+2996*(1-exp(-Name/12.8))
I've added 2 more levels to the same test and I'm going to get a larger sample of games. This is interesting, so even though I don't know if it's a very accurate way to estimate the highest rating it's a lot of fun to try.
I wrote a quicky program to find the 3 constants in this formula that minimizes the "least squares" error. With the new data points I get this:
Elo = 1488.20 + 2692.70 * (1 - exp( -level / 11.09 )
And the maximum achievable ELO is ... (drum roll please) is 4180! I don't have large samples at these high levels and I will continue to run the test for at least a couple more days or longer - and recheck the estimate.
It may be interesting to run the same test on a different program and see if the estimate is in the same ballpark. Preferably a program that has a solid fixed nodes testing level. This would make the "guess" more believable if they agreed.
Don
Average value: 4497
Standard deviation: 116
Although, I believe an even bigger error source is the fact that the true rating curve is most likely not an exponential function, so the large extrapolation is probably unsound.
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Elo Increase per Doubling
P.S. I also did a run where I used the same numbers but reduced the ELO levels of the highest 3 entries by 10, 20 and 30 ELO respectively to give a pessimistic answer. Even with this reduction we get the asymptote of 4052.Don wrote:petero2 wrote:Yes, the logarithmic formula goes to infinity, so not a good approximation for large x.
Here is an exponential fit instead: Elo=1495+2996*(1-exp(-Name/12.8))
I've added 2 more levels to the same test and I'm going to get a larger sample of games. This is interesting, so even though I don't know if it's a very accurate way to estimate the highest rating it's a lot of fun to try.
I wrote a quicky program to find the 3 constants in this formula that minimizes the "least squares" error. With the new data points I get this:
Elo = 1488.20 + 2692.70 * (1 - exp( -level / 11.09 )
And the maximum achievable ELO is ... (drum roll please) is 4180! I don't have large samples at these high levels and I will continue to run the test for at least a couple more days or longer - and recheck the estimate.
It may be interesting to run the same test on a different program and see if the estimate is in the same ballpark. Preferably a program that has a solid fixed nodes testing level. This would make the "guess" more believable if they agreed.
Don
There are all sorts of problems with this type of estimate - ELO is just an approximate model and it's not even transitive - self play exaggerating the ELO is another potential issue. I would like to believe that playing hundreds of ELO down cancels the self-play effect and I think it does but I cannot quantify any of this. Style of play can effect your ELO too but I think that falls under the category of trying to measure ELO as if where transitive when it's not.
I cannot quite reconcile this data because:
1. Projected ELO for Komodo at level 12 (2 million nodes) is 3250.
2. I don't believe Komodo at 40/2 hours against a human is 3250 ELO looking at 2 million nodes.
Of course point 2 is just my opinion. At 2 million nodes Komodo is looking about 16.4 depth which for Komodo is a lot (Komodo is slow at a given depth but stronger compared to other top programs.)
Shredder is assumed to be 2800 on most tests - presumably that is an estimate of Shredders's level of play against humans at 40/2 hours. My estimate of level 00 (512 nodes) is based on the same reasoning, that Komodo looking at 512 nodes and playing against a human at 40/2 hour time control would play about 1500 ELO FIDE strength. I don't think this is off by more than 100 ELO. So the understanding is that playing against a human at 40/2 hours a perfect player would play at the 4000+ level - and obviously the human would draw occasionally or else the rating would be infinite.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.