Elo Increase per Doubling

JuLieN · Post by **JuLieN** » Thu May 10, 2012 7:09 pm

petero2 wrote:
Don wrote: The current asymptote is 4209.7 after more data collected.

Basically what you are saying is what we have always known and there are multiple issues here which is why I say this is just for fun. Here are just some of the issues:

1. True chess skill is not transitive.
2. Similar point - true chess skill is multi-dimensional.
3. ELO is not a perfect way to model chess skill.
4. No program can play perfect chess even with infinite resources due to GHI issues.
5. The exponential formula may not even be very appropriate.
6. The book could have some positions that are game theoretic wins.
Another issue is that there are multiple ways to implement a perfect player. Assuming chess is a draw and that it was possible to create a 32-man endgame tablebase, an engine that used that EGTB but just randomly picked an optimal move would probably draw a lot of games even against quite weak players. I imagine a game where the engine played black could start something like this:

1. e4 a6
2. d4 a5

Only when the engine risks losing it would play a "reasonable" move. If white later makes a positionally weak move, the engine will likely give up the advantage immediately. Very similar things happen if you play the weak side in a drawn KRBKR endgame against an engine with tablebases but with no "swindle" mode.

On the other hand, a perfect player that actively tries to steer the game towards positions where the opponent must play very exactly to maintain the draw would probably win almost all games even against the best human players.

This could be easily solved. For instance, by picking among the drawing moves the move that has the more winning opportunities.

For instance, let's say that you have three choices that all tend to a draw with perfect play:

- Nxe5 winning children: 10000, drawing children: 10000000
- Nc3 winning children: 7000, drawing children: 12000000
- dxe5 winning children: 150000, drawing children: 11000000

Then picking dxe5 would be the move to play.

Don · Post by **Don** » Thu May 10, 2012 7:41 pm

petero2 wrote:
Don wrote: The current asymptote is 4209.7 after more data collected.

Basically what you are saying is what we have always known and there are multiple issues here which is why I say this is just for fun. Here are just some of the issues:

1. True chess skill is not transitive.
2. Similar point - true chess skill is multi-dimensional.
3. ELO is not a perfect way to model chess skill.
4. No program can play perfect chess even with infinite resources due to GHI issues.
5. The exponential formula may not even be very appropriate.
6. The book could have some positions that are game theoretic wins.
Another issue is that there are multiple ways to implement a perfect player. Assuming chess is a draw and that it was possible to create a 32-man endgame tablebase, an engine that used that EGTB but just randomly picked an optimal move would probably draw a lot of games even against quite weak players. I imagine a game where the engine played black could start something like this:

1. e4 a6
2. d4 a5

Only when the engine risks losing it would play a "reasonable" move. If white later makes a positionally weak move, the engine will likely give up the advantage immediately. Very similar things happen if you play the weak side in a drawn KRBKR endgame against an engine with tablebases but with no "swindle" mode.

On the other hand, a perfect player that actively tries to steer the game towards positions where the opponent must play very exactly to maintain the draw would probably win almost all games even against the best human players.

Do you really think this matters that much? I think it's an issue but much more minor than we probably think it is. For example, you might get away with 1... a5 in response to 1. d4 but you won't be able to keep doing that. In chess your opportunities for gamesmanship is not that high because the priority must be on playing the best move, not trying to trick your opponent. I once heard that the average number of good moves in chess is less than 2. I don't know if that is true, but it implies that a perfect player may not have that many choices about how to proceed.

But I do think there is an issue with repetition. If a 32 man database knows the position is a draw it should always avoid repetition unless that is the only path to a draw and it cannot win. The other issues I believe are relatively minor. One is trading down too quickly, but I think there is all kinds of complexity in chess even after most of the pieces are traded off. Then there is locked up draw-ish positions. A program specialized to do that might gain an advantage against stronger players. A final issue is drawish endings such as bishop of opposite color. Of course we are assuming the position is a draw anyway, but avoid bishop of opposite color might make it easier to swindle your opponent out of a draw.

I think a 32 man database that avoids repetition is not going to appear boring or passive. Just my opinion, I don't know for sure!

I guess the issue is just the nature of ELO which is one dimensional and transitive and what we are pondering is if you can take a pool of players and pump some ELO points into the rating pool (expanding it) by taking advantage of the multi-dimensional nature of chess skill. I think the answer is probably yes. More complex games such as GO have much greater skill ranges. (If you set the average go player to 1500 ELO the top players would probably be well over 4000.) So adding gamemanship to chess is like turning it into a more complex game with more "moves."

Don · Post by **Don** » Thu May 10, 2012 7:46 pm

Don wrote:
petero2 wrote:
Don wrote: The current asymptote is 4209.7 after more data collected.

Basically what you are saying is what we have always known and there are multiple issues here which is why I say this is just for fun. Here are just some of the issues:

1. True chess skill is not transitive.
2. Similar point - true chess skill is multi-dimensional.
3. ELO is not a perfect way to model chess skill.
4. No program can play perfect chess even with infinite resources due to GHI issues.
5. The exponential formula may not even be very appropriate.
6. The book could have some positions that are game theoretic wins.
Another issue is that there are multiple ways to implement a perfect player. Assuming chess is a draw and that it was possible to create a 32-man endgame tablebase, an engine that used that EGTB but just randomly picked an optimal move would probably draw a lot of games even against quite weak players. I imagine a game where the engine played black could start something like this:

1. e4 a6
2. d4 a5

Only when the engine risks losing it would play a "reasonable" move. If white later makes a positionally weak move, the engine will likely give up the advantage immediately. Very similar things happen if you play the weak side in a drawn KRBKR endgame against an engine with tablebases but with no "swindle" mode.

On the other hand, a perfect player that actively tries to steer the game towards positions where the opponent must play very exactly to maintain the draw would probably win almost all games even against the best human players.
P.S. Some evidence contrary to my last response is that Jonathan Schaeffer reports that this is a big factor in checkers - it is difficult to win in checkers because you start already with a "drawish" position. In his book "One Jump Ahead" he says something about the importance of an opening book that takes you away from easy games.

I guess if gamesmanship helps it pushes your rating up a little and the weaker players down a little and thus the entire rating pool expands.

Adam Hair · Post by **Adam Hair** » Thu May 10, 2012 11:07 pm

Don wrote:Ok, I'm doing the study with Critter now and I have some data. For calibration purposes I had to place one of the Komodo versions in the test as an ELO reference point - but it's not used in the calculations of course. I picked the 05 komodo and fixed it's rating at 2461.6 which is what it came out as previously.

For reference the Komodo test indicated an asymptote of 4209.7 and the Critter test (currently) is showing 4174.3. I think it's remarkable how closely they agree.

At the moment, the asymtote calculated from my data is 4508.1 (Gaviota 0.85.1 and up to level 10).

I am handicapped due to using Arena as my GUI. At this rate, it will take over a week to finish the RR. Arena takes a relatively long time to load engines. If I can find the time this weekend, I will learn how to use cutechess cli.

Don · Post by **Don** » Fri May 11, 2012 12:55 am

Adam Hair wrote:
Don wrote:Ok, I'm doing the study with Critter now and I have some data. For calibration purposes I had to place one of the Komodo versions in the test as an ELO reference point - but it's not used in the calculations of course. I picked the 05 komodo and fixed it's rating at 2461.6 which is what it came out as previously.

For reference the Komodo test indicated an asymptote of 4209.7 and the Critter test (currently) is showing 4174.3. I think it's remarkable how closely they agree.
At the moment, the asymtote calculated from my data is 4508.1 (Gaviota 0.85.1 and up to level 10).

I am handicapped due to using Arena as my GUI. At this rate, it will take over a week to finish the RR. Arena takes a relatively long time to load engines. If I can find the time this weekend, I will learn how to use cutechess cli.

I'm now wondering how legit the test really is. I did a test with 3 data points (level 0,1,2) and then 4, then 5 etc. and I find the with 3 data points the asymptote is huge, and it goes down a little with each level - but it's still ridiculously high after several data points. With 7 data points it's starting to get reasonable but it's still almost 5000. If I add a data point it will drop 2 or 3 hundred ELO for sure, that is the trend.

This probably means the exponential curve is not quite the correct model as someone here has already suggested. Can you see what happens if you drop a couple of the highest levels?

Adam Hair · Post by **Adam Hair** » Fri May 11, 2012 2:46 am

Don wrote:
Adam Hair wrote:
Don wrote:Ok, I'm doing the study with Critter now and I have some data. For calibration purposes I had to place one of the Komodo versions in the test as an ELO reference point - but it's not used in the calculations of course. I picked the 05 komodo and fixed it's rating at 2461.6 which is what it came out as previously.

For reference the Komodo test indicated an asymptote of 4209.7 and the Critter test (currently) is showing 4174.3. I think it's remarkable how closely they agree.
At the moment, the asymtote calculated from my data is 4508.1 (Gaviota 0.85.1 and up to level 10).

I am handicapped due to using Arena as my GUI. At this rate, it will take over a week to finish the RR. Arena takes a relatively long time to load engines. If I can find the time this weekend, I will learn how to use cutechess cli.
I'm now wondering how legit the test really is. I did a test with 3 data points (level 0,1,2) and then 4, then 5 etc. and I find the with 3 data points the asymptote is huge, and it goes down a little with each level - but it's still ridiculously high after several data points. With 7 data points it's starting to get reasonable but it's still almost 5000. If I add a data point it will drop 2 or 3 hundred ELO for sure, that is the trend.

This probably means the exponential curve is not quite the correct model as someone here has already suggested. Can you see what happens if you drop a couple of the highest levels?

The left hand column is the highest level, and the right hand column is the computed asymtote (using Peter's formula and the appropriate data points to estimate the coefficients ):

Code: Select all

10	4508.1
9	4578.3
8	4149.4
7	3952.8
6	3701.3
5	4143.6
4   effectively infinite

My trend differs from yours

Adam Hair · Post by **Adam Hair** » Fri May 11, 2012 3:06 am

By the way, I am using 1500 Elo for Gaviota level 0 also. I tested Komodo level 0 against Gaviota levels 0, 1, and 2. The two engines are roughly the same strength at level 0 (512 nodes/position).

Daniel Shawul · Post by **Daniel Shawul** » Fri May 11, 2012 7:48 pm

What is the conclusion of this discussion? Were the previous results of +160 elo or so per doubling down to the use of fast time controls which is not appropriate? In light of the advancement in hardware since 80s, isn't a 6sec per move not enough to match the long time control used at the time the +70 elo per doubling is reported ? Maybe the number of opponents with different style of play is not enough that made the results obtained similar to what could be found in a self test. I am just speculating but there must be a reason that can be pinned down..

Don · Post by **Don** » Fri May 11, 2012 8:30 pm

Daniel Shawul wrote:What is the conclusion of this discussion? Were the previous results of +160 elo or so per doubling down to the use of fast time controls which is not appropriate? In light of the advancement in hardware since 80s, isn't a 6sec per move not enough to match the long time control used at the time the +70 elo per doubling is reported ? Maybe the number of opponents with different style of play is not enough that made the results obtained similar to what could be found in a self test. I am just speculating but there must be a reason that can be pinned down..

I'm unsure of the conclusion. The tentative conclusion is that the maximum ELO is in the low to mid 4k range but ELO is a slippery dude and it will depend a lot on what assumptions you make.

If we can figure it out then it would be fun to run the test to a significantly deeper level.

JuLieN · Post by **JuLieN** » Fri May 11, 2012 8:52 pm

Don wrote:
Daniel Shawul wrote:What is the conclusion of this discussion? Were the previous results of +160 elo or so per doubling down to the use of fast time controls which is not appropriate? In light of the advancement in hardware since 80s, isn't a 6sec per move not enough to match the long time control used at the time the +70 elo per doubling is reported ? Maybe the number of opponents with different style of play is not enough that made the results obtained similar to what could be found in a self test. I am just speculating but there must be a reason that can be pinned down..
I'm unsure of the conclusion. The tentative conclusion is that the maximum ELO is in the low to mid 4k range but ELO is a slippery dude and it will depend a lot on what assumptions you make.

If we can figure it out then it would be fun to run the test to a significantly deeper level.

Another, more practical question is: this is the maximum Elo at +Oo plies, so which depth would be necessary to be at, say, 50 Elo points from this theoretical maximum?

2nd question: if the computers' speed keeps doubling every 18 months, when will we reach this point of nearly perfect play?

Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling

Re: Elo Increase per Doubling