Something Hikaru Said

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

duncan
Posts: 12038
Joined: Mon Jul 07, 2008 10:50 pm

Re: Something Hikaru Said

Post by duncan »

bob wrote:
700-1500 Elo is certainly meaningless. Where does that come from? 20 years ago the assumption was that 2800 was the upper bound on Elo. That seems to have bitten the big banana. The only thing that bounds Elo is that the best player will be hard-pressed to get more than 800 above the second-best player. But then the second best can get to 800 below the 3rd.




what is the fallacy to kai's argument that max elo is 4877

http://www.talkchess.com/forum/viewtopi ... ew=threads


The gain from doubling the nodes I fitted with a/(b*x^c + 1), where x is the number of doublings, getting the correlation 0.99
The plot is here:


The 40/4', 40/40' and 40/120' CCRL and CEGT levels are shown, and the resulting gain from doubling in this extrapolation is ~70 points at 40/4', ~55 points at 40/40' and ~45 points at 40/120'. The limiting value I get by summing up to infinity over all doublings (infinite time control), and is 1707 points above the Houdini 3 40/40' CCRL level. So, I get 4877 Elo points on CCRL the rating of the perfect engine, similar to what I remember Don got some time ago.

The draw ratio I fitted with a shifted logistic, getting the correlation 0.999. In self play we can expect a very high percentage of draws going to very long time controls.
The plot is here:


The hardest to quantify to me was the win/loss ratio, which I somehow assumed to be constant at longer TC. It seems not to be the case, win/loss ratio seems to decrease with time control (or nodes). I fitted it with 1 + 1/(a*x + b), getting a correlation 0.96.
The plot is here:
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Something Hikaru Said

Post by Laskos »

bob wrote:
700-1500 Elo is certainly meaningless. Where does that come from? 20 years ago the assumption was that 2800 was the upper bound on Elo. That seems to have bitten the big banana. The only thing that bounds Elo is that the best player will be hard-pressed to get more than 800 above the second-best player. But then the second best can get to 800 below the 3rd.
Well, I will show how that is "meaningless". Some 2-3 years ago I computed the estimate of the rating of the perfect engine as limiting ELO of Houdini 3 to infinite number of nodes (time control). Duncan refers to that post of mine. Today I repeated with thousands of games this estimate with Komodo 9.3 to infinite time control (doubling in time). I guess you consider ELO gain per doubling as "close to 70 ELO points", which is totally misleading, and the diminishing returns are thought by you as "hard to measure".

My today's tests:
First doubling: 10s+0.1s vs 5s+0.05s -- 148 ELO points
Second doubling: 20s + 0.2s vs 10s+0.1s -- 128 ELO points
Third doubling: 40s+0.4s vs 20s+0.2s -- 110 ELO points

Other data:
Close to eighth doubling -- CEGT 40/20' -- 65 ELO points
Close to ninth doubling -- CCRL 40/40' -- 55 ELO points

Diminishing returns are very visible here. I fitted these results with the relevant curve. Here is the plot of ELO gain per doubling in time:
Image

The red dots are not data, they are predictions, first for Larry's 45'+15'' on 24 cores level against humans in odds matches, for which Larry got as Komodo performing at 3250 FIDE ELO level, second red dot shows TCEC level of 150'+30'' on 24 faster cores. One sees that at those high levels the gain per doubling is below 40 ELO points.

The fit: the fitting curve chosen is relevant as I show. It is a/(1+b*(number of doublings)^c). {a,b,c} are parameters to fit, and {c} is the relevant exponent here. If {c} is between 0 and 1, then we do have diminishing returns, but the total ELO is unbounded for higher and higher number of doublings, therefore the rating of perfect engine is undetermined and high. If c>1, then the perfect engine has a definite limited rating which can be derived.

It turns out that the best fit is c=1.56, which is significantly larger than 1, and the ELO of the perfect engine can be computed and is not very high. I actually derive this, it is not assumed.

Do you have a more relevant fitting expression? I assume only that diminishing gains are going to 0 gain per doubling to infinite number of doublings, and this is related to the draw ratio going to 100% to infinite time control (doublings). Do you have a more plausible model which shows that the draw ratio has in fact lower limit than 100%?

Having the fit, I can compute the ELO of the perfect engine by summing up all gains from doublings starting from established by Larry FIDE ELO 3250 (close to 13th doubling in time). It is ~1300 ELO points above Larry's 24 core Komodo. Therefore, Komodo 9.3 shows a FIDE ELO of the perfect engines at about 3250+1300 = 4550 ELO points. CCRL rating would be 100-200 ELO points higher, as their rating is computer rating. And very close to 4800 CCRL I got 2-3 years ago with Houdini 3.

As you see, 700-1500 ELO points estimate for improvement over Komodo I gave previously is not meaningless at all, my model here is simple, robust, and consistent with earlier results (including one or two by Don Dailey). If you want to dismiss it, it's surely not by "we don't know" mantra, because it seems it is mostly you who "doesn't know".
User avatar
mhull
Posts: 13447
Joined: Wed Mar 08, 2006 9:02 pm
Location: Dallas, Texas
Full name: Matthew Hull

Re: Something Hikaru Said

Post by mhull »

Laskos wrote:
bob wrote:
700-1500 Elo is certainly meaningless. Where does that come from? 20 years ago the assumption was that 2800 was the upper bound on Elo. That seems to have bitten the big banana. The only thing that bounds Elo is that the best player will be hard-pressed to get more than 800 above the second-best player. But then the second best can get to 800 below the 3rd.
Well, I will show how that is "meaningless". Some 2-3 years ago I computed the estimate of the rating of the perfect engine as limiting ELO of Houdini 3 to infinite number of nodes (time control). Duncan refers to that post of mine. Today I repeated with thousands of games this estimate with Komodo 9.3 to infinite time control (doubling in time). I guess you consider ELO gain per doubling as "close to 70 ELO points", which is totally misleading, and the diminishing returns are thought by you as "hard to measure".

My today's tests:
First doubling: 10s+0.1s vs 5s+0.05s -- 148 ELO points
Second doubling: 20s + 0.2s vs 10s+0.1s -- 128 ELO points
Third doubling: 40s+0.4s vs 20s+0.2s -- 110 ELO points

Other data:
Close to eighth doubling -- CEGT 40/20' -- 65 ELO points
Close to ninth doubling -- CCRL 40/40' -- 55 ELO points

Diminishing returns are very visible here. I fitted these results with the relevant curve. Here is the plot of ELO gain per doubling in time:
Image

The red dots are not data, they are predictions, first for Larry's 45'+15'' on 24 cores level against humans in odds matches, for which Larry got as Komodo performing at 3250 FIDE ELO level, second red dot shows TCEC level of 150'+30'' on 24 faster cores. One sees that at those high levels the gain per doubling is below 40 ELO points.

The fit: the fitting curve chosen is relevant as I show. It is a/(1+b*(number of doublings)^c). {a,b,c} are parameters to fit, and {c} is the relevant exponent here. If {c} is between 0 and 1, then we do have diminishing returns, but the total ELO is unbounded for higher and higher number of doublings, therefore the rating of perfect engine is undetermined and high. If c>1, then the perfect engine has a definite limited rating which can be derived.

It turns out that the best fit is c=1.56, which is significantly larger than 1, and the ELO of the perfect engine can be computed and is not very high. I actually derive this, it is not assumed.

Do you have a more relevant fitting expression? I assume only that diminishing gains are going to 0 gain per doubling to infinite number of doublings, and this is related to the draw ratio going to 100% to infinite time control (doublings). Do you have a more plausible model which shows that the draw ratio has in fact lower limit than 100%?

Having the fit, I can compute the ELO of the perfect engine by summing up all gains from doublings starting from established by Larry FIDE ELO 3250 (close to 13th doubling in time). It is ~1300 ELO points above Larry's 24 core Komodo. Therefore, Komodo 9.3 shows a FIDE ELO of the perfect engines at about 3250+1300 = 4550 ELO points. CCRL rating would be 100-200 ELO points higher, as their rating is computer rating. And very close to 4800 CCRL I got 2-3 years ago with Houdini 3.

As you see, 700-1500 ELO points estimate for improvement over Komodo I gave previously is not meaningless at all, my model here is simple, robust, and consistent with earlier results (including one or two by Don Dailey). If you want to dismiss it, it's surely not by "we don't know" mantra, because it seems it is mostly you who "doesn't know".
I wonder if there is a variable unaccounted for in this analysis (correct me if I'm wrong) which I would call "hardware limit optimization" or "limit optimization effect".

For instance, in the 10s versus 5s, consider that the depths achieved on current hardware at this limit were in time past, achieved in 2 minutes (however long ago that was). The software of that day (search and evaluation) was tuned and optimized for that depth which was the outer reaches of what was possible.

If current software were constrained to hardware that could only reach those depths (5s on current hardware) in 2 minutes, its search and evaluation would likely be tuned and adjusted differently to maximize its performance to that constraint (not to 10 times that practical limit).

So your curve function is fitting software results to limits for which the software has not been optimized except where it intersects a narrow range of currently optimized championship time controls.

So how can we accurately predict how future doubling in computing capacity will effect ELO if limit optimization effects are unknown? And how is this variable factored-in to measurements of historical progress? If it's not factored-in, what could be the effect on our analysis?
Matthew Hull
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Something Hikaru Said

Post by bob »

mhull wrote:
Laskos wrote:
bob wrote:
700-1500 Elo is certainly meaningless. Where does that come from? 20 years ago the assumption was that 2800 was the upper bound on Elo. That seems to have bitten the big banana. The only thing that bounds Elo is that the best player will be hard-pressed to get more than 800 above the second-best player. But then the second best can get to 800 below the 3rd.
Well, I will show how that is "meaningless". Some 2-3 years ago I computed the estimate of the rating of the perfect engine as limiting ELO of Houdini 3 to infinite number of nodes (time control). Duncan refers to that post of mine. Today I repeated with thousands of games this estimate with Komodo 9.3 to infinite time control (doubling in time). I guess you consider ELO gain per doubling as "close to 70 ELO points", which is totally misleading, and the diminishing returns are thought by you as "hard to measure".

My today's tests:
First doubling: 10s+0.1s vs 5s+0.05s -- 148 ELO points
Second doubling: 20s + 0.2s vs 10s+0.1s -- 128 ELO points
Third doubling: 40s+0.4s vs 20s+0.2s -- 110 ELO points

Other data:
Close to eighth doubling -- CEGT 40/20' -- 65 ELO points
Close to ninth doubling -- CCRL 40/40' -- 55 ELO points

Diminishing returns are very visible here. I fitted these results with the relevant curve. Here is the plot of ELO gain per doubling in time:
Image

The red dots are not data, they are predictions, first for Larry's 45'+15'' on 24 cores level against humans in odds matches, for which Larry got as Komodo performing at 3250 FIDE ELO level, second red dot shows TCEC level of 150'+30'' on 24 faster cores. One sees that at those high levels the gain per doubling is below 40 ELO points.

The fit: the fitting curve chosen is relevant as I show. It is a/(1+b*(number of doublings)^c). {a,b,c} are parameters to fit, and {c} is the relevant exponent here. If {c} is between 0 and 1, then we do have diminishing returns, but the total ELO is unbounded for higher and higher number of doublings, therefore the rating of perfect engine is undetermined and high. If c>1, then the perfect engine has a definite limited rating which can be derived.

It turns out that the best fit is c=1.56, which is significantly larger than 1, and the ELO of the perfect engine can be computed and is not very high. I actually derive this, it is not assumed.

Do you have a more relevant fitting expression? I assume only that diminishing gains are going to 0 gain per doubling to infinite number of doublings, and this is related to the draw ratio going to 100% to infinite time control (doublings). Do you have a more plausible model which shows that the draw ratio has in fact lower limit than 100%?

Having the fit, I can compute the ELO of the perfect engine by summing up all gains from doublings starting from established by Larry FIDE ELO 3250 (close to 13th doubling in time). It is ~1300 ELO points above Larry's 24 core Komodo. Therefore, Komodo 9.3 shows a FIDE ELO of the perfect engines at about 3250+1300 = 4550 ELO points. CCRL rating would be 100-200 ELO points higher, as their rating is computer rating. And very close to 4800 CCRL I got 2-3 years ago with Houdini 3.

As you see, 700-1500 ELO points estimate for improvement over Komodo I gave previously is not meaningless at all, my model here is simple, robust, and consistent with earlier results (including one or two by Don Dailey). If you want to dismiss it, it's surely not by "we don't know" mantra, because it seems it is mostly you who "doesn't know".
I wonder if there is a variable unaccounted for in this analysis (correct me if I'm wrong) which I would call "hardware limit optimization" or "limit optimization effect".

For instance, in the 10s versus 5s, consider that the depths achieved on current hardware at this limit were in time past, achieved in 2 minutes (however long ago that was). The software of that day (search and evaluation) was tuned and optimized for that depth which was the outer reaches of what was possible.

If current software were constrained to hardware that could only reach those depths (5s on current hardware) in 2 minutes, its search and evaluation would likely be tuned and adjusted differently to maximize its performance to that constraint (not to 10 times that practical limit).

So your curve function is fitting software results to limits for which the software has not been optimized except where it intersects a narrow range of currently optimized championship time controls.

So how can we accurately predict how future doubling in computing capacity will effect ELO if limit optimization effects are unknown? And how is this variable factored-in to measurements of historical progress? If it's not factored-in, what could be the effect on our analysis?
I've tried to make this point multiple times. But when you have a perfect player (32 piece EGTBs) things will be even more different than today. Every 10 years someone has extrapolated what the computer Elo will be in ten years. And every 10 years they have been wrong. Nobody was projecting 3300 Elo 20 years ago, given past data. It only happens with todays programs which are far different from those 20 years ago and beyond.

Curve fitting as done here is an attempt at predicting future performance based on passed performance. Look at predictions prior to LMR vs post-LMR. Prior to null-move vs post-null-move. Prior to parallel search to post-parallel-search. There will always be steady improvement, and there will be discontinuous jumps along the way when something new and different is discovered/tried. Trying to project the Elo of the "ultimate computer" based on today's numbers is senseless. Today's numbers suggest there will NEVER be an ultimate computer. Which I happen to agree with.

But this thread was about a hypothetical case, and extended arguments about something that will almost certainly never happen is a waste of time...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Something Hikaru Said

Post by bob »

Laskos wrote:
bob wrote:
700-1500 Elo is certainly meaningless. Where does that come from? 20 years ago the assumption was that 2800 was the upper bound on Elo. That seems to have bitten the big banana. The only thing that bounds Elo is that the best player will be hard-pressed to get more than 800 above the second-best player. But then the second best can get to 800 below the 3rd.
Well, I will show how that is "meaningless". Some 2-3 years ago I computed the estimate of the rating of the perfect engine as limiting ELO of Houdini 3 to infinite number of nodes (time control). Duncan refers to that post of mine. Today I repeated with thousands of games this estimate with Komodo 9.3 to infinite time control (doubling in time). I guess you consider ELO gain per doubling as "close to 70 ELO points", which is totally misleading, and the diminishing returns are thought by you as "hard to measure".

My today's tests:
First doubling: 10s+0.1s vs 5s+0.05s -- 148 ELO points
Second doubling: 20s + 0.2s vs 10s+0.1s -- 128 ELO points
Third doubling: 40s+0.4s vs 20s+0.2s -- 110 ELO points

Other data:
Close to eighth doubling -- CEGT 40/20' -- 65 ELO points
Close to ninth doubling -- CCRL 40/40' -- 55 ELO points

Diminishing returns are very visible here. I fitted these results with the relevant curve. Here is the plot of ELO gain per doubling in time:
Image

The red dots are not data, they are predictions, first for Larry's 45'+15'' on 24 cores level against humans in odds matches, for which Larry got as Komodo performing at 3250 FIDE ELO level, second red dot shows TCEC level of 150'+30'' on 24 faster cores. One sees that at those high levels the gain per doubling is below 40 ELO points.

The fit: the fitting curve chosen is relevant as I show. It is a/(1+b*(number of doublings)^c). {a,b,c} are parameters to fit, and {c} is the relevant exponent here. If {c} is between 0 and 1, then we do have diminishing returns, but the total ELO is unbounded for higher and higher number of doublings, therefore the rating of perfect engine is undetermined and high. If c>1, then the perfect engine has a definite limited rating which can be derived.

It turns out that the best fit is c=1.56, which is significantly larger than 1, and the ELO of the perfect engine can be computed and is not very high. I actually derive this, it is not assumed.

Do you have a more relevant fitting expression? I assume only that diminishing gains are going to 0 gain per doubling to infinite number of doublings, and this is related to the draw ratio going to 100% to infinite time control (doublings). Do you have a more plausible model which shows that the draw ratio has in fact lower limit than 100%?

Having the fit, I can compute the ELO of the perfect engine by summing up all gains from doublings starting from established by Larry FIDE ELO 3250 (close to 13th doubling in time). It is ~1300 ELO points above Larry's 24 core Komodo. Therefore, Komodo 9.3 shows a FIDE ELO of the perfect engines at about 3250+1300 = 4550 ELO points. CCRL rating would be 100-200 ELO points higher, as their rating is computer rating. And very close to 4800 CCRL I got 2-3 years ago with Houdini 3.

As you see, 700-1500 ELO points estimate for improvement over Komodo I gave previously is not meaningless at all, my model here is simple, robust, and consistent with earlier results (including one or two by Don Dailey). If you want to dismiss it, it's surely not by "we don't know" mantra, because it seems it is mostly you who "doesn't know".
So you presume that the ONLY thing that will advance computer chess is hardware speed (doublings)? You ought to try doing a curve fit prior to null-move. And then see what happens when null-move became the norm (a discontinuous jump). Ditto for other revolutionary ideas (LMR, forward-pruning, iterative deepening, etc.) I would not try to predict 20 years into the future using past data. Because it is impossible to predict what will happen next. At one point we were all doing exhaustive searching ala' chess 4.x and the predictions were being made. Then along came null-move and those predictions were crushed. And then everyone made steady progress until LMR came along. And the predictions were again crushed. And then forward pruning (futility pruning. LMP, etc.) and once again the predictions fell apart. Who knows what kind of software advances will occur along the way to the ultimate chess program? About the only valid prediction is that all current predictions will fail badly. We are NOT "near perfection" today. We've not even scratched the surface to approach anything even close to perfection. And that shows just how imperfect humans are since what is being done today is still good enough to beat humans, even though most anyone can point out flaws in a chess engine's search or evaluation. Except for a human playing OTB.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Something Hikaru Said

Post by bob »

Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:

None of this seems reasonable to me. The percentage of GM errors, for example. A GM makes far FEWER errors against a weak program than against a strong program, but not because he actually makes fewer errors, but because the opponent doesn't notice them and doesn't punish them.
I disagree here.
If the opponent does not play well it is easier not to make mistakes.

I clearly have games against humans when I did no significant mistakes based on computer analysis(no move reduce the evaluation by more than 0.2 pawn).

It is not because I am so strong but because it is easier not to make mistakes when the opponent does not play well.
If the opponent play well I expect myself to do more mistakes.

Uri
"no significant mistakes based on computer analysis" is meaningless when we are talking about PERFECT computer play. ANY mistake will be significant there. This extrapolation about what happens today is meaningless when we talk about a perfect chess opponent.
It is not meaningless because the computer is clearly stronger than me and find many mistakes.

If the computer find that I play more mistakes in games that I play against stronger opponents it means that it is easier to make mistakes against stronger opponents.

I do not have the 32 piece tablebases but my speculation is that there are many games when the winner did not do a mistake that change the theoretical result and also there are draws with no mistakes.
Of course the side who did no mistake could do mistakes in case of playing against a stronger opponent.
It is not easier to make mistakes against stronger opponents. It is simply more likely that they will understand your mistake and how they can exploit it. Your propensity for making mistakes is independent of the opponent, your brain cells don't suddenly change when you play a 2500 player vs a 1500 player.

As far as the "many games with no discernible mistakes" there are only "many games" because there are millions and millions of games that have been played. If your opponent spots the mistake and beats you only 1% of the time, that does NOT mean you played 99% of your games with no mistakes.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Something Hikaru Said

Post by bob »

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:None of this seems reasonable to me. The percentage of GM errors, for example. A GM makes far FEWER errors against a weak program than against a strong program, but not because he actually makes fewer errors, but because the opponent doesn't notice them and doesn't punish them.
And my calculation takes that fully into account.
How, when you have no idea how many mistakes a GM makes if the opponent is perfect and sees every one?
I have explained that. Kai understood it and you might as well if you take some time.
No you have waved your hands and claimed how it works. That is NOT a "proof" of any kind at all. A 1500 player will claim a 2200 player plays perfectly. There's a reason for that. And it has NOTHING to do with "perfection".
Uri Blass
Posts: 10895
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Something Hikaru Said

Post by Uri Blass »

bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:

None of this seems reasonable to me. The percentage of GM errors, for example. A GM makes far FEWER errors against a weak program than against a strong program, but not because he actually makes fewer errors, but because the opponent doesn't notice them and doesn't punish them.
I disagree here.
If the opponent does not play well it is easier not to make mistakes.

I clearly have games against humans when I did no significant mistakes based on computer analysis(no move reduce the evaluation by more than 0.2 pawn).

It is not because I am so strong but because it is easier not to make mistakes when the opponent does not play well.
If the opponent play well I expect myself to do more mistakes.

Uri
"no significant mistakes based on computer analysis" is meaningless when we are talking about PERFECT computer play. ANY mistake will be significant there. This extrapolation about what happens today is meaningless when we talk about a perfect chess opponent.
It is not meaningless because the computer is clearly stronger than me and find many mistakes.

If the computer find that I play more mistakes in games that I play against stronger opponents it means that it is easier to make mistakes against stronger opponents.

I do not have the 32 piece tablebases but my speculation is that there are many games when the winner did not do a mistake that change the theoretical result and also there are draws with no mistakes.
Of course the side who did no mistake could do mistakes in case of playing against a stronger opponent.
It is not easier to make mistakes against stronger opponents. It is simply more likely that they will understand your mistake and how they can exploit it. Your propensity for making mistakes is independent of the opponent, your brain cells don't suddenly change when you play a 2500 player vs a 1500 player.

As far as the "many games with no discernible mistakes" there are only "many games" because there are millions and millions of games that have been played. If your opponent spots the mistake and beats you only 1% of the time, that does NOT mean you played 99% of your games with no mistakes.
It is easier to make mistakes when the opponent help you to do mistakes.
If the opponent play weak moves I simply has good chances not to get positions when I do mistakes.

It is not that my brain change when I play against stronger player but
I simply need to solve harder problems that I cannot solve when
I do not need to solve hard problems against the weak player.

The opening moves I guess that I know to play perfectly because there are many drawing moves and after the opponent made a mistake it is easy not to make a mistake.

As an extreme example suppose that
I play against a player who make random moves.

I guess that part of the games are going to be 1.f3 e5 2.g4 Qh4 mate when I played perfectly because the opponent did not help me to do mistakes.

You can claim that maybe 1...e5 is a mistake(and I am not 100% sure because maybe 1...e5 draw when 1...d5 win) but my guess is that it is not a mistake.

Against non random player that is a very weak chess player something like the following can happen when again I cannot prove a mistake by white with today's software.

1.e4 e5 2.Nf3 Nf6 3.Nxe5 Nxe4 4.Qe2 Nf6 5.Nc6+ that win the queen can happen when after 5.Nc6+ I believe that it is easy to play perfectly(maybe I do not mate in the fastest way that is not important but I always play winning moves).
Uri Blass
Posts: 10895
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Something Hikaru Said

Post by Uri Blass »

bob wrote:
Laskos wrote:
bob wrote:
700-1500 Elo is certainly meaningless. Where does that come from? 20 years ago the assumption was that 2800 was the upper bound on Elo. That seems to have bitten the big banana. The only thing that bounds Elo is that the best player will be hard-pressed to get more than 800 above the second-best player. But then the second best can get to 800 below the 3rd.
Well, I will show how that is "meaningless". Some 2-3 years ago I computed the estimate of the rating of the perfect engine as limiting ELO of Houdini 3 to infinite number of nodes (time control). Duncan refers to that post of mine. Today I repeated with thousands of games this estimate with Komodo 9.3 to infinite time control (doubling in time). I guess you consider ELO gain per doubling as "close to 70 ELO points", which is totally misleading, and the diminishing returns are thought by you as "hard to measure".

My today's tests:
First doubling: 10s+0.1s vs 5s+0.05s -- 148 ELO points
Second doubling: 20s + 0.2s vs 10s+0.1s -- 128 ELO points
Third doubling: 40s+0.4s vs 20s+0.2s -- 110 ELO points

Other data:
Close to eighth doubling -- CEGT 40/20' -- 65 ELO points
Close to ninth doubling -- CCRL 40/40' -- 55 ELO points

Diminishing returns are very visible here. I fitted these results with the relevant curve. Here is the plot of ELO gain per doubling in time:
Image

The red dots are not data, they are predictions, first for Larry's 45'+15'' on 24 cores level against humans in odds matches, for which Larry got as Komodo performing at 3250 FIDE ELO level, second red dot shows TCEC level of 150'+30'' on 24 faster cores. One sees that at those high levels the gain per doubling is below 40 ELO points.

The fit: the fitting curve chosen is relevant as I show. It is a/(1+b*(number of doublings)^c). {a,b,c} are parameters to fit, and {c} is the relevant exponent here. If {c} is between 0 and 1, then we do have diminishing returns, but the total ELO is unbounded for higher and higher number of doublings, therefore the rating of perfect engine is undetermined and high. If c>1, then the perfect engine has a definite limited rating which can be derived.

It turns out that the best fit is c=1.56, which is significantly larger than 1, and the ELO of the perfect engine can be computed and is not very high. I actually derive this, it is not assumed.

Do you have a more relevant fitting expression? I assume only that diminishing gains are going to 0 gain per doubling to infinite number of doublings, and this is related to the draw ratio going to 100% to infinite time control (doublings). Do you have a more plausible model which shows that the draw ratio has in fact lower limit than 100%?

Having the fit, I can compute the ELO of the perfect engine by summing up all gains from doublings starting from established by Larry FIDE ELO 3250 (close to 13th doubling in time). It is ~1300 ELO points above Larry's 24 core Komodo. Therefore, Komodo 9.3 shows a FIDE ELO of the perfect engines at about 3250+1300 = 4550 ELO points. CCRL rating would be 100-200 ELO points higher, as their rating is computer rating. And very close to 4800 CCRL I got 2-3 years ago with Houdini 3.

As you see, 700-1500 ELO points estimate for improvement over Komodo I gave previously is not meaningless at all, my model here is simple, robust, and consistent with earlier results (including one or two by Don Dailey). If you want to dismiss it, it's surely not by "we don't know" mantra, because it seems it is mostly you who "doesn't know".
So you presume that the ONLY thing that will advance computer chess is hardware speed (doublings)? You ought to try doing a curve fit prior to null-move. And then see what happens when null-move became the norm (a discontinuous jump). Ditto for other revolutionary ideas (LMR, forward-pruning, iterative deepening, etc.) I would not try to predict 20 years into the future using past data. Because it is impossible to predict what will happen next. At one point we were all doing exhaustive searching ala' chess 4.x and the predictions were being made. Then along came null-move and those predictions were crushed. And then everyone made steady progress until LMR came along. And the predictions were again crushed. And then forward pruning (futility pruning. LMP, etc.) and once again the predictions fell apart. Who knows what kind of software advances will occur along the way to the ultimate chess program? About the only valid prediction is that all current predictions will fail badly. We are NOT "near perfection" today. We've not even scratched the surface to approach anything even close to perfection. And that shows just how imperfect humans are since what is being done today is still good enough to beat humans, even though most anyone can point out flaws in a chess engine's search or evaluation. Except for a human playing OTB.
I believe that you could get similiar numbers with some weak engine that does not use null move.

It may be interesting to see results with some engine that does not use null move pruning.
I do not remember that people tried to calculate the max elo at that time in that way.

Uri
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Something Hikaru Said

Post by Laskos »

bob wrote:
mhull wrote:
I wonder if there is a variable unaccounted for in this analysis (correct me if I'm wrong) which I would call "hardware limit optimization" or "limit optimization effect".

For instance, in the 10s versus 5s, consider that the depths achieved on current hardware at this limit were in time past, achieved in 2 minutes (however long ago that was). The software of that day (search and evaluation) was tuned and optimized for that depth which was the outer reaches of what was possible.

If current software were constrained to hardware that could only reach those depths (5s on current hardware) in 2 minutes, its search and evaluation would likely be tuned and adjusted differently to maximize its performance to that constraint (not to 10 times that practical limit).

So your curve function is fitting software results to limits for which the software has not been optimized except where it intersects a narrow range of currently optimized championship time controls.

So how can we accurately predict how future doubling in computing capacity will effect ELO if limit optimization effects are unknown? And how is this variable factored-in to measurements of historical progress? If it's not factored-in, what could be the effect on our analysis?
I've tried to make this point multiple times. But when you have a perfect player (32 piece EGTBs) things will be even more different than today. Every 10 years someone has extrapolated what the computer Elo will be in ten years. And every 10 years they have been wrong. Nobody was projecting 3300 Elo 20 years ago, given past data. It only happens with todays programs which are far different from those 20 years ago and beyond.

Curve fitting as done here is an attempt at predicting future performance based on passed performance. Look at predictions prior to LMR vs post-LMR. Prior to null-move vs post-null-move. Prior to parallel search to post-parallel-search. There will always be steady improvement, and there will be discontinuous jumps along the way when something new and different is discovered/tried. Trying to project the Elo of the "ultimate computer" based on today's numbers is senseless. Today's numbers suggest there will NEVER be an ultimate computer. Which I happen to agree with.

But this thread was about a hypothetical case, and extended arguments about something that will almost certainly never happen is a waste of time...
First to Matthew: in fact both Komodo and Stockfish are optimized for ultra-fast controls like those of my three tests, and almost never tested during development at CEGT or CCRL longer time control, never mind TCEC. Their developers only hope that the engines optimized such way will scale well to rating lists and TCEC. The shape of the curve is hardly intentionally skewed by the developers, as they see the gain at ultra-fast and hope for the reasonable scaling of the gain. I did not include unknown to me factors which nobody can measure.

To Bob: there is Mephisto Gideon 1993 engine (about FIDE ELO of 2200) adapted by Ed to work as a UCI engine. I let it play 40/1' on 150 times faster core than the x486 of 1993, it traduces to tournament time control of 1993. IIRC I even tried to measure the gain from doubling with this Mephisto, it seemed like 70-90 ELO points at tournament-like time control of 1993. Maybe from 1990es comes the myth about "70 ELO points". The first difference: in 1993 an engine on top hardware of the time at tournament time control gained at least 70 ELO points per doubling, nowadays at TCEC only 30 ELO points.

Draw rate in self-play is about 10-15% in these conditions (40/1' on i7 core, traduces to 40/150' on x486 of 1993). At TCEC the draw rate is above 85%. Isn't it clear that we are approaching a different state of computer chess? One is hard pressed to be able to show that the distance to perfect player is very large.