Diminishing return

Daniel Shawul · Post by **Daniel Shawul** » Thu Sep 13, 2012 12:27 pm

I am glad to see that the logarithmic fit works fine! Well, regarding my model, this is a drawback... but more than 20000 Elo between depth 1 and depth d (in this case: Y(d) - Y(1) ~ 1037*ln(d/1) = 1037*ln(d)) would come with d > 2.37e+8, and I doubt that those depths can be reached. Please take a look to the first point of this web:
The longest Chess game theoretically possible is 5,949 moves.

Well in that case it is even more. I used a cubic trend line from excel at first which showed >20000 even at shallow depths which I assumed maximum depth to be at 80 half plies. That is the average plies in a game of chess. But with logarithmic fit, you are right it is much less. I don't know if cubic is overfitting the data as R2=1. Logarithmic has 0.9999 and yet the results are a lot different. That is why you need to test at bigger depths to say anything definitive about significant diminishing returns. Try a cubic on the data, and you will get >20000 at depth=70 while fitting the data better.

Code: Select all

=0.1715*x^3-10.579*x^2+266.61*x-1788

Using logarithmic fit we are _forcing_ it to decrease. The slope in the previous model is dy/dx=1037/x so you will have 1elo per ply at depth 1037. So in a way, we use logarithmic because we want to see diminishing returns! If you do neutral model selection from linear,polynomical,logarithmic and other models,you are very far from proving diminishing returns with the data you are given. It is just too small a data, and infact towards the last ply didn't show any diminshing. So it is more about our assumption, than logical conclusion from what we observed.

Ajedrecista · Post by **Ajedrecista** » Thu Sep 13, 2012 12:52 pm

Hello again:

Daniel Shawul wrote:Well in that case it is even more. I used a cubic trend line from excel at first which showed >20000 even at shallow depths which I assumed maximum depth to be at 80 half plies. That is the average plies in a game of chess. But with logarithmic fit, you are right it is much less. I don't know if cubic is overfitting the data as R2=1. Logarithmic has 0.9999 and yet the results are a lot different. That is why you need to test at bigger depths to say anything definitive about significant diminishing returns. Try a cubic on the data, and you will get >20000 at depth=70 while fitting the data better.

I am not very gifted at maths, but I think you hit the spot: the overfitting issue.

Daniel Shawul wrote:Using logarithmic fit we are _forcing_ it to decrease. The slope in the previous model is dy/dx=1037/x so you will have 1elo per ply at depth 1037. So in a way, we use logarithmic because we want to see diminishing returns! If you do neutral model selection from linear,polynomical,logarithmic and other models,you are very far from proving diminishing returns with the data you are given. It is just too small a data, and infact towards the last ply didn't show any diminshing. So it is more about our assumption, than logical conclusion from what we observed.

Yes, I am forcing it to decrease because it is what I expect to see. Obviously, deeper depths and a insane number of games are required. Quoting myself:

Ajedrecista wrote:I choose a line because of some reasons: when depth d tends to infinity, then ln(d) ~ 1 + 1/2 + 1/3 + ... + 1/d and ln(d - 1) ~ 1 + 1/2 + 1/3 + ... + 1/(d - 1); delta_x = ln(d) - ln(d - 1) = ln[d/(d - 1)] ~ 1/d. If Y(x) = mx + n, dY/dx = m; estimate Elo gain = delta_Y = m*delta_x ~ m/d ---> 0 if d ---> infinity (diminishing return exists with this model).

A quadratic function fails with the same previous analysis: Y(x) = ax² + bx + c; dY/dx = 2ax + b; delta_x ~ 1/d (the same as before); estimate Elo gain = delta_Y = (dY/dx)*delta_x = (2ax + b)/d ~ {2a*[d + (d - 1)]/2 + b}/d ~ 2a = constant: diminishing return does not exist with this model (the same with other polynomials of higher degree). In dY/dx, I choose the average mean x ~ [d + (d - 1)]/2 because it makes sense to me. I am not taking into account error bars.

I choose the logarithmic approach with the aim of seeing a decrease in Elo gain. I agree that it can be a little cheat, but I also think that we have to be a little flexible with R² once it is near to one (I mean, not interpreting the adjust of R² ~ 1 as infinitely better to the adjust with R² ~ 0.9999 or R² ~ 0.999). But I know that fitting few data points for extrapolate far points is a very hard and risky task, always full of traps. I humbly present my model as reasonably good but not definitive or universally correct. Of course, everyone can apply his/her own fitting model and extract his/her own conclusions. Polynomial and logarithmic fits are too different. I have not checked it, but I am sure that if you try a polynomial of degree 6 in Excel you will see weird results for far points while R² ~ 1: IMHO, it is overfitting.

Daniel Shawul wrote:So in a way, we use logarithmic because we want to see diminishing returns!
...
So it is more about our assumption, than logical conclusion from what we observed.

You could not be more right! I fully agree with you. I can not say more than I have said until now: each person must do his/her fitting like he/she thinks it is best. Thanks for sharing your results.

Regards from Spain.

Ajedrecista.

Daniel Shawul · Post by **Daniel Shawul** » Thu Sep 13, 2012 5:25 pm

Jesus,
I do not think it is overfitting either. I think it is just that the error margins are still too high compared to the elo diminishing from ply to ply. I expected that the elo delta would show bigger decrement but it is only around ~10 elo decrease per ply while the error margins are of the same magnitude! So I need to bring down error margins down to 1 elo or less. If you look at the second derivative, you would see right now it is pretty much random except at the beginning. It is d2y/dx^2 that should decrease for diminishing returns

Code: Select all

D	  y	 e+  e-  dy/dx d2y/dx^2
15	409	9	9	72	 -4
14	337	7	7	68	 17
13	269	6	6	85	  7
12	184	6	6	92	  9
11	 92	5	5	101	13
10	 -9	6	5	114	 8
9	-123	6	5	122	13
8	-245	6	5	135	19
7	-380	6	6	154	 *
6	-534	7	7	 *	  *

You can see that only at the beginning do i get diminishing returns. Towards the top there is still a lot of work to do.

I did some curve fitting based on current data and almost all models I tried have R^2~0.99 but they can exchibit but with different result of extrabpolation

Daniel

Ajedrecista · Post by **Ajedrecista** » Thu Sep 13, 2012 5:30 pm

Hello Daniel:

Daniel Shawul wrote:Jesus,
I do not think it is overfitting either. I think it is just that the error margins are still too high compared to the elo diminishing from ply to ply. I expected that the elo delta would show bigger decrement but it is only around ~10 elo decrease per ply while the error margins are of the same magnitude! So I need to bring down error margins down to 1 elo or less. If you look at the second derivative, you would see right now it is pretty much random except at the beginning. It is d2y/dx^2 that should decrease for diminishing returns
Code: Select all
D	  y	 e+  e-  dy/dx d2y/dx^2
15	409	9	9	72	 -4
14	337	7	7	68	 17
13	269	6	6	85	  7
12	184	6	6	92	  9
11	 92	5	5	101	13
10	 -9	6	5	114	 8
9	-123	6	5	122	13
8	-245	6	5	135	19
7	-380	6	6	154	 *
6	-534	7	7	 *	  *
You can see that only at the beginning do i get diminishing returns. Towards the top there is still a lot of work to do.

I did some curve fitting based on current data and almost all models I tried have R^2~0.99 but they can exchibit but with different result of extrabpolation

Daniel

In fact I do not consider error bars because I do not know how to fit data with them. I already said that I do not use error bars for fit, this is why I(and everybody) wants a huge amount of games for reducing them. ± 1 Elo with 95% confidence is almost unaffordable (surely more than 150,000 games). Good luck.

I did not know that d²y/dx² should decrease for diminishing returns, thanks for saying that. Just for clarification:

Code: Select all

dy(D)/dx = y(D) - y(D - 1)
d²y(D)/dx² = -y(D) + 2y(D-1) - y(D - 2)

I did these calculations and all your data match with my numbers.

At first glance (nothing serious) of your plots: the lineal model is too simple for being true, cubic looks weird and quadratic seems suspicious. The others do not hurt my eyes and they differ in the 'speed' of the diminishing returns. The importance of a good model choice is critical... which is better? I would say that each person must choose what he/she find to be best, no matter of other's choices.

Regards from Spain.

Ajedrecista.

Daniel Shawul · Post by **Daniel Shawul** » Thu Sep 13, 2012 5:40 pm

Jesus,
Linear model used to be the norm so when diminishing returns was claimed first, so it was a big deal. I want to know how that claim was made. You would expect that the elo delta should decrease but by how much and at what rate? Polynomial regression have problems but there is nothing that says one equation be used for all data points so a spline may fit better f.i. Plus the range is fixed up to 80 plies. To prove that there is diminishing returns, it seems to me at least I need to get a single data point at depth 39v40 or something like that. I did the different regressions in search of a diminishing returns model that can give higher elo while still showing some decrements. The linear-log model that we used gives too much decrease while the data I have so far say not that much. Maybe something that varies with log(sqrt(x)) or something like that that diminishes the data a a lower rate will work too ...

In fact I do not consider error bars because I do not know how to fit data with them. I already said that I do not use error bars for fit, this is why I(and everybody) wants a huge amount of games for reducing them. ± 1 Elo with 95% confidence is almost unaffordable (surely more than 150,000 games). Good luck.

I did not know that d²y/dx² should decrease for diminishing returns, thanks for saying that. Just for clarification:

Code:
dy(D)/dx = y(D) - y(D - 1)
d²y(D)/dx² = -y(D) + 2y(D-1) - y(D - 2)

I did these calculations and all your data match with my numbers.

At first glance (nothing serious) of your plots: the lineal model is too simple for being true, cubic looks weird and quadratic seems suspicious. The others do not hurt my eyes and they differ in the 'speed' of the diminishing returns. The importance of a good model choice is critical... which is better? I would say that each person must choose what he/she find to be best, no matter of other's choices.

Regards from Spain.

Ajedrecista.

Daniel Shawul · Post by **Daniel Shawul** » Thu Sep 13, 2012 6:05 pm

I did not know that d²y/dx² should decrease for diminishing returns, thanks for saying that. Just for clarification:

I am not sure about that. dy/dx should definitely decrease but it happens that for y = aln(x)+b both dy/dx=a/x and d2y/dx2=-a/x^2 decrease. But I am not getting a decrease in the second derivative in my data which goes against using a logarithmic fit.

Diminishing return

Re: Diminishing return.

Re: Diminishing return.

Re: Diminishing return.

Re: Diminishing return.

Re: Diminishing return.

Re: Diminishing return.