future of top engines:how much more elo?

Laskos · Post by **Laskos** » Tue Jul 30, 2019 9:31 am

Ovyron wrote: ↑Sun Jul 28, 2019 11:51 am
Dann Corbit wrote: ↑Sun Jul 28, 2019 2:58 amHowever, some engines have opponents that really clobber then (better than they should) and other engines that they clobber (better than they should) and so the ranking you get would contain data from just the one cross tie
Still way better than what we have now.

Dann Corbit wrote: ↑Sun Jul 28, 2019 2:58 amWe, also.'' already know relative strengths, and if the goal is better data I do not think that there are any real short cuts.So what did we really gain?
We think we know relative strengths, but it hasn't been tested, so how do we know?

Dann Corbit wrote: ↑Sun Jul 28, 2019 2:58 amI actually think it would add to the confusion also,
I was genuinely confused by all this at first, with 40/4 showing higher rating, I wrongly assumed 40/4 meant "40 minutes for 4 moves." Any change could only make things better.

Oh, why so much mystification? Calibrate both lists to same engine's fixed rating, an engine well connected in the pool of engines and having many games. I am not sure, maybe CCRL already does that. Then add some 200 Elo points to longer TC 40/40 list compared to 40/4 list. Then the ratings can roughly be compared in some absolute CCRL Elo points valid for both 40/4 and 40/40 lists. Rough method, but pretty solid.

Ovyron · Post by **Ovyron** » Tue Jul 30, 2019 12:41 pm

Laskos wrote: ↑Tue Jul 30, 2019 9:31 amThen add some 200 Elo points to longer TC 40/40 list compared to 40/4 list.

This is the key point, we need to figure out what this numbers is, and after doing so, we could even mix the lists and add [40/40]/[40/4] to the engines' monikers, so they can be compared directly on a list, just like 1CPU and 4CPU can be compared on a list.

(caveat: 40/40 ratings seem really overrated, specially as it's equivalent to blitz in some machines, so I'd rather have 200 subtracted from the 40/4 one)

But, as the laziest solution, just adding the rating points to the 40/40 list (without any further testing) would be much better than what we have now. So that's an easy fix.

jp · Post by jp » Wed Jul 31, 2019 12:59 am

Ovyron wrote: ↑Tue Jul 30, 2019 12:41 pm
Laskos wrote: ↑Tue Jul 30, 2019 9:31 amThen add some 200 Elo points to longer TC 40/40 list compared to 40/4 list.
This is the key point, we need to figure out what this numbers is

Yes, if numbers to translate between different lists aren't figured out, people just assume numbers, which might be totally wrong.

Laskos · Post by **Laskos** » Wed Jul 31, 2019 1:08 pm

jp wrote: ↑Wed Jul 31, 2019 12:59 am
Ovyron wrote: ↑Tue Jul 30, 2019 12:41 pm
Laskos wrote: ↑Tue Jul 30, 2019 9:31 amThen add some 200 Elo points to longer TC 40/40 list compared to 40/4 list.
This is the key point, we need to figure out what this numbers is
Yes, if numbers to translate between different lists aren't figured out, people just assume numbers, which might be totally wrong.

I am very sorry you have huge difficulties translating 10x time factor in average Elo points shft across the engines from 40/4 to 40/40 in CCRL conditions. I seem to have less difficulties, and I estimate it to be 200 +/- 30 Elo points. The sole serious problem is that the scale of differences of the two lists might be not quite a factor 1.0, but say 0.9. Then the translation from one list to another might look like Elo2 - 2800 = 0.9 x (Elo1 - 2800) + 200, for example. Sure, still a rough result, but a very easy translation. Playing hundreds of thousands of gamed for mixing the time controls is basically building a new rating list, a huge endeavor and almost an idiotic one.

Ovyron · Post by **Ovyron** » Wed Jul 31, 2019 4:52 pm

Laskos wrote: ↑Wed Jul 31, 2019 1:08 pm The sole serious problem is that the scale of differences of the two lists might be not quite a factor 1.0, but say 0.9.

We shouldn't care about scale factors, but move quality, if there's some 40/40 engine that plays at the same strength as one on the 40/4, then it'd be able to appear on the 40/40 list without disrupting anything (no more disruption than increasing the 40/40 engine that plays at that strength anyway).

What I'm saying is that 40/40 allows engines with 1CPU to play engines with 4CPU with no problems, we're not restricting engines to only play others in with the same CPU and then have 1CPU and 4CPU lists that can't be compared (where 1CPU engines appear with higher rating than 4CPU...) It's the same thing with time control so it'd make sense to have a single list that shows ratings for 40/40 and 40/4 where they can be compared.

Even if it's only done with 1 40/20 engine and we assume a scale factor of 1 (which could be wrong), and it ends with 200 elo difference (so Laskos can say "you just wasted testing time! told you so!"), using it to compare the rating lists would be much better than what we have now.

But don't remain static just because the best solution would be idiotic to implement, the simplest solution (calibrating to -200) that improves the situation is worth implementing.

Laskos · Post by **Laskos** » Wed Jul 31, 2019 6:17 pm

Ovyron wrote: ↑Wed Jul 31, 2019 4:52 pm
Laskos wrote: ↑Wed Jul 31, 2019 1:08 pm The sole serious problem is that the scale of differences of the two lists might be not quite a factor 1.0, but say 0.9.
We shouldn't care about scale factors, but move quality, if there's some 40/40 engine that plays at the same strength as one on the 40/4, then it'd be able to appear on the 40/40 list without disrupting anything (no more disruption than increasing the 40/40 engine that plays at that strength anyway).

What I'm saying is that 40/40 allows engines with 1CPU to play engines with 4CPU with no problems, we're not restricting engines to only play others in with the same CPU and then have 1CPU and 4CPU lists that can't be compared (where 1CPU engines appear with higher rating than 4CPU...) It's the same thing with time control so it'd make sense to have a single list that shows ratings for 40/40 and 40/4 where they can be compared.

Even if it's only done with 1 40/20 engine and we assume a scale factor of 1 (which could be wrong), and it ends with 200 elo difference (so Laskos can say "you just wasted testing time! told you so!"), using it to compare the rating lists would be much better than what we have now.

But don't remain static just because the best solution would be idiotic to implement, the simplest solution (calibrating to -200) that improves the situation is worth implementing.

I don't understand what you say. I am on the phone now and for some time. On a computer, one can do a linear regression in 10 minutes by say picking 20 engines behaving regularly, assuming that regular engines, one by one, is about 200 Elo points stronger at 40/40 than at 40/4.
One will get Elo2 = a*Elo1 + b relationship between the two lists and then linearly too, one can easily build a common list. Also, one would probably get some 20 Elo points additional methodological error, but keeping in mind that we anyway have usually 10-20 Elo point margins on both lists, that's not such a grave issue. I prefer doing 10 minutes work than years of pretty redundant tests.

Uri Blass · Post by **Uri Blass** » Wed Jul 31, 2019 7:03 pm

The point is that the assumption of 200 elo+-20 elo difference between 40/40 and 40/4 is not something that
we know to be proved by games at the relevant time control.

I do not know if it is 200 elo or 150 elo or 250 elo.
Maybe people already played games to find the difference to be 95% certain it is 180-220 elo but I do not know about them.

Laskos · Post by **Laskos** » Wed Jul 31, 2019 8:10 pm

Uri Blass wrote: ↑Wed Jul 31, 2019 7:03 pm The point is that the assumption of 200 elo+-20 elo difference between 40/40 and 40/4 is not something that
we know to be proved by games at the relevant time control.

I do not know if it is 200 elo or 150 elo or 250 elo.
Maybe people already played games to find the difference to be 95% certain it is 180-220 elo but I do not know about them.

Probably 200 +/- 30 or so. No great mystery, there is a plethora of studies on that even on this forum, and tests of Andreas and others discussed here. If people have difficulties remembering anything related to numbers, I don't think they need rating lists.

70 first doubling
60 second doubling
55 third doubling
15 for 1.25 final factor
==============
about 200 Elo points for a factor of 10 from CCRL 40/4 to 40/40. Give or take 20, at most 30 Elo points.

Ovyron · Post by **Ovyron** » Wed Jul 31, 2019 10:57 pm

I guess all these discussions are useless, the rating lists are built from volunteer work and what those volunteers want to test (that's why Stockfish 9 tops the 40/4 list...)

What we'd need to do is convincing one of those testers to do the engine 40/40 vs. 40/4 test thing, hopefully one of them is reading...

Zenmastur · Post by **Zenmastur** » Wed Jul 31, 2019 11:42 pm

Ovyron wrote: ↑Wed Jul 31, 2019 10:57 pm I guess all these discussions are useless, the rating lists are built from volunteer work and what those volunteers want to test (that's why Stockfish 9 tops the 40/4 list...)

What we'd need to do is convincing one of those testers to do the engine 40/40 vs. 40/4 test thing, hopefully one of them is reading...

Like Kai said, it's not worth their efforts. Just do the math and be content that it's that easy. If you want more precision you'll have to run the tests yourself!

Regards,

Zenmastur

future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?