future of top engines:how much more elo?

jp · Post by jp » Wed Jul 24, 2019 11:08 pm

carldaman wrote: ↑Wed Jul 24, 2019 10:41 pm They are separate rating lists, meaning you can't compare those two ratings.

But people still do, even though they know they shouldn't. Not the two ratings lists Ovyron mentioned, but e.g. human & computer ratings lists.

Dann Corbit · Post by **Dann Corbit** » Thu Jul 25, 2019 12:07 am

As long as you don't use the absolute numbers as a reference, comparing lists isn't silly.
There may be some minor differences due to scaling implementations, but for the most part, a carefully prepared ranking list will look a lot like another ranking list at a different time control or thread count.

Naturally, due to excellent or awful implementations of threading there can be some differences. But I guess that they are pretty rare.

IOW, if engine X is stronger than engine Y on list A, then chances are pretty good that it is also stronger on list B. I assume, of course, that error bars are small enough that the rankings are not randomized to some degree and there is substantial LOS.

Graham Banks · Post by **Graham Banks** » Thu Jul 25, 2019 1:04 am

carldaman wrote: ↑Wed Jul 24, 2019 10:41 pm They are separate rating lists, meaning you can't compare those two ratings.

Unless you were joking, of course.

Correct.
The 40/40 and 40/4 lists are constructed from separate databases.

However, within each list, the 1CPU and 4CPU ratings can be compared.

Uri Blass · Post by **Uri Blass** » Thu Jul 25, 2019 3:14 am

carldaman wrote: ↑Wed Jul 24, 2019 10:41 pm They are separate rating lists, meaning you can't compare those two ratings.

Unless you were joking, of course.

The idea is that people want rating to measure playing strength when the idea is that rating at short time control should be lower than rating at long time control because the level of playing is lower.

It is obvious that it is not the case when you compare CCRL 40/40 and 40/4 and it is possible to change it by making games at unequal time control.

Ovyron · Post by **Ovyron** » Fri Jul 26, 2019 5:47 am

carldaman wrote: ↑Wed Jul 24, 2019 10:41 pm They are separate rating lists, meaning you can't compare those two ratings.

The whole point is that we should be able to compare them, like, what is the answer to this question:

What is the weakest engine that needs a time control of 40/40 to reach the strength of 40/4 Stockfish 9?

That's an interesting question, and nobody knows. If the 40/4 list was correctly calibrated, we could answer this, and any other such question, at a glance.

So it looks like a flaw in their calibration, and it's easy to fix.

Dann Corbit · Post by **Dann Corbit** » Sat Jul 27, 2019 10:33 am

Ovyron wrote: ↑Fri Jul 26, 2019 5:47 am
So it looks like a flaw in their calibration, and it's easy to fix.

Aside from playing a few hundred thousand games with some engines at the slow time control verses engines running the fast time control, i would be curious to know what your easy fix is

Ovyron · Post by **Ovyron** » Sat Jul 27, 2019 11:14 am

One should be enough, mainly, the top one of the 40/4 list.

At the very least, playing 1000 games between that top engine and the ones from 40/40, get a rating and calibrate for that rating the entire 40/40 list would be better than what we have now. Much, much better.

Dann Corbit · Post by **Dann Corbit** » Sun Jul 28, 2019 2:58 am

Ovyron wrote: ↑Sat Jul 27, 2019 11:14 am One should be enough, mainly, the top one of the 40/4 list.

At the very least, playing 1000 games between that top engine and the ones from 40/40, get a rating and calibrate for that rating the entire 40/40 list would be better than what we have now. Much, much better.

I don't think that approach works. If you play 1000 games (for instance) with SF at 40/20 and using 40/4 time control for the second engine, it would link the lists.

However, some engines have opponents that really clobber then (better than they should) and other engines that they clobber (better than they should) and so the ranking you get would contain data from just the one cross tie,

We, also.'' already know relative strengths, and if the goal is better data I do not think that there are any real short cuts.So what did we really gain?

I actually think it would add to the confusion also,
I guess that people who saw SF with 4 threads rated at 3400 Elo would be surprised to see the same engine rated 175 Elo lower due to the other time control setting,

Ovyron · Post by **Ovyron** » Sun Jul 28, 2019 11:51 am

Dann Corbit wrote: ↑Sun Jul 28, 2019 2:58 amHowever, some engines have opponents that really clobber then (better than they should) and other engines that they clobber (better than they should) and so the ranking you get would contain data from just the one cross tie

Still way better than what we have now.

Dann Corbit wrote: ↑Sun Jul 28, 2019 2:58 amWe, also.'' already know relative strengths, and if the goal is better data I do not think that there are any real short cuts.So what did we really gain?

We think we know relative strengths, but it hasn't been tested, so how do we know?

Dann Corbit wrote: ↑Sun Jul 28, 2019 2:58 amI actually think it would add to the confusion also,

I was genuinely confused by all this at first, with 40/4 showing higher rating, I wrongly assumed 40/4 meant "40 minutes for 4 moves." Any change could only make things better.

jp · Post by jp » Tue Jul 30, 2019 8:52 am

Ovyron wrote: ↑Sun Jul 28, 2019 11:51 am
Dann Corbit wrote: ↑Sun Jul 28, 2019 2:58 amWe, also already know relative strengths, and if the goal is better data I do not think that there are any real short cuts.So what did we really gain?
We think we know relative strengths, but it hasn't been tested, so how do we know?

Ideally, we'd have data to let us match (by time or nodes handicap) the playing strength of any two engines we want to play against each other, without just guessing. That seems way too big a job, though.

future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?