CCRL 40/4 single cpu list - IPONized

lkaufman · Post by **lkaufman** » Fri Nov 25, 2016 4:09 am

carldaman wrote:
lkaufman wrote:
Laskos wrote:
lkaufman wrote:
ThatsIt wrote:
IWB wrote:I like 'IPONized'!
[...snip...]
Isn't that a good start to standardize elo calculation and base rating within the lists. I am open for suggestions! (And I am more willing to compromise with the rating (as this is arbitrary anyhow) than with the statistical method ... but if all agree to follow a democratic vote I am in!)
I would like to see Critter 1.6x 1CPU @ ELO 2800.

Best wishes,
G.S.
It seems to me that if you are going to have engine ratings that are intended to suggest what they would rate vs. top humans, the base rating should be an engine that actually played many GMs, with a very conservative estimated rating since engine vs engine ratings do seem to overstate improvement somewhat. Due to the many games between Rybka (versions from about 2.3.2a to Rybka 3) and GMs with varying handicaps and conditions, I can say with reasonable confidence that Rybka 3 on 1 cpu would have a FIDE rating somewhere in the 2900s, and so fixing it at 2900 would be a suitably conservative value. This just happens to be almost exactly its current IPON rating. These Rybka versions from 2.3.2a to 3, running on 4 or 8 cores almost a decade ago, were probably on average about like Rybka 3 running single core on a typical laptop I7 today. Rybka beat GM Joel Benjamin (then about 2600 FIDE) by 7 to 1 in normal chess despite giving him White in every game, and beat GM Jan Ehlvest (then well over 2600) by 4.5 to 1.5 giving him White every game, double time, only 3 move opening book, no TBs, and small Hash size. Matches with material handicaps confirmed that Rybka was well beyond the strength of any human player. So I favor using Rybka 3 as the anchor at 2900, since it is the strongest engine with a rating that can be clearly justified based on GM matches.
Also, the newer versions of Komodo on 24 cores about 3300 FIDE on 1 hour TC. We established that with reasonable confidence. 1 core would be about 3150 FIDE at that TC. In Bullet and Blitz games the FIDE ELO level of Komodo is even higher. I often see naive statements, like "engines now are 3200 level". This quantity depends on time control, hardware, FIDE or engine ratings, contempt, opening book, etc. Also, FIDE and CCRL ratings differ not only in offset, but also in rating compression of FIDE ratings compared to engine ratings, and this seems to be even non-linear on large ELO spans.
This compression problem makes it a matter of opinion where to anchor the list. Although we pretty much agree on the ratings for Komodo based on these handicap matches, the rating for Rybka 3 is much less dependent on assumptions and calculations, as it actually played 14 games of normal chess with strong GMs at fairly normal time limits. True, there were various other advantages for the GMs, especially White in every game, but the elo value of White is well known, and the other advantages only applied to six of the games and were not too large. So I think Rybka 3 makes a better anchor as its rating can be much more easily explained. Actual performance for 11.5 out of 14 score was close to 2900, without even adjusting for White pieces, so 2900 is clearly a very conservative rating for Rybka 3. Actually those 14 games were played with versions closer to Rybka 2.32a than to Rybka 3, so even though they were played on 4 cpus I think Rybka 3 on one modern cpu would be at least as strong.
2900 seems too conservative a rating for Rybka 3. It was way ahead of its peers back in its heyday. Even the old Chess Tiger 14 achieved performance levels of about 2800 FIDE ELO in human tournaments way back in 2001.

CL

I agree, probably it deserves at least 2950. I pick 2900 because it can be shown to be pretty much a low bound estimate, and because compression makes the top engines come out too high unless we suppress the anchor a bit.

Laskos · Post by **Laskos** » Fri Nov 25, 2016 6:10 am

lkaufman wrote:
Laskos wrote:
lkaufman wrote:
ThatsIt wrote:
IWB wrote:I like 'IPONized'!
[...snip...]
Isn't that a good start to standardize elo calculation and base rating within the lists. I am open for suggestions! (And I am more willing to compromise with the rating (as this is arbitrary anyhow) than with the statistical method ... but if all agree to follow a democratic vote I am in!)
I would like to see Critter 1.6x 1CPU @ ELO 2800.

Best wishes,
G.S.
It seems to me that if you are going to have engine ratings that are intended to suggest what they would rate vs. top humans, the base rating should be an engine that actually played many GMs, with a very conservative estimated rating since engine vs engine ratings do seem to overstate improvement somewhat. Due to the many games between Rybka (versions from about 2.3.2a to Rybka 3) and GMs with varying handicaps and conditions, I can say with reasonable confidence that Rybka 3 on 1 cpu would have a FIDE rating somewhere in the 2900s, and so fixing it at 2900 would be a suitably conservative value. This just happens to be almost exactly its current IPON rating. These Rybka versions from 2.3.2a to 3, running on 4 or 8 cores almost a decade ago, were probably on average about like Rybka 3 running single core on a typical laptop I7 today. Rybka beat GM Joel Benjamin (then about 2600 FIDE) by 7 to 1 in normal chess despite giving him White in every game, and beat GM Jan Ehlvest (then well over 2600) by 4.5 to 1.5 giving him White every game, double time, only 3 move opening book, no TBs, and small Hash size. Matches with material handicaps confirmed that Rybka was well beyond the strength of any human player. So I favor using Rybka 3 as the anchor at 2900, since it is the strongest engine with a rating that can be clearly justified based on GM matches.
Also, the newer versions of Komodo on 24 cores about 3300 FIDE on 1 hour TC. We established that with reasonable confidence. 1 core would be about 3150 FIDE at that TC. In Bullet and Blitz games the FIDE ELO level of Komodo is even higher. I often see naive statements, like "engines now are 3200 level". This quantity depends on time control, hardware, FIDE or engine ratings, contempt, opening book, etc. Also, FIDE and CCRL ratings differ not only in offset, but also in rating compression of FIDE ratings compared to engine ratings, and this seems to be even non-linear on large ELO spans.
This compression problem makes it a matter of opinion where to anchor the list. Although we pretty much agree on the ratings for Komodo based on these handicap matches, the rating for Rybka 3 is much less dependent on assumptions and calculations, as it actually played 14 games of normal chess with strong GMs at fairly normal time limits. True, there were various other advantages for the GMs, especially White in every game, but the elo value of White is well known, and the other advantages only applied to six of the games and were not too large. So I think Rybka 3 makes a better anchor as its rating can be much more easily explained. Actual performance for 11.5 out of 14 score was close to 2900, without even adjusting for White pieces, so 2900 is clearly a very conservative rating for Rybka 3. Actually those 14 games were played with versions closer to Rybka 2.32a than to Rybka 3, so even though they were played on 4 cpus I think Rybka 3 on one modern cpu would be at least as strong.

Rybka 3 on 1 modern core of 2950 FIDE or Komodo 10.2 3150 FIDE on one modern core at 1 hour game, or even older anchors like Fritz 8 or Junior 8 of 2750 on one modern core are pretty equivalent anchors. It's not that we are in the dark about CCRL-FIDE relationship, but one has to be careful with, for example, doubling number of CPUs, it might be 60 ELO points on CCRL, but only 30 ELO points on FIDE. Humans seem to scale better with time control.

CCRL 40/4 single cpu list - IPONized

Re: CCRL 40/4 single cpu list - IPONized

Re: CCRL 40/4 single cpu list - IPONized