ccrl hardware adjustment question

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Modern Times
Posts: 3546
Joined: Thu Jun 07, 2012 11:02 pm

Re: ccrl hardware adjustment question

Post by Modern Times »

carldaman wrote: Tue May 12, 2020 8:45 pm
And, weren't the CCRL slow ratings 'adjusted' by 100 Elo points downward a few years ago?
Indeed they were lowered by 100. There was a lot of noise at the time that the ratings were too high.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: ccrl hardware adjustment question

Post by lkaufman »

Modern Times wrote: Tue May 12, 2020 11:35 pm
carldaman wrote: Tue May 12, 2020 8:45 pm
And, weren't the CCRL slow ratings 'adjusted' by 100 Elo points downward a few years ago?
Indeed they were lowered by 100. There was a lot of noise at the time that the ratings were too high.
Probably some people felt that the TOP ratings were too high, but the ratings of engines that actually played on a par with the World Champions were always on the low side, even with a hundred added to current numbers. The ratings are further apart than they would be against humans, but to me it only makes sense to peg the ones that actually have measurable ratings vs. humans and let the chips fall where they may.
Komodo rules!
carldaman
Posts: 2283
Joined: Sat Jun 02, 2012 2:13 am

Re: ccrl hardware adjustment question

Post by carldaman »

Modern Times wrote: Tue May 12, 2020 11:35 pm
carldaman wrote: Tue May 12, 2020 8:45 pm
And, weren't the CCRL slow ratings 'adjusted' by 100 Elo points downward a few years ago?
Indeed they were lowered by 100. There was a lot of noise at the time that the ratings were too high.
The CCRL ratings may have seemed high relative to other rating lists, but that didn't mean that the other lists were closer to being right... :| I still hold to that view. :)
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: ccrl hardware adjustment question

Post by Laskos »

Modern Times wrote: Tue May 12, 2020 11:35 pm
carldaman wrote: Tue May 12, 2020 8:45 pm
And, weren't the CCRL slow ratings 'adjusted' by 100 Elo points downward a few years ago?
Indeed they were lowered by 100. There was a lot of noise at the time that the ratings were too high.
CCRL uses Bayeselo, right? To have "uninflated" rating comparable to FIDE ratings, could you keep as "standard candles" Fritz 8 and Junior 9 at some 2800 Elo level on one core at 40/15' (rapid) and 3000 Elo on one core at blitz? And all other ratings compressed to these central "standard candles" by some adjustment factor of 0.70 for Bayeselo? This way, one could directly compare CCRL and FIDE ratings.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: ccrl hardware adjustment question

Post by lkaufman »

Laskos wrote: Thu May 14, 2020 1:38 am
Modern Times wrote: Tue May 12, 2020 11:35 pm
carldaman wrote: Tue May 12, 2020 8:45 pm
And, weren't the CCRL slow ratings 'adjusted' by 100 Elo points downward a few years ago?
Indeed they were lowered by 100. There was a lot of noise at the time that the ratings were too high.
CCRL uses Bayeselo, right? To have "uninflated" rating comparable to FIDE ratings, could you keep as "standard candles" Fritz 8 and Junior 9 at some 2800 Elo level on one core at 40/15' (rapid) and 3000 Elo on one core at blitz? And all other ratings compressed to these central "standard candles" by some adjustment factor of 0.70 for Bayeselo? This way, one could directly compare CCRL and FIDE ratings.
Note that 2800 is a reasonable (though conservative) estimate of how those "standard candle" engines would perform on today's hardware on one core at the standard time limits used in the 2002/2003 matches (40 moves in 2 hours I think), but if they played rapid (40/15' or more realistically 15' + 10" incr) against humans they would get something like 2950 I think, and at blitz something closer to 3100 I imagine; the difference between human performance in blitz vs rapid is quite large.
Komodo rules!
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: ccrl hardware adjustment question

Post by lkaufman »

One reason I posted this question is that I wanted to be able to give a fairly solid estimate for the strength in human terms of Komodo 14 at its highest crippled Skill level, 24. I ran 300 games of it vs. Arasan 14 at 15' + 10", with Komodo losing narrowly by 15 elo points. With a bit of interpolation since that exact version isn't on the CCRL list, the estimated rating on that list for Skill 24 would be about 2620. If my conclusion in this thread is correct that you need to add 280 elo to the CCRL 40/15 ratings to estimate human FIDE Rapid rating (at 15' + 10"), that gives 2900, 20 above Magnus Carlsen. As mentioned in the readme, our target was indeed to have this level be about the same as Magnus Carlsen at this time limit, but I didn't know until now that we were right on target, as near as I can tell.
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: ccrl hardware adjustment question

Post by Laskos »

lkaufman wrote: Thu May 14, 2020 2:35 am
Laskos wrote: Thu May 14, 2020 1:38 am
Modern Times wrote: Tue May 12, 2020 11:35 pm
carldaman wrote: Tue May 12, 2020 8:45 pm
And, weren't the CCRL slow ratings 'adjusted' by 100 Elo points downward a few years ago?
Indeed they were lowered by 100. There was a lot of noise at the time that the ratings were too high.
CCRL uses Bayeselo, right? To have "uninflated" rating comparable to FIDE ratings, could you keep as "standard candles" Fritz 8 and Junior 9 at some 2800 Elo level on one core at 40/15' (rapid) and 3000 Elo on one core at blitz? And all other ratings compressed to these central "standard candles" by some adjustment factor of 0.70 for Bayeselo? This way, one could directly compare CCRL and FIDE ratings.
Note that 2800 is a reasonable (though conservative) estimate of how those "standard candle" engines would perform on today's hardware on one core at the standard time limits used in the 2002/2003 matches (40 moves in 2 hours I think), but if they played rapid (40/15' or more realistically 15' + 10" incr) against humans they would get something like 2950 I think, and at blitz something closer to 3100 I imagine; the difference between human performance in blitz vs rapid is quite large.
Yes, I agree, although the hardware used back in 2003 was a bit stronger than 1 modern core (probably two times stronger or so). But yes, Fritz 8 in rapid on one core would probably be about 2950 FIDE Elo, and about 3100 in blitz.

Adjusting just by addition CCRL ratings to suit top engines being compared to FIDE ratings completely ruins CCRL comparison to FIDE ratings of weak engines. One must go with a linear compression relation, the compression being probably of order 0.70 with BayesElo and of order 0.60 with Ordo.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: ccrl hardware adjustment question

Post by lkaufman »

Laskos wrote: Thu May 14, 2020 10:50 am
lkaufman wrote: Thu May 14, 2020 2:35 am
Laskos wrote: Thu May 14, 2020 1:38 am
Modern Times wrote: Tue May 12, 2020 11:35 pm
carldaman wrote: Tue May 12, 2020 8:45 pm
And, weren't the CCRL slow ratings 'adjusted' by 100 Elo points downward a few years ago?
Indeed they were lowered by 100. There was a lot of noise at the time that the ratings were too high.
CCRL uses Bayeselo, right? To have "uninflated" rating comparable to FIDE ratings, could you keep as "standard candles" Fritz 8 and Junior 9 at some 2800 Elo level on one core at 40/15' (rapid) and 3000 Elo on one core at blitz? And all other ratings compressed to these central "standard candles" by some adjustment factor of 0.70 for Bayeselo? This way, one could directly compare CCRL and FIDE ratings.
Note that 2800 is a reasonable (though conservative) estimate of how those "standard candle" engines would perform on today's hardware on one core at the standard time limits used in the 2002/2003 matches (40 moves in 2 hours I think), but if they played rapid (40/15' or more realistically 15' + 10" incr) against humans they would get something like 2950 I think, and at blitz something closer to 3100 I imagine; the difference between human performance in blitz vs rapid is quite large.
Yes, I agree, although the hardware used back in 2003 was a bit stronger than 1 modern core (probably two times stronger or so). But yes, Fritz 8 in rapid on one core would probably be about 2950 FIDE Elo, and about 3100 in blitz.

Adjusting just by addition CCRL ratings to suit top engines being compared to FIDE ratings completely ruins CCRL comparison to FIDE ratings of weak engines. One must go with a linear compression relation, the compression being probably of order 0.70 with BayesElo and of order 0.60 with Ordo.
According to the comments in this thread, one thread on the reference I7 is about four times faster than one thread on the 2003 hardware, so since 4 cores gives at best a 3 to 1 real speedup, one thread on i7 should be about 4/3 faster than the 2003 hardware. I think you are thinking of the 2006 hardware. Regarding compression, 0.6 (for Ordo) sounds a bit extreme to me, weren't you getting about 0.65 for that ratio?
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: ccrl hardware adjustment question

Post by Laskos »

lkaufman wrote: Thu May 14, 2020 6:51 pm
Laskos wrote: Thu May 14, 2020 10:50 am
lkaufman wrote: Thu May 14, 2020 2:35 am
Laskos wrote: Thu May 14, 2020 1:38 am
Modern Times wrote: Tue May 12, 2020 11:35 pm
carldaman wrote: Tue May 12, 2020 8:45 pm
And, weren't the CCRL slow ratings 'adjusted' by 100 Elo points downward a few years ago?
Indeed they were lowered by 100. There was a lot of noise at the time that the ratings were too high.
CCRL uses Bayeselo, right? To have "uninflated" rating comparable to FIDE ratings, could you keep as "standard candles" Fritz 8 and Junior 9 at some 2800 Elo level on one core at 40/15' (rapid) and 3000 Elo on one core at blitz? And all other ratings compressed to these central "standard candles" by some adjustment factor of 0.70 for Bayeselo? This way, one could directly compare CCRL and FIDE ratings.
Note that 2800 is a reasonable (though conservative) estimate of how those "standard candle" engines would perform on today's hardware on one core at the standard time limits used in the 2002/2003 matches (40 moves in 2 hours I think), but if they played rapid (40/15' or more realistically 15' + 10" incr) against humans they would get something like 2950 I think, and at blitz something closer to 3100 I imagine; the difference between human performance in blitz vs rapid is quite large.
Yes, I agree, although the hardware used back in 2003 was a bit stronger than 1 modern core (probably two times stronger or so). But yes, Fritz 8 in rapid on one core would probably be about 2950 FIDE Elo, and about 3100 in blitz.

Adjusting just by addition CCRL ratings to suit top engines being compared to FIDE ratings completely ruins CCRL comparison to FIDE ratings of weak engines. One must go with a linear compression relation, the compression being probably of order 0.70 with BayesElo and of order 0.60 with Ordo.
According to the comments in this thread, one thread on the reference I7 is about four times faster than one thread on the 2003 hardware, so since 4 cores gives at best a 3 to 1 real speedup, one thread on i7 should be about 4/3 faster than the 2003 hardware. I think you are thinking of the 2006 hardware. Regarding compression, 0.6 (for Ordo) sounds a bit extreme to me, weren't you getting about 0.65 for that ratio?
Ah, I for some reason was remembering that either Fritz or Junior played on 8 cores back in 2003. I think a very strong overclocked i7 or i9 core is 4 times faster than Pentium 3000 MHz of 2003, more usually 3 times or so.
The compression factor seems to be non-linear with strength, and that's visible from ratings of Lc0 in a pool of regular AB engines. Here is an example of such a behavior:
http://talkchess.com/forum3/viewtopic.php?f=2&t=69672
I think that today's Elo ratings of top engines compared to top humans are even more inflated than say 10 years ago, and something like 0.70 was valid 10 years ago and 0.60 today (Ordo). A bit speculative to talk about, no much evidence, just Leela weird rating behavior.

EDIT: Several weeks ago I had the following interesting test: good net Lc0 on RTX 2070 against Fritz 6 was about 600 Elo points stronger. SF11 on 4 cores was about 1200 Elo points stronger than Fritz 6. As Lc0 in these conditions is some 100 Elo points stronger than SF11, the compression factor for such large Elo difference is larger than 2 (or below 0.50 factor). Similar things can happen to humans, and it's important to keep in mind that the compression formula is NOT linear (say simple 0.70 across the range).
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: ccrl hardware adjustment question

Post by lkaufman »

Laskos wrote: Thu May 14, 2020 7:35 pm
lkaufman wrote: Thu May 14, 2020 6:51 pm
Laskos wrote: Thu May 14, 2020 10:50 am
lkaufman wrote: Thu May 14, 2020 2:35 am
Laskos wrote: Thu May 14, 2020 1:38 am
Modern Times wrote: Tue May 12, 2020 11:35 pm
carldaman wrote: Tue May 12, 2020 8:45 pm
And, weren't the CCRL slow ratings 'adjusted' by 100 Elo points downward a few years ago?
Indeed they were lowered by 100. There was a lot of noise at the time that the ratings were too high.
CCRL uses Bayeselo, right? To have "uninflated" rating comparable to FIDE ratings, could you keep as "standard candles" Fritz 8 and Junior 9 at some 2800 Elo level on one core at 40/15' (rapid) and 3000 Elo on one core at blitz? And all other ratings compressed to these central "standard candles" by some adjustment factor of 0.70 for Bayeselo? This way, one could directly compare CCRL and FIDE ratings.
Note that 2800 is a reasonable (though conservative) estimate of how those "standard candle" engines would perform on today's hardware on one core at the standard time limits used in the 2002/2003 matches (40 moves in 2 hours I think), but if they played rapid (40/15' or more realistically 15' + 10" incr) against humans they would get something like 2950 I think, and at blitz something closer to 3100 I imagine; the difference between human performance in blitz vs rapid is quite large.
Yes, I agree, although the hardware used back in 2003 was a bit stronger than 1 modern core (probably two times stronger or so). But yes, Fritz 8 in rapid on one core would probably be about 2950 FIDE Elo, and about 3100 in blitz.

Adjusting just by addition CCRL ratings to suit top engines being compared to FIDE ratings completely ruins CCRL comparison to FIDE ratings of weak engines. One must go with a linear compression relation, the compression being probably of order 0.70 with BayesElo and of order 0.60 with Ordo.
According to the comments in this thread, one thread on the reference I7 is about four times faster than one thread on the 2003 hardware, so since 4 cores gives at best a 3 to 1 real speedup, one thread on i7 should be about 4/3 faster than the 2003 hardware. I think you are thinking of the 2006 hardware. Regarding compression, 0.6 (for Ordo) sounds a bit extreme to me, weren't you getting about 0.65 for that ratio?
Ah, I for some reason was remembering that either Fritz or Junior played on 8 cores back in 2003. I think a very strong overclocked i7 or i9 core is 4 times faster than Pentium 3000 MHz of 2003, more usually 3 times or so.
The compression factor seems to be non-linear with strength, and that's visible from ratings of Lc0 in a pool of regular AB engines. Here is an example of such a behavior:
http://talkchess.com/forum3/viewtopic.php?f=2&t=69672
I think that today's Elo ratings of top engines compared to top humans are even more inflated than say 10 years ago, and something like 0.70 was valid 10 years ago and 0.60 today (Ordo). A bit speculative to talk about, no much evidence, just Leela weird rating behavior.

EDIT: Several weeks ago I had the following interesting test: good net Lc0 on RTX 2070 against Fritz 6 was about 600 Elo points stronger. SF11 on 4 cores was about 1200 Elo points stronger than Fritz 6. As Lc0 in these conditions is some 100 Elo points stronger than SF11, the compression factor for such large Elo difference is larger than 2 (or below 0.50 factor). Similar things can happen to humans, and it's important to keep in mind that the compression formula is NOT linear (say simple 0.70 across the range).
According to the first reply to my post, In 2003 Junior played Kasparov on a quad Intel Xeon 1.9GHz. Presumably the Pentium 3000 MHz wasn't yet available. Using Lc0 to predict the human compression is the best you can do, but I don't know how reliable it is for that purpose. I had the idea to use Lc0 11248 running on CPU as a proxy to simulate Magnus Carlsen playing rapid and then see what handicaps Komodo can give it in fast games, but it seems that if the increment is less than three seconds it is prone to forfeit on time, and with that much increment it's probably stronger than MC. Are there any settings that prevent time forfeits? They seem to happen in the endgame when Lc0 is close to winning.
Komodo rules!