CCRL 40/4 single cpu list - IPONized

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

fastgm
Posts: 818
Joined: Mon Aug 19, 2013 6:57 pm

Re: IPON-RRRL - FGRL

Post by fastgm »

1) Statistical method
I prefer Ordo, but if the majority votes for Bayeselo, it's also fine for me.

2) Base engine
What’s wrong with Stockfish 8 as “base engine”? For example 3300 Elo. Everyone knows Stockfish and it is in continuous development.
The problem with Critter 1.6a or Gull 3 is that they are out off developing.
I also prefer Elo values. It is better readable an more common than 0 as a base.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: IPON-RRRL - FGRL

Post by IWB »

fastgm wrote: ...
What’s wrong with Stockfish 8 as “base engine”? For example 3300 Elo. Everyone knows Stockfish and it is in continuous development.
The problem with Critter 1.6a or Gull 3 is that they are out off developing.
I also prefer Elo values. It is better readable an more common than 0 as a base.
To have a stable engine for a long time is an advantage and using an engine in development is a disadvantage!
If you use SF8 you will either have to test SF8 with the other new engines even if SF9/10/11 ... is out OR you don't test it anymore and lose contact to your other engines over time as they are not tested against your "anker".

Some of the bigger list have TOP 30 or 50 engines not played a single game against the base. That is ok in the Elo system, nonethelss I have a bit of 'Jimjams' with it. When S8 moved out of my TOP16 I could have stayed with it but for a better feeling I choosed a new base in my TOP16 list ...

That is the advantage of always fixing the best engine to a value (e.g. 0). You never have to change your base again AND you are always connected to your list.

The better readable or more common is just something one has to get used to. Argumentative '0' seems to be more "logic" (at least to me ;-) )

Besides that, I have the feeling that the MOLs (Multi owner lists) have a problem to agree to something - espacialy on short notice. I might change my list for a while to 0 anyhow just for a change with the next entry as I can change back to whatever any time. :-)

Ingo
Modern Times
Posts: 3550
Joined: Thu Jun 07, 2012 11:02 pm

Re: IPON-RRRL - FGRL

Post by Modern Times »

IWB wrote:
Besides that, I have the feeling that the MOLs (Multi owner lists) have a problem to agree to something - especially on short notice.

Ingo
I don't think we put any timetable on this, but yes it could take a while. If we agree to change to Ordo for example, which could be decided in principle quite quickly, it may be several months before we can actually do it. And when we try to implement it we may find it is way too much work and have to reverse the decision. Our website is the probably the most complex and this is a major change. And as it may also involve Miguel, we don't know what his availability is either.

But agreeing a number for a 1CPU reference engine could be done quickly. But I don't know what we would do with the pure lists. I guess ideally it would be a 1CPU engine that will never have an SMP version.
mar
Posts: 2559
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: IPON-RRRL - FGRL

Post by mar »

Speaking of base offset - since mutual agreement seems unlikely,
how about letting the viewers (users) choose the offset themselves?

Let's say I open CCRL list and click something that lets me change base offset.
Or maybe choose from a combo box or whatever.

I've no idea how difficult this would be as I'm not a web guy (assuming the lists are generated and not hard-wired plaintext).

So everything would remain as it is (same backend, same way to collect data), only the web interface (frontend) would need some overhaul to apply simple offset.
clumma
Posts: 186
Joined: Fri Oct 10, 2014 10:05 pm
Location: Berkeley, CA

Re: IPON-RRRL - CCRL'd

Post by clumma »

If reference engine is sufficiently weaker than top engines -- Rybka 4.1 on a single core, say -- a top engine can estimate it's IPR using Regan's method. This IPR can be harmonized to human FIDE ratings quite easily, through analysis of human games with the same top engine.

Then all ratings lists can include reference engine, and ratings will be harmonized to FIDE and to one another.

-Carl
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CCRL 40/4 single cpu list - IPONized

Post by lkaufman »

ThatsIt wrote:
IWB wrote:I like 'IPONized'! 😀
[...snip...]
Isn't that a good start to standardize elo calculation and base rating within the lists. I am open for suggestions! (And I am more willing to compromise with the rating (as this is arbitrary anyhow) than with the statistical method ... but if all agree to follow a democratic vote I am in!)
I would like to see Critter 1.6x 1CPU @ ELO 2800.

Best wishes,
G.S.
It seems to me that if you are going to have engine ratings that are intended to suggest what they would rate vs. top humans, the base rating should be an engine that actually played many GMs, with a very conservative estimated rating since engine vs engine ratings do seem to overstate improvement somewhat. Due to the many games between Rybka (versions from about 2.3.2a to Rybka 3) and GMs with varying handicaps and conditions, I can say with reasonable confidence that Rybka 3 on 1 cpu would have a FIDE rating somewhere in the 2900s, and so fixing it at 2900 would be a suitably conservative value. This just happens to be almost exactly its current IPON rating. These Rybka versions from 2.3.2a to 3, running on 4 or 8 cores almost a decade ago, were probably on average about like Rybka 3 running single core on a typical laptop I7 today. Rybka beat GM Joel Benjamin (then about 2600 FIDE) by 7 to 1 in normal chess despite giving him White in every game, and beat GM Jan Ehlvest (then well over 2600) by 4.5 to 1.5 giving him White every game, double time, only 3 move opening book, no TBs, and small Hash size. Matches with material handicaps confirmed that Rybka was well beyond the strength of any human player. So I favor using Rybka 3 as the anchor at 2900, since it is the strongest engine with a rating that can be clearly justified based on GM matches.
Komodo rules!
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: CCRL 40/4 single cpu list - IPONized

Post by JJJ »

Based on ipon rating list, Stockfish would be +442 elo over Carslen on 1CPU. Could be a lot more with a good computer.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: CCRL 40/4 single cpu list - IPONized

Post by Laskos »

lkaufman wrote:
ThatsIt wrote:
IWB wrote:I like 'IPONized'! 😀
[...snip...]
Isn't that a good start to standardize elo calculation and base rating within the lists. I am open for suggestions! (And I am more willing to compromise with the rating (as this is arbitrary anyhow) than with the statistical method ... but if all agree to follow a democratic vote I am in!)
I would like to see Critter 1.6x 1CPU @ ELO 2800.

Best wishes,
G.S.
It seems to me that if you are going to have engine ratings that are intended to suggest what they would rate vs. top humans, the base rating should be an engine that actually played many GMs, with a very conservative estimated rating since engine vs engine ratings do seem to overstate improvement somewhat. Due to the many games between Rybka (versions from about 2.3.2a to Rybka 3) and GMs with varying handicaps and conditions, I can say with reasonable confidence that Rybka 3 on 1 cpu would have a FIDE rating somewhere in the 2900s, and so fixing it at 2900 would be a suitably conservative value. This just happens to be almost exactly its current IPON rating. These Rybka versions from 2.3.2a to 3, running on 4 or 8 cores almost a decade ago, were probably on average about like Rybka 3 running single core on a typical laptop I7 today. Rybka beat GM Joel Benjamin (then about 2600 FIDE) by 7 to 1 in normal chess despite giving him White in every game, and beat GM Jan Ehlvest (then well over 2600) by 4.5 to 1.5 giving him White every game, double time, only 3 move opening book, no TBs, and small Hash size. Matches with material handicaps confirmed that Rybka was well beyond the strength of any human player. So I favor using Rybka 3 as the anchor at 2900, since it is the strongest engine with a rating that can be clearly justified based on GM matches.
Also, the newer versions of Komodo on 24 cores about 3300 FIDE on 1 hour TC. We established that with reasonable confidence. 1 core would be about 3150 FIDE at that TC. In Bullet and Blitz games the FIDE ELO level of Komodo is even higher. I often see naive statements, like "engines now are 3200 level". This quantity depends on time control, hardware, FIDE or engine ratings, contempt, opening book, etc. Also, FIDE and CCRL ratings differ not only in offset, but also in rating compression of FIDE ratings compared to engine ratings, and this seems to be even non-linear on large ELO spans.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CCRL 40/4 single cpu list - IPONized

Post by lkaufman »

Laskos wrote:
lkaufman wrote:
ThatsIt wrote:
IWB wrote:I like 'IPONized'! 😀
[...snip...]
Isn't that a good start to standardize elo calculation and base rating within the lists. I am open for suggestions! (And I am more willing to compromise with the rating (as this is arbitrary anyhow) than with the statistical method ... but if all agree to follow a democratic vote I am in!)
I would like to see Critter 1.6x 1CPU @ ELO 2800.

Best wishes,
G.S.
It seems to me that if you are going to have engine ratings that are intended to suggest what they would rate vs. top humans, the base rating should be an engine that actually played many GMs, with a very conservative estimated rating since engine vs engine ratings do seem to overstate improvement somewhat. Due to the many games between Rybka (versions from about 2.3.2a to Rybka 3) and GMs with varying handicaps and conditions, I can say with reasonable confidence that Rybka 3 on 1 cpu would have a FIDE rating somewhere in the 2900s, and so fixing it at 2900 would be a suitably conservative value. This just happens to be almost exactly its current IPON rating. These Rybka versions from 2.3.2a to 3, running on 4 or 8 cores almost a decade ago, were probably on average about like Rybka 3 running single core on a typical laptop I7 today. Rybka beat GM Joel Benjamin (then about 2600 FIDE) by 7 to 1 in normal chess despite giving him White in every game, and beat GM Jan Ehlvest (then well over 2600) by 4.5 to 1.5 giving him White every game, double time, only 3 move opening book, no TBs, and small Hash size. Matches with material handicaps confirmed that Rybka was well beyond the strength of any human player. So I favor using Rybka 3 as the anchor at 2900, since it is the strongest engine with a rating that can be clearly justified based on GM matches.
Also, the newer versions of Komodo on 24 cores about 3300 FIDE on 1 hour TC. We established that with reasonable confidence. 1 core would be about 3150 FIDE at that TC. In Bullet and Blitz games the FIDE ELO level of Komodo is even higher. I often see naive statements, like "engines now are 3200 level". This quantity depends on time control, hardware, FIDE or engine ratings, contempt, opening book, etc. Also, FIDE and CCRL ratings differ not only in offset, but also in rating compression of FIDE ratings compared to engine ratings, and this seems to be even non-linear on large ELO spans.
This compression problem makes it a matter of opinion where to anchor the list. Although we pretty much agree on the ratings for Komodo based on these handicap matches, the rating for Rybka 3 is much less dependent on assumptions and calculations, as it actually played 14 games of normal chess with strong GMs at fairly normal time limits. True, there were various other advantages for the GMs, especially White in every game, but the elo value of White is well known, and the other advantages only applied to six of the games and were not too large. So I think Rybka 3 makes a better anchor as its rating can be much more easily explained. Actual performance for 11.5 out of 14 score was close to 2900, without even adjusting for White pieces, so 2900 is clearly a very conservative rating for Rybka 3. Actually those 14 games were played with versions closer to Rybka 2.32a than to Rybka 3, so even though they were played on 4 cpus I think Rybka 3 on one modern cpu would be at least as strong.
Komodo rules!
carldaman
Posts: 2283
Joined: Sat Jun 02, 2012 2:13 am

Re: CCRL 40/4 single cpu list - IPONized

Post by carldaman »

lkaufman wrote:
Laskos wrote:
lkaufman wrote:
ThatsIt wrote:
IWB wrote:I like 'IPONized'! 😀
[...snip...]
Isn't that a good start to standardize elo calculation and base rating within the lists. I am open for suggestions! (And I am more willing to compromise with the rating (as this is arbitrary anyhow) than with the statistical method ... but if all agree to follow a democratic vote I am in!)
I would like to see Critter 1.6x 1CPU @ ELO 2800.

Best wishes,
G.S.
It seems to me that if you are going to have engine ratings that are intended to suggest what they would rate vs. top humans, the base rating should be an engine that actually played many GMs, with a very conservative estimated rating since engine vs engine ratings do seem to overstate improvement somewhat. Due to the many games between Rybka (versions from about 2.3.2a to Rybka 3) and GMs with varying handicaps and conditions, I can say with reasonable confidence that Rybka 3 on 1 cpu would have a FIDE rating somewhere in the 2900s, and so fixing it at 2900 would be a suitably conservative value. This just happens to be almost exactly its current IPON rating. These Rybka versions from 2.3.2a to 3, running on 4 or 8 cores almost a decade ago, were probably on average about like Rybka 3 running single core on a typical laptop I7 today. Rybka beat GM Joel Benjamin (then about 2600 FIDE) by 7 to 1 in normal chess despite giving him White in every game, and beat GM Jan Ehlvest (then well over 2600) by 4.5 to 1.5 giving him White every game, double time, only 3 move opening book, no TBs, and small Hash size. Matches with material handicaps confirmed that Rybka was well beyond the strength of any human player. So I favor using Rybka 3 as the anchor at 2900, since it is the strongest engine with a rating that can be clearly justified based on GM matches.
Also, the newer versions of Komodo on 24 cores about 3300 FIDE on 1 hour TC. We established that with reasonable confidence. 1 core would be about 3150 FIDE at that TC. In Bullet and Blitz games the FIDE ELO level of Komodo is even higher. I often see naive statements, like "engines now are 3200 level". This quantity depends on time control, hardware, FIDE or engine ratings, contempt, opening book, etc. Also, FIDE and CCRL ratings differ not only in offset, but also in rating compression of FIDE ratings compared to engine ratings, and this seems to be even non-linear on large ELO spans.
This compression problem makes it a matter of opinion where to anchor the list. Although we pretty much agree on the ratings for Komodo based on these handicap matches, the rating for Rybka 3 is much less dependent on assumptions and calculations, as it actually played 14 games of normal chess with strong GMs at fairly normal time limits. True, there were various other advantages for the GMs, especially White in every game, but the elo value of White is well known, and the other advantages only applied to six of the games and were not too large. So I think Rybka 3 makes a better anchor as its rating can be much more easily explained. Actual performance for 11.5 out of 14 score was close to 2900, without even adjusting for White pieces, so 2900 is clearly a very conservative rating for Rybka 3. Actually those 14 games were played with versions closer to Rybka 2.32a than to Rybka 3, so even though they were played on 4 cpus I think Rybka 3 on one modern cpu would be at least as strong.
2900 seems too conservative a rating for Rybka 3. It was way ahead of its peers back in its heyday. Even the old Chess Tiger 14 achieved performance levels of about 2800 FIDE ELO in human tournaments way back in 2001.

CL