Question to the members of the ranking lists..

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Question to the members of the ranking lists..

Post by lkaufman »

IWB wrote:Hello Larry

"Of course the enignes would have a rating if we would have data to compare them. The keyword for me is "reasonable margin of error".
As we do not have any actual data every calculation is pure speculation for me as we do not know anything about the possible error margin.
You mentioned the old games, I think they are too far of from todays hardware and software progress. You mentioned the handicap games but there we do not know the strength of the handicaped Rybka.
I simply believe that Humans are simply no more real match for a descent computer/software setup as all top players have given up years ago to officially play against top computers. All you can do is force some GMs with money, but even then it seems they are perfectly happy with getting the looser gratification and go home.

Regarding the IPON 2800 for Shredder. I believe that this single comp rating is too low when you compare it with the Top GMs. How far - I have no clue.

Bye
Ingo"

I agree with most of your comments. However i think the data from the many pc vs. Kasparov or Kramnik matches is sufficient to set the average of those engines involved at about 2800 with reasonable accuracy, which of course would put Deep Shredder 12 at a much higher figure. As I've already explained, whatever rating it would come out with should be reduced by 25% of the excess over 2800 to make a realistic estimate of its human rating, same for any 2800+ program. I also don't agree with your comment about the motivation of the GMs in these matches. Having run many such matches myself, I can say that in most cases the GMs seemed fully motivated to do the best they could, and in fact in the case of the Milov match he was able to win the handicapped match by a single game margin. The main conclusion from these matches is that Rybka played at a level far above any human, as it is obvious that even Kasparov or Carlsen could not give such handicaps to the players in question with any reasonable success.

Regards,
Larry
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Question to the members of the ranking lists..

Post by lkaufman »

Steve B wrote:
Steve B wrote: "i am sure you are well aware of the now legendary BB report
some computer chess enthusiasts do not take this report seriously because of the fact that it is anonymous(not signed)
Zach Wegner who is now preparing for the WCCC has met BB in person and he also mentioned that you have spoken to BB in person
forgetting for the moment the actual content of the report..
do you think the BB report is tainted in any way by virtue of its being unsigned?"

as a second digression
i know you have been interested in ODDS games for some time
There were several articles published by you in the USCF mag regarding this topic
i own some old dedicated computers and i ran some quick non exhaustive tests to see if i could overcome a 1000 Pt Elo difference between a 1700 Dedicated computer and a 2700 PC engine at 40/2

1 game at R odds had the Dedicated drawing with the Engine
however very heavy time odds ( with even material) could not produce a draw..nor could minor piece odds(N or B)

i used NO BOOK and Pondering OFF

at a minimum how many games do you think would need to be played at R odds to establish some sort of reasonable certainty that a R =1000 elo?..That is to say that R odds can overcome a 1000 PT Elo gap?
what parameters are best regarding the use of books for the dedicated and pondering ON/OFF?
TC will be 40/2

you must realize i will be using dedicated computers v a 2700 PC engine so i dont have the luxury of running thousands and thousands of games as i would in a PC v PC engne match

Best Regards
Steve
Some answers:
1. Of course you can't use opening books as there were no handicap opening books prior to Rybka 3.
2. Ponder off is a better test as I think that ponder on would favor the stronger/faster computer as it would often "guess'" the opponent's reply.
3. The value of the handicap increases as the level goes up. Thus if it turns out that 2700 vs 1700 at rook odds is fair, if you increase the weaker machine to 2000 you might have to increase the stronger machine to 3300 rather than just to 3000.
4. As for number of games, if you want to say that 1000 difference is fair with a small margin of error (like 10 Elo) you need a lot of games. But if you just want to say plus or minus a hundred Elo, you can get by with just a few games. I think for your purposes this is good enough.
Steve B
Posts: 3697
Joined: Tue Jul 31, 2007 4:26 pm

Re: Question to the members of the ranking lists..

Post by Steve B »

lkaufman wrote:
Steve B wrote:
Steve B wrote: "i am sure you are well aware of the now legendary BB report
some computer chess enthusiasts do not take this report seriously because of the fact that it is anonymous(not signed)
Zach Wegner who is now preparing for the WCCC has met BB in person and he also mentioned that you have spoken to BB in person
forgetting for the moment the actual content of the report..
do you think the BB report is tainted in any way by virtue of its being unsigned?"

as a second digression
i know you have been interested in ODDS games for some time
There were several articles published by you in the USCF mag regarding this topic
i own some old dedicated computers and i ran some quick non exhaustive tests to see if i could overcome a 1000 Pt Elo difference between a 1700 Dedicated computer and a 2700 PC engine at 40/2

1 game at R odds had the Dedicated drawing with the Engine
however very heavy time odds ( with even material) could not produce a draw..nor could minor piece odds(N or B)

i used NO BOOK and Pondering OFF

at a minimum how many games do you think would need to be played at R odds to establish some sort of reasonable certainty that a R =1000 elo?..That is to say that R odds can overcome a 1000 PT Elo gap?
what parameters are best regarding the use of books for the dedicated and pondering ON/OFF?
TC will be 40/2

you must realize i will be using dedicated computers v a 2700 PC engine so i dont have the luxury of running thousands and thousands of games as i would in a PC v PC engne match

Best Regards
Steve
Some answers:
1. Of course you can't use opening books as there were no handicap opening books prior to Rybka 3.
2. Ponder off is a better test as I think that ponder on would favor the stronger/faster computer as it would often "guess'" the opponent's reply.
3. The value of the handicap increases as the level goes up. Thus if it turns out that 2700 vs 1700 at rook odds is fair, if you increase the weaker machine to 2000 you might have to increase the stronger machine to 3300 rather than just to 3000.
4. As for number of games, if you want to say that 1000 difference is fair with a small margin of error (like 10 Elo) you need a lot of games. But if you just want to say plus or minus a hundred Elo, you can get by with just a few games. I think for your purposes this is good enough.
Thanks for your suggestions
i think 1700 v 2700 has some "real world" significance
1700 probably close to the Average Club players Elo ..and 2700 a Top GM's Elo

so perhaps we can conclude that the difference between an average club player and a top GM in terms of material odds is a R
actually i just received in a Novag Constellation 3.6 Mhz which tips the rating scales at 1650 ish
a bit lower then 1700 but its worth a 20 Game match at 40/2 to at least test the computer
Pondering OFF makes playing a 20 game match at 40/2 much easier for me ..as i do not have to sit transfixed at the board for fear of "pondering" skewing the results

Thanks Again
Steve
Steve B
Posts: 3697
Joined: Tue Jul 31, 2007 4:26 pm

Re: Question to the members of the ranking lists..

Post by Steve B »

Steve B wrote:
so perhaps we can conclude that the difference between an average club player and a top GM in terms of material odds is a R
Steve
as an after thought.. Larry
do you think your rating compression formula of 75% x(diff in Elo) + a constant ..should figure in here in some way?

asking another way..
assuming the use of a 2700 Elo PC engine as a constant (i have to use this engine as i own no others)..what elo in your opinion should i use for the Dedicated computer to as closely as possible mimic the difference between an avg club player and a top GM?

Best Regards
Steve
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Question to the members of the ranking lists..

Post by lkaufman »

Steve B wrote:
Steve B wrote:
so perhaps we can conclude that the difference between an average club player and a top GM in terms of material odds is a R
Steve
as an after thought.. Larry
do you think your rating compression formula of 75% x(diff in Elo) + a constant ..should figure in here in some way?

asking another way..
assuming the use of a 2700 Elo PC engine as a constant (i have to use this engine as i own no others)..what elo in your opinion should i use for the Dedicated computer to as closely as possible mimic the difference between an avg club player and a top GM?

Best Regards
Steve
First it depends on where you are getting the ratings. If they are from an engine-engine rating list then indeed you need to allow for compression. For example Novag constellation 3.6 you quote at 1650, but I'm sure that it's USCF rating would be higher than that, because SuperConstellation got an ufficial USCF rating of 2018 (so long ago!) and the two models are probably only a hundred Elo or so apart. So I suppose your 1650 is from an engine list. So probably the test you propose actually is of a stronger unit than the average club player. On the other hand humans are better than computers at using a material advantage, because they put much more emphasis on trading when ahead. So I think your test is a pretty good one for your purpose.
Steve B
Posts: 3697
Joined: Tue Jul 31, 2007 4:26 pm

Re: Question to the members of the ranking lists..

Post by Steve B »

lkaufman wrote:
Steve B wrote:
Steve B wrote:
so perhaps we can conclude that the difference between an average club player and a top GM in terms of material odds is a R
Steve
as an after thought.. Larry
do you think your rating compression formula of 75% x(diff in Elo) + a constant ..should figure in here in some way?

asking another way..
assuming the use of a 2700 Elo PC engine as a constant (i have to use this engine as i own no others)..what elo in your opinion should i use for the Dedicated computer to as closely as possible mimic the difference between an avg club player and a top GM?

Best Regards
Steve
First it depends on where you are getting the ratings. If they are from an engine-engine rating list then indeed you need to allow for compression. For example Novag constellation 3.6 you quote at 1650, but I'm sure that it's USCF rating would be higher than that, because SuperConstellation got an ufficial USCF rating of 2018 (so long ago!) and the two models are probably only a hundred Elo or so apart. So I suppose your 1650 is from an engine list. So probably the test you propose actually is of a stronger unit than the average club player. On the other hand humans are better than computers at using a material advantage, because they put much more emphasis on trading when ahead. So I think your test is a pretty good one for your purpose.
the rating comes from Eric Hallsworth "Selective Search" dedicated computer rating lists and are in ELO not USCF
he still publishes the list every two months
you might remember him when you rated the oldies for ICD back in the early 1990's?
Hallsworth quoted you quite often in those days

he currently(issue 148) shows the following ratings for the Novag Constellation series:
Connie 2 Mhz-1591
Connie 3.6 Mhz-1646
Super Connie-1728
Connie Expert-(AKA Chess Monster..full sized wooden auto-sensory)-1790

Best Regards
Steve
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Question to the members of the ranking lists..

Post by lkaufman »

Sure I remember Eric. His ratings like SSDF, CCRL, CEGT etc. are based on engine vs engine games, and therefore underrate the old/weak engines in order not to overrrate the current strong ones. Part of the difference is FIDE vs. USCF, but that's actually only about 50 Elo or so on average. His ratings confirm that the Constellation 3.6 is less than a hundred below SuperCon, and therefore would be in the mid 1900s USCF assuming ratings are about in line with the mid '80s, which I think is roughly true (they inflated until 1995, then deflated some).
Steve B
Posts: 3697
Joined: Tue Jul 31, 2007 4:26 pm

Re: Question to the members of the ranking lists..

Post by Steve B »

lkaufman wrote:Sure I remember Eric. His ratings like SSDF, CCRL, CEGT etc. are based on engine vs engine games, and therefore underrate the old/weak engines in order not to overrrate the current strong ones. Part of the difference is FIDE vs. USCF, but that's actually only about 50 Elo or so on average. His ratings confirm that the Constellation 3.6 is less than a hundred below SuperCon, and therefore would be in the mid 1900s USCF assuming ratings are about in line with the mid '80s, which I think is roughly true (they inflated until 1995, then deflated some).
so you agree Connie 3.6 v 2700 is ok?
or should i use any of the other Connies?
i own each one of them

Steve
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Question to the members of the ranking lists..

Post by Don »

Steve B wrote:
lkaufman wrote:
Steve B wrote:
Steve B wrote:
so perhaps we can conclude that the difference between an average club player and a top GM in terms of material odds is a R
Steve
as an after thought.. Larry
do you think your rating compression formula of 75% x(diff in Elo) + a constant ..should figure in here in some way?

asking another way..
assuming the use of a 2700 Elo PC engine as a constant (i have to use this engine as i own no others)..what elo in your opinion should i use for the Dedicated computer to as closely as possible mimic the difference between an avg club player and a top GM?

Best Regards
Steve
First it depends on where you are getting the ratings. If they are from an engine-engine rating list then indeed you need to allow for compression. For example Novag constellation 3.6 you quote at 1650, but I'm sure that it's USCF rating would be higher than that, because SuperConstellation got an ufficial USCF rating of 2018 (so long ago!) and the two models are probably only a hundred Elo or so apart. So I suppose your 1650 is from an engine list. So probably the test you propose actually is of a stronger unit than the average club player. On the other hand humans are better than computers at using a material advantage, because they put much more emphasis on trading when ahead. So I think your test is a pretty good one for your purpose.
the rating comes from Eric Hallsworth "Selective Search" dedicated computer rating lists and are in ELO not USCF
he still publishes the list every two months
you might remember him when you rated the oldies for ICD back in the early 1990's?
Hallsworth quoted you quite often in those days

he currently(issue 148) shows the following ratings for the Novag Constellation series:
Connie 2 Mhz-1591
Connie 3.6 Mhz-1646
Super Connie-1728
Connie Expert-(AKA Chess Monster..full sized wooden auto-sensory)-1790

Best Regards
Steve
The super connie achieved 2018 in serious tournament play against humans. At that particular point in time the USCF rating pool was somewhat inflated relative to FIDE ratings, but it wasn't ridiculous - maybe 100 ELO at most and probably much less than that.

The number 1728 seems quite a bit too low. However it's possible that the connie got way over rated as people were not yet very skilled at playing computers at the time and the sample of games the USCF required to get an "official" computer rating was still subject to fairly large error.

You mention the ratings are in ELO and not USCF, but the USCF uses the ELO rating system so what you said makes no sense. What you probably meant was FIDE vs USCF.

But it's difficult to see how he got the rating of 1728, it really seems way too low no matter how much you reasonably allow for sample size and rating inflation. So I would say his scale is off by at least 100 ELO and probably more like 200.
Steve B
Posts: 3697
Joined: Tue Jul 31, 2007 4:26 pm

Re: Question to the members of the ranking lists..

Post by Steve B »

Don wrote:
Steve B wrote:
lkaufman wrote:
Steve B wrote:
Steve B wrote:
so perhaps we can conclude that the difference between an average club player and a top GM in terms of material odds is a R
Steve
as an after thought.. Larry
do you think your rating compression formula of 75% x(diff in Elo) + a constant ..should figure in here in some way?

asking another way..
assuming the use of a 2700 Elo PC engine as a constant (i have to use this engine as i own no others)..what elo in your opinion should i use for the Dedicated computer to as closely as possible mimic the difference between an avg club player and a top GM?

Best Regards
Steve
First it depends on where you are getting the ratings. If they are from an engine-engine rating list then indeed you need to allow for compression. For example Novag constellation 3.6 you quote at 1650, but I'm sure that it's USCF rating would be higher than that, because SuperConstellation got an ufficial USCF rating of 2018 (so long ago!) and the two models are probably only a hundred Elo or so apart. So I suppose your 1650 is from an engine list. So probably the test you propose actually is of a stronger unit than the average club player. On the other hand humans are better than computers at using a material advantage, because they put much more emphasis on trading when ahead. So I think your test is a pretty good one for your purpose.
the rating comes from Eric Hallsworth "Selective Search" dedicated computer rating lists and are in ELO not USCF
he still publishes the list every two months
you might remember him when you rated the oldies for ICD back in the early 1990's?
Hallsworth quoted you quite often in those days

he currently(issue 148) shows the following ratings for the Novag Constellation series:
Connie 2 Mhz-1591
Connie 3.6 Mhz-1646
Super Connie-1728
Connie Expert-(AKA Chess Monster..full sized wooden auto-sensory)-1790

Best Regards
Steve
The super connie achieved 2018 in serious tournament play against humans. At that particular point in time the USCF rating pool was somewhat inflated relative to FIDE ratings, but it wasn't ridiculous - maybe 100 ELO at most and probably much less than that.

The number 1728 seems quite a bit too low. However it's possible that the connie got way over rated as people were not yet very skilled at playing computers at the time and the sample of games the USCF required to get an "official" computer rating was still subject to fairly large error.

You mention the ratings are in ELO and not USCF, but the USCF uses the ELO rating system so what you said makes no sense. What you probably meant was FIDE vs USCF.

But it's difficult to see how he got the rating of 1728, it really seems way too low no matter how much you reasonably allow for sample size and rating inflation. So I would say his scale is off by at least 100 ELO and probably more like 200.
hi Don
way back when the Super connie got its OFFICIAL USCF rating ..
USCF ratings were 125-150 higher Then elo ratings
in addition some of the ratings achieved by the USCF Chess Computer ratings agency (now defunct)had several issues as to accuracy
Larry wrote extensively about this back then and he can speak to this far better then i could

in addition to Selective Search ..there is a German dedicated chess computer site which also rates many of the dedicated computers independently from Selective Search
they show very similar rating's for the Conny 3.6 at 1648 ELO
and 1732 for the Super Conny

http://www.schach-computer.info/wiki/in ... stellation

http://www.schach-computer.info/wiki/in ... lation_3.6


what ever the two of you feel is best to achieve the goals of my experiment is OK with me
Regards
Steve