Vintage .... Rating List Winboard from June 1999 (16:42)

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Vintage .... Rating List Winboard from June 1999 (16:42)

Post by Laskos »

lkaufman wrote: Tue Aug 04, 2020 4:46 pm
Laskos wrote: Tue Aug 04, 2020 8:39 am
lkaufman wrote: Tue Aug 04, 2020 6:08 am
Frank Quisinsky wrote: Tue Aug 04, 2020 1:02 am Hi Larry,

and same situation we have in Winboard times.
All the winboard engines are not very strong in endgames on AMD K6-2 or AMD K6-3.

First program really strong in endgames are Shredder!

Stefan Meyer-Kahlen developed Shredder 3.0 for Winboard too (a secret mission).
I tested WB Shredder 3 vs. the older Crafty, Nimzo or Zarkov version.
No of the strong WB engines have a chance in endgames.

You are the clearly stronger player Larry but in my humble opinion Elo for chess computers or for the first PC programs are not possible from human view.

If you have programs, comes with ...
1.900 Elo after opening book moves
2.450-2.550 in earlier middlegames
2.350-2.450 in late middlgames.
2.100 Elo for transposition into endgame
1.700 Elo for endgames

Is this a very big problem.

The same problem we have today with strongest chess software.

Stockfish:
2.700 Elo after opening book moves
2.900 Elo in ealier middlegames (the only chance for strongest player for draw)
3.300 Elo in late middlgames
3.700 Elo in transposition into endgame
3.200 Elo in endgame

So what for a rating we should give such an engine?

Best
Frank
Given this definition, are the CCRL 40/15 ratings of engines within the human range too low, too high, or just right on the specified hardware with a time limit of say 45' + 15" increment for engine and human? I don't know the answer, but I hope I have clarified the question!
I would say that in 2500-2700 range CCRL 40/15 engine ratings are very comparable to human ratings for 45' + 15'' or even better 90' + 30'' tc. For much lower ratings, I went to extreme of Micro-Max of HGM of 1900 CCRL rating. I actually played this minimalistic engine. I usually beat it, often because it's deterministic, and I just repeat the game up to a point. So, its human rating should be no more than 1500. I think many engines in this rating range are either mildly buggy or "exploitable" by humans, so their CCRL ratings compared to humans are somewhat inflated. CEGT list is probably better in 1900-2300 Elo range. But CEGT is deflated by some 200 Elo points in the 2600-2700 FIDE Elo range.
OK, so you are saying that CCRL 2600 (roughly CEGT 2400) = FIDE 2600, and that CCRL 2300 (roughly CEGT 2100) = FIDE 2100. So this means that the engine rating lists substantially UNDERESTIMATE rating differences in human terms!! Both you and I have said the exact opposite many times, I think you estimate something like 300 elo gap on engine list = 200 gap on FIDE; here we have 300 elo gap on engine list = 500 gap on FIDE !! So, am I missing something here? Is this correct, and if so how can we explain such an incredible gap between theory (200 gap) and reality (500 gap)?
I have the example of Micro-Max of 1900 CCRL which is obviously less than FIDE 1900. I am not sure how generalizable is that. Yes, 2600 CCRL 40/15 is indeed FIDE 2600 40/15, or better to say both translated to 45' + 30''. So, FIDE ratings strangely seem dilated compared to CCRL ratings in the 1900-2600 CCRL interval. I am not sure what to make of this. The dilation factor of 1.5 of computer ratings (CEGT with Ordo) to human ratings I think is true for 2600-3500 Elo interval. I never estimated well low comparative ratings of less than 2000 FIDE and CCRL. But some of the CCRL ratings of even buggy engines seem rather high, their faults are exploitable even by weak humans.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Vintage .... Rating List Winboard from June 1999 (16:42)

Post by lkaufman »

Laskos wrote: Tue Aug 04, 2020 5:14 pm
lkaufman wrote: Tue Aug 04, 2020 4:46 pm
Laskos wrote: Tue Aug 04, 2020 8:39 am
lkaufman wrote: Tue Aug 04, 2020 6:08 am
Frank Quisinsky wrote: Tue Aug 04, 2020 1:02 am Hi Larry,

and same situation we have in Winboard times.
All the winboard engines are not very strong in endgames on AMD K6-2 or AMD K6-3.

First program really strong in endgames are Shredder!

Stefan Meyer-Kahlen developed Shredder 3.0 for Winboard too (a secret mission).
I tested WB Shredder 3 vs. the older Crafty, Nimzo or Zarkov version.
No of the strong WB engines have a chance in endgames.

You are the clearly stronger player Larry but in my humble opinion Elo for chess computers or for the first PC programs are not possible from human view.

If you have programs, comes with ...
1.900 Elo after opening book moves
2.450-2.550 in earlier middlegames
2.350-2.450 in late middlgames.
2.100 Elo for transposition into endgame
1.700 Elo for endgames

Is this a very big problem.

The same problem we have today with strongest chess software.

Stockfish:
2.700 Elo after opening book moves
2.900 Elo in ealier middlegames (the only chance for strongest player for draw)
3.300 Elo in late middlgames
3.700 Elo in transposition into endgame
3.200 Elo in endgame

So what for a rating we should give such an engine?

Best
Frank
Given this definition, are the CCRL 40/15 ratings of engines within the human range too low, too high, or just right on the specified hardware with a time limit of say 45' + 15" increment for engine and human? I don't know the answer, but I hope I have clarified the question!
I would say that in 2500-2700 range CCRL 40/15 engine ratings are very comparable to human ratings for 45' + 15'' or even better 90' + 30'' tc. For much lower ratings, I went to extreme of Micro-Max of HGM of 1900 CCRL rating. I actually played this minimalistic engine. I usually beat it, often because it's deterministic, and I just repeat the game up to a point. So, its human rating should be no more than 1500. I think many engines in this rating range are either mildly buggy or "exploitable" by humans, so their CCRL ratings compared to humans are somewhat inflated. CEGT list is probably better in 1900-2300 Elo range. But CEGT is deflated by some 200 Elo points in the 2600-2700 FIDE Elo range.
OK, so you are saying that CCRL 2600 (roughly CEGT 2400) = FIDE 2600, and that CCRL 2300 (roughly CEGT 2100) = FIDE 2100. So this means that the engine rating lists substantially UNDERESTIMATE rating differences in human terms!! Both you and I have said the exact opposite many times, I think you estimate something like 300 elo gap on engine list = 200 gap on FIDE; here we have 300 elo gap on engine list = 500 gap on FIDE !! So, am I missing something here? Is this correct, and if so how can we explain such an incredible gap between theory (200 gap) and reality (500 gap)?
I have the example of Micro-Max of 1900 CCRL which is obviously less than FIDE 1900. I am not sure how generalizable is that. Yes, 2600 CCRL 40/15 is indeed FIDE 2600 40/15, or better to say both translated to 45' + 30''. So, FIDE ratings strangely seem dilated compared to CCRL ratings in the 1900-2600 CCRL interval. I am not sure what to make of this. The dilation factor of 1.5 of computer ratings (CEGT with Ordo) to human ratings I think is true for 2600-3500 Elo interval. I never estimated well low comparative ratings of less than 2000 FIDE and CCRL. But some of the CCRL ratings of even buggy engines seem rather high, their faults are exploitable even by weak humans.
I don't think it is meaningful to talk about comparison with human ratings above 2900, because there are none. Worse, as far as I know there are no games on record between engines and humans rated much over 2800 (Kasp and Kramnik were around 2800 when they played the engines), so it seems that you are really just saying that for humans in the very narrow range 2600 to 2800 FIDE the engine ratings overstate the differences. That seems to be almost impossible to verify, except based on games played decades ago. I'm not too concerned about very weak, buggy engines, but engines that play without obvious bugs and have ratings over 2000 should be easily tested for accuracy of their ratings vs. humans. My suspicion is that the engine ratings needed to be compressed to match human ratings twenty years ago, but that human FIDE ratings have now spread out to the point where that is no longer true. 2200 was once the minimum FIDE rating, now it is 1000. They were artificially compressed in the past, no longer the case. Is there any current indication that FIDE=CEGT + constant isn't a reasonably accurate formula now for players in the 2000 to 2900 human range? If so, what is the best estimate of the constant?
Komodo rules!
User avatar
silentshark
Posts: 332
Joined: Sat Mar 27, 2010 7:15 pm

Re: Vintage .... Rating List Winboard from June 1999 (16:42)

Post by silentshark »

Frank Quisinsky wrote: Mon Aug 03, 2020 1:43 pm [stuff snipped]

Code: Select all

*****************************************************************************
WB KSQ                  --> 3041 Games <--                     16:42 09.06.99
*****************************************************************************
   Kai Skibbe (Hamburg), Christian Koch (Hamburg), Frank Quisinsky (Trier)
40 moves / 40 minutes, ponder = off, AMD K6-2 333/400 MHz, Celeron 450 MHz !!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
for Fritz 5-32 ELO calculation,                  58.750 : 25 = 2350 (  0 ELO)
*****************************************************************************
01. Zarkov              4.5e-4.5g            2516 ELO   284 Games   USA  2525
02. Crafty              15.18-16.10          2480 ELO   422 Games   USA  2575
03. Comet               A95-B03              2441 ELO   416 Games   GER  2450
03. Phalanx             17-21                2441 ELO   392 Games   TCH  2450
05. Voyager             2.29-5.03            2430 ELO   326 Games   SUI  2425
06. Nimzo               2000                 2425 ELO   190 Games   AUT  2425
07. Bionic Impakt       4.01                 2424 ELO   190 Games   BEL  2425
08. Patzer              2.99zp-3.0           2409 ELO   424 Games   GER  2400
09. Gromit              2.11x-2.16           2400 ELO   293 Games   GER  2400
10. AnMon               4.09-4.22            2392 ELO   232 Games   FRA  2400
11. ZChess              1.2                  2374 ELO   180 Games   FRA  2375
12. Francesca           0.63-0.68c           2373 ELO   242 Games   ENG  2375
13. The Crazy Bishop    37-43                2358 ELO   358 Games   FRA  2350
14. Little Goliath      1.05-1.41a           2350 ELO   316 Games   GER  2350
15. Bringer             1.2-1.4      !PLAY!  2340 ELO    81 Games   GER  2325
16. Arasan              5.1-5.1a     !NEW!   2324 ELO   170 Games   USA  2325
17. Ant                 3.42-3.61            2281 ELO   186 Games   NDL  2275
18. LambChop            6.9-7.1              2270 ELO   180 Games   ZEA  2275
19. Stobor              B.32-B.56            2256 ELO   218 Games   USA  2250
20. Dragon              3.11         !NEW!   2199 ELO   172 Games   FRA  2200
21. ExChess             2.46-2.51            2187 ELO   250 Games   USA  2175
22. La Dame Blanche     2.0-2.0c     !NEW!   2122 ELO   120 Games   FRA  2125
*****************************************************************************
--. Bionic Impakt       4.11                 2361 ELO   110 Games   BEL  2375
--. Gromit              2.0-2.1              2288 ELO   106 Games   GER  2300
--. ZChess              0.92-1.0             2288 ELO   224 Games   FRA  2300
*****************************************************************************
Nice.. how many of these engines are still being developed, I wonder. This was back in the day when my engine was probably stronger than Arasan! Times have changed. Pity some of these interesting engines are no more
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Vintage .... Rating List Winboard from June 1999 (16:42)

Post by Laskos »

lkaufman wrote: Tue Aug 04, 2020 5:37 pm
Laskos wrote: Tue Aug 04, 2020 5:14 pm
lkaufman wrote: Tue Aug 04, 2020 4:46 pm
OK, so you are saying that CCRL 2600 (roughly CEGT 2400) = FIDE 2600, and that CCRL 2300 (roughly CEGT 2100) = FIDE 2100. So this means that the engine rating lists substantially UNDERESTIMATE rating differences in human terms!! Both you and I have said the exact opposite many times, I think you estimate something like 300 elo gap on engine list = 200 gap on FIDE; here we have 300 elo gap on engine list = 500 gap on FIDE !! So, am I missing something here? Is this correct, and if so how can we explain such an incredible gap between theory (200 gap) and reality (500 gap)?
I have the example of Micro-Max of 1900 CCRL which is obviously less than FIDE 1900. I am not sure how generalizable is that. Yes, 2600 CCRL 40/15 is indeed FIDE 2600 40/15, or better to say both translated to 45' + 30''. So, FIDE ratings strangely seem dilated compared to CCRL ratings in the 1900-2600 CCRL interval. I am not sure what to make of this. The dilation factor of 1.5 of computer ratings (CEGT with Ordo) to human ratings I think is true for 2600-3500 Elo interval. I never estimated well low comparative ratings of less than 2000 FIDE and CCRL. But some of the CCRL ratings of even buggy engines seem rather high, their faults are exploitable even by weak humans.
I don't think it is meaningful to talk about comparison with human ratings above 2900, because there are none. Worse, as far as I know there are no games on record between engines and humans rated much over 2800 (Kasp and Kramnik were around 2800 when they played the engines), so it seems that you are really just saying that for humans in the very narrow range 2600 to 2800 FIDE the engine ratings overstate the differences.
No, engines also overstate engines' Elo advantage of 3300-3500 CCRL engines compared to 2700 FIDE (and by equivalence in this range, CCRL) humans. That was the main range of application of 1.5 factor rule. I thought that it is also applicable to lower than 2500 CCRL ratings, but it seems to not be the case.
That seems to be almost impossible to verify, except based on games played decades ago. I'm not too concerned about very weak, buggy engines, but engines that play without obvious bugs and have ratings over 2000 should be easily tested for accuracy of their ratings vs. humans. My suspicion is that the engine ratings needed to be compressed to match human ratings twenty years ago, but that human FIDE ratings have now spread out to the point where that is no longer true. 2200 was once the minimum FIDE rating, now it is 1000. They were artificially compressed in the past, no longer the case. Is there any current indication that FIDE=CEGT + constant isn't a reasonably accurate formula now for players in the 2000 to 2900 human range? If so, what is the best estimate of the constant?
Yes, that were SSDF ratings in early 2000s and earlier ratings of early 1990s, they always needed engine ratings compression to fit into the FIDE ratings when mainly engine-engine games were used. The ratings of engines were usually in low-mid 2000s (FIDE) Elo.

About the CCRL and CEGT comparisons, the things seem fairly clear: at the top (3400 ratings), the lists are the same. In the 2000 CEGT region, the CCRL ratings are about 220 Elo points higher than CEGT ratings. And the rest of the ratings are linear extrapolation using these 2 datapoints. But FIDE is not CEGT + constant, constant is not a constant. For ratings in low 2000s CEGT might be fairly similar to FIDE, but at CEGT 2500, FIDE is about 2650 or so (close to CCRL) and then to very high ratings above 2800 CEGT, the compression of CEGT ratings begins. Comparing engine self-ratings to FIDE human self-ratings seems always nasty and nonlinear.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Vintage .... Rating List Winboard from June 1999 (16:42)

Post by lkaufman »

Laskos wrote: Tue Aug 04, 2020 6:39 pm
lkaufman wrote: Tue Aug 04, 2020 5:37 pm
Laskos wrote: Tue Aug 04, 2020 5:14 pm
lkaufman wrote: Tue Aug 04, 2020 4:46 pm
OK, so you are saying that CCRL 2600 (roughly CEGT 2400) = FIDE 2600, and that CCRL 2300 (roughly CEGT 2100) = FIDE 2100. So this means that the engine rating lists substantially UNDERESTIMATE rating differences in human terms!! Both you and I have said the exact opposite many times, I think you estimate something like 300 elo gap on engine list = 200 gap on FIDE; here we have 300 elo gap on engine list = 500 gap on FIDE !! So, am I missing something here? Is this correct, and if so how can we explain such an incredible gap between theory (200 gap) and reality (500 gap)?
I have the example of Micro-Max of 1900 CCRL which is obviously less than FIDE 1900. I am not sure how generalizable is that. Yes, 2600 CCRL 40/15 is indeed FIDE 2600 40/15, or better to say both translated to 45' + 30''. So, FIDE ratings strangely seem dilated compared to CCRL ratings in the 1900-2600 CCRL interval. I am not sure what to make of this. The dilation factor of 1.5 of computer ratings (CEGT with Ordo) to human ratings I think is true for 2600-3500 Elo interval. I never estimated well low comparative ratings of less than 2000 FIDE and CCRL. But some of the CCRL ratings of even buggy engines seem rather high, their faults are exploitable even by weak humans.
I don't think it is meaningful to talk about comparison with human ratings above 2900, because there are none. Worse, as far as I know there are no games on record between engines and humans rated much over 2800 (Kasp and Kramnik were around 2800 when they played the engines), so it seems that you are really just saying that for humans in the very narrow range 2600 to 2800 FIDE the engine ratings overstate the differences.
No, engines also overstate engines' Elo advantage of 3300-3500 CCRL engines compared to 2700 FIDE (and by equivalence in this range, CCRL) humans. That was the main range of application of 1.5 factor rule. I thought that it is also applicable to lower than 2500 CCRL ratings, but it seems to not be the case.
That seems to be almost impossible to verify, except based on games played decades ago. I'm not too concerned about very weak, buggy engines, but engines that play without obvious bugs and have ratings over 2000 should be easily tested for accuracy of their ratings vs. humans. My suspicion is that the engine ratings needed to be compressed to match human ratings twenty years ago, but that human FIDE ratings have now spread out to the point where that is no longer true. 2200 was once the minimum FIDE rating, now it is 1000. They were artificially compressed in the past, no longer the case. Is there any current indication that FIDE=CEGT + constant isn't a reasonably accurate formula now for players in the 2000 to 2900 human range? If so, what is the best estimate of the constant?
Yes, that were SSDF ratings in early 2000s and earlier ratings of early 1990s, they always needed engine ratings compression to fit into the FIDE ratings when mainly engine-engine games were used. The ratings of engines were usually in low-mid 2000s (FIDE) Elo.

About the CCRL and CEGT comparisons, the things seem fairly clear: at the top (3400 ratings), the lists are the same. In the 2000 CEGT region, the CCRL ratings are about 220 Elo points higher than CEGT ratings. And the rest of the ratings are linear extrapolation using these 2 datapoints. But FIDE is not CEGT + constant, constant is not a constant. For ratings in low 2000s CEGT might be fairly similar to FIDE, but at CEGT 2500, FIDE is about 2650 or so (close to CCRL) and then to very high ratings above 2800 CEGT, the compression of CEGT ratings begins. Comparing engine self-ratings to FIDE human self-ratings seems always nasty and nonlinear.
What I don't understand is how you can make any meaningful statements about equivalence or compression when comparing CEGT to FIDE above 2800, when there are only two players in the world over 2800 FIDE, and neither has played any games on record vs. engines? I suppose you could be saying that a 3200 CEGT engine would not be able to score the required 91% vs. 2800 humans, and that may be true, but how would we know this? In short, how can you project anything beyond 2800 relating to human ratings? I did so myself in the past by making the assumption that the observed compression needed to fit engine ratings into the FIDE system twenty or thirty years ago was linear and would extrapolate beyond 2800, but it seems that the compression was an artifact of the old FIDE floor and that there is no evidence for compression at lower levels now, so nothing to extrapolate to beyond 2800.
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Vintage .... Rating List Winboard from June 1999 (16:42)

Post by Laskos »

lkaufman wrote: Tue Aug 04, 2020 7:28 pm
Laskos wrote: Tue Aug 04, 2020 6:39 pm
lkaufman wrote: Tue Aug 04, 2020 5:37 pm
Laskos wrote: Tue Aug 04, 2020 5:14 pm
lkaufman wrote: Tue Aug 04, 2020 4:46 pm
OK, so you are saying that CCRL 2600 (roughly CEGT 2400) = FIDE 2600, and that CCRL 2300 (roughly CEGT 2100) = FIDE 2100. So this means that the engine rating lists substantially UNDERESTIMATE rating differences in human terms!! Both you and I have said the exact opposite many times, I think you estimate something like 300 elo gap on engine list = 200 gap on FIDE; here we have 300 elo gap on engine list = 500 gap on FIDE !! So, am I missing something here? Is this correct, and if so how can we explain such an incredible gap between theory (200 gap) and reality (500 gap)?
I have the example of Micro-Max of 1900 CCRL which is obviously less than FIDE 1900. I am not sure how generalizable is that. Yes, 2600 CCRL 40/15 is indeed FIDE 2600 40/15, or better to say both translated to 45' + 30''. So, FIDE ratings strangely seem dilated compared to CCRL ratings in the 1900-2600 CCRL interval. I am not sure what to make of this. The dilation factor of 1.5 of computer ratings (CEGT with Ordo) to human ratings I think is true for 2600-3500 Elo interval. I never estimated well low comparative ratings of less than 2000 FIDE and CCRL. But some of the CCRL ratings of even buggy engines seem rather high, their faults are exploitable even by weak humans.
I don't think it is meaningful to talk about comparison with human ratings above 2900, because there are none. Worse, as far as I know there are no games on record between engines and humans rated much over 2800 (Kasp and Kramnik were around 2800 when they played the engines), so it seems that you are really just saying that for humans in the very narrow range 2600 to 2800 FIDE the engine ratings overstate the differences.
No, engines also overstate engines' Elo advantage of 3300-3500 CCRL engines compared to 2700 FIDE (and by equivalence in this range, CCRL) humans. That was the main range of application of 1.5 factor rule. I thought that it is also applicable to lower than 2500 CCRL ratings, but it seems to not be the case.
That seems to be almost impossible to verify, except based on games played decades ago. I'm not too concerned about very weak, buggy engines, but engines that play without obvious bugs and have ratings over 2000 should be easily tested for accuracy of their ratings vs. humans. My suspicion is that the engine ratings needed to be compressed to match human ratings twenty years ago, but that human FIDE ratings have now spread out to the point where that is no longer true. 2200 was once the minimum FIDE rating, now it is 1000. They were artificially compressed in the past, no longer the case. Is there any current indication that FIDE=CEGT + constant isn't a reasonably accurate formula now for players in the 2000 to 2900 human range? If so, what is the best estimate of the constant?
Yes, that were SSDF ratings in early 2000s and earlier ratings of early 1990s, they always needed engine ratings compression to fit into the FIDE ratings when mainly engine-engine games were used. The ratings of engines were usually in low-mid 2000s (FIDE) Elo.

About the CCRL and CEGT comparisons, the things seem fairly clear: at the top (3400 ratings), the lists are the same. In the 2000 CEGT region, the CCRL ratings are about 220 Elo points higher than CEGT ratings. And the rest of the ratings are linear extrapolation using these 2 datapoints. But FIDE is not CEGT + constant, constant is not a constant. For ratings in low 2000s CEGT might be fairly similar to FIDE, but at CEGT 2500, FIDE is about 2650 or so (close to CCRL) and then to very high ratings above 2800 CEGT, the compression of CEGT ratings begins. Comparing engine self-ratings to FIDE human self-ratings seems always nasty and nonlinear.
What I don't understand is how you can make any meaningful statements about equivalence or compression when comparing CEGT to FIDE above 2800, when there are only two players in the world over 2800 FIDE, and neither has played any games on record vs. engines? I suppose you could be saying that a 3200 CEGT engine would not be able to score the required 91% vs. 2800 humans, and that may be true, but how would we know this? In short, how can you project anything beyond 2800 relating to human ratings? I did so myself in the past by making the assumption that the observed compression needed to fit engine ratings into the FIDE system twenty or thirty years ago was linear and would extrapolate beyond 2800, but it seems that the compression was an artifact of the old FIDE floor and that there is no evidence for compression at lower levels now, so nothing to extrapolate to beyond 2800.
We were still doing these extrapolations when calculating the odds needed for 2 pawns or for a Knight. At least I was doing, assuming that Komodo FIDE rating even on 64 cores in lower than CCRL or CEGT 4 core 3400 rating. On 64 cores we should have assumed some 3600 FIDE rating, right? We never assumed such a rating. We have indirect evidence of some human interference with top engines, especially in mild handicap games.

I don't believe that compression is gone now across the range. It was there in 2500-2700 FIDE Elo range on constantly adjusted to FIDE SSDF rating lists, it was there when these new rating lists appeared. Do you think that now, reaching some 3500 ratings, CEGT and CCRL stopped dilating? Yes, I can be wrong and have no proof, but I will stick by my reasonable hunch.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Vintage .... Rating List Winboard from June 1999 (16:42)

Post by lkaufman »

Laskos wrote: Tue Aug 04, 2020 7:53 pm
lkaufman wrote: Tue Aug 04, 2020 7:28 pm
Laskos wrote: Tue Aug 04, 2020 6:39 pm
lkaufman wrote: Tue Aug 04, 2020 5:37 pm
Laskos wrote: Tue Aug 04, 2020 5:14 pm
lkaufman wrote: Tue Aug 04, 2020 4:46 pm
OK, so you are saying that CCRL 2600 (roughly CEGT 2400) = FIDE 2600, and that CCRL 2300 (roughly CEGT 2100) = FIDE 2100. So this means that the engine rating lists substantially UNDERESTIMATE rating differences in human terms!! Both you and I have said the exact opposite many times, I think you estimate something like 300 elo gap on engine list = 200 gap on FIDE; here we have 300 elo gap on engine list = 500 gap on FIDE !! So, am I missing something here? Is this correct, and if so how can we explain such an incredible gap between theory (200 gap) and reality (500 gap)?
I have the example of Micro-Max of 1900 CCRL which is obviously less than FIDE 1900. I am not sure how generalizable is that. Yes, 2600 CCRL 40/15 is indeed FIDE 2600 40/15, or better to say both translated to 45' + 30''. So, FIDE ratings strangely seem dilated compared to CCRL ratings in the 1900-2600 CCRL interval. I am not sure what to make of this. The dilation factor of 1.5 of computer ratings (CEGT with Ordo) to human ratings I think is true for 2600-3500 Elo interval. I never estimated well low comparative ratings of less than 2000 FIDE and CCRL. But some of the CCRL ratings of even buggy engines seem rather high, their faults are exploitable even by weak humans.
I don't think it is meaningful to talk about comparison with human ratings above 2900, because there are none. Worse, as far as I know there are no games on record between engines and humans rated much over 2800 (Kasp and Kramnik were around 2800 when they played the engines), so it seems that you are really just saying that for humans in the very narrow range 2600 to 2800 FIDE the engine ratings overstate the differences.
No, engines also overstate engines' Elo advantage of 3300-3500 CCRL engines compared to 2700 FIDE (and by equivalence in this range, CCRL) humans. That was the main range of application of 1.5 factor rule. I thought that it is also applicable to lower than 2500 CCRL ratings, but it seems to not be the case.
That seems to be almost impossible to verify, except based on games played decades ago. I'm not too concerned about very weak, buggy engines, but engines that play without obvious bugs and have ratings over 2000 should be easily tested for accuracy of their ratings vs. humans. My suspicion is that the engine ratings needed to be compressed to match human ratings twenty years ago, but that human FIDE ratings have now spread out to the point where that is no longer true. 2200 was once the minimum FIDE rating, now it is 1000. They were artificially compressed in the past, no longer the case. Is there any current indication that FIDE=CEGT + constant isn't a reasonably accurate formula now for players in the 2000 to 2900 human range? If so, what is the best estimate of the constant?
Yes, that were SSDF ratings in early 2000s and earlier ratings of early 1990s, they always needed engine ratings compression to fit into the FIDE ratings when mainly engine-engine games were used. The ratings of engines were usually in low-mid 2000s (FIDE) Elo.

About the CCRL and CEGT comparisons, the things seem fairly clear: at the top (3400 ratings), the lists are the same. In the 2000 CEGT region, the CCRL ratings are about 220 Elo points higher than CEGT ratings. And the rest of the ratings are linear extrapolation using these 2 datapoints. But FIDE is not CEGT + constant, constant is not a constant. For ratings in low 2000s CEGT might be fairly similar to FIDE, but at CEGT 2500, FIDE is about 2650 or so (close to CCRL) and then to very high ratings above 2800 CEGT, the compression of CEGT ratings begins. Comparing engine self-ratings to FIDE human self-ratings seems always nasty and nonlinear.
What I don't understand is how you can make any meaningful statements about equivalence or compression when comparing CEGT to FIDE above 2800, when there are only two players in the world over 2800 FIDE, and neither has played any games on record vs. engines? I suppose you could be saying that a 3200 CEGT engine would not be able to score the required 91% vs. 2800 humans, and that may be true, but how would we know this? In short, how can you project anything beyond 2800 relating to human ratings? I did so myself in the past by making the assumption that the observed compression needed to fit engine ratings into the FIDE system twenty or thirty years ago was linear and would extrapolate beyond 2800, but it seems that the compression was an artifact of the old FIDE floor and that there is no evidence for compression at lower levels now, so nothing to extrapolate to beyond 2800.
We were still doing these extrapolations when calculating the odds needed for 2 pawns or for a Knight. At least I was doing, assuming that Komodo FIDE rating even on 64 cores in lower than CCRL or CEGT 4 core 3400 rating. On 64 cores we should have assumed some 3600 FIDE rating, right? We never assumed such a rating. We have indirect evidence of some human interference with top engines, especially in mild handicap games.

I don't believe that compression is gone now across the range. It was there in 2500-2700 FIDE Elo range on constantly adjusted to FIDE SSDF rating lists, it was there when these new rating lists appeared. Do you think that now, reaching some 3500 ratings, CEGT and CCRL stopped dilating? Yes, I can be wrong and have no proof, but I will stick by my reasonable hunch.
OK, I'm not saying you are wrong, I just wondered if you had any evidence from engine vs human games in the past decade that there is still compression. Regarding estimating ratings beyond 2800, I had one insight recently. Beyond a certain level, maybe 3000 or so, when the rating difference approaches 200 points, the points scored by the weak player are almost solely draws. So at the top end, draw odds approaches 191 elo points, the elo gap for half draws and half wins. But we know from our tests that certain handicaps (especially NBSC) are equivalent to draw odds, so such handicaps could be used for any opponents, engine or human, and given that elo value (or nearly that value). We have found that this doesn't depend on time control or engine ratings to a measurable degree. Similarly somewhat larger handicaps, such as f7 pawn, should have a fairly constant elo value once we are in this range where draw odds has a constant value. By measuring this handicap value, using CEGT or fastgm ratings (because they use ordo and not bayeselo) and running the games at the proper time control between 3000+ engines, we should have a reliable elo value for f7 handicap, and any engine that can score 50% giving f7 to another engine should deserve whatever rating that would imply at the time control of the games played. It probably breaks down if you go to knight odds, which appears to be something like 1100 or 1200 elo by this method, which may or may not be realistic. But for moderate handicaps, it should be pretty accurate.
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Vintage .... Rating List Winboard from June 1999 (16:42)

Post by Laskos »

lkaufman wrote: Tue Aug 04, 2020 8:29 pm
Current SSDF rating list looks interesting, and I think they use something similar to ELOStat.
https://ssdf.bosjo.net/list.htm

They are veterans of 2000-2700 Elo ratings of top engines compared to humans, and the lower part of the table should be fairly accurate with respect to human ratings.

Here are some 165 engines listed using their database and ELOStat.

Code: Select all


    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Stockfish 11 x64 1800X         : 3447   27  25   330    76.4 %   3243   47.3 %
  2 Stockfish 10 x64 1800X         : 3415   18  17   640    72.8 %   3243   52.8 %
  3 Stockfish 8 MP 1800X           : 3372   20  20   780    81.7 %   3113   35.9 %
  4 Stockfish 9 x64 1800X          : 3367   16  15   802    70.6 %   3215   55.0 %
  5 Komodo 13.1 x64 1800X          : 3358   24  23   280    67.1 %   3234   62.1 %
  6 Komodo 11.01 MP 1800X          : 3333   18  18   814    77.4 %   3119   42.8 %
  7 Stockfish 9 x64 Q6600          : 3330   19  18   440    57.3 %   3279   66.4 %
  8 Stockfish 8 x64 1800X          : 3323   18  18   500    60.8 %   3247   64.0 %
  9 Komodo 12.3 x64 Q6600          : 3323   24  23   320    62.3 %   3235   60.9 %
 10 Komodo 12.3 x64 1800X          : 3322   16  16   680    67.1 %   3198   58.7 %
 11 Deep Shredder 13 1800X         : 3311   18  18   680    72.0 %   3148   49.0 %
 12 Komodo 9.1 MP Q6600            : 3297   16  16  1018    74.1 %   3114   43.9 %
 13 Komodo 11.01 x64 1800X         : 3294   19  19   500    55.8 %   3253   60.4 %
 14 Stockfish 8 x64 Q6600          : 3291   24  24   440    70.2 %   3142   45.9 %
 15 Komodo 13.02 MCTS x64 1800X    : 3285   22  21   360    61.5 %   3203   62.5 %
 16 Stockfish 6 MP Q6600           : 3266   16  16  1016    72.3 %   3099   44.7 %
 17 Komodo 11.01 MP Q6600          : 3263   21  21   442    63.1 %   3170   56.6 %
 18 Booot 6.3.1 x64 1800X          : 3236   13  13   920    49.8 %   3237   64.2 %
 19 Deep Shredder 13 Q6600         : 3230   17  17   804    65.2 %   3121   50.5 %
 20 Komodo 7 MP Q6600              : 3223   17  17   692    59.3 %   3158   55.6 %
 21 Vajolet2 2.8 x64 1800X         : 3185   19  19   650    38.1 %   3270   50.6 %
 22 Komodo 5.1 MP Q6600            : 3178   16  15   894    60.2 %   3106   53.1 %
 23 Booot 6.3.1 x64 Q6600          : 3175   29  29   320    58.0 %   3119   42.2 %
 24 Arasan 21.2 x64 1800X          : 3158   19  19   600    38.2 %   3242   51.8 %
 25 Vajolet2 2.8 x64 Q6600         : 3157   30  29   320    73.4 %   2981   41.2 %
 26 Deep Hiarcs 14 1800X           : 3147   17  17   720    42.3 %   3201   52.9 %
 27 Stockfish 3 MP Q6600           : 3139   14  14  1147    56.6 %   3093   48.5 %
 28 Deep Rybka 4 Q6600             : 3134   15  15   974    58.7 %   3073   52.7 %
 29 Deep Hiarcs 14 Q6600           : 3122   13  13  1434    58.6 %   3061   49.9 %
 30 Chiron 3.01 MP Q6600           : 3116   18  19   616    47.1 %   3136   54.5 %
 31 Deep Rybka 3 Q6600             : 3110   16  16  1056    71.3 %   2952   42.6 %
 32 Wasp 3.5 x64 1800X             : 3105   20  20   600    32.2 %   3234   49.2 %
 33 Wasp 2.01 MP 1800X             : 3100   20  20   646    36.5 %   3196   46.6 %
 34 Wasp 3 x64 1800X               : 3096   17  17   762    40.9 %   3160   54.1 %
 35 Naum 4.2 MP Q6600              : 3071   14  14  1071    58.5 %   3011   53.0 %
 36 Wasp 3.5 x64 Q6600             : 3068   26  26   320    43.6 %   3113   53.4 %
 37 Deep Junior Yokohama Q6600     : 3056   18  18   848    37.3 %   3146   42.8 %
 38 Deep Junior 13.3 Q6600         : 3048   13  13  1317    48.6 %   3058   50.6 %
 39 Spike 1.4 MP Q6600             : 3037   13  13  1565    50.9 %   3031   46.8 %
 40 Naum 4 MP Q6600                : 3035   14  14  1267    58.8 %   2973   45.1 %
 41 Deep Shredder 12 Q6600         : 3032   16  16   832    51.1 %   3024   52.6 %
 42 Deep Fritz 13 Q6600            : 3032   18  18   645    47.1 %   3052   56.4 %
 43 Hiarcs 13.1 MP Q6600           : 3024   16  16   770    50.7 %   3019   55.7 %
 44 Deep Hiarcs 13.2 Q6600         : 3024   18  18   776    46.3 %   3050   45.9 %
 45 Wasp 2.01 x64 1800X            : 3023   25  26   400    25.9 %   3206   44.8 %
 46 Hiarcs 14 A1200                : 3021   23  23   520    57.9 %   2966   41.5 %
 47 Deep Fritz 12 Q6600            : 3014   14  14  1078    48.3 %   3026   55.4 %
 48 Wasp 2.01 MP Q6600             : 3010   26  26   482    22.1 %   3229   35.9 %
 49 Deep Junior 12 Q6600           : 2995   16  16   940    52.1 %   2980   50.4 %
 50 Zappa Mexico II Q6600          : 2987   17  17   938    52.2 %   2972   44.3 %
 51 Naum 3.1 MP Q6600              : 2981   18  18   912    38.5 %   3062   39.5 %
 52 Deep Fritz 11 Q6600            : 2972   13  13  1418    60.8 %   2896   45.5 %
 53 Rybka 3 A1200                  : 2969   29  29   282    46.8 %   2991   49.6 %
 54 The Baron 3.43 x64 1800X       : 2962   22  23   680    25.7 %   3146   32.9 %
 55 Crafty 25.0 MP Q6600           : 2960   19  20   804    35.1 %   3066   36.8 %
 56 The Baron 3.43 x64 Q6600       : 2952   29  29   400    49.0 %   2959   27.0 %
 57 Deep Hiarcs 12 Q6600           : 2942   17  17   922    45.3 %   2975   44.1 %
 58 Deep Shredder 11 Q6600         : 2930   16  16  1004    47.8 %   2946   42.5 %
 59 Naum 4 A1200                   : 2922   25  25   440    42.0 %   2978   42.3 %
 60 Arasan 17.2 MP Q6600           : 2918   20  20   685    45.2 %   2952   42.5 %
 61 Hiarcs 11.2 MP Q6600           : 2917   16  16   980    50.9 %   2910   48.3 %
 62 Arasan 16 MP Q6600             : 2914   21  21   604    38.9 %   2993   43.7 %
 63 Shredder 12 A1200              : 2910   24  24   520    36.5 %   3006   38.1 %
 64 Fritz 13 A1200                 : 2897   32  32   280    65.2 %   2788   39.6 %
 65 Glaurung 2.2 MP Q6600          : 2897   17  17  1002    51.3 %   2888   34.5 %
 66 Deep Junior 10.1 Q6600         : 2885   19  19   846    46.8 %   2908   37.4 %
 67 Wasp 2.01 A1200                : 2872   23  23   560    53.8 %   2845   38.8 %
 68 Fritz 12 A1200                 : 2854   18  18   860    61.7 %   2771   41.7 %
 69 Rybka 2.3.1 A1200              : 2840   21  21   612    49.7 %   2842   39.5 %
 70 Jonny 4 MP Q6600               : 2821   20  20   860    29.7 %   2970   31.3 %
 71 Rybka 1.2 A1200                : 2818   24  24   535    63.7 %   2720   37.4 %
 72 Deep Fritz 8 Q6600             : 2809   19  19   786    40.1 %   2879   37.9 %
 73 Shredder 8 MP Q6600            : 2807   19  19   824    38.9 %   2886   36.8 %
 74 Hiarcs 11.1 UCI A1200          : 2782   25  25   362    56.5 %   2737   50.6 %
 75 CM King 3.5 MP Q6600           : 2779   19  19   932    29.6 %   2929   32.8 %
 76 Deep Junior 8 Q6600            : 2775   25  25   526    33.1 %   2897   32.3 %
 77 Hiarcs 11.1 A1200              : 2771   30  31   325    25.7 %   2956   39.1 %
 78 Junior 10.1 A1200              : 2750   22  22   679    50.6 %   2745   27.8 %
 79 Junior 10 A1200                : 2736   26  26   497    49.4 %   2740   29.6 %
 80 Zap!Chess Zanzibar A1200       : 2729   17  17  1038    46.9 %   2751   32.5 %
 81 Hiarcs 10 HypMod A1200         : 2728   18  18  1016    65.9 %   2614   34.6 %
 82 Shredder 10 UCI A1200          : 2722   19  19   867    55.8 %   2682   34.0 %
 83 Pro Deo 2.1 YAT A1200          : 2721   27  26   400    59.4 %   2655   40.2 %
 84 Shredder 8 A1200               : 2711   21  21   743    59.6 %   2643   31.2 %
 85 Fritz 9 A1200                  : 2710   19  19   835    51.3 %   2701   35.8 %
 86 Fruit 2.2.1 A1200              : 2709   18  18   940    59.8 %   2640   35.1 %
 87 Spike 1.2 A1200                : 2705   29  29   352    45.7 %   2735   38.1 %
 88 Shredder 9 UCI A1200           : 2704   16  16  1238    65.3 %   2595   32.1 %
 89 Pro Deo 2.0 A1200              : 2698   23  23   520    46.0 %   2726   39.6 %
 90 Shredder 7.04 UCI A1200        : 2692   22  22   635    64.6 %   2588   35.1 %
 91 Deep Fritz 8 A1200             : 2686   24  24   532    44.5 %   2724   32.3 %
 92 Junior 8 A1200                 : 2683   27  27   438    54.7 %   2650   30.8 %
 93 Junior 9 A1200                 : 2678   23  23   565    57.5 %   2625   34.3 %
 94 Chess Tiger 2007 A1200         : 2672   25  26   450    36.8 %   2766   38.9 %
 95 Deep Junior 8 A1200            : 2663   30  30   385    61.0 %   2585   29.6 %
 96 Shredder 7 A1200               : 2663   29  29   407    65.5 %   2551   29.2 %
 97 Pro Deo 1.86 A1200             : 2660   23  23   600    36.6 %   2755   32.8 %
 98 Spike 1.1 A1200                : 2659   27  27   400    50.9 %   2653   35.8 %
 99 Deep Fritz 7 A1200             : 2652   24  24   542    63.8 %   2554   36.5 %
100 Pro Deo 1.82 A1200             : 2650   25  25   440    44.3 %   2690   40.5 %
101 Fritz 8 A1200                  : 2642   20  20   825    51.8 %   2630   32.2 %
102 Fritz 7 A1200                  : 2631   26  26   400    52.5 %   2613   40.5 %
103 Gambit Tiger 2 A1200           : 2630   21  21   644    48.7 %   2639   38.7 %
104 Hiarcs 9 A1200                 : 2619   30  30   430    29.4 %   2771   26.7 %
105 Gandalf 6 A1200                : 2613   22  22   627    46.7 %   2636   36.5 %
106 Shredder 6 Pad UCI A1200       : 2610   23  23   569    56.4 %   2565   34.8 %
107 Shredder 6 A1200               : 2606   32  32   280    52.1 %   2591   37.1 %
108 Chess Tiger 15 A1200           : 2605   17  17   870    49.8 %   2606   45.1 %
109 Chess Tiger 2004 A1200         : 2603   19  19   712    55.5 %   2565   43.0 %
110 Pro Deo 1.1 A1200              : 2603   21  21   716    54.5 %   2572   34.4 %
111 Junior 7 A1200                 : 2602   21  21   701    49.5 %   2606   35.1 %
112 Deep Fritz A1200               : 2599   21  21   684    47.2 %   2619   36.8 %
113 Chess Tiger 14 CB A1200        : 2599   22  22   579    54.1 %   2570   40.4 %
114 Rebel 12 A1200                 : 2591   30  30   335    42.7 %   2643   35.2 %
115 Ruffian 1.0.1 A1200            : 2577   20  20   729    44.9 %   2612   35.0 %
116 Rebel Century 4 A1200          : 2567   26  26   448    58.1 %   2510   35.9 %
117 Hiarcs 8 A1200                 : 2563   24  24   529    46.6 %   2587   33.8 %
118 Deep Sjeng 1.5a A1200          : 2562   32  32   301    44.9 %   2598   35.2 %
119 Pocket Shredder Ipaq 114       : 2559   31  31   280    54.3 %   2529   41.4 %
120 Deep Fritz K6-2 450            : 2553   24  24   570    58.7 %   2492   29.3 %
121 Deep Fritz 7 K6-2 450          : 2553   32  32   282    41.1 %   2615   39.7 %
122 Shredder 5.32 A1200            : 2547   19  19   828    44.0 %   2589   36.5 %
123 Gandalf 4.32h A1200            : 2545   29  29   358    45.3 %   2578   36.9 %
124 Gandalf 5 A1200                : 2532   30  30   284    40.3 %   2600   44.7 %
125 Gambit Tiger 2 K6-2 450        : 2526   33  33   280    38.4 %   2608   35.4 %
126 Fritz 6 K6-2 450               : 2524   22  22   673    63.2 %   2430   34.6 %
127 Crafty 18.12 CB A1200          : 2504   25  25   468    42.4 %   2557   36.1 %
128 Gandalf 5.1 A1200              : 2497   28  28   376    45.7 %   2527   37.8 %
129 Shredder 5.32 K6-2 450         : 2488   30  30   366    38.7 %   2569   32.0 %
130 Junior 6 K6-2 450              : 2487   20  20   821    59.4 %   2421   32.8 %
131 Nimzo 7.32 K6-2 450            : 2461   25  25   482    56.3 %   2417   34.6 %
132 Fritz 5.32 K6-2 450            : 2453   31  31   296    51.9 %   2440   37.5 %
133 Junior 5 K6-2 450              : 2442   25  25   519    52.2 %   2427   32.8 %
134 Crafty 19.17 A1200             : 2435   33  33   282    29.3 %   2589   37.2 %
135 Hiarcs 7.32 K6-2 450           : 2430   27  27   426    49.4 %   2434   32.2 %
136 Nimzo 8 K6-2 450               : 2414   30  30   438    27.5 %   2582   26.3 %
137 Gandalf 4.32f K6-2 450         : 2404   29  29   358    46.2 %   2430   33.8 %
138 SOS K6-2 450                   : 2403   16  16  1538    26.4 %   2580   24.1 %
139 Goliath Light K6-2 450         : 2402   17  17  1403    26.3 %   2580   25.7 %
140 Fritz 5.32 P200 MMX            : 2395   26  26   504    34.9 %   2503   29.8 %
141 Crafty 17.07 CB K6-2 450       : 2385   24  24   558    38.1 %   2469   31.7 %
142 MChess Pro 8 K6-2 450          : 2369   27  27   459    34.3 %   2482   32.0 %
143 Fritz 5 P200 MMX               : 2362   25  25   535    69.4 %   2219   34.2 %
144 Hiarcs 7 P200 MMX              : 2350   33  33   291    56.4 %   2305   31.6 %
145 Crafty 18.12 CB K6-2 450       : 2346   33  34   514    14.1 %   2660   19.6 %
146 Junior 5 P200 MMX              : 2343   28  28   386    55.1 %   2308   33.9 %
147 Nimzo 99 P200 MMX              : 2331   26  26   500    45.4 %   2363   30.0 %
148 Nimzo 98 P200 MMX              : 2309   28  28   399    46.7 %   2332   31.3 %
149 Shredder 2 P200 MMX            : 2309   24  24   565    43.3 %   2356   29.2 %
150 Rebel 9 P200 MMX               : 2290   32  32   281    58.4 %   2232   37.7 %
151 Hiarcs 9.5a/9.6 Palm Tung E    : 2290   30  30   380    46.8 %   2312   28.4 %
152 CEBoard Crafty 2004 HP RX4240  : 2239   36  36   260    43.7 %   2284   29.6 %
153 Rebel 9 P90                    : 2224   23  23   596    45.0 %   2259   35.2 %
154 Rebel 8 P90                    : 2216   19  19   901    53.4 %   2193   29.5 %
155 MChess Pro 6 P90               : 2213   19  19   905    55.1 %   2177   31.9 %
156 Hiarcs 6 P90                   : 2211   21  21   709    49.0 %   2218   33.4 %
157 Genius 5 P90                   : 2211   19  19   871    53.7 %   2185   33.3 %
158 Nimzo 3 P90                    : 2177   31  31   330    57.4 %   2125   32.4 %
159 Nimzo 3.5 P90                  : 2172   22  22   636    47.8 %   2188   34.0 %
160 Fritz 3 P90                    : 2143   27  28   452    40.5 %   2210   28.3 %
161 Junior 3.3-3.5 P90             : 2132   31  31   363    47.0 %   2153   25.1 %
162 Palm Tiger 2009 Tung C         : 2106   37  37   260    41.5 %   2165   26.2 %
163 Mephisto London 68030 33 MHz   : 2095   31  31   359    42.5 %   2148   27.6 %
164 Rebel 7 486/66 MHz             : 2072   35  36   270    34.8 %   2181   31.1 %
165 Comet 32 P90                   : 1968   30  31   538    19.0 %   2220   20.8 %
Frank Quisinsky
Posts: 7229
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Vintage .... Rating List Winboard from June 1999 (16:42)

Post by Frank Quisinsky »

Hi Kai,

the current SSDF is very interesting.

I think 125 - 200 Elo to high for chess engines and 25 Elo to high for the chess computers (if I compare with my own results).
Spectrum isn't easy to explain because it have many reasons I think.

SSDF = More higher the ranking, more realistic for myself.
So, SSDF is much more realistic as chess computer WIKI for the older chess computers.

If I added my SWCR rating list results to the FCP rating list results (SWCR ended 2010, FCP ended 2016) I produced the same problem. Results often with logic not to explain.

All in all ...
We can be happy to have SSDF or all the others works.
But his is the reason I gave up to do such things.
More as 20 years I am working on private or official rating list systems.
Today I am thinking, it make more sense to play 1 time in the year a bigger tourney, before I create a rating list and added many differerent versions from same engines into the list.

Best
Frank
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Vintage .... Rating List Winboard from June 1999 (16:42)

Post by Laskos »

Frank Quisinsky wrote: Wed Aug 05, 2020 11:16 am Hi Kai,

the current SSDF is very interesting.

I think 125 - 200 Elo to high for chess engines and 25 Elo to high for the chess computers (if I compare with my own results).
Spectrum isn't easy to explain because it have many reasons I think.

SSDF = More higher the ranking, more realistic for myself.
So, SSDF is much more realistic as chess computer WIKI for the older chess computers.

If I added my SWCR rating list results to the FCP rating list results (SWCR ended 2010, FCP ended 2016) I produced the same problem. Results often with logic not to explain.

All in all ...
We can be happy to have SSDF or all the others works.
But his is the reason I gave up to do such things.
More as 20 years I am working on private or official rating list systems.
Today I am thinking, it make more sense to play 1 time in the year a bigger tourney, before I create a rating list and added many differerent versions from same engines into the list.

Best
Frank
Hi Frank,

I also find SSDF pretty illuminating with their variable hardware and tournament time control. Especially the lower ratings for chess computers like Mephisto and weak old engines. I am used to CCRL and CEGT, which were hardly corroborated with lower ratings, and have no any chess computers. I have trouble with SSDF database, there are too many unconnected engines for Ordo to give a meaningful rating.
By the way, maybe you will restart your work for FCP rating list?