I have the example of Micro-Max of 1900 CCRL which is obviously less than FIDE 1900. I am not sure how generalizable is that. Yes, 2600 CCRL 40/15 is indeed FIDE 2600 40/15, or better to say both translated to 45' + 30''. So, FIDE ratings strangely seem dilated compared to CCRL ratings in the 1900-2600 CCRL interval. I am not sure what to make of this. The dilation factor of 1.5 of computer ratings (CEGT with Ordo) to human ratings I think is true for 2600-3500 Elo interval. I never estimated well low comparative ratings of less than 2000 FIDE and CCRL. But some of the CCRL ratings of even buggy engines seem rather high, their faults are exploitable even by weak humans.lkaufman wrote: ↑Tue Aug 04, 2020 4:46 pmOK, so you are saying that CCRL 2600 (roughly CEGT 2400) = FIDE 2600, and that CCRL 2300 (roughly CEGT 2100) = FIDE 2100. So this means that the engine rating lists substantially UNDERESTIMATE rating differences in human terms!! Both you and I have said the exact opposite many times, I think you estimate something like 300 elo gap on engine list = 200 gap on FIDE; here we have 300 elo gap on engine list = 500 gap on FIDE !! So, am I missing something here? Is this correct, and if so how can we explain such an incredible gap between theory (200 gap) and reality (500 gap)?Laskos wrote: ↑Tue Aug 04, 2020 8:39 amI would say that in 2500-2700 range CCRL 40/15 engine ratings are very comparable to human ratings for 45' + 15'' or even better 90' + 30'' tc. For much lower ratings, I went to extreme of Micro-Max of HGM of 1900 CCRL rating. I actually played this minimalistic engine. I usually beat it, often because it's deterministic, and I just repeat the game up to a point. So, its human rating should be no more than 1500. I think many engines in this rating range are either mildly buggy or "exploitable" by humans, so their CCRL ratings compared to humans are somewhat inflated. CEGT list is probably better in 1900-2300 Elo range. But CEGT is deflated by some 200 Elo points in the 2600-2700 FIDE Elo range.lkaufman wrote: ↑Tue Aug 04, 2020 6:08 amGiven this definition, are the CCRL 40/15 ratings of engines within the human range too low, too high, or just right on the specified hardware with a time limit of say 45' + 15" increment for engine and human? I don't know the answer, but I hope I have clarified the question!Frank Quisinsky wrote: ↑Tue Aug 04, 2020 1:02 am Hi Larry,
and same situation we have in Winboard times.
All the winboard engines are not very strong in endgames on AMD K6-2 or AMD K6-3.
First program really strong in endgames are Shredder!
Stefan Meyer-Kahlen developed Shredder 3.0 for Winboard too (a secret mission).
I tested WB Shredder 3 vs. the older Crafty, Nimzo or Zarkov version.
No of the strong WB engines have a chance in endgames.
You are the clearly stronger player Larry but in my humble opinion Elo for chess computers or for the first PC programs are not possible from human view.
If you have programs, comes with ...
1.900 Elo after opening book moves
2.450-2.550 in earlier middlegames
2.350-2.450 in late middlgames.
2.100 Elo for transposition into endgame
1.700 Elo for endgames
Is this a very big problem.
The same problem we have today with strongest chess software.
Stockfish:
2.700 Elo after opening book moves
2.900 Elo in ealier middlegames (the only chance for strongest player for draw)
3.300 Elo in late middlgames
3.700 Elo in transposition into endgame
3.200 Elo in endgame
So what for a rating we should give such an engine?
Best
Frank
Vintage .... Rating List Winboard from June 1999 (16:42)
Moderator: Ras
-
Laskos
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Vintage .... Rating List Winboard from June 1999 (16:42)
-
lkaufman
- Posts: 6284
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Vintage .... Rating List Winboard from June 1999 (16:42)
I don't think it is meaningful to talk about comparison with human ratings above 2900, because there are none. Worse, as far as I know there are no games on record between engines and humans rated much over 2800 (Kasp and Kramnik were around 2800 when they played the engines), so it seems that you are really just saying that for humans in the very narrow range 2600 to 2800 FIDE the engine ratings overstate the differences. That seems to be almost impossible to verify, except based on games played decades ago. I'm not too concerned about very weak, buggy engines, but engines that play without obvious bugs and have ratings over 2000 should be easily tested for accuracy of their ratings vs. humans. My suspicion is that the engine ratings needed to be compressed to match human ratings twenty years ago, but that human FIDE ratings have now spread out to the point where that is no longer true. 2200 was once the minimum FIDE rating, now it is 1000. They were artificially compressed in the past, no longer the case. Is there any current indication that FIDE=CEGT + constant isn't a reasonably accurate formula now for players in the 2000 to 2900 human range? If so, what is the best estimate of the constant?Laskos wrote: ↑Tue Aug 04, 2020 5:14 pmI have the example of Micro-Max of 1900 CCRL which is obviously less than FIDE 1900. I am not sure how generalizable is that. Yes, 2600 CCRL 40/15 is indeed FIDE 2600 40/15, or better to say both translated to 45' + 30''. So, FIDE ratings strangely seem dilated compared to CCRL ratings in the 1900-2600 CCRL interval. I am not sure what to make of this. The dilation factor of 1.5 of computer ratings (CEGT with Ordo) to human ratings I think is true for 2600-3500 Elo interval. I never estimated well low comparative ratings of less than 2000 FIDE and CCRL. But some of the CCRL ratings of even buggy engines seem rather high, their faults are exploitable even by weak humans.lkaufman wrote: ↑Tue Aug 04, 2020 4:46 pmOK, so you are saying that CCRL 2600 (roughly CEGT 2400) = FIDE 2600, and that CCRL 2300 (roughly CEGT 2100) = FIDE 2100. So this means that the engine rating lists substantially UNDERESTIMATE rating differences in human terms!! Both you and I have said the exact opposite many times, I think you estimate something like 300 elo gap on engine list = 200 gap on FIDE; here we have 300 elo gap on engine list = 500 gap on FIDE !! So, am I missing something here? Is this correct, and if so how can we explain such an incredible gap between theory (200 gap) and reality (500 gap)?Laskos wrote: ↑Tue Aug 04, 2020 8:39 amI would say that in 2500-2700 range CCRL 40/15 engine ratings are very comparable to human ratings for 45' + 15'' or even better 90' + 30'' tc. For much lower ratings, I went to extreme of Micro-Max of HGM of 1900 CCRL rating. I actually played this minimalistic engine. I usually beat it, often because it's deterministic, and I just repeat the game up to a point. So, its human rating should be no more than 1500. I think many engines in this rating range are either mildly buggy or "exploitable" by humans, so their CCRL ratings compared to humans are somewhat inflated. CEGT list is probably better in 1900-2300 Elo range. But CEGT is deflated by some 200 Elo points in the 2600-2700 FIDE Elo range.lkaufman wrote: ↑Tue Aug 04, 2020 6:08 amGiven this definition, are the CCRL 40/15 ratings of engines within the human range too low, too high, or just right on the specified hardware with a time limit of say 45' + 15" increment for engine and human? I don't know the answer, but I hope I have clarified the question!Frank Quisinsky wrote: ↑Tue Aug 04, 2020 1:02 am Hi Larry,
and same situation we have in Winboard times.
All the winboard engines are not very strong in endgames on AMD K6-2 or AMD K6-3.
First program really strong in endgames are Shredder!
Stefan Meyer-Kahlen developed Shredder 3.0 for Winboard too (a secret mission).
I tested WB Shredder 3 vs. the older Crafty, Nimzo or Zarkov version.
No of the strong WB engines have a chance in endgames.
You are the clearly stronger player Larry but in my humble opinion Elo for chess computers or for the first PC programs are not possible from human view.
If you have programs, comes with ...
1.900 Elo after opening book moves
2.450-2.550 in earlier middlegames
2.350-2.450 in late middlgames.
2.100 Elo for transposition into endgame
1.700 Elo for endgames
Is this a very big problem.
The same problem we have today with strongest chess software.
Stockfish:
2.700 Elo after opening book moves
2.900 Elo in ealier middlegames (the only chance for strongest player for draw)
3.300 Elo in late middlgames
3.700 Elo in transposition into endgame
3.200 Elo in endgame
So what for a rating we should give such an engine?
Best
Frank
Komodo rules!
-
silentshark
- Posts: 332
- Joined: Sat Mar 27, 2010 7:15 pm
Re: Vintage .... Rating List Winboard from June 1999 (16:42)
Nice.. how many of these engines are still being developed, I wonder. This was back in the day when my engine was probably stronger than Arasan! Times have changed. Pity some of these interesting engines are no moreFrank Quisinsky wrote: ↑Mon Aug 03, 2020 1:43 pm [stuff snipped]
Code: Select all
***************************************************************************** WB KSQ --> 3041 Games <-- 16:42 09.06.99 ***************************************************************************** Kai Skibbe (Hamburg), Christian Koch (Hamburg), Frank Quisinsky (Trier) 40 moves / 40 minutes, ponder = off, AMD K6-2 333/400 MHz, Celeron 450 MHz !! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ for Fritz 5-32 ELO calculation, 58.750 : 25 = 2350 ( 0 ELO) ***************************************************************************** 01. Zarkov 4.5e-4.5g 2516 ELO 284 Games USA 2525 02. Crafty 15.18-16.10 2480 ELO 422 Games USA 2575 03. Comet A95-B03 2441 ELO 416 Games GER 2450 03. Phalanx 17-21 2441 ELO 392 Games TCH 2450 05. Voyager 2.29-5.03 2430 ELO 326 Games SUI 2425 06. Nimzo 2000 2425 ELO 190 Games AUT 2425 07. Bionic Impakt 4.01 2424 ELO 190 Games BEL 2425 08. Patzer 2.99zp-3.0 2409 ELO 424 Games GER 2400 09. Gromit 2.11x-2.16 2400 ELO 293 Games GER 2400 10. AnMon 4.09-4.22 2392 ELO 232 Games FRA 2400 11. ZChess 1.2 2374 ELO 180 Games FRA 2375 12. Francesca 0.63-0.68c 2373 ELO 242 Games ENG 2375 13. The Crazy Bishop 37-43 2358 ELO 358 Games FRA 2350 14. Little Goliath 1.05-1.41a 2350 ELO 316 Games GER 2350 15. Bringer 1.2-1.4 !PLAY! 2340 ELO 81 Games GER 2325 16. Arasan 5.1-5.1a !NEW! 2324 ELO 170 Games USA 2325 17. Ant 3.42-3.61 2281 ELO 186 Games NDL 2275 18. LambChop 6.9-7.1 2270 ELO 180 Games ZEA 2275 19. Stobor B.32-B.56 2256 ELO 218 Games USA 2250 20. Dragon 3.11 !NEW! 2199 ELO 172 Games FRA 2200 21. ExChess 2.46-2.51 2187 ELO 250 Games USA 2175 22. La Dame Blanche 2.0-2.0c !NEW! 2122 ELO 120 Games FRA 2125 ***************************************************************************** --. Bionic Impakt 4.11 2361 ELO 110 Games BEL 2375 --. Gromit 2.0-2.1 2288 ELO 106 Games GER 2300 --. ZChess 0.92-1.0 2288 ELO 224 Games FRA 2300 *****************************************************************************
-
Laskos
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Vintage .... Rating List Winboard from June 1999 (16:42)
No, engines also overstate engines' Elo advantage of 3300-3500 CCRL engines compared to 2700 FIDE (and by equivalence in this range, CCRL) humans. That was the main range of application of 1.5 factor rule. I thought that it is also applicable to lower than 2500 CCRL ratings, but it seems to not be the case.lkaufman wrote: ↑Tue Aug 04, 2020 5:37 pmI don't think it is meaningful to talk about comparison with human ratings above 2900, because there are none. Worse, as far as I know there are no games on record between engines and humans rated much over 2800 (Kasp and Kramnik were around 2800 when they played the engines), so it seems that you are really just saying that for humans in the very narrow range 2600 to 2800 FIDE the engine ratings overstate the differences.Laskos wrote: ↑Tue Aug 04, 2020 5:14 pmI have the example of Micro-Max of 1900 CCRL which is obviously less than FIDE 1900. I am not sure how generalizable is that. Yes, 2600 CCRL 40/15 is indeed FIDE 2600 40/15, or better to say both translated to 45' + 30''. So, FIDE ratings strangely seem dilated compared to CCRL ratings in the 1900-2600 CCRL interval. I am not sure what to make of this. The dilation factor of 1.5 of computer ratings (CEGT with Ordo) to human ratings I think is true for 2600-3500 Elo interval. I never estimated well low comparative ratings of less than 2000 FIDE and CCRL. But some of the CCRL ratings of even buggy engines seem rather high, their faults are exploitable even by weak humans.lkaufman wrote: ↑Tue Aug 04, 2020 4:46 pm
OK, so you are saying that CCRL 2600 (roughly CEGT 2400) = FIDE 2600, and that CCRL 2300 (roughly CEGT 2100) = FIDE 2100. So this means that the engine rating lists substantially UNDERESTIMATE rating differences in human terms!! Both you and I have said the exact opposite many times, I think you estimate something like 300 elo gap on engine list = 200 gap on FIDE; here we have 300 elo gap on engine list = 500 gap on FIDE !! So, am I missing something here? Is this correct, and if so how can we explain such an incredible gap between theory (200 gap) and reality (500 gap)?
Yes, that were SSDF ratings in early 2000s and earlier ratings of early 1990s, they always needed engine ratings compression to fit into the FIDE ratings when mainly engine-engine games were used. The ratings of engines were usually in low-mid 2000s (FIDE) Elo.That seems to be almost impossible to verify, except based on games played decades ago. I'm not too concerned about very weak, buggy engines, but engines that play without obvious bugs and have ratings over 2000 should be easily tested for accuracy of their ratings vs. humans. My suspicion is that the engine ratings needed to be compressed to match human ratings twenty years ago, but that human FIDE ratings have now spread out to the point where that is no longer true. 2200 was once the minimum FIDE rating, now it is 1000. They were artificially compressed in the past, no longer the case. Is there any current indication that FIDE=CEGT + constant isn't a reasonably accurate formula now for players in the 2000 to 2900 human range? If so, what is the best estimate of the constant?
About the CCRL and CEGT comparisons, the things seem fairly clear: at the top (3400 ratings), the lists are the same. In the 2000 CEGT region, the CCRL ratings are about 220 Elo points higher than CEGT ratings. And the rest of the ratings are linear extrapolation using these 2 datapoints. But FIDE is not CEGT + constant, constant is not a constant. For ratings in low 2000s CEGT might be fairly similar to FIDE, but at CEGT 2500, FIDE is about 2650 or so (close to CCRL) and then to very high ratings above 2800 CEGT, the compression of CEGT ratings begins. Comparing engine self-ratings to FIDE human self-ratings seems always nasty and nonlinear.
-
lkaufman
- Posts: 6284
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Vintage .... Rating List Winboard from June 1999 (16:42)
What I don't understand is how you can make any meaningful statements about equivalence or compression when comparing CEGT to FIDE above 2800, when there are only two players in the world over 2800 FIDE, and neither has played any games on record vs. engines? I suppose you could be saying that a 3200 CEGT engine would not be able to score the required 91% vs. 2800 humans, and that may be true, but how would we know this? In short, how can you project anything beyond 2800 relating to human ratings? I did so myself in the past by making the assumption that the observed compression needed to fit engine ratings into the FIDE system twenty or thirty years ago was linear and would extrapolate beyond 2800, but it seems that the compression was an artifact of the old FIDE floor and that there is no evidence for compression at lower levels now, so nothing to extrapolate to beyond 2800.Laskos wrote: ↑Tue Aug 04, 2020 6:39 pmNo, engines also overstate engines' Elo advantage of 3300-3500 CCRL engines compared to 2700 FIDE (and by equivalence in this range, CCRL) humans. That was the main range of application of 1.5 factor rule. I thought that it is also applicable to lower than 2500 CCRL ratings, but it seems to not be the case.lkaufman wrote: ↑Tue Aug 04, 2020 5:37 pmI don't think it is meaningful to talk about comparison with human ratings above 2900, because there are none. Worse, as far as I know there are no games on record between engines and humans rated much over 2800 (Kasp and Kramnik were around 2800 when they played the engines), so it seems that you are really just saying that for humans in the very narrow range 2600 to 2800 FIDE the engine ratings overstate the differences.Laskos wrote: ↑Tue Aug 04, 2020 5:14 pmI have the example of Micro-Max of 1900 CCRL which is obviously less than FIDE 1900. I am not sure how generalizable is that. Yes, 2600 CCRL 40/15 is indeed FIDE 2600 40/15, or better to say both translated to 45' + 30''. So, FIDE ratings strangely seem dilated compared to CCRL ratings in the 1900-2600 CCRL interval. I am not sure what to make of this. The dilation factor of 1.5 of computer ratings (CEGT with Ordo) to human ratings I think is true for 2600-3500 Elo interval. I never estimated well low comparative ratings of less than 2000 FIDE and CCRL. But some of the CCRL ratings of even buggy engines seem rather high, their faults are exploitable even by weak humans.lkaufman wrote: ↑Tue Aug 04, 2020 4:46 pm
OK, so you are saying that CCRL 2600 (roughly CEGT 2400) = FIDE 2600, and that CCRL 2300 (roughly CEGT 2100) = FIDE 2100. So this means that the engine rating lists substantially UNDERESTIMATE rating differences in human terms!! Both you and I have said the exact opposite many times, I think you estimate something like 300 elo gap on engine list = 200 gap on FIDE; here we have 300 elo gap on engine list = 500 gap on FIDE !! So, am I missing something here? Is this correct, and if so how can we explain such an incredible gap between theory (200 gap) and reality (500 gap)?
Yes, that were SSDF ratings in early 2000s and earlier ratings of early 1990s, they always needed engine ratings compression to fit into the FIDE ratings when mainly engine-engine games were used. The ratings of engines were usually in low-mid 2000s (FIDE) Elo.That seems to be almost impossible to verify, except based on games played decades ago. I'm not too concerned about very weak, buggy engines, but engines that play without obvious bugs and have ratings over 2000 should be easily tested for accuracy of their ratings vs. humans. My suspicion is that the engine ratings needed to be compressed to match human ratings twenty years ago, but that human FIDE ratings have now spread out to the point where that is no longer true. 2200 was once the minimum FIDE rating, now it is 1000. They were artificially compressed in the past, no longer the case. Is there any current indication that FIDE=CEGT + constant isn't a reasonably accurate formula now for players in the 2000 to 2900 human range? If so, what is the best estimate of the constant?
About the CCRL and CEGT comparisons, the things seem fairly clear: at the top (3400 ratings), the lists are the same. In the 2000 CEGT region, the CCRL ratings are about 220 Elo points higher than CEGT ratings. And the rest of the ratings are linear extrapolation using these 2 datapoints. But FIDE is not CEGT + constant, constant is not a constant. For ratings in low 2000s CEGT might be fairly similar to FIDE, but at CEGT 2500, FIDE is about 2650 or so (close to CCRL) and then to very high ratings above 2800 CEGT, the compression of CEGT ratings begins. Comparing engine self-ratings to FIDE human self-ratings seems always nasty and nonlinear.
Komodo rules!
-
Laskos
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Vintage .... Rating List Winboard from June 1999 (16:42)
We were still doing these extrapolations when calculating the odds needed for 2 pawns or for a Knight. At least I was doing, assuming that Komodo FIDE rating even on 64 cores in lower than CCRL or CEGT 4 core 3400 rating. On 64 cores we should have assumed some 3600 FIDE rating, right? We never assumed such a rating. We have indirect evidence of some human interference with top engines, especially in mild handicap games.lkaufman wrote: ↑Tue Aug 04, 2020 7:28 pmWhat I don't understand is how you can make any meaningful statements about equivalence or compression when comparing CEGT to FIDE above 2800, when there are only two players in the world over 2800 FIDE, and neither has played any games on record vs. engines? I suppose you could be saying that a 3200 CEGT engine would not be able to score the required 91% vs. 2800 humans, and that may be true, but how would we know this? In short, how can you project anything beyond 2800 relating to human ratings? I did so myself in the past by making the assumption that the observed compression needed to fit engine ratings into the FIDE system twenty or thirty years ago was linear and would extrapolate beyond 2800, but it seems that the compression was an artifact of the old FIDE floor and that there is no evidence for compression at lower levels now, so nothing to extrapolate to beyond 2800.Laskos wrote: ↑Tue Aug 04, 2020 6:39 pmNo, engines also overstate engines' Elo advantage of 3300-3500 CCRL engines compared to 2700 FIDE (and by equivalence in this range, CCRL) humans. That was the main range of application of 1.5 factor rule. I thought that it is also applicable to lower than 2500 CCRL ratings, but it seems to not be the case.lkaufman wrote: ↑Tue Aug 04, 2020 5:37 pmI don't think it is meaningful to talk about comparison with human ratings above 2900, because there are none. Worse, as far as I know there are no games on record between engines and humans rated much over 2800 (Kasp and Kramnik were around 2800 when they played the engines), so it seems that you are really just saying that for humans in the very narrow range 2600 to 2800 FIDE the engine ratings overstate the differences.Laskos wrote: ↑Tue Aug 04, 2020 5:14 pmI have the example of Micro-Max of 1900 CCRL which is obviously less than FIDE 1900. I am not sure how generalizable is that. Yes, 2600 CCRL 40/15 is indeed FIDE 2600 40/15, or better to say both translated to 45' + 30''. So, FIDE ratings strangely seem dilated compared to CCRL ratings in the 1900-2600 CCRL interval. I am not sure what to make of this. The dilation factor of 1.5 of computer ratings (CEGT with Ordo) to human ratings I think is true for 2600-3500 Elo interval. I never estimated well low comparative ratings of less than 2000 FIDE and CCRL. But some of the CCRL ratings of even buggy engines seem rather high, their faults are exploitable even by weak humans.lkaufman wrote: ↑Tue Aug 04, 2020 4:46 pm
OK, so you are saying that CCRL 2600 (roughly CEGT 2400) = FIDE 2600, and that CCRL 2300 (roughly CEGT 2100) = FIDE 2100. So this means that the engine rating lists substantially UNDERESTIMATE rating differences in human terms!! Both you and I have said the exact opposite many times, I think you estimate something like 300 elo gap on engine list = 200 gap on FIDE; here we have 300 elo gap on engine list = 500 gap on FIDE !! So, am I missing something here? Is this correct, and if so how can we explain such an incredible gap between theory (200 gap) and reality (500 gap)?
Yes, that were SSDF ratings in early 2000s and earlier ratings of early 1990s, they always needed engine ratings compression to fit into the FIDE ratings when mainly engine-engine games were used. The ratings of engines were usually in low-mid 2000s (FIDE) Elo.That seems to be almost impossible to verify, except based on games played decades ago. I'm not too concerned about very weak, buggy engines, but engines that play without obvious bugs and have ratings over 2000 should be easily tested for accuracy of their ratings vs. humans. My suspicion is that the engine ratings needed to be compressed to match human ratings twenty years ago, but that human FIDE ratings have now spread out to the point where that is no longer true. 2200 was once the minimum FIDE rating, now it is 1000. They were artificially compressed in the past, no longer the case. Is there any current indication that FIDE=CEGT + constant isn't a reasonably accurate formula now for players in the 2000 to 2900 human range? If so, what is the best estimate of the constant?
About the CCRL and CEGT comparisons, the things seem fairly clear: at the top (3400 ratings), the lists are the same. In the 2000 CEGT region, the CCRL ratings are about 220 Elo points higher than CEGT ratings. And the rest of the ratings are linear extrapolation using these 2 datapoints. But FIDE is not CEGT + constant, constant is not a constant. For ratings in low 2000s CEGT might be fairly similar to FIDE, but at CEGT 2500, FIDE is about 2650 or so (close to CCRL) and then to very high ratings above 2800 CEGT, the compression of CEGT ratings begins. Comparing engine self-ratings to FIDE human self-ratings seems always nasty and nonlinear.
I don't believe that compression is gone now across the range. It was there in 2500-2700 FIDE Elo range on constantly adjusted to FIDE SSDF rating lists, it was there when these new rating lists appeared. Do you think that now, reaching some 3500 ratings, CEGT and CCRL stopped dilating? Yes, I can be wrong and have no proof, but I will stick by my reasonable hunch.
-
lkaufman
- Posts: 6284
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Vintage .... Rating List Winboard from June 1999 (16:42)
OK, I'm not saying you are wrong, I just wondered if you had any evidence from engine vs human games in the past decade that there is still compression. Regarding estimating ratings beyond 2800, I had one insight recently. Beyond a certain level, maybe 3000 or so, when the rating difference approaches 200 points, the points scored by the weak player are almost solely draws. So at the top end, draw odds approaches 191 elo points, the elo gap for half draws and half wins. But we know from our tests that certain handicaps (especially NBSC) are equivalent to draw odds, so such handicaps could be used for any opponents, engine or human, and given that elo value (or nearly that value). We have found that this doesn't depend on time control or engine ratings to a measurable degree. Similarly somewhat larger handicaps, such as f7 pawn, should have a fairly constant elo value once we are in this range where draw odds has a constant value. By measuring this handicap value, using CEGT or fastgm ratings (because they use ordo and not bayeselo) and running the games at the proper time control between 3000+ engines, we should have a reliable elo value for f7 handicap, and any engine that can score 50% giving f7 to another engine should deserve whatever rating that would imply at the time control of the games played. It probably breaks down if you go to knight odds, which appears to be something like 1100 or 1200 elo by this method, which may or may not be realistic. But for moderate handicaps, it should be pretty accurate.Laskos wrote: ↑Tue Aug 04, 2020 7:53 pmWe were still doing these extrapolations when calculating the odds needed for 2 pawns or for a Knight. At least I was doing, assuming that Komodo FIDE rating even on 64 cores in lower than CCRL or CEGT 4 core 3400 rating. On 64 cores we should have assumed some 3600 FIDE rating, right? We never assumed such a rating. We have indirect evidence of some human interference with top engines, especially in mild handicap games.lkaufman wrote: ↑Tue Aug 04, 2020 7:28 pmWhat I don't understand is how you can make any meaningful statements about equivalence or compression when comparing CEGT to FIDE above 2800, when there are only two players in the world over 2800 FIDE, and neither has played any games on record vs. engines? I suppose you could be saying that a 3200 CEGT engine would not be able to score the required 91% vs. 2800 humans, and that may be true, but how would we know this? In short, how can you project anything beyond 2800 relating to human ratings? I did so myself in the past by making the assumption that the observed compression needed to fit engine ratings into the FIDE system twenty or thirty years ago was linear and would extrapolate beyond 2800, but it seems that the compression was an artifact of the old FIDE floor and that there is no evidence for compression at lower levels now, so nothing to extrapolate to beyond 2800.Laskos wrote: ↑Tue Aug 04, 2020 6:39 pmNo, engines also overstate engines' Elo advantage of 3300-3500 CCRL engines compared to 2700 FIDE (and by equivalence in this range, CCRL) humans. That was the main range of application of 1.5 factor rule. I thought that it is also applicable to lower than 2500 CCRL ratings, but it seems to not be the case.lkaufman wrote: ↑Tue Aug 04, 2020 5:37 pmI don't think it is meaningful to talk about comparison with human ratings above 2900, because there are none. Worse, as far as I know there are no games on record between engines and humans rated much over 2800 (Kasp and Kramnik were around 2800 when they played the engines), so it seems that you are really just saying that for humans in the very narrow range 2600 to 2800 FIDE the engine ratings overstate the differences.Laskos wrote: ↑Tue Aug 04, 2020 5:14 pmI have the example of Micro-Max of 1900 CCRL which is obviously less than FIDE 1900. I am not sure how generalizable is that. Yes, 2600 CCRL 40/15 is indeed FIDE 2600 40/15, or better to say both translated to 45' + 30''. So, FIDE ratings strangely seem dilated compared to CCRL ratings in the 1900-2600 CCRL interval. I am not sure what to make of this. The dilation factor of 1.5 of computer ratings (CEGT with Ordo) to human ratings I think is true for 2600-3500 Elo interval. I never estimated well low comparative ratings of less than 2000 FIDE and CCRL. But some of the CCRL ratings of even buggy engines seem rather high, their faults are exploitable even by weak humans.lkaufman wrote: ↑Tue Aug 04, 2020 4:46 pm
OK, so you are saying that CCRL 2600 (roughly CEGT 2400) = FIDE 2600, and that CCRL 2300 (roughly CEGT 2100) = FIDE 2100. So this means that the engine rating lists substantially UNDERESTIMATE rating differences in human terms!! Both you and I have said the exact opposite many times, I think you estimate something like 300 elo gap on engine list = 200 gap on FIDE; here we have 300 elo gap on engine list = 500 gap on FIDE !! So, am I missing something here? Is this correct, and if so how can we explain such an incredible gap between theory (200 gap) and reality (500 gap)?
Yes, that were SSDF ratings in early 2000s and earlier ratings of early 1990s, they always needed engine ratings compression to fit into the FIDE ratings when mainly engine-engine games were used. The ratings of engines were usually in low-mid 2000s (FIDE) Elo.That seems to be almost impossible to verify, except based on games played decades ago. I'm not too concerned about very weak, buggy engines, but engines that play without obvious bugs and have ratings over 2000 should be easily tested for accuracy of their ratings vs. humans. My suspicion is that the engine ratings needed to be compressed to match human ratings twenty years ago, but that human FIDE ratings have now spread out to the point where that is no longer true. 2200 was once the minimum FIDE rating, now it is 1000. They were artificially compressed in the past, no longer the case. Is there any current indication that FIDE=CEGT + constant isn't a reasonably accurate formula now for players in the 2000 to 2900 human range? If so, what is the best estimate of the constant?
About the CCRL and CEGT comparisons, the things seem fairly clear: at the top (3400 ratings), the lists are the same. In the 2000 CEGT region, the CCRL ratings are about 220 Elo points higher than CEGT ratings. And the rest of the ratings are linear extrapolation using these 2 datapoints. But FIDE is not CEGT + constant, constant is not a constant. For ratings in low 2000s CEGT might be fairly similar to FIDE, but at CEGT 2500, FIDE is about 2650 or so (close to CCRL) and then to very high ratings above 2800 CEGT, the compression of CEGT ratings begins. Comparing engine self-ratings to FIDE human self-ratings seems always nasty and nonlinear.
I don't believe that compression is gone now across the range. It was there in 2500-2700 FIDE Elo range on constantly adjusted to FIDE SSDF rating lists, it was there when these new rating lists appeared. Do you think that now, reaching some 3500 ratings, CEGT and CCRL stopped dilating? Yes, I can be wrong and have no proof, but I will stick by my reasonable hunch.
Komodo rules!
-
Laskos
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Vintage .... Rating List Winboard from June 1999 (16:42)
Current SSDF rating list looks interesting, and I think they use something similar to ELOStat.
https://ssdf.bosjo.net/list.htm
They are veterans of 2000-2700 Elo ratings of top engines compared to humans, and the lower part of the table should be fairly accurate with respect to human ratings.
Here are some 165 engines listed using their database and ELOStat.
Code: Select all
Program Elo + - Games Score Av.Op. Draws
1 Stockfish 11 x64 1800X : 3447 27 25 330 76.4 % 3243 47.3 %
2 Stockfish 10 x64 1800X : 3415 18 17 640 72.8 % 3243 52.8 %
3 Stockfish 8 MP 1800X : 3372 20 20 780 81.7 % 3113 35.9 %
4 Stockfish 9 x64 1800X : 3367 16 15 802 70.6 % 3215 55.0 %
5 Komodo 13.1 x64 1800X : 3358 24 23 280 67.1 % 3234 62.1 %
6 Komodo 11.01 MP 1800X : 3333 18 18 814 77.4 % 3119 42.8 %
7 Stockfish 9 x64 Q6600 : 3330 19 18 440 57.3 % 3279 66.4 %
8 Stockfish 8 x64 1800X : 3323 18 18 500 60.8 % 3247 64.0 %
9 Komodo 12.3 x64 Q6600 : 3323 24 23 320 62.3 % 3235 60.9 %
10 Komodo 12.3 x64 1800X : 3322 16 16 680 67.1 % 3198 58.7 %
11 Deep Shredder 13 1800X : 3311 18 18 680 72.0 % 3148 49.0 %
12 Komodo 9.1 MP Q6600 : 3297 16 16 1018 74.1 % 3114 43.9 %
13 Komodo 11.01 x64 1800X : 3294 19 19 500 55.8 % 3253 60.4 %
14 Stockfish 8 x64 Q6600 : 3291 24 24 440 70.2 % 3142 45.9 %
15 Komodo 13.02 MCTS x64 1800X : 3285 22 21 360 61.5 % 3203 62.5 %
16 Stockfish 6 MP Q6600 : 3266 16 16 1016 72.3 % 3099 44.7 %
17 Komodo 11.01 MP Q6600 : 3263 21 21 442 63.1 % 3170 56.6 %
18 Booot 6.3.1 x64 1800X : 3236 13 13 920 49.8 % 3237 64.2 %
19 Deep Shredder 13 Q6600 : 3230 17 17 804 65.2 % 3121 50.5 %
20 Komodo 7 MP Q6600 : 3223 17 17 692 59.3 % 3158 55.6 %
21 Vajolet2 2.8 x64 1800X : 3185 19 19 650 38.1 % 3270 50.6 %
22 Komodo 5.1 MP Q6600 : 3178 16 15 894 60.2 % 3106 53.1 %
23 Booot 6.3.1 x64 Q6600 : 3175 29 29 320 58.0 % 3119 42.2 %
24 Arasan 21.2 x64 1800X : 3158 19 19 600 38.2 % 3242 51.8 %
25 Vajolet2 2.8 x64 Q6600 : 3157 30 29 320 73.4 % 2981 41.2 %
26 Deep Hiarcs 14 1800X : 3147 17 17 720 42.3 % 3201 52.9 %
27 Stockfish 3 MP Q6600 : 3139 14 14 1147 56.6 % 3093 48.5 %
28 Deep Rybka 4 Q6600 : 3134 15 15 974 58.7 % 3073 52.7 %
29 Deep Hiarcs 14 Q6600 : 3122 13 13 1434 58.6 % 3061 49.9 %
30 Chiron 3.01 MP Q6600 : 3116 18 19 616 47.1 % 3136 54.5 %
31 Deep Rybka 3 Q6600 : 3110 16 16 1056 71.3 % 2952 42.6 %
32 Wasp 3.5 x64 1800X : 3105 20 20 600 32.2 % 3234 49.2 %
33 Wasp 2.01 MP 1800X : 3100 20 20 646 36.5 % 3196 46.6 %
34 Wasp 3 x64 1800X : 3096 17 17 762 40.9 % 3160 54.1 %
35 Naum 4.2 MP Q6600 : 3071 14 14 1071 58.5 % 3011 53.0 %
36 Wasp 3.5 x64 Q6600 : 3068 26 26 320 43.6 % 3113 53.4 %
37 Deep Junior Yokohama Q6600 : 3056 18 18 848 37.3 % 3146 42.8 %
38 Deep Junior 13.3 Q6600 : 3048 13 13 1317 48.6 % 3058 50.6 %
39 Spike 1.4 MP Q6600 : 3037 13 13 1565 50.9 % 3031 46.8 %
40 Naum 4 MP Q6600 : 3035 14 14 1267 58.8 % 2973 45.1 %
41 Deep Shredder 12 Q6600 : 3032 16 16 832 51.1 % 3024 52.6 %
42 Deep Fritz 13 Q6600 : 3032 18 18 645 47.1 % 3052 56.4 %
43 Hiarcs 13.1 MP Q6600 : 3024 16 16 770 50.7 % 3019 55.7 %
44 Deep Hiarcs 13.2 Q6600 : 3024 18 18 776 46.3 % 3050 45.9 %
45 Wasp 2.01 x64 1800X : 3023 25 26 400 25.9 % 3206 44.8 %
46 Hiarcs 14 A1200 : 3021 23 23 520 57.9 % 2966 41.5 %
47 Deep Fritz 12 Q6600 : 3014 14 14 1078 48.3 % 3026 55.4 %
48 Wasp 2.01 MP Q6600 : 3010 26 26 482 22.1 % 3229 35.9 %
49 Deep Junior 12 Q6600 : 2995 16 16 940 52.1 % 2980 50.4 %
50 Zappa Mexico II Q6600 : 2987 17 17 938 52.2 % 2972 44.3 %
51 Naum 3.1 MP Q6600 : 2981 18 18 912 38.5 % 3062 39.5 %
52 Deep Fritz 11 Q6600 : 2972 13 13 1418 60.8 % 2896 45.5 %
53 Rybka 3 A1200 : 2969 29 29 282 46.8 % 2991 49.6 %
54 The Baron 3.43 x64 1800X : 2962 22 23 680 25.7 % 3146 32.9 %
55 Crafty 25.0 MP Q6600 : 2960 19 20 804 35.1 % 3066 36.8 %
56 The Baron 3.43 x64 Q6600 : 2952 29 29 400 49.0 % 2959 27.0 %
57 Deep Hiarcs 12 Q6600 : 2942 17 17 922 45.3 % 2975 44.1 %
58 Deep Shredder 11 Q6600 : 2930 16 16 1004 47.8 % 2946 42.5 %
59 Naum 4 A1200 : 2922 25 25 440 42.0 % 2978 42.3 %
60 Arasan 17.2 MP Q6600 : 2918 20 20 685 45.2 % 2952 42.5 %
61 Hiarcs 11.2 MP Q6600 : 2917 16 16 980 50.9 % 2910 48.3 %
62 Arasan 16 MP Q6600 : 2914 21 21 604 38.9 % 2993 43.7 %
63 Shredder 12 A1200 : 2910 24 24 520 36.5 % 3006 38.1 %
64 Fritz 13 A1200 : 2897 32 32 280 65.2 % 2788 39.6 %
65 Glaurung 2.2 MP Q6600 : 2897 17 17 1002 51.3 % 2888 34.5 %
66 Deep Junior 10.1 Q6600 : 2885 19 19 846 46.8 % 2908 37.4 %
67 Wasp 2.01 A1200 : 2872 23 23 560 53.8 % 2845 38.8 %
68 Fritz 12 A1200 : 2854 18 18 860 61.7 % 2771 41.7 %
69 Rybka 2.3.1 A1200 : 2840 21 21 612 49.7 % 2842 39.5 %
70 Jonny 4 MP Q6600 : 2821 20 20 860 29.7 % 2970 31.3 %
71 Rybka 1.2 A1200 : 2818 24 24 535 63.7 % 2720 37.4 %
72 Deep Fritz 8 Q6600 : 2809 19 19 786 40.1 % 2879 37.9 %
73 Shredder 8 MP Q6600 : 2807 19 19 824 38.9 % 2886 36.8 %
74 Hiarcs 11.1 UCI A1200 : 2782 25 25 362 56.5 % 2737 50.6 %
75 CM King 3.5 MP Q6600 : 2779 19 19 932 29.6 % 2929 32.8 %
76 Deep Junior 8 Q6600 : 2775 25 25 526 33.1 % 2897 32.3 %
77 Hiarcs 11.1 A1200 : 2771 30 31 325 25.7 % 2956 39.1 %
78 Junior 10.1 A1200 : 2750 22 22 679 50.6 % 2745 27.8 %
79 Junior 10 A1200 : 2736 26 26 497 49.4 % 2740 29.6 %
80 Zap!Chess Zanzibar A1200 : 2729 17 17 1038 46.9 % 2751 32.5 %
81 Hiarcs 10 HypMod A1200 : 2728 18 18 1016 65.9 % 2614 34.6 %
82 Shredder 10 UCI A1200 : 2722 19 19 867 55.8 % 2682 34.0 %
83 Pro Deo 2.1 YAT A1200 : 2721 27 26 400 59.4 % 2655 40.2 %
84 Shredder 8 A1200 : 2711 21 21 743 59.6 % 2643 31.2 %
85 Fritz 9 A1200 : 2710 19 19 835 51.3 % 2701 35.8 %
86 Fruit 2.2.1 A1200 : 2709 18 18 940 59.8 % 2640 35.1 %
87 Spike 1.2 A1200 : 2705 29 29 352 45.7 % 2735 38.1 %
88 Shredder 9 UCI A1200 : 2704 16 16 1238 65.3 % 2595 32.1 %
89 Pro Deo 2.0 A1200 : 2698 23 23 520 46.0 % 2726 39.6 %
90 Shredder 7.04 UCI A1200 : 2692 22 22 635 64.6 % 2588 35.1 %
91 Deep Fritz 8 A1200 : 2686 24 24 532 44.5 % 2724 32.3 %
92 Junior 8 A1200 : 2683 27 27 438 54.7 % 2650 30.8 %
93 Junior 9 A1200 : 2678 23 23 565 57.5 % 2625 34.3 %
94 Chess Tiger 2007 A1200 : 2672 25 26 450 36.8 % 2766 38.9 %
95 Deep Junior 8 A1200 : 2663 30 30 385 61.0 % 2585 29.6 %
96 Shredder 7 A1200 : 2663 29 29 407 65.5 % 2551 29.2 %
97 Pro Deo 1.86 A1200 : 2660 23 23 600 36.6 % 2755 32.8 %
98 Spike 1.1 A1200 : 2659 27 27 400 50.9 % 2653 35.8 %
99 Deep Fritz 7 A1200 : 2652 24 24 542 63.8 % 2554 36.5 %
100 Pro Deo 1.82 A1200 : 2650 25 25 440 44.3 % 2690 40.5 %
101 Fritz 8 A1200 : 2642 20 20 825 51.8 % 2630 32.2 %
102 Fritz 7 A1200 : 2631 26 26 400 52.5 % 2613 40.5 %
103 Gambit Tiger 2 A1200 : 2630 21 21 644 48.7 % 2639 38.7 %
104 Hiarcs 9 A1200 : 2619 30 30 430 29.4 % 2771 26.7 %
105 Gandalf 6 A1200 : 2613 22 22 627 46.7 % 2636 36.5 %
106 Shredder 6 Pad UCI A1200 : 2610 23 23 569 56.4 % 2565 34.8 %
107 Shredder 6 A1200 : 2606 32 32 280 52.1 % 2591 37.1 %
108 Chess Tiger 15 A1200 : 2605 17 17 870 49.8 % 2606 45.1 %
109 Chess Tiger 2004 A1200 : 2603 19 19 712 55.5 % 2565 43.0 %
110 Pro Deo 1.1 A1200 : 2603 21 21 716 54.5 % 2572 34.4 %
111 Junior 7 A1200 : 2602 21 21 701 49.5 % 2606 35.1 %
112 Deep Fritz A1200 : 2599 21 21 684 47.2 % 2619 36.8 %
113 Chess Tiger 14 CB A1200 : 2599 22 22 579 54.1 % 2570 40.4 %
114 Rebel 12 A1200 : 2591 30 30 335 42.7 % 2643 35.2 %
115 Ruffian 1.0.1 A1200 : 2577 20 20 729 44.9 % 2612 35.0 %
116 Rebel Century 4 A1200 : 2567 26 26 448 58.1 % 2510 35.9 %
117 Hiarcs 8 A1200 : 2563 24 24 529 46.6 % 2587 33.8 %
118 Deep Sjeng 1.5a A1200 : 2562 32 32 301 44.9 % 2598 35.2 %
119 Pocket Shredder Ipaq 114 : 2559 31 31 280 54.3 % 2529 41.4 %
120 Deep Fritz K6-2 450 : 2553 24 24 570 58.7 % 2492 29.3 %
121 Deep Fritz 7 K6-2 450 : 2553 32 32 282 41.1 % 2615 39.7 %
122 Shredder 5.32 A1200 : 2547 19 19 828 44.0 % 2589 36.5 %
123 Gandalf 4.32h A1200 : 2545 29 29 358 45.3 % 2578 36.9 %
124 Gandalf 5 A1200 : 2532 30 30 284 40.3 % 2600 44.7 %
125 Gambit Tiger 2 K6-2 450 : 2526 33 33 280 38.4 % 2608 35.4 %
126 Fritz 6 K6-2 450 : 2524 22 22 673 63.2 % 2430 34.6 %
127 Crafty 18.12 CB A1200 : 2504 25 25 468 42.4 % 2557 36.1 %
128 Gandalf 5.1 A1200 : 2497 28 28 376 45.7 % 2527 37.8 %
129 Shredder 5.32 K6-2 450 : 2488 30 30 366 38.7 % 2569 32.0 %
130 Junior 6 K6-2 450 : 2487 20 20 821 59.4 % 2421 32.8 %
131 Nimzo 7.32 K6-2 450 : 2461 25 25 482 56.3 % 2417 34.6 %
132 Fritz 5.32 K6-2 450 : 2453 31 31 296 51.9 % 2440 37.5 %
133 Junior 5 K6-2 450 : 2442 25 25 519 52.2 % 2427 32.8 %
134 Crafty 19.17 A1200 : 2435 33 33 282 29.3 % 2589 37.2 %
135 Hiarcs 7.32 K6-2 450 : 2430 27 27 426 49.4 % 2434 32.2 %
136 Nimzo 8 K6-2 450 : 2414 30 30 438 27.5 % 2582 26.3 %
137 Gandalf 4.32f K6-2 450 : 2404 29 29 358 46.2 % 2430 33.8 %
138 SOS K6-2 450 : 2403 16 16 1538 26.4 % 2580 24.1 %
139 Goliath Light K6-2 450 : 2402 17 17 1403 26.3 % 2580 25.7 %
140 Fritz 5.32 P200 MMX : 2395 26 26 504 34.9 % 2503 29.8 %
141 Crafty 17.07 CB K6-2 450 : 2385 24 24 558 38.1 % 2469 31.7 %
142 MChess Pro 8 K6-2 450 : 2369 27 27 459 34.3 % 2482 32.0 %
143 Fritz 5 P200 MMX : 2362 25 25 535 69.4 % 2219 34.2 %
144 Hiarcs 7 P200 MMX : 2350 33 33 291 56.4 % 2305 31.6 %
145 Crafty 18.12 CB K6-2 450 : 2346 33 34 514 14.1 % 2660 19.6 %
146 Junior 5 P200 MMX : 2343 28 28 386 55.1 % 2308 33.9 %
147 Nimzo 99 P200 MMX : 2331 26 26 500 45.4 % 2363 30.0 %
148 Nimzo 98 P200 MMX : 2309 28 28 399 46.7 % 2332 31.3 %
149 Shredder 2 P200 MMX : 2309 24 24 565 43.3 % 2356 29.2 %
150 Rebel 9 P200 MMX : 2290 32 32 281 58.4 % 2232 37.7 %
151 Hiarcs 9.5a/9.6 Palm Tung E : 2290 30 30 380 46.8 % 2312 28.4 %
152 CEBoard Crafty 2004 HP RX4240 : 2239 36 36 260 43.7 % 2284 29.6 %
153 Rebel 9 P90 : 2224 23 23 596 45.0 % 2259 35.2 %
154 Rebel 8 P90 : 2216 19 19 901 53.4 % 2193 29.5 %
155 MChess Pro 6 P90 : 2213 19 19 905 55.1 % 2177 31.9 %
156 Hiarcs 6 P90 : 2211 21 21 709 49.0 % 2218 33.4 %
157 Genius 5 P90 : 2211 19 19 871 53.7 % 2185 33.3 %
158 Nimzo 3 P90 : 2177 31 31 330 57.4 % 2125 32.4 %
159 Nimzo 3.5 P90 : 2172 22 22 636 47.8 % 2188 34.0 %
160 Fritz 3 P90 : 2143 27 28 452 40.5 % 2210 28.3 %
161 Junior 3.3-3.5 P90 : 2132 31 31 363 47.0 % 2153 25.1 %
162 Palm Tiger 2009 Tung C : 2106 37 37 260 41.5 % 2165 26.2 %
163 Mephisto London 68030 33 MHz : 2095 31 31 359 42.5 % 2148 27.6 %
164 Rebel 7 486/66 MHz : 2072 35 36 270 34.8 % 2181 31.1 %
165 Comet 32 P90 : 1968 30 31 538 19.0 % 2220 20.8 %
-
Frank Quisinsky
- Posts: 7232
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: Vintage .... Rating List Winboard from June 1999 (16:42)
Hi Kai,
the current SSDF is very interesting.
I think 125 - 200 Elo to high for chess engines and 25 Elo to high for the chess computers (if I compare with my own results).
Spectrum isn't easy to explain because it have many reasons I think.
SSDF = More higher the ranking, more realistic for myself.
So, SSDF is much more realistic as chess computer WIKI for the older chess computers.
If I added my SWCR rating list results to the FCP rating list results (SWCR ended 2010, FCP ended 2016) I produced the same problem. Results often with logic not to explain.
All in all ...
We can be happy to have SSDF or all the others works.
But his is the reason I gave up to do such things.
More as 20 years I am working on private or official rating list systems.
Today I am thinking, it make more sense to play 1 time in the year a bigger tourney, before I create a rating list and added many differerent versions from same engines into the list.
Best
Frank
the current SSDF is very interesting.
I think 125 - 200 Elo to high for chess engines and 25 Elo to high for the chess computers (if I compare with my own results).
Spectrum isn't easy to explain because it have many reasons I think.
SSDF = More higher the ranking, more realistic for myself.
So, SSDF is much more realistic as chess computer WIKI for the older chess computers.
If I added my SWCR rating list results to the FCP rating list results (SWCR ended 2010, FCP ended 2016) I produced the same problem. Results often with logic not to explain.
All in all ...
We can be happy to have SSDF or all the others works.
But his is the reason I gave up to do such things.
More as 20 years I am working on private or official rating list systems.
Today I am thinking, it make more sense to play 1 time in the year a bigger tourney, before I create a rating list and added many differerent versions from same engines into the list.
Best
Frank
-
Laskos
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Vintage .... Rating List Winboard from June 1999 (16:42)
Hi Frank,Frank Quisinsky wrote: ↑Wed Aug 05, 2020 11:16 am Hi Kai,
the current SSDF is very interesting.
I think 125 - 200 Elo to high for chess engines and 25 Elo to high for the chess computers (if I compare with my own results).
Spectrum isn't easy to explain because it have many reasons I think.
SSDF = More higher the ranking, more realistic for myself.
So, SSDF is much more realistic as chess computer WIKI for the older chess computers.
If I added my SWCR rating list results to the FCP rating list results (SWCR ended 2010, FCP ended 2016) I produced the same problem. Results often with logic not to explain.
All in all ...
We can be happy to have SSDF or all the others works.
But his is the reason I gave up to do such things.
More as 20 years I am working on private or official rating list systems.
Today I am thinking, it make more sense to play 1 time in the year a bigger tourney, before I create a rating list and added many differerent versions from same engines into the list.
Best
Frank
I also find SSDF pretty illuminating with their variable hardware and tournament time control. Especially the lower ratings for chess computers like Mephisto and weak old engines. I am used to CCRL and CEGT, which were hardly corroborated with lower ratings, and have no any chess computers. I have trouble with SSDF database, there are too many unconnected engines for Ordo to give a meaningful rating.
By the way, maybe you will restart your work for FCP rating list?