CCRL Chess Engine Match Standards. How obsolete are they?

mkchan · Post by **mkchan** » Tue Jul 02, 2019 7:45 am

mwyoung wrote: ↑Tue Jul 02, 2019 7:35 am
mkchan wrote: ↑Tue Jul 02, 2019 7:28 am
mwyoung wrote: ↑Tue Jul 02, 2019 6:10 am
mkchan wrote: ↑Tue Jul 02, 2019 6:06 am
mwyoung wrote: ↑Tue Jul 02, 2019 4:51 am
mkchan wrote: ↑Tue Jul 02, 2019 4:33 am
mwyoung wrote: ↑Mon Jul 01, 2019 10:03 pm
mkchan wrote: ↑Mon Jul 01, 2019 9:46 pm To me this seems more of an attack directed at CCRL team than actually getting to the point. From what I read, the issue is the advertised 40/4 | 40/40 which are actually scaled to some CPU that was decided when the website was started. They clearly state, right at the start, approximates for modern CPUs and the benchmarking methodology. I see no bait and switch here at all. The rating list is still indicative of relative strengths of engines to a pretty good accuracy.

If you have such valuable criticism to make, why not start a list of your own instead of making the entire established community conform to your personal interpretation of their list. In fact, make it pay-to-view as well because it's going to be the one true list with exact rating values measured for each new CPU that comes out right? Don't forget to get a few GMs into the pool to better reflect FIDE rating numbers so that we're not fooling anyone about SF's 3400 rating. I'm sure everyone would flock to that
I was told by subject matter expert that CCRL standards is very poor.

And it is clear reading the posts of the discussion. The practice of CCRL claiming the ratings test as 4/40 and 40/40 needs to be addressed. CCRL can fix this issue today.

So valuable criticism has been made.
No, you have not made valid criticism at all. Why don't you address the point I made in their defense: CCRL shows accurate relative difference between engine strength. You do realize changing time control will not change that by much? Rating is relative to the pool of opponents anyways
Are you kidding, CCRL can not even rank the best engine. The last time I check on modern hardware. Lc0 was the strongest engine.
But CCRL is blind to this because they test on a benchmark of a CPU from the year 2005. This is the year 2019....

Rank Name Rating Score Average
Opponent Draws Games LOS
Elo + −
1 Stockfish 10 64-bit 4CPU 3546 +13 −12 69.6% −124.9 54.9% 2015
100.0%
2 Houdini 6 64-bit 4CPU 3519 +9 −9 65.5% −108.4 53.9% 3912
95.8%
3 Komodo 11.2 64-bit 4CPU 3503 +16 −16 58.2% −66.6 55.3% 1158
90.4%
4 Lc0 0.21.1 JH.T6.532 GPU 3487 +17 −17 59.2% −58.5 52.4% 1100
What? Lc0 might be best on an RTX 2080 sure but not everyone is going to be able to afford that. In either case, they clearly state they tested on Gtx 1050 with the ratio mentioned. I don't know what kind of a fool you take the people of this forum for but that is, clearly, the rating given the hardware. Sure if you scale up the hardware the rating changes (oh I wonder if CCRL has mentioned that scaling hmm..... Oh look! It's on the front page and the lc0 page).
Like I said, you can go ahead and make your rating list with the latest "modern" hardware to satisfy yourself. CCRL according EVERYONE else is fine
It is not fine. That is why more and more people look to TCEC for what engine is best. Why, Because we live in the year 2019. When CCRL is still trying to test with 2005 hardware. Facts can hurt....
No one who knows anything about testing looks at TCEC for what engine is best. It is a tournament, not a dedicated effort into testing. You are simply appealing to authority and not contradicting the actual point I'm making with facts:

>CCRL shows accurate relative difference between engine strength. You do realize changing time control will not change that by much? Rating is relative to the pool of opponents anyways. Regardless of what you or the masses think, mathematically CCRL > TCEC for rating estimates
The world of chess engines is bigger then us here. Most people who use chess engines knows nothing of CCC. Or are they in the weeds about computer chess testing.

I live in both worlds.

Sure, but accuracy has nothing to do with the world or anything. It has to do with math. More games = more accuracy. It is irrelevant that people don't understand why CCRL is a better measure than TCEC. TCEC is more fun and very popular sure, as are all famous tournaments. Not as many people look at the FIDE rating db so is that enough reason to fudge it? Maybe to get the real ratings in FIDE one should introduce all the chess engines to that pool and the humans will probably get capped at around 2400 in blitz rating.

Offer mathematical evidence of incorrect CCRL given the details on the website, and I have no doubt they'll change the result.

To sum up:
1. Rating pools measured in small time deltas do not change significantly.
2. If you can provide the RTX 2080 to put lc0 on the top, by all means contribute to CCRL and adhere to their standards, you'll get your result.
3. Understand that this is a best effort by volunteers and it is of high quality nevertheless.

Rebel · Post by **Rebel** » Tue Jul 02, 2019 8:43 am

mwyoung wrote: ↑Tue Jul 02, 2019 3:14 am But that is no excuse to mislead people, what every excuse you want to give for CCRL.

There is no misleading.

j.korhonen · Post by **j.korhonen** » Tue Jul 02, 2019 9:36 am

IMHO CPU-engines rating list should be separated from GPU-engines rating list, like normal chess rating list is separated from FRC rating list

mwyoung · Post by **mwyoung** » Tue Jul 02, 2019 10:39 am

Rebel wrote: ↑Tue Jul 02, 2019 8:43 am
mwyoung wrote: ↑Tue Jul 02, 2019 3:14 am But that is no excuse to mislead people, what every excuse you want to give for CCRL.
There is no misleading.

Yep non at all. The 40 in 40 list is really 15 in 40. The 4 in 40 list is really 1.5 in 40.

mwyoung · Post by **mwyoung** » Tue Jul 02, 2019 11:00 am

mkchan wrote: ↑Tue Jul 02, 2019 7:45 am
mwyoung wrote: ↑Tue Jul 02, 2019 7:35 am
mkchan wrote: ↑Tue Jul 02, 2019 7:28 am
mwyoung wrote: ↑Tue Jul 02, 2019 6:10 am
mkchan wrote: ↑Tue Jul 02, 2019 6:06 am
mwyoung wrote: ↑Tue Jul 02, 2019 4:51 am
mkchan wrote: ↑Tue Jul 02, 2019 4:33 am
mwyoung wrote: ↑Mon Jul 01, 2019 10:03 pm
mkchan wrote: ↑Mon Jul 01, 2019 9:46 pm To me this seems more of an attack directed at CCRL team than actually getting to the point. From what I read, the issue is the advertised 40/4 | 40/40 which are actually scaled to some CPU that was decided when the website was started. They clearly state, right at the start, approximates for modern CPUs and the benchmarking methodology. I see no bait and switch here at all. The rating list is still indicative of relative strengths of engines to a pretty good accuracy.

If you have such valuable criticism to make, why not start a list of your own instead of making the entire established community conform to your personal interpretation of their list. In fact, make it pay-to-view as well because it's going to be the one true list with exact rating values measured for each new CPU that comes out right? Don't forget to get a few GMs into the pool to better reflect FIDE rating numbers so that we're not fooling anyone about SF's 3400 rating. I'm sure everyone would flock to that
I was told by subject matter expert that CCRL standards is very poor.

And it is clear reading the posts of the discussion. The practice of CCRL claiming the ratings test as 4/40 and 40/40 needs to be addressed. CCRL can fix this issue today.

So valuable criticism has been made.
No, you have not made valid criticism at all. Why don't you address the point I made in their defense: CCRL shows accurate relative difference between engine strength. You do realize changing time control will not change that by much? Rating is relative to the pool of opponents anyways
Are you kidding, CCRL can not even rank the best engine. The last time I check on modern hardware. Lc0 was the strongest engine.
But CCRL is blind to this because they test on a benchmark of a CPU from the year 2005. This is the year 2019....

Rank Name Rating Score Average
Opponent Draws Games LOS
Elo + −
1 Stockfish 10 64-bit 4CPU 3546 +13 −12 69.6% −124.9 54.9% 2015
100.0%
2 Houdini 6 64-bit 4CPU 3519 +9 −9 65.5% −108.4 53.9% 3912
95.8%
3 Komodo 11.2 64-bit 4CPU 3503 +16 −16 58.2% −66.6 55.3% 1158
90.4%
4 Lc0 0.21.1 JH.T6.532 GPU 3487 +17 −17 59.2% −58.5 52.4% 1100
What? Lc0 might be best on an RTX 2080 sure but not everyone is going to be able to afford that. In either case, they clearly state they tested on Gtx 1050 with the ratio mentioned. I don't know what kind of a fool you take the people of this forum for but that is, clearly, the rating given the hardware. Sure if you scale up the hardware the rating changes (oh I wonder if CCRL has mentioned that scaling hmm..... Oh look! It's on the front page and the lc0 page).
Like I said, you can go ahead and make your rating list with the latest "modern" hardware to satisfy yourself. CCRL according EVERYONE else is fine
It is not fine. That is why more and more people look to TCEC for what engine is best. Why, Because we live in the year 2019. When CCRL is still trying to test with 2005 hardware. Facts can hurt....
No one who knows anything about testing looks at TCEC for what engine is best. It is a tournament, not a dedicated effort into testing. You are simply appealing to authority and not contradicting the actual point I'm making with facts:

>CCRL shows accurate relative difference between engine strength. You do realize changing time control will not change that by much? Rating is relative to the pool of opponents anyways. Regardless of what you or the masses think, mathematically CCRL > TCEC for rating estimates
The world of chess engines is bigger then us here. Most people who use chess engines knows nothing of CCC. Or are they in the weeds about computer chess testing.

I live in both worlds.
Sure, but accuracy has nothing to do with the world or anything. It has to do with math. More games = more accuracy. It is irrelevant that people don't understand why CCRL is a better measure than TCEC. TCEC is more fun and very popular sure, as are all famous tournaments. Not as many people look at the FIDE rating db so is that enough reason to fudge it? Maybe to get the real ratings in FIDE one should introduce all the chess engines to that pool and the humans will probably get capped at around 2400 in blitz rating.

Offer mathematical evidence of incorrect CCRL given the details on the website, and I have no doubt they'll change the result.

To sum up:
1. Rating pools measured in small time deltas do not change significantly.
2. If you can provide the RTX 2080 to put lc0 on the top, by all means contribute to CCRL and adhere to their standards, you'll get your result.
3. Understand that this is a best effort by volunteers and it is of high quality nevertheless.

Are you telling me that the premier chess testing group can not test NN engines correctly.

They will be a sorry excuse for a ratings list. As more NN surpass the ab engines.

Maybe CCRL does not want to change. It is much easier to pump out low quality games. When using a CPU standard from 2005 in 2019.

flok · Post by **flok** » Tue Jul 02, 2019 11:25 am

mwyoung wrote: ↑Tue Jul 02, 2019 11:00 amMaybe CCRL does not want to change. It is much easier to pump out low quality games. When using a CPU standard from 2005 in 2019.

https://www.youtube.com/watch?v=v1PBptSDIh8

(translation: you're not serious, you're a troll looking for attention)

Guenther · Post by **Guenther** » Tue Jul 02, 2019 11:42 am

mwyoung wrote: ↑Tue Jul 02, 2019 11:00 am
mkchan wrote: ↑Tue Jul 02, 2019 7:45 am
Sure, but accuracy has nothing to do with the world or anything. It has to do with math. More games = more accuracy. It is irrelevant that people don't understand why CCRL is a better measure than TCEC. TCEC is more fun and very popular sure, as are all famous tournaments. Not as many people look at the FIDE rating db so is that enough reason to fudge it? Maybe to get the real ratings in FIDE one should introduce all the chess engines to that pool and the humans will probably get capped at around 2400 in blitz rating.

Offer mathematical evidence of incorrect CCRL given the details on the website, and I have no doubt they'll change the result.

To sum up:
1. Rating pools measured in small time deltas do not change significantly.
2. If you can provide the RTX 2080 to put lc0 on the top, by all means contribute to CCRL and adhere to their standards, you'll get your result.
3. Understand that this is a best effort by volunteers and it is of high quality nevertheless.
Are you telling me that the premier chess testing group can not test NN engines correctly.

They will be a sorry excuse for a ratings list. As more NN surpass the ab engines.

Maybe CCRL does not want to change. It is much easier to pump out low quality games. When using a CPU standard from 2005 in 2019.

You still don't understand that the 'quality' of rating tests depends NOT on the quality of the hardware (at least for CPU programs, their
rating correlation remains still quite similar).
It depends much more on the quality and accuracy of the conditions of the games (and a lot of them). This is exactly what is missing in your games.
Why can't you have fun with your hardware and games w/o those stupid posts with stupid youtube links?
If you think you have a nice game just post it in readable standard pgn (with eval/depth ofc) and we will see.

As for LC0, as long as the hardware is stated, everyone can extrapolate the ratings to a better GPU and it does not help an average
user to know how it would do on the most expensive GPU, which he does not have, or does not intend to buy.

OTH I often have stated that CCRL could add the real TC to the pgn header and at least the GUI used (the listed TC is already
there in the event header anyway), but this has not much to do with your bashing of CCRL, while everyone knows the TC is adapted...

Rebel · Post by **Rebel** » Tue Jul 02, 2019 11:44 am

mwyoung wrote: ↑Tue Jul 02, 2019 10:39 am
Rebel wrote: ↑Tue Jul 02, 2019 8:43 am
mwyoung wrote: ↑Tue Jul 02, 2019 3:14 am But that is no excuse to mislead people, what every excuse you want to give for CCRL.
There is no misleading.
Yep non at all. The 40 in 40 list is really 15 in 40. The 4 in 40 list is really 1.5 in 40.

You can criticize the method with all the juicy qualifications you like but you can't call the method misleading when the method is explained.

mwyoung · Post by **mwyoung** » Tue Jul 02, 2019 1:18 pm

Guenther wrote: ↑Tue Jul 02, 2019 11:42 am
mwyoung wrote: ↑Tue Jul 02, 2019 11:00 am
mkchan wrote: ↑Tue Jul 02, 2019 7:45 am
Sure, but accuracy has nothing to do with the world or anything. It has to do with math. More games = more accuracy. It is irrelevant that people don't understand why CCRL is a better measure than TCEC. TCEC is more fun and very popular sure, as are all famous tournaments. Not as many people look at the FIDE rating db so is that enough reason to fudge it? Maybe to get the real ratings in FIDE one should introduce all the chess engines to that pool and the humans will probably get capped at around 2400 in blitz rating.

Offer mathematical evidence of incorrect CCRL given the details on the website, and I have no doubt they'll change the result.

To sum up:
1. Rating pools measured in small time deltas do not change significantly.
2. If you can provide the RTX 2080 to put lc0 on the top, by all means contribute to CCRL and adhere to their standards, you'll get your result.
3. Understand that this is a best effort by volunteers and it is of high quality nevertheless.
Are you telling me that the premier chess testing group can not test NN engines correctly.

They will be a sorry excuse for a ratings list. As more NN surpass the ab engines.

Maybe CCRL does not want to change. It is much easier to pump out low quality games. When using a CPU standard from 2005 in 2019.
You still don't understand that the 'quality' of rating tests depends NOT on the quality of the hardware (at least for CPU programs, their
rating correlation remains still quite similar).
It depends much more on the quality and accuracy of the conditions of the games (and a lot of them). This is exactly what is missing in your games.
Why can't you have fun with your hardware and games w/o those stupid posts with stupid youtube links?
If you think you have a nice game just post it in readable standard pgn (with eval/depth ofc) and we will see.

As for LC0, as long as the hardware is stated, everyone can extrapolate the ratings to a better GPU and it does not help an average
user to know how it would do on the most expensive GPU, which he does not have, or does not intend to buy.

OTH I often have stated that CCRL could add the real TC to the pgn header and at least the GUI used (the listed TC is already
there in the event header anyway), but this has not much to do with your bashing of CCRL, while everyone knows the TC is adapted...

Nice of you to join the discussion.
And without the lies on your last post.

Guenther if everyone knows it was adapted. Then why all the panic over my graphic?

mkchan · Post by **mkchan** » Tue Jul 02, 2019 2:17 pm

This thread is full of eristic intent and sophistry from the author. I vote we shut the thread down as non-productive.

CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?