CCRL Chess Engine Match Standards. How obsolete are they?

Hugo · Post by **Hugo** » Mon Jul 01, 2019 5:01 pm

Thats really weird , Mark.

You do ponder ON matches with clearly thread overload.
You "test " engines ...23 games, 43 games, 37 games, 102 games...aso. There is NO system at all in your tests.
3 minutes, 5 minutes, 90 minutes.....
But of course, in YOUR world all others are blind to see "the system" and the "easy" overload of threads - except you.

And now, you start moaning about other tester groups.
This is really mind sick.
For you, every kind of attention seems to be worthy. Event it is just a bad one. I am sure, this guides you all the way in your life.

You could grow so much in here, its a great scource of knowledge. Just your ego will not accept anything here.

C.K.

mwyoung · Post by **mwyoung** » Mon Jul 01, 2019 5:43 pm

Hugo wrote: ↑Mon Jul 01, 2019 5:01 pm Thats really weird , Mark.

You do ponder ON matches with clearly thread overload.
You "test " engines ...23 games, 43 games, 37 games, 102 games...aso. There is NO system at all in your tests.
3 minutes, 5 minutes, 90 minutes.....
But of course, in YOUR world all others are blind to see "the system" and the "easy" overload of threads - except you.

And now, you start moaning about other tester groups.
This is really mind sick.
For you, every kind of attention seems to be worthy. Event it is just a bad one. I am sure, this guides you all the way in your life.

You could grow so much in here, its a great scource of knowledge. Just your ego will not accept anything here.

C.K.

Even my overloaded CPU is testing better then a smartphone you carry in your pocket.

40,000,0000 nps game average.

How do you feel as a tester about CCRL using a testing standard that is weaker than a smartphone?

And then misleading the public about the real time controls used in the CCRL ratings list.

We all should expect better from CCRL. The premier testing group. Don't you think so....

JVMerlino · Post by **JVMerlino** » Mon Jul 01, 2019 5:50 pm

mwyoung wrote: ↑Mon Jul 01, 2019 5:43 pm We all should expect better from CCRL.

Clearly using the royal "we" here....

sovaz1997 · Post by **sovaz1997** » Mon Jul 01, 2019 5:53 pm

mwyoung wrote: ↑Mon Jul 01, 2019 5:43 pm
Hugo wrote: ↑Mon Jul 01, 2019 5:01 pm Thats really weird , Mark.

You do ponder ON matches with clearly thread overload.
You "test " engines ...23 games, 43 games, 37 games, 102 games...aso. There is NO system at all in your tests.
3 minutes, 5 minutes, 90 minutes.....
But of course, in YOUR world all others are blind to see "the system" and the "easy" overload of threads - except you.

And now, you start moaning about other tester groups.
This is really mind sick.
For you, every kind of attention seems to be worthy. Event it is just a bad one. I am sure, this guides you all the way in your life.

You could grow so much in here, its a great scource of knowledge. Just your ego will not accept anything here.

C.K.
Even my overloaded CPU is testing better then a smartphone you carry in your pocket.

40,000,0000 nps game average.

How do you feel as a tester about CCRL using a testing standard that is weaker than a smartphone?

And then misleading the public about the real time controls used in the CCRL ratings list.

We all should expect better from CCRL. The premier testing group. Don't you think so....

I find their testing very good. At the same time I consider your testing disgusting, sorry. This is IMHO.

mwyoung · Post by **mwyoung** » Mon Jul 01, 2019 5:58 pm

JVMerlino wrote: ↑Mon Jul 01, 2019 5:50 pm
mwyoung wrote: ↑Mon Jul 01, 2019 5:43 pm We all should expect better from CCRL.
Clearly using the royal "we" here....

Why would we want such a low testing standard from CCRL. Subject experts tell me when you use such a weak standards. Your draw rate decreases making the rating list more subject to rating errors.

What do you think?

mwyoung · Post by **mwyoung** » Mon Jul 01, 2019 6:40 pm

sovaz1997 wrote: ↑Mon Jul 01, 2019 5:53 pm
mwyoung wrote: ↑Mon Jul 01, 2019 5:43 pm
Hugo wrote: ↑Mon Jul 01, 2019 5:01 pm Thats really weird , Mark.

You do ponder ON matches with clearly thread overload.
You "test " engines ...23 games, 43 games, 37 games, 102 games...aso. There is NO system at all in your tests.
3 minutes, 5 minutes, 90 minutes.....
But of course, in YOUR world all others are blind to see "the system" and the "easy" overload of threads - except you.

And now, you start moaning about other tester groups.
This is really mind sick.
For you, every kind of attention seems to be worthy. Event it is just a bad one. I am sure, this guides you all the way in your life.

You could grow so much in here, its a great scource of knowledge. Just your ego will not accept anything here.

C.K.
Even my overloaded CPU is testing better then a smartphone you carry in your pocket.

40,000,0000 nps game average.

How do you feel as a tester about CCRL using a testing standard that is weaker than a smartphone?

And then misleading the public about the real time controls used in the CCRL ratings list.

We all should expect better from CCRL. The premier testing group. Don't you think so....
I find their testing very good. At the same time I consider your testing disgusting, sorry. This is IMHO.

No, that is alright.

If it was not for subject matter experts explaining to me that 40 million nps for the ab engine , 80 Knps for Lc0 game average is not good.

I would not have realized just how bad CCRL testing standard of dumming down their hardware speed to less then a smartphone was hurting their ratings.

I want to thank the subject matter experts for their help.

lkaufman · Post by **lkaufman** » Mon Jul 01, 2019 6:52 pm

Each of the big three (CCRL, CEGT, FastGm) rating lists (I omit SSDF only because by the time they have a reliable rating for an engine, it is already obsolete!) has to work with the people and hardware that they have. FastGM has the most scientific lists for single CPU only by virtue of large samples, round robin tests, and increment use. The others have to mix results from different testers and hardware, so there is a bit more "noise" in the data, but they do a very good job despite this. Sure, the 40/40 CCRL list should be renamed 40/15, but that's a minor quibble, we all know the truth here. As long as you compare 1 GPU with 1 CPU the GPU they use seems reasonable. The use of Bayeselo (by CCRL only) artificially contracts the rating differences, but they might argue that this makes them more realistic in terms of how they would rate vs humans. The use of repeating time controls is a serious waste of resources, but is done to preserve historical continuity. Personally I would advocate just switching to increment play with the base set to the 40 move time and the increment to half that in seconds, and just combining the data from now on as the time controls will be equivalent for a typical 60 move game. So 40/15 would be 15' + 7.5" inc. This would allow perhaps 50% or so more games per hour at the same quality, significantly diminishing error margins. No need for ponder, that's mostly a waste of resources. But despite these issues, the three lists do a great job overall considering that they are volunteers, and most of us involved with computer chess very much appreciate what they do.

ThatsIt · Post by **ThatsIt** » Mon Jul 01, 2019 7:40 pm

Hi Larry,

"we" (CEGT)

use 3'+1" pb=on and 5'+3" also pb=on additionally
to our "repeating time controls lists" to compare
if there are differences between them or if there
are engines which are better/worser in this regard.
Thats no waste of resources in my view.

Best wishes,
G.S.
(CEGT team)

mkchan · Post by **mkchan** » Mon Jul 01, 2019 9:46 pm

To me this seems more of an attack directed at CCRL team than actually getting to the point. From what I read, the issue is the advertised 40/4 | 40/40 which are actually scaled to some CPU that was decided when the website was started. They clearly state, right at the start, approximates for modern CPUs and the benchmarking methodology. I see no bait and switch here at all. The rating list is still indicative of relative strengths of engines to a pretty good accuracy.

If you have such valuable criticism to make, why not start a list of your own instead of making the entire established community conform to your personal interpretation of their list. In fact, make it pay-to-view as well because it's going to be the one true list with exact rating values measured for each new CPU that comes out right? Don't forget to get a few GMs into the pool to better reflect FIDE rating numbers so that we're not fooling anyone about SF's 3400 rating. I'm sure everyone would flock to that

mwyoung · Post by **mwyoung** » Mon Jul 01, 2019 10:03 pm

mkchan wrote: ↑Mon Jul 01, 2019 9:46 pm To me this seems more of an attack directed at CCRL team than actually getting to the point. From what I read, the issue is the advertised 40/4 | 40/40 which are actually scaled to some CPU that was decided when the website was started. They clearly state, right at the start, approximates for modern CPUs and the benchmarking methodology. I see no bait and switch here at all. The rating list is still indicative of relative strengths of engines to a pretty good accuracy.

If you have such valuable criticism to make, why not start a list of your own instead of making the entire established community conform to your personal interpretation of their list. In fact, make it pay-to-view as well because it's going to be the one true list with exact rating values measured for each new CPU that comes out right? Don't forget to get a few GMs into the pool to better reflect FIDE rating numbers so that we're not fooling anyone about SF's 3400 rating. I'm sure everyone would flock to that

I was told by subject matter expert that CCRL standards is very poor.

And it is clear reading the posts of the discussion. The practice of CCRL claiming the ratings test as 4/40 and 40/40 needs to be addressed. CCRL can fix this issue today.

So valuable criticism has been made.

CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Re: CCRL Chess Engine Match Standards. How obsolete are they?