CCRL Chess Engine Match Standards. How obsolete are they?

Discussion of computer chess matches and engine tournaments.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
MikeB
Posts: 3552
Joined: Thu Mar 09, 2006 5:34 am
Location: Pen Argyl, Pennsylvania

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by MikeB » Mon Jul 01, 2019 8:04 pm

Guenther wrote:
Mon Jul 01, 2019 9:01 am
AndrewGrant wrote:
Sun Jun 30, 2019 10:21 pm
elcabesa wrote:
Sun Jun 30, 2019 10:11 pm
so the real problem for you is that LC0 and NN behave so poorly?
That tends to be the real motivation for those with an obsession with LC0.
I have just ignored him since he couldn't get over people stating it is nonsense to use more threads
in a ponder on match, as even hyperthreads are available (plus youtube streaming!).
Now he tries to put someone else down instead...

(Ofc he has no clue of stats and boldly states the wonderful average nps or depths and he gets off with it, because
he rarely posts real pgn files at all and the few I have seen were so obfuscated through CB software that I didn't care
to write another script making them usable and do some data mining - he will never acknowledge sudden erratic
time to depth abnormalities, which surely are there, hidden in his average good numbers...
Moreover the habit of only posting youtube links here for mere ccc games should be already forbidden, no one wants
to be spammed by watching a ccc game)
+1 Bullseye!

sovaz1997
Posts: 235
Joined: Sun Nov 13, 2016 9:37 am

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by sovaz1997 » Mon Jul 01, 2019 8:30 pm

mwyoung wrote:
Sun Jun 30, 2019 10:41 pm
AndrewGrant wrote:
Sun Jun 30, 2019 10:21 pm
elcabesa wrote:
Sun Jun 30, 2019 10:11 pm
so the real problem for you is that LC0 and NN behave so poorly?
That tends to be the real motivation for those with an obsession with LC0.
I have an obsession with testing chess computers since 1982.
And some of you A/B guys really hate Lc0.

And my favorite engines is always the best tested engines.

Some may remember when Houdini was best, and I tested Stockfish as best.
The same kind of fan boys said I had an obsession with Stockfish.
I called it as Stockfish was tested. The Best....Now Lc0 is best by test.
Here I agree. Indeed, many are annoyed at Lc0 and try to belittle it. But it works on the other hand: some Leela fans hate Stockfish. And one of the reasons is described here: https://en.wikipedia.org/wiki/Anchoring :D
Zevra 2 - my chess engine.
Binary, source and description here: https://github.com/sovaz1997/Zevra2

konsolas
Posts: 182
Joined: Sun Jun 12, 2016 3:44 pm
Location: London
Full name: Vincent
Contact:

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by konsolas » Mon Jul 01, 2019 9:16 pm

I admire CCRL's commitment to their scientific approach of benchmarking and standardising time controls to ensure consistent conditions across all their games. Whether or not the list is called 40/40 or 40/20 is just nitpicking given its history and the fact that the "40" is immediately explained at the top of the page.

tmokonen
Posts: 1038
Joined: Sun Mar 12, 2006 5:46 pm
Location: Vancouver

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by tmokonen » Mon Jul 01, 2019 10:12 pm

Chess engine fanboyism is stupid.

mwyoung
Posts: 1642
Joined: Wed May 12, 2010 8:00 pm

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by mwyoung » Mon Jul 01, 2019 11:20 pm

konsolas wrote:
Mon Jul 01, 2019 9:16 pm
I admire CCRL's commitment to their scientific approach of benchmarking and standardising time controls to ensure consistent conditions across all their games. Whether or not the list is called 40/40 or 40/20 is just nitpicking given its history and the fact that the "40" is immediately explained at the top of the page.
So how long is it credible to call for for example the 4/40 test a 4/40 test. We are currently at 1.5/40. In the next few years it will be 1/40 then 45 seconds in 40 and so on...

This is archaic right now. And if I did not have a valid point.
This graphic would not have caused the reaction seen here.
Since the argument goes. This is already known by everyone.
CCRL Chess Engine Testing Standards...jpg
CCRL Chess Engine Testing Standards...jpg (41.36 KiB) Viewed 1294 times
Professing themselves to be wise, they became fools,
Take on me. foes 0

mwyoung
Posts: 1642
Joined: Wed May 12, 2010 8:00 pm

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by mwyoung » Tue Jul 02, 2019 12:32 am

MikeB wrote:
Mon Jul 01, 2019 8:04 pm
Guenther wrote:
Mon Jul 01, 2019 9:01 am
AndrewGrant wrote:
Sun Jun 30, 2019 10:21 pm
elcabesa wrote:
Sun Jun 30, 2019 10:11 pm
so the real problem for you is that LC0 and NN behave so poorly?
That tends to be the real motivation for those with an obsession with LC0.
I have just ignored him since he couldn't get over people stating it is nonsense to use more threads
in a ponder on match, as even hyperthreads are available (plus youtube streaming!).
Now he tries to put someone else down instead...

(Ofc he has no clue of stats and boldly states the wonderful average nps or depths and he gets off with it, because
he rarely posts real pgn files at all and the few I have seen were so obfuscated through CB software that I didn't care
to write another script making them usable and do some data mining - he will never acknowledge sudden erratic
time to depth abnormalities, which surely are there, hidden in his average good numbers...
Moreover the habit of only posting youtube links here for mere ccc games should be already forbidden, no one wants
to be spammed by watching a ccc game)
+1 Bullseye!
Mike is this better than a CCRL Smart phone? My computer works just fine. What are you talking about....

Here is the live link for anyone to see. All the games are recorded, and can be viewed.
live link: https://www.youtube.com/watch?v=QKUPqFE-3yw
Better than a smart phone.jpg
Better than a smart phone.jpg (189.42 KiB) Viewed 1280 times
Professing themselves to be wise, they became fools,
Take on me. foes 0

Dann Corbit
Posts: 10202
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by Dann Corbit » Tue Jul 02, 2019 12:57 am

mwyoung wrote:
Mon Jul 01, 2019 11:20 pm
konsolas wrote:
Mon Jul 01, 2019 9:16 pm
I admire CCRL's commitment to their scientific approach of benchmarking and standardising time controls to ensure consistent conditions across all their games. Whether or not the list is called 40/40 or 40/20 is just nitpicking given its history and the fact that the "40" is immediately explained at the top of the page.
So how long is it credible to call for for example the 4/40 test a 4/40 test. We are currently at 1.5/40. In the next few years it will be 1/40 then 45 seconds in 40 and so on...

This is archaic right now. And if I did not have a valid point.
This graphic would not have caused the reaction seen here.
Since the argument goes. This is already known by everyone.
CCRL Chess Engine Testing Standards...jpg
The amount of time taken per move is relatively irrelevant.
If it was not, then the ratings for the 40/4 lists would be very different from the 40/40 lists.
But as you can see, they match up pretty well.

It is also like the SSDF list. People often complain that the hardware is not strong enough.
But changing to stronger hardware would be a terrible mistake.
You would be exchanging rulers with millimeter rulings for ropes with knots tied in them every forearm length.
That is because when you have thousands of games played by hundreds of engines you can measure the real strength very, very closely.
If you change things, like going to ultra powerful CPUs or going to longer time control, then your ruler is going to be very crude.

Now, I would also like to see longer time control and more powerful CPUs. But the problems are time and money.
Ideally, we would have 32 core 7nm AMD machines going full tilt at correspondence time controls of a move per day. If we have a thousand machines, we can still get prodigious output.
But nobody has patience to wait a whole day for a move except correspondence players and a few crazy people like me.
And nobody is going to donate a giant pile of 32 core 7nm AMD machines to go full tilt with.

But I suggest that if you can find a way to overcome the money and time problems, we can see stronger systems at longer time control.
But don't forget, they won't tell us much about how strong the programs are for quite a while (until we have enough games for an accurate calibration).
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

mwyoung
Posts: 1642
Joined: Wed May 12, 2010 8:00 pm

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by mwyoung » Tue Jul 02, 2019 1:14 am

Dann Corbit wrote:
Tue Jul 02, 2019 12:57 am
mwyoung wrote:
Mon Jul 01, 2019 11:20 pm
konsolas wrote:
Mon Jul 01, 2019 9:16 pm
I admire CCRL's commitment to their scientific approach of benchmarking and standardising time controls to ensure consistent conditions across all their games. Whether or not the list is called 40/40 or 40/20 is just nitpicking given its history and the fact that the "40" is immediately explained at the top of the page.
So how long is it credible to call for for example the 4/40 test a 4/40 test. We are currently at 1.5/40. In the next few years it will be 1/40 then 45 seconds in 40 and so on...

This is archaic right now. And if I did not have a valid point.
This graphic would not have caused the reaction seen here.
Since the argument goes. This is already known by everyone.
CCRL Chess Engine Testing Standards...jpg
The amount of time taken per move is relatively irrelevant.
If it was not, then the ratings for the 40/4 lists would be very different from the 40/40 lists.
But as you can see, they match up pretty well.

It is also like the SSDF list. People often complain that the hardware is not strong enough.
But changing to stronger hardware would be a terrible mistake.
You would be exchanging rulers with millimeter rulings for ropes with knots tied in them every forearm length.
That is because when you have thousands of games played by hundreds of engines you can measure the real strength very, very closely.
If you change things, like going to ultra powerful CPUs or going to longer time control, then your ruler is going to be very crude.

Now, I would also like to see longer time control and more powerful CPUs. But the problems are time and money.
Ideally, we would have 32 core 7nm AMD machines going full tilt at correspondence time controls of a move per day. If we have a thousand machines, we can still get prodigious output.
But nobody has patience to wait a whole day for a move except correspondence players and a few crazy people like me.
And nobody is going to donate a giant pile of 32 core 7nm AMD machines to go full tilt with.

But I suggest that if you can find a way to overcome the money and time problems, we can see stronger systems at longer time control.
But don't forget, they won't tell us much about how strong the programs are for quite a while (until we have enough games for an accurate calibration).
I guess chess engine testing is hard, and time consuming. I know, I have spent many thousands of dollars, and thousands of hours since 1982.

But that is no excuse to mislead people, what every excuse you want to give for CCRL.

What is irrelevant is using a processor from 2005 as your bench. To pump out as many low quality games as possible. You know as well as I do what low quality games does to the real ratings. The first thing you sacrifice is the drawing rate of the engines.

I hate to say it, but more people will use TCEC as the real benchmark for the chess engines true strength. If this practice does not stop....
Professing themselves to be wise, they became fools,
Take on me. foes 0

AndrewGrant
Posts: 494
Joined: Tue Apr 19, 2016 4:08 am
Location: U.S.A
Full name: Andrew Grant
Contact:

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by AndrewGrant » Tue Jul 02, 2019 2:09 am

mwyoung wrote:
Tue Jul 02, 2019 12:32 am
Mike is this better than a CCRL Smart phone? My computer works just fine. What are you talking about....
viewtopic.php?p=773654#p773654

Do yourself a favour and read through that. Its the story of how some people failed to listen about what pondering and hyperthreading can do. Ironically, you've actually managed to take it one step further by assigning more threads than their are hyperthreads, but thats another story.

In case you are too stubborn or lazy to actually read the above link and humour the possibility that you are wrong, I'll give you a summary.

1) You are allocating more threads than there are on the CPU
2) This IS causing engines to be subtly screwed out of CPU time. Ironically, its likely that Leela is hurt more than the AB engines
3) Even without over allocation, there is no way to trust the OS to not lock an engine out of a thread for a period of time.

mwyoung
Posts: 1642
Joined: Wed May 12, 2010 8:00 pm

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by mwyoung » Tue Jul 02, 2019 2:24 am

AndrewGrant wrote:
Tue Jul 02, 2019 2:09 am
mwyoung wrote:
Tue Jul 02, 2019 12:32 am
Mike is this better than a CCRL Smart phone? My computer works just fine. What are you talking about....
viewtopic.php?p=773654#p773654

Do yourself a favour and read through that. Its the story of how some people failed to listen about what pondering and hyperthreading can do. Ironically, you've actually managed to take it one step further by assigning more threads than their are hyperthreads, but thats another story.

In case you are too stubborn or lazy to actually read the above link and humour the possibility that you are wrong, I'll give you a summary.

1) You are allocating more threads than there are on the CPU
2) This IS causing engines to be subtly screwed out of CPU time. Ironically, its likely that Leela is hurt more than the AB engines
3) Even without over allocation, there is no way to trust the OS to not lock an engine out of a thread for a period of time.
You are very ignorant on how modern AMD SMT works on a 2950x. Or maybe you are confusing Lc0 for a AB engine. Maybe I am using magic beans.

My computer performance speaks for itself and this is streaming live video with this performance.

Please explain after observing your ignorance....

All games are recorded. And can be viewed. Ponder matches and non Ponder matches. Modern technology is amazing. You need to keep up to date. Never mind I forgot you are stuck in 2005 with CCRL.

Live Link: https://www.youtube.com/watch?v=QKUPqFE-3yw
Last edited by mwyoung on Tue Jul 02, 2019 2:40 am, edited 1 time in total.
Professing themselves to be wise, they became fools,
Take on me. foes 0

Post Reply