CCRL Chess Engine Match Standards. How obsolete are they?

Discussion of computer chess matches and engine tournaments.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Graham Banks
Posts: 33252
Joined: Sun Feb 26, 2006 9:52 am
Location: Auckland, NZ

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by Graham Banks » Sun Jun 30, 2019 10:53 pm

mwyoung wrote:
Sun Jun 30, 2019 10:25 pm
elcabesa wrote:
Sun Jun 30, 2019 10:11 pm
so the real problem for you is that LC0 and NN behave so poorly?
Is it really correct to advertise a rating list as 4m/40moves. When CCRL is really testing at 1.5m/40 moves.
Is it really correct to advertise a rating list as 40m/40moves When CCRL is really testing at 15m/40 moves.

And it is not 2005 anymore. You can not test the modern NN engines without proper hardware.
And you can not have a fair match to test the NN engines.
If the A/B are running a system less powerful then a modern smartphone.
We all have different hardware, ranging from a Q8200 (the oldest) through to the most recent i7's and octals.
In order to maintain consistency in our results, we benchmark our machines.

We are just a group of people who enjoy testing engine v engine, and thought that we could combine our efforts to produce rating lists that some might find useful.

You have a nice computer, one that I'm sure most of us would love to be able to afford, but a bit of humility on your part wouldn't go amiss.
My email addresses:
gbanksnz at gmail.com
gbanksnz at yahoo.co.nz

mwyoung
Posts: 1642
Joined: Wed May 12, 2010 8:00 pm

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by mwyoung » Sun Jun 30, 2019 11:06 pm

Graham Banks wrote:
Sun Jun 30, 2019 10:53 pm
mwyoung wrote:
Sun Jun 30, 2019 10:25 pm
elcabesa wrote:
Sun Jun 30, 2019 10:11 pm
so the real problem for you is that LC0 and NN behave so poorly?
Is it really correct to advertise a rating list as 4m/40moves. When CCRL is really testing at 1.5m/40 moves.
Is it really correct to advertise a rating list as 40m/40moves When CCRL is really testing at 15m/40 moves.

And it is not 2005 anymore. You can not test the modern NN engines without proper hardware.
And you can not have a fair match to test the NN engines.
If the A/B are running a system less powerful then a modern smartphone.
We all have different hardware, ranging from a Q8200 (the oldest) through to the most recent i7's and octals.
In order to maintain consistency in our results, we benchmark our machines.

We are just a group of people who enjoy testing engine v engine, and thought that we could combine our efforts to produce rating lists that some might find useful.

You have a nice computer, one that I'm sure most of us would love to be able to afford, but a bit of humility on your part wouldn't go amiss.
My computer has nothing to do with proper engine testing. And did not cause CCRL to mislead people on the time control, and hardware used.

SSDF rating list does not do this.

2019-02-28
148673 games played by 377 computers
The Swedish Ratinglist may be quoted in other magazines, but we insist that this will be done in a correct way! We expect, that not only the rating figures, but also the number of games and the margin of error will be quoted.

Please read the comment by the chairman, Lars Sandin. You may also download the list in DOS text format. Please note that this is a longer list, with almost all tested computers since SSDF began its work more than 20 years ago!

All games have been played on the tournament level, 40 moves/2 hours followed by 20 moves/each following hour. In matches between PC-programs, two separate PCs have been used, connected with an auto232-cable.

If you have any questions about the list you are welcome to contact us.



Rating + - Games Won Av.opp
1 Stockfish 9 x64 1800X 3.6 GHz 3494 32 -30 642 74% 3308
2 Komodo 12.3 x64 1800X 3.6 GHz 3456 30 -28 640 68% 3321
3 Stockfish 9 x64 Q6600 2.4 GHz 3446 50 -48 200 57% 3396
4 Stockfish 8 x64 1800X 3.6 GHz 3432 26 -24 1059 77% 3217
5 Stockfish 8 x64 Q6600 2.4 GHz 3418 38 -35 440 72% 3251
6 Komodo 11.01 x64 1800X 3.6 GHz 3397 23 -22 1134 72% 3229
7 Deep Shredder 13 x64 1800X 3.6 GHz 3360 25 -24 830 66% 3246
8 Booot 6.3.1 x64 1800X 3.6 GHz 3352 29 -29 560 54% 3319
9 Komodo 9.1 x64 Q6600 2.4 GHz 3340 21 -20 1435 72% 3175
Professing themselves to be wise, they became fools,
Take on me. foes 0

User avatar
Graham Banks
Posts: 33252
Joined: Sun Feb 26, 2006 9:52 am
Location: Auckland, NZ

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by Graham Banks » Sun Jun 30, 2019 11:12 pm

mwyoung wrote:
Sun Jun 30, 2019 11:06 pm
.....CCRL to mislead people on the time control, and hardware used.
Well, we do state the following:

Time control: Equivalent to 40 moves in 40 minutes on Athlon 64 X2 4600+ (2.4 GHz), about 15 minutes on a modern Intel CPU.

And:

CCRL 40/40 Testing Conditions
Time Control: Equivalent to 40 moves in 40 minutes on AMD X2 4600+ at 2.4GHz. We use Crafty 19.17 BH as a benchmark to determine the equivalent time control for particular machine.
My email addresses:
gbanksnz at gmail.com
gbanksnz at yahoo.co.nz

mwyoung
Posts: 1642
Joined: Wed May 12, 2010 8:00 pm

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by mwyoung » Sun Jun 30, 2019 11:29 pm

Graham Banks wrote:
Sun Jun 30, 2019 11:12 pm
mwyoung wrote:
Sun Jun 30, 2019 11:06 pm
.....CCRL to mislead people on the time control, and hardware used.
Well, we do state the following:

Time control: Equivalent to 40 moves in 40 minutes on Athlon 64 X2 4600+ (2.4 GHz), about 15 minutes on a modern Intel CPU.

And:

CCRL 40/40 Testing Conditions
Time Control: Equivalent to 40 moves in 40 minutes on AMD X2 4600+ at 2.4GHz. We use Crafty 19.17 BH as a benchmark to determine the equivalent time control for particular machine.
I pointed that out also. It is called a bait and switch in advertising. People see 40m/40 and you hope they don't read the fine print on what they are really receiving. I also pointed out that the setting is less powerful then a modern smartphone.

You need to advertise the real testing standard in the Header. 1.5m/40 and 15m/40 on a testing speed of a AMD X2 4600+ at 2.4GHz.

SSDF does not have an issue calling the correct time control. And Hardware used to test the engines.

Why is CCRL not will to do the same?
What are you going to do in the next few years as your hardware improves. Test at 30 seconds in 40 moves. So CCRL can pump out more low quality games at an even faster rate.

2019-02-28
148673 games played by 377 computers
The Swedish Ratinglist may be quoted in other magazines, but we insist that this will be done in a correct way! We expect, that not only the rating figures, but also the number of games and the margin of error will be quoted.

Please read the comment by the chairman, Lars Sandin. You may also download the list in DOS text format. Please note that this is a longer list, with almost all tested computers since SSDF began its work more than 20 years ago!

All games have been played on the tournament level, 40 moves/2 hours followed by 20 moves/each following hour. In matches between PC-programs, two separate PCs have been used, connected with an auto232-cable.

If you have any questions about the list you are welcome to contact us.



Rating + - Games Won Av.opp
1 Stockfish 9 x64 1800X 3.6 GHz 3494 32 -30 642 74% 3308
2 Komodo 12.3 x64 1800X 3.6 GHz 3456 30 -28 640 68% 3321
3 Stockfish 9 x64 Q6600 2.4 GHz 3446 50 -48 200 57% 3396
4 Stockfish 8 x64 1800X 3.6 GHz 3432 26 -24 1059 77% 3217
5 Stockfish 8 x64 Q6600 2.4 GHz 3418 38 -35 440 72% 3251
6 Komodo 11.01 x64 1800X 3.6 GHz 3397 23 -22 1134 72% 3229
7 Deep Shredder 13 x64 1800X 3.6 GHz 3360 25 -24 830 66% 3246
8 Booot 6.3.1 x64 1800X 3.6 GHz 3352 29 -29 560 54% 3319
9 Komodo 9.1 x64 Q6600 2.4 GHz 3340 21 -20 1435 72% 3175
Professing themselves to be wise, they became fools,
Take on me. foes 0

Modern Times
Posts: 2421
Joined: Thu Jun 07, 2012 9:02 pm

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by Modern Times » Mon Jul 01, 2019 4:38 am

AndrewGrant wrote:
Sun Jun 30, 2019 10:21 pm
elcabesa wrote:
Sun Jun 30, 2019 10:11 pm
so the real problem for you is that LC0 and NN behave so poorly?
That tends to be the real motivation for those with an obsession with LC0.
Indeed.

**** Whether Lc0 is "best" or not is totally dependent on how much money you're willing to spend on a GPU. *****

On this list the Lc0 fans need to compare LCo's performance to single CPU engines, as the core of the list is single CPU. That in my mind was the basis of the choice of the GTX 1050. There, Lc0 is just 8 Elo behind Stock fish 10 and LC0 fans should be happy with that.

http://ccrl.chessdom.com/ccrl/404/cgi/c ... librate=no


In the same way that an engine's performance is boosted by running on 4CPU vs 1CPU, so will Lc0's performance be boosted when we start to use stronger GPUs. The GTX 1050 was just to get a foot in the door at single CPU level. So the GTX1050 will kind of be the "1CPU" on the list, and then a stronger GPU will then provide competition for the "4CPU" tier. That is my thinking anyway.
.

Opinions expressed here are my own, and not necessarily those of the CCRL Group.

User avatar
xr_a_y
Posts: 789
Joined: Sat Nov 25, 2017 1:28 pm
Location: France

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by xr_a_y » Mon Jul 01, 2019 4:45 am

mwyoung wrote:
Sun Jun 30, 2019 9:53 pm
xr_a_y wrote:
Sun Jun 30, 2019 4:07 pm
So what ?, 40/40 means something like 40/10 and 40/4 means something like 40/1. This is not an issue I think, CCRL testers are using heterogeneous hardware anyway, so "scaling" is alwas needed.

Do you really want engines to be tested on 8 threads @3.8GHz, 32Gb hash for 40 min TC ? For what purpose ?
The times have changed. You can not test the A/B engines on a system setting that performs less then a smartphone. Without proper testing methods. The NN engines can not be tested correctly. This is why Lc0 rating and ranking is so wrong by CCRL. NN engines have to have modern hardware. This is not the year 2005 anymore.


1 Stockfish 10 64-bit 4CPU 3546 +13 −12 69.6% −124.9 54.9% 2015
100.0%
2 Houdini 6 64-bit 4CPU 3519 +9 −9 65.5% −108.4 53.9% 3912
95.8%
3 Komodo 11.2 64-bit 4CPU 3503 +16 −16 58.2% −66.6 55.3% 1158
90.4%
4 Lc0 0.21.1 JH.T6.532 GPU 3487 +17 −17 59.2% −58.5 52.4% 1100
100.0

But why focus on the system ? I say that's the TC is not the same. And anyway, comparing CPU and GPU will never be easy, so Lc0 performance versus engines that uses only CPU will always be bias. There is a good discussion on the subject of CPU/GPU comparison : shall we compare perf (how ?), power consumption, price, ...

But please note that very often, modern hardware is used, and that's only the TC that's change.

AndrewGrant
Posts: 494
Joined: Tue Apr 19, 2016 4:08 am
Location: U.S.A
Full name: Andrew Grant
Contact:

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by AndrewGrant » Mon Jul 01, 2019 6:38 am

mwyoung wrote:
Sun Jun 30, 2019 11:29 pm
I pointed that out also. It is called a bait and switch in advertising. People see 40m/40 and you hope they don't read the fine print on what they are really receiving. I also pointed out that the setting is less powerful then a modern smartphone.
It seems like you are the ONLY person who is confused by what CCRL is. This sounds like a personal problem for you to work through, and not something worth posting on these forums about. Your vendetta with CCRL is very bizarre. I imagine you have better things to do. Perhaps start your own rating list, using whatever you feel is appropriate hardware for the current year. Although, if you were not convinced by the recent thread that pondering with every CPU in use is a stupid idea, then you may find another vindictive user posting here complaining about YOUR rating list being confusing.

User avatar
Guenther
Posts: 3111
Joined: Wed Oct 01, 2008 4:33 am
Location: Regensburg, Germany
Full name: Guenther Simon
Contact:

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by Guenther » Mon Jul 01, 2019 9:01 am

AndrewGrant wrote:
Sun Jun 30, 2019 10:21 pm
elcabesa wrote:
Sun Jun 30, 2019 10:11 pm
so the real problem for you is that LC0 and NN behave so poorly?
That tends to be the real motivation for those with an obsession with LC0.
I have just ignored him since he couldn't get over people stating it is nonsense to use more threads
in a ponder on match, as even hyperthreads are available (plus youtube streaming!).
Now he tries to put someone else down instead...

(Ofc he has no clue of stats and boldly states the wonderful average nps or depths and he gets off with it, because
he rarely posts real pgn files at all and the few I have seen were so obfuscated through CB software that I didn't care
to write another script making them usable and do some data mining - he will never acknowledge sudden erratic
time to depth abnormalities, which surely are there, hidden in his average good numbers...
Moreover the habit of only posting youtube links here for mere ccc games should be already forbidden, no one wants
to be spammed by watching a ccc game)
Current foe list count : [101]
http://rwbc-chess.de/chronology.htm

mwyoung
Posts: 1642
Joined: Wed May 12, 2010 8:00 pm

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by mwyoung » Mon Jul 01, 2019 11:09 am

Guenther wrote:
Mon Jul 01, 2019 9:01 am
AndrewGrant wrote:
Sun Jun 30, 2019 10:21 pm
elcabesa wrote:
Sun Jun 30, 2019 10:11 pm
so the real problem for you is that LC0 and NN behave so poorly?
That tends to be the real motivation for those with an obsession with LC0.
I have just ignored him since he couldn't get over people stating it is nonsense to use more threads
in a ponder on match, as even hyperthreads are available (plus youtube streaming!).
Now he tries to put someone else down instead...

(Ofc he has no clue of stats and boldly states the wonderful average nps or depths and he gets off with it, because
he rarely posts real pgn files at all and the few I have seen were so obfuscated through CB software that I didn't care
to write another script making them usable and do some data mining - he will never acknowledge sudden erratic
time to depth abnormalities, which surely are there, hidden in his average good numbers...
Moreover the habit of only posting youtube links here for mere ccc games should be already forbidden, no one wants
to be spammed by watching a ccc game)
Gunther. I thought we were all about standards here. Now CCRL does not want to talk about it. I was sure everyone here wanted a standards discussion. And there is much I know to discuss about CCRL.
Professing themselves to be wise, they became fools,
Take on me. foes 0

mwyoung
Posts: 1642
Joined: Wed May 12, 2010 8:00 pm

Re: CCRL Chess Engine Match Standards. How obsolete are they?

Post by mwyoung » Mon Jul 01, 2019 12:39 pm

AndrewGrant wrote:
Mon Jul 01, 2019 6:38 am
mwyoung wrote:
Sun Jun 30, 2019 11:29 pm
I pointed that out also. It is called a bait and switch in advertising. People see 40m/40 and you hope they don't read the fine print on what they are really receiving. I also pointed out that the setting is less powerful then a modern smartphone.
It seems like you are the ONLY person who is confused by what CCRL is. This sounds like a personal problem for you to work through, and not something worth posting on these forums about. Your vendetta with CCRL is very bizarre. I imagine you have better things to do. Perhaps start your own rating list, using whatever you feel is appropriate hardware for the current year. Although, if you were not convinced by the recent thread that pondering with every CPU in use is a stupid idea, then you may find another vindictive user posting here complaining about YOUR rating list being confusing.
Someone needs to say it. So I said it. Very poor testing standards at CCRL. And then CCRL misleads people about the time controls used in the ratings list. Very poor standards used by CCRL.
Professing themselves to be wise, they became fools,
Take on me. foes 0

Post Reply