Leela Chess Zero 42656 vs Stockfish 210619

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: Leela Chess Zero 42656 vs Stockfish 210619

Post by Raphexon »

mwyoung wrote: Sun Jun 30, 2019 12:18 pm
Raphexon wrote: Sun Jun 30, 2019 10:23 am
mwyoung wrote: Sat Jun 29, 2019 2:00 pm
Nordlandia wrote: Sat Jun 29, 2019 9:10 am mwyoung: the way i interpret your replies is that you're afraid of sacrificing speed (modest speed penalty), something that is most necessary when running with ponder enabled.

Perhaps you should purchase another identical machine then two computer matches solves all of the issues mentioned in this thread.
You bring up a good point about sacrificing speed.
Did you know CCRL is testing chess engines by using the speed of a Athlon 64 X2 4600+ as a gauge to slow down their computers.
The Athlon 64 X2 4600+ was released in 2005. Almost 15 years ago.
You can buy a Athlon 64 X2 4600+ right now on ebay for around $5.

Did you know the Athlon 64 X2 4600+ has a cinebench R15 score of 90. My 2950x has a score of 3700. That is like 40 times faster.

On my computer I would have to use a time control of 5 seconds in 40 moves if I used the Athlon 64 X2 4600+ as a gauge to slow down my computer.

What kind of clown car show is CCRL running......
filepicker_FZLKRu3aTQujSPegOvqg_clown_car.jpg
Cinebench isn't NPS performance.
Besides your 2950x has a lot more threads and CCRL only tests with 1 and 4 cores.
Your single core performance is at most 3 times as good as the old Athlon. (probably more like 2 times as good)
You test with more threads because we live in 2019. You test at the full potential of the CPU, Ram speed and size. Not dumming it down to a CPU that came out in 2005.

The maximum hash size CCRL is allowed to use is 1 GB. This is the year 2019, not 2005.

CCRL is only testing with 5 man TB. This is year 2019....

CCRL testing standars are obsolete, and irrelevant in 2019.
Tell me how is Stockfish tested, and then tell me again why CCRL testing is obsolete.
CCRL is a very accurate predictor of engine strength even on stronger hardware/longer TC.
Besides, testing at shorter TC makes it much easier to test many engines plenty of times.

Or would you rather have the numbers changed so it's 40/20 or 40/2 based on an i5-4460?
Or 40/15 on the latest Ryzen?
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Leela Chess Zero 42656 vs Stockfish 210619

Post by mwyoung »

Raphexon wrote: Sun Jun 30, 2019 7:19 pm
mwyoung wrote: Sun Jun 30, 2019 12:18 pm
Raphexon wrote: Sun Jun 30, 2019 10:23 am
mwyoung wrote: Sat Jun 29, 2019 2:00 pm
Nordlandia wrote: Sat Jun 29, 2019 9:10 am mwyoung: the way i interpret your replies is that you're afraid of sacrificing speed (modest speed penalty), something that is most necessary when running with ponder enabled.

Perhaps you should purchase another identical machine then two computer matches solves all of the issues mentioned in this thread.
You bring up a good point about sacrificing speed.
Did you know CCRL is testing chess engines by using the speed of a Athlon 64 X2 4600+ as a gauge to slow down their computers.
The Athlon 64 X2 4600+ was released in 2005. Almost 15 years ago.
You can buy a Athlon 64 X2 4600+ right now on ebay for around $5.

Did you know the Athlon 64 X2 4600+ has a cinebench R15 score of 90. My 2950x has a score of 3700. That is like 40 times faster.

On my computer I would have to use a time control of 5 seconds in 40 moves if I used the Athlon 64 X2 4600+ as a gauge to slow down my computer.

What kind of clown car show is CCRL running......
filepicker_FZLKRu3aTQujSPegOvqg_clown_car.jpg
Cinebench isn't NPS performance.
Besides your 2950x has a lot more threads and CCRL only tests with 1 and 4 cores.
Your single core performance is at most 3 times as good as the old Athlon. (probably more like 2 times as good)
You test with more threads because we live in 2019. You test at the full potential of the CPU, Ram speed and size. Not dumming it down to a CPU that came out in 2005.

The maximum hash size CCRL is allowed to use is 1 GB. This is the year 2019, not 2005.

CCRL is only testing with 5 man TB. This is year 2019....

CCRL testing standars are obsolete, and irrelevant in 2019.
Tell me how is Stockfish tested, and then tell me again why CCRL testing is obsolete.
CCRL is a very accurate predictor of engine strength even on stronger hardware/longer TC.
Besides, testing at shorter TC makes it much easier to test many engines plenty of times.

Or would you rather have the numbers changed so it's 40/20 or 40/2 based on an i5-4460?
Or 40/15 on the latest Ryzen?
Well I guess it maybe a problem when CCRL testing standards is less powerful then a smartphone.

I guess everyone can post test results here with less than a smartphone and call it good. As long as we are consistent, and obsolete at same time.

And there are a lot of people that are very accurate predictors of engine strength. Why does CCRL have to have to worst and most misleading testing standards?
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Leela Chess Zero 42656 vs Stockfish 210619

Post by mwyoung »

mwyoung wrote: Sun Jun 30, 2019 10:28 pm
Raphexon wrote: Sun Jun 30, 2019 7:19 pm
mwyoung wrote: Sun Jun 30, 2019 12:18 pm
Raphexon wrote: Sun Jun 30, 2019 10:23 am
mwyoung wrote: Sat Jun 29, 2019 2:00 pm
Nordlandia wrote: Sat Jun 29, 2019 9:10 am mwyoung: the way i interpret your replies is that you're afraid of sacrificing speed (modest speed penalty), something that is most necessary when running with ponder enabled.

Perhaps you should purchase another identical machine then two computer matches solves all of the issues mentioned in this thread.
You bring up a good point about sacrificing speed.
Did you know CCRL is testing chess engines by using the speed of a Athlon 64 X2 4600+ as a gauge to slow down their computers.
The Athlon 64 X2 4600+ was released in 2005. Almost 15 years ago.
You can buy a Athlon 64 X2 4600+ right now on ebay for around $5.

Did you know the Athlon 64 X2 4600+ has a cinebench R15 score of 90. My 2950x has a score of 3700. That is like 40 times faster.

On my computer I would have to use a time control of 5 seconds in 40 moves if I used the Athlon 64 X2 4600+ as a gauge to slow down my computer.

What kind of clown car show is CCRL running......
filepicker_FZLKRu3aTQujSPegOvqg_clown_car.jpg
Cinebench isn't NPS performance.
Besides your 2950x has a lot more threads and CCRL only tests with 1 and 4 cores.
Your single core performance is at most 3 times as good as the old Athlon. (probably more like 2 times as good)
You test with more threads because we live in 2019. You test at the full potential of the CPU, Ram speed and size. Not dumming it down to a CPU that came out in 2005.

The maximum hash size CCRL is allowed to use is 1 GB. This is the year 2019, not 2005.

CCRL is only testing with 5 man TB. This is year 2019....

CCRL testing standars are obsolete, and irrelevant in 2019.
Tell me how is Stockfish tested, and then tell me again why CCRL testing is obsolete.
CCRL is a very accurate predictor of engine strength even on stronger hardware/longer TC.
Besides, testing at shorter TC makes it much easier to test many engines plenty of times.

Or would you rather have the numbers changed so it's 40/20 or 40/2 based on an i5-4460?
Or 40/15 on the latest Ryzen?
Well I guess it maybe a problem when CCRL testing standards is less powerful then a smartphone.

I guess everyone can post test results here with less than a smartphone and call it good. As long as we are consistent, and obsolete at same time.

And there are a lot of people that are very accurate predictors of engine strength. Why does CCRL have to have to worst and most misleading testing standards?

And where is CCRL very accurate prediction with Lc0. Try running Lc0 on something other then a ebay sale graphics card. But to do this you have to run the A/B engines on more then a smartphone.

1 Stockfish 10 64-bit 4CPU 3546 +13 −12 69.6% −124.9 54.9% 2015
100.0%
2 Houdini 6 64-bit 4CPU 3519 +9 −9 65.5% −108.4 53.9% 3912
95.8%
3 Komodo 11.2 64-bit 4CPU 3503 +16 −16 58.2% −66.6 55.3% 1158
90.4%
4 Lc0 0.21.1 JH.T6.532 GPU 3487 +17 −17 59.2% −58.5 52.4% 1100
100.0
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.