Stockfish 4

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Stockfish 4

Post by Don »

Modern Times wrote:It isn't even 8 Elo - 7-3 =4 or to be generous, 8-2 = 6.

But I still don't buy it. I have not seen any proof that Komodo benefits more than any other engines from popcount. It would need a huge amount of games to measure that with say 95% certainty.

I ran off a few quick games on my tester to expose how ignorant you are of this issue:

Code: Select all

Rank    ELO     +/-    Games    Score  Player
---- ------- ------ -------- --------  ----------------------------
   1  3020.1   13.9     1278   52.895  pop   
   2  3000.0   13.9     1278   47.105  noPop 

w/l/d: 368 276 634    49.61 percent draws


      TIME       RATIO    log(r)     NODES    log(r)  ave DEPTH    GAMES   PLAYER
 ---------  ----------  --------  --------  --------  ---------  -------   -----
    0.0776       0.996    -0.004     0.091     0.080    11.0568     1278   pop
    0.0780       1.000     0.000     0.084     0.000    10.9263     1278   noPop
These are 3 second fischer games with 0.03 increment - so that I could get a large sample in a few minutes and even though the error margins are still very high the average depth is a non-trivial 13 percent of a ply.

I'm actually very surprised that you cannot even detect a difference and as Graham Banks puts it, "SSE is a fallacy" and 8 ELO meets his definition of minimal.

Please stop testing Komodo - I am hereby make a public and formal request for you to withdraw Komodo from your lists and your testing. Even though it has returned good results I just don't trust your "seat of the pants" and "ad-hoc" methodology.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
Modern Times
Posts: 3771
Joined: Thu Jun 07, 2012 11:02 pm

Re: Stockfish 4

Post by Modern Times »

Don wrote: Please stop testing Komodo - I am hereby make a public and formal request for you to withdraw Komodo from your lists and your testing. Even though it has returned good results I just don't trust your "seat of the pants" and "ad-hoc" methodology.
I have not tested Komodo 5.1 at all. After the last post of yours with a bad attitude a month or so back, I deleted all versions of Doch and Komodo from my hard drives. So you need not worry I'm a step ahead of you in having already decided a month or so ago to boycott your engine and never touch anything from you ever again. Other CCRL testers will confirm the timing of that decision.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Stockfish 4

Post by Adam Hair »

Don wrote:
Modern Times wrote:It isn't even 8 Elo - 7-3 =4 or to be generous, 8-2 = 6.

But I still don't buy it. I have not seen any proof that Komodo benefits more than any other engines from popcount. It would need a huge amount of games to measure that with say 95% certainty.

I ran off a few quick games on my tester to expose how ignorant you are of this issue:

Code: Select all

Rank    ELO     +/-    Games    Score  Player
---- ------- ------ -------- --------  ----------------------------
   1  3020.1   13.9     1278   52.895  pop   
   2  3000.0   13.9     1278   47.105  noPop 

w/l/d: 368 276 634    49.61 percent draws


      TIME       RATIO    log(r)     NODES    log(r)  ave DEPTH    GAMES   PLAYER
 ---------  ----------  --------  --------  --------  ---------  -------   -----
    0.0776       0.996    -0.004     0.091     0.080    11.0568     1278   pop
    0.0780       1.000     0.000     0.084     0.000    10.9263     1278   noPop
These are 3 second fischer games with 0.03 increment - so that I could get a large sample in a few minutes and even though the error margins are still very high the average depth is a non-trivial 13 percent of a ply.

I'm actually very surprised that you cannot even detect a difference and as Graham Banks puts it, "SSE is a fallacy" and 8 ELO meets his definition of minimal.
In a certain sense, it is minimal. It depends on the conditions.

The gain from using popcount is definite. If I had a processor that understood SSE 4.2 instructions (specifically popcount), I would measure the difference. I am sure I would find that the popcount compile would be stronger. But, it can be hidden in the CCRL results due to different computers being used.

I use to have two computers that were identical in every way except for the processors (E8400 and QX6700). I ran two 26,000 game gauntlets for Gaviota. One gauntlet per computer. Gaviota, under the same conditions, was 10 Elo weaker on the E8400 than on the QX6700. 8 Elos may not show up given these conditions.

The CCRL and CEGT lack precision due to the use of different computers, which means the difference between popcount and non-popcount can be blurred. IPON is more precise, though it probably is a less accurate indicator of an engine's relative strength on a random computer. As Ingo indicated yesterday, there are tradeoffs involved.
Don wrote: Please stop testing Komodo - I am hereby make a public and formal request for you to withdraw Komodo from your lists and your testing. Even though it has returned good results I just don't trust your "seat of the pants" and "ad-hoc" methodology.
I hope you change your mind.
BubbaTough
Posts: 1154
Joined: Fri Jun 23, 2006 5:18 am

Re: Stockfish 4

Post by BubbaTough »

Adam Hair wrote: The CCRL and CEGT lack precision due to the use of different computers, which means the difference between popcount and non-popcount can be blurred. IPON is more precise, though it probably is a less accurate indicator of an engine's relative strength on a random computer. As Ingo indicated yesterday, there are tradeoffs involved.
This seems reasonable. I hereby officially and publicly give any and all people permission to test any official release of Hannibal under any conditions desired, even the non-popcount versions.

-Sam
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Stockfish 4

Post by Don »

Modern Times wrote:
Don wrote: Please stop testing Komodo - I am hereby make a public and formal request for you to withdraw Komodo from your lists and your testing. Even though it has returned good results I just don't trust your "seat of the pants" and "ad-hoc" methodology.
I have not tested Komodo 5.1 at all. After the last post of yours with a bad attitude a month or so back, I deleted all versions of Doch and Komodo from my hard drives. So you need not worry I'm a step ahead of you in having already decided a month or so ago to boycott your engine and never touch anything from you ever again. Other CCRL testers will confirm the timing of that decision.
This is exactly the kind of bias that throws you objectivity into question and why I don't want Komodo tested on CCRL. I would feel that way even if you LIKED Komodo and you wanted it to do well - it still has no place for a testing group that people refer to seriously. And the fact that you can pick and choose what to test as you see fit - that does not bode well at all for objective testing.

Don't get me wrong - Komodo has done well and continues to do so so I cannot say that anything improper has been done, but this is just not right, not professional and not objective.

So it's really a good things that you are NOT continuing to test Komodo. How could anyone who has an axe to grind against a programmer be trusted to test their program? Do you actually believe that I am heartbroken that you are not going to test my program? Get real!! You have no business being a tester for anyone with this attitude and I just don't trust you.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Stockfish 4

Post by zullil »

bnemias wrote:
Graham Banks wrote:
Don wrote:......We know for sure that SSE4.2 helps some programs very little and others a lot......
Do we?
I think that this is a fallacy. It makes minimal difference from what I've seen.

Reminds me of the age long argument over the value of tablebases in adding ELO to ratings.
I agree wrt. stockfish. The devs have chosen very good alternative implementations when needed. All of this is automatically chosen during compile, although the proper settings generally have to be sent to the makefile.
Just built the latest development version twice. Once with -DUSE_POPCNT and one without. I then ran stockfish's bench five times for each binary and did an average.

Using hardware popcount, nodes/second averaged 1286334. Without hardware popcount, nodes/second averaged 1264206. The gain in nodes/second from popcount is about 1.75%. What this amounts to in ELO I can't say, but my guess is "very little".
kranium
Posts: 2130
Joined: Thu May 29, 2008 10:43 am

Re: Stockfish 4

Post by kranium »

Don wrote:
Modern Times wrote:
Don wrote: Please stop testing Komodo - I am hereby make a public and formal request for you to withdraw Komodo from your lists and your testing. Even though it has returned good results I just don't trust your "seat of the pants" and "ad-hoc" methodology.
I have not tested Komodo 5.1 at all. After the last post of yours with a bad attitude a month or so back, I deleted all versions of Doch and Komodo from my hard drives. So you need not worry I'm a step ahead of you in having already decided a month or so ago to boycott your engine and never touch anything from you ever again. Other CCRL testers will confirm the timing of that decision.
This is exactly the kind of bias that throws you objectivity into question and why I don't want Komodo tested on CCRL. I would feel that way even if you LIKED Komodo and you wanted it to do well - it still has no place for a testing group that people refer to seriously. And the fact that you can pick and choose what to test as you see fit - that does not bode well at all for objective testing.

Don't get me wrong - Komodo has done well and continues to do so so I cannot say that anything improper has been done, but this is just not right, not professional and not objective.

So it's really a good things that you are NOT continuing to test Komodo. How could anyone who has an axe to grind against a programmer be trusted to test their program? Do you actually believe that I am heartbroken that you are not going to test my program? Get real!! You have no business being a tester for anyone with this attitude and I just don't trust you.
in all due respect-
would you please stop trolling this thread?
in case you haven't noticed, this topic is supposed to be about Stockfish
(this may be difficult for you to accept, but the world doesn't revolve around Komodo)

kindly refrain from making this thread yet another episode of the Dailey show

thx-
Norm
User avatar
Ajedrecista
Posts: 2159
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Stockfish 4 (off-topic).

Post by Ajedrecista »

Hello:

This post does not discuss about SF 4 but I want to note something:
Don wrote:

Code: Select all

Rank    ELO     +/-    Games    Score  Player
---- ------- ------ -------- --------  ----------------------------
   1  3020.1   13.9     1278   52.895  pop
   2  3000.0   13.9     1278   47.105  noPop

w/l/d: 368 276 634    49.61 percent draws


      TIME       RATIO    log(r)     NODES    log(r)  ave DEPTH    GAMES   PLAYER
 ---------  ----------  --------  --------  --------  ---------  -------   -----
    0.0776       0.996    -0.004     0.091     0.080    11.0568     1278   pop
    0.0780       1.000     0.000     0.084     0.000    10.9263     1278   noPop
There is something wrong: (score) = (368 + 634/2)/(368 + 276 + 634) = 685/1278 ~ 0.536 > 0.52895, so popcount rating should be around 3025.1 ± 13.5 with 95% confidence; with 1278 games and 634 draws (~ 49.61% of draws, as Don reported), the score of 52.895% is obtained with +359 -285 =634, which gives a popcount rating of more less 3020.1 ± 13.5 (similar to the rating in Don's post; please do not pay much attention on the slightly different error bars). I hope no typos in my calculations.

------------------------

Going on-topic: good luck with the future development of SF!

Regards from Spain.

Ajedrecista.
Modern Times
Posts: 3771
Joined: Thu Jun 07, 2012 11:02 pm

Re: Stockfish 4

Post by Modern Times »

Every CCRL tester is free to test what they want to test, and free NOT to test anything they don't want to test. That has always been the case and always will be. The world does not revolve around Komodo and Don. That is not bias, it is personal choice and freedom of choice. Something we value.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Stockfish 4

Post by Don »

zullil wrote:
bnemias wrote:
Graham Banks wrote:
Don wrote:......We know for sure that SSE4.2 helps some programs very little and others a lot......
Do we?
I think that this is a fallacy. It makes minimal difference from what I've seen.

Reminds me of the age long argument over the value of tablebases in adding ELO to ratings.
I agree wrt. stockfish. The devs have chosen very good alternative implementations when needed. All of this is automatically chosen during compile, although the proper settings generally have to be sent to the makefile.
Just built the latest development version twice. Once with -DUSE_POPCNT and one without. I then ran stockfish's bench five times for each binary and did an average.

Using hardware popcount, nodes/second averaged 1286334. Without hardware popcount, nodes/second averaged 1264206. The gain in nodes/second from popcount is about 1.75%. What this amounts to in ELO I can't say, but my guess is "very little".
Thanks for doing that test. It's something we already knew and stated but it's nice to see it backed up with real data. Of course the CCRL people "don't see any evidence" even though it's so clear and obvious.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.