Stockfish 4

beram · Post by **beram** » Sat Aug 24, 2013 7:53 pm

zullil wrote:
bnemias wrote:
Graham Banks wrote:
Don wrote:......We know for sure that SSE4.2 helps some programs very little and others a lot......
Do we?
I think that this is a fallacy. It makes minimal difference from what I've seen.

Reminds me of the age long argument over the value of tablebases in adding ELO to ratings.
I agree wrt. stockfish. The devs have chosen very good alternative implementations when needed. All of this is automatically chosen during compile, although the proper settings generally have to be sent to the makefile.
Just built the latest development version twice. Once with -DUSE_POPCNT and one without. I then ran stockfish's bench five times for each binary and did an average.

Using hardware popcount, nodes/second averaged 1286334. Without hardware popcount, nodes/second averaged 1264206. The gain in nodes/second from popcount is about 1.75%. What this amounts to in ELO I can't say, but my guess is "very little".

In general speed doubling counts for +60 ELO as an average
so if we take that for SF4 we can measure 60*0,0175= 1,05 ELO point, WOW

chessmann · Post by **chessmann** » Sat Aug 24, 2013 8:06 pm

I would not have mocked so even with a single elo because the more elo engines already have, the harder it is for each additional single elo, and every single elo is at a premium, paid for with hard work...

Graham Banks · Post by **Graham Banks** » Sat Aug 24, 2013 9:18 pm

Don wrote:
Modern Times wrote:It isn't even 8 Elo - 7-3 =4 or to be generous, 8-2 = 6.

But I still don't buy it. I have not seen any proof that Komodo benefits more than any other engines from popcount. It would need a huge amount of games to measure that with say 95% certainty.

I ran off a few quick games on my tester to expose how ignorant you are of this issue:
Code: Select all
Rank    ELO     +/-    Games    Score  Player
---- ------- ------ -------- --------  ----------------------------
   1  3020.1   13.9     1278   52.895  pop   
   2  3000.0   13.9     1278   47.105  noPop 

w/l/d: 368 276 634    49.61 percent draws


      TIME       RATIO    log(r)     NODES    log(r)  ave DEPTH    GAMES   PLAYER
 ---------  ----------  --------  --------  --------  ---------  -------   -----
    0.0776       0.996    -0.004     0.091     0.080    11.0568     1278   pop
    0.0780       1.000     0.000     0.084     0.000    10.9263     1278   noPop
These are 3 second fischer games with 0.03 increment - so that I could get a large sample in a few minutes and even though the error margins are still very high the average depth is a non-trivial 13 percent of a ply.

I'm actually very surprised that you cannot even detect a difference and as Graham Banks puts it, "SSE is a fallacy" and 8 ELO meets his definition of minimal.

Please stop testing Komodo - I am hereby make a public and formal request for you to withdraw Komodo from your lists and your testing. Even though it has returned good results I just don't trust your "seat of the pants" and "ad-hoc" methodology.

Don - just because I have an opinion that differs slightly from yours regarding the value of popcount/SSE does not mean that don't I have great respect for you and your work with Komodo.
I'm sorry if you felt otherwise.

I think that things have got a little out of hand here and that was certainly no intention of mine.
Engine authors always have my respect and support, because they're the backbone of our hobby.

Graham.

ernest · Post by **ernest** » Sun Aug 25, 2013 1:33 am

beram wrote:we can measure 60*0,0175= 1,05 ELO point,

Wrong formula (you must use logarithms...) but the order of magnitude is the same!

Masta · Post by **Masta** » Sun Aug 25, 2013 3:24 am

Come on... Stop this fight. We all love engines, testers and developers. Stop this non-sense. We love you all! And we love you Don (hugs) !

Go go go Stockfish 4!!!

Mike S. · Post by **Mike S.** » Sun Aug 25, 2013 6:08 am

Stop this fight.

+1

Also, I never would have expected that popcorn or non-popcorn is so important that it can cause such a dispute. Most important by far is that the movie is good, and the person sitting in the next row does not wear a top-hat.

Raptor · Post by **Raptor** » Sun Aug 25, 2013 9:24 am

My 2 cents,

Even though I see where you come from when u say that 8 ELO makes a world of a difference, I know that, and I know how hard that must be at the level Komodo is.

But CCRL IMHO tries to give average users an overall perspective. And with the variety of machines CCRL uses, the advantage of POPCNT and Disadvantage of lack of it eventually evens out, and this happens to all engines, not just Komodo or stockfish. So at the end CCRL does manage to give a good idea of strength of all engines averaging performances of the same over various Hardware, in fact IMO this gives it more value for average user, as for cases where your customer isn't using best hardware(POPCNT in this case), this estimate actually comes closer to actual performance he gets.

So effectively for non technical/programmer, the estimates of CCRL are reasonable, as long as they don't run Komodo alone on non SSE4.2 hardware while running everything else on SSE4.2 Hardware (which as far as I know they do not)

Please feel free to correct me about it Graham or the rest of the CCRL team, but that was my understanding of it.

Don, please realize, whatever their personal grudges may be, they are not rigging their tests against you or PRO anyone else. Their methodology may not be as ideal as you want it to be, but they apply the same to all engines they test.

Sorry to team stockfish for making yet another post which is non-relevent to the post topic.

Best Regards,

Steve

gerold · Post by **gerold** » Sun Aug 25, 2013 2:28 pm

BubbaTough wrote:
Adam Hair wrote: The CCRL and CEGT lack precision due to the use of different computers, which means the difference between popcount and non-popcount can be blurred. IPON is more precise, though it probably is a less accurate indicator of an engine's relative strength on a random computer. As Ingo indicated yesterday, there are tradeoffs involved.
This seems reasonable. I hereby officially and publicly give any and all people permission to test any official release of Hannibal under any conditions desired, even the non-popcount versions.

-Sam

Thanks Sam. Your engine has just replace one that is not quite up to par.

Stockfish 4

Re: Stockfish 4

Re: Stockfish 4

Re: Stockfish 4

Re: Stockfish 4

Re: Stockfish 4

Re: Stockfish 4

Re: Stockfish 4

Re: Stockfish 4