Don wrote:......We know for sure that SSE4.2 helps some programs very little and others a lot......
Do we?
I think that this is a fallacy. It makes minimal difference from what I've seen.
Reminds me of the age long argument over the value of tablebases in adding ELO to ratings.
I agree wrt. stockfish. The devs have chosen very good alternative implementations when needed. All of this is automatically chosen during compile, although the proper settings generally have to be sent to the makefile.
Just built the latest development version twice. Once with -DUSE_POPCNT and one without. I then ran stockfish's bench five times for each binary and did an average.
Using hardware popcount, nodes/second averaged 1286334. Without hardware popcount, nodes/second averaged 1264206. The gain in nodes/second from popcount is about 1.75%. What this amounts to in ELO I can't say, but my guess is "very little".
In general speed doubling counts for +60 ELO as an average
so if we take that for SF4 we can measure 60*0,0175= 1,05 ELO point, WOW
I would not have mocked so even with a single elo because the more elo engines already have, the harder it is for each additional single elo, and every single elo is at a premium, paid for with hard work...
Modern Times wrote:It isn't even 8 Elo - 7-3 =4 or to be generous, 8-2 = 6.
But I still don't buy it. I have not seen any proof that Komodo benefits more than any other engines from popcount. It would need a huge amount of games to measure that with say 95% certainty.
I ran off a few quick games on my tester to expose how ignorant you are of this issue:
Rank ELO +/- Games Score Player
---- ------- ------ -------- -------- ----------------------------
1 3020.1 13.9 1278 52.895 pop
2 3000.0 13.9 1278 47.105 noPop
w/l/d: 368 276 634 49.61 percent draws
TIME RATIO log(r) NODES log(r) ave DEPTH GAMES PLAYER
--------- ---------- -------- -------- -------- --------- ------- -----
0.0776 0.996 -0.004 0.091 0.080 11.0568 1278 pop
0.0780 1.000 0.000 0.084 0.000 10.9263 1278 noPop
These are 3 second fischer games with 0.03 increment - so that I could get a large sample in a few minutes and even though the error margins are still very high the average depth is a non-trivial 13 percent of a ply.
I'm actually very surprised that you cannot even detect a difference and as Graham Banks puts it, "SSE is a fallacy" and 8 ELO meets his definition of minimal.
Please stop testing Komodo - I am hereby make a public and formal request for you to withdraw Komodo from your lists and your testing. Even though it has returned good results I just don't trust your "seat of the pants" and "ad-hoc" methodology.
Don - just because I have an opinion that differs slightly from yours regarding the value of popcount/SSE does not mean that don't I have great respect for you and your work with Komodo.
I'm sorry if you felt otherwise.
I think that things have got a little out of hand here and that was certainly no intention of mine.
Engine authors always have my respect and support, because they're the backbone of our hobby.
Also, I never would have expected that popcorn or non-popcorn is so important that it can cause such a dispute. Most important by far is that the movie is good, and the person sitting in the next row does not wear a top-hat.
Even though I see where you come from when u say that 8 ELO makes a world of a difference, I know that, and I know how hard that must be at the level Komodo is.
But CCRL IMHO tries to give average users an overall perspective. And with the variety of machines CCRL uses, the advantage of POPCNT and Disadvantage of lack of it eventually evens out, and this happens to all engines, not just Komodo or stockfish. So at the end CCRL does manage to give a good idea of strength of all engines averaging performances of the same over various Hardware, in fact IMO this gives it more value for average user, as for cases where your customer isn't using best hardware(POPCNT in this case), this estimate actually comes closer to actual performance he gets.
So effectively for non technical/programmer, the estimates of CCRL are reasonable, as long as they don't run Komodo alone on non SSE4.2 hardware while running everything else on SSE4.2 Hardware (which as far as I know they do not)
Please feel free to correct me about it Graham or the rest of the CCRL team, but that was my understanding of it.
Don, please realize, whatever their personal grudges may be, they are not rigging their tests against you or PRO anyone else. Their methodology may not be as ideal as you want it to be, but they apply the same to all engines they test.
Sorry to team stockfish for making yet another post which is non-relevent to the post topic.
Adam Hair wrote:
The CCRL and CEGT lack precision due to the use of different computers, which means the difference between popcount and non-popcount can be blurred. IPON is more precise, though it probably is a less accurate indicator of an engine's relative strength on a random computer. As Ingo indicated yesterday, there are tradeoffs involved.
This seems reasonable. I hereby officially and publicly give any and all people permission to test any official release of Hannibal under any conditions desired, even the non-popcount versions.
-Sam
Thanks Sam. Your engine has just replace one that is not quite up to par.