Uri Blass wrote:I do not understand your confidence that 10 elo difference is impossible.geots wrote:Adam Hair wrote:lkaufman wrote:Adam Hair wrote:Here are all of the relevant details that I can think of:
OS: Windows XP 64-bit
CPU: Intel QX6700 at 3.05 GHz
Time Control: 40/3'
GUI: cutechess-cli
Hash: 128 MB
EGTB: None
Starting Positions: PGN of ~17,900 positions 4 moves deep
Resign: off
Draws: game adjudicated as a draw if both engines' score is within 50 centipawns after 250 moves. I do not remember if cutechess uses the 50 moves rule (I think it does).
Yes, two 40/4 testers have SSE4 CPUs. And our results for Komodo 4 showed no measurable difference between non-SSE4 and SSE4. Though, if we played 20,000 games, it is possible that a statistically significant difference would be found.lkaufman wrote: Two comments:
1. I believe your cpu is pre-sse4. Since Komodo really suffers on non-sse4 machines (compared to other engines), that probably accounts for the bulk of the 20 elo. Do your other testers have sse4 machines or not?
I have confirmed that cutechess does use the 50 move limit. I was 99% certain before; now I am 100% certain since at least 1 game was adjudicated as a draw because of the 50 move limit.lkaufman wrote: 2. We learned that it is very important for testers to use the 50 move rule. If they do not, engines may make ridiculous moves when they think the 50 move rule is about to apply. You should verify that it does use the 50 move rule and switch if it does not.
Thanks for your answers and your testing!
With Ilari's post, I am 110% certain
Anyone who thinks SSE would add 20 elo better take a long look in the mirror. It would be next to impossible for it to ever add a double-digit elo gain. I'm thinking 3 or 4 elo tops, maybe an extreme case where it had a 6 elo gain- but 10 to 20. Either pure bullshit, or someone is chasing rainbows- you pick.
george
PS: One other thing everyone should keep in mind. If you are beta testing an engine for a future release, at least 50% of your testing should be at time controls that the most prominent testing groups use. Beta testing with no "repeating" time controls, then seeing it rated with nothing BUT repeating controls will make more of an elo difference than SSE could ever think of making. (This whole set of threads is a long journey to nowhere!)
CCRL did not play enough games for Komodo to have a statistical error that is lower than 10 elo so the fact that they see no difference between SSE and not SSE proves nothing.
It is simple Uri. You don't test engines and see the results "with this time control and with that time control" and "with sse and without" 365 days a year like I do. I'm speaking from experience and you are guessing.
george


