Hi all,
how many more knodes and how many elo points or how much more speed can sse4 instructions give an engine?
I don't know anything about programming and i wonder if there's a substantial difference in performance.
Thank you
Best Regards
SSE4 instructions
Moderators: hgm, Dann Corbit, Harvey Williamson
-
Adam Hair
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: SSE4 instructions
MM wrote:Hi all,
how many more knodes and how many elo points or how much more speed can sse4 instructions give an engine?
I don't know anything about programming and i wonder if there's a substantial difference in performance.
Thank you
Best Regards
I have seen references (mainly in connection to Komodo) that SSE 4.2 compiles are 10% to 15% faster. Also, engines seem to gain 50 to 120 Elo per doubling in speed, depending on the engine and the base time control (probably even less gain as the base time control increases). So a rough guess at the increase in Elo would be as little as 5 to 7 Elo to as much as 25 Elo.
-
MM
- Posts: 766
- Joined: Sun Oct 16, 2011 11:25 am
Re: SSE4 instructions
Thank youAdam Hair wrote:MM wrote:Hi all,
how many more knodes and how many elo points or how much more speed can sse4 instructions give an engine?
I don't know anything about programming and i wonder if there's a substantial difference in performance.
Thank you
Best Regards
I have seen references (mainly in connection to Komodo) that SSE 4.2 compiles are 10% to 15% faster. Also, engines seem to gain 50 to 120 Elo per doubling in speed, depending on the engine and the base time control (probably even less gain as the base time control increases). So a rough guess at the increase in Elo would be as little as 5 to 7 Elo to as much as 25 Elo.
Regards
MM
-
zamar
- Posts: 613
- Joined: Sun Jan 18, 2009 7:03 am
Re: SSE4 instructions
It depends how heavy program's evaluation function is...
For program with light evaluation it's probably only 2 elo points.
For program with very heavy evaluation it might be even 10 elo points.
Stockfish gains around 4-5 elo points.
For program with light evaluation it's probably only 2 elo points.
For program with very heavy evaluation it might be even 10 elo points.
Stockfish gains around 4-5 elo points.
Joona Kiiski
-
Engin
- Posts: 918
- Joined: Mon Jan 05, 2009 7:40 pm
- Location: Germany
- Full name: Engin Üstün
Re: SSE4 instructions
hard to say how much, but its gain of course a little, its depends how often you are using popcount or prefetch functions, on my Tornado its gain a lot with popcount but near nothing if i am using prefetch.
-
Engin
- Posts: 918
- Joined: Mon Jan 05, 2009 7:40 pm
- Location: Germany
- Full name: Engin Üstün
Re: SSE4 instructions
..and i forgot to say that you can only gain if you using 64 bit, its not make any sense if you are using it for 32 bit version.
-
Engin
- Posts: 918
- Joined: Mon Jan 05, 2009 7:40 pm
- Location: Germany
- Full name: Engin Üstün
Re: SSE4 instructions
how heavy ? light evaluation +2 elo ?zamar wrote:It depends how heavy program's evaluation function is...
For program with light evaluation it's probably only 2 elo points.
For program with very heavy evaluation it might be even 10 elo points.
Stockfish gains around 4-5 elo points.
i think its depends on how often you are using the popcount in the evaluation.
-
MM
- Posts: 766
- Joined: Sun Oct 16, 2011 11:25 am
-
rbarreira
- Posts: 900
- Joined: Tue Apr 27, 2010 3:48 pm
Re: SSE4 instructions
Why? I'm pretty sure that doing "hw_popcount (low_32_bits) + hw_popcount (high_32_bits)" is faster than other methods of doing 64-bit popcnt on a 32-bit architecture.Engin wrote:..and i forgot to say that you can only gain if you using 64 bit, its not make any sense if you are using it for 32 bit version.
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: SSE4 instructions
There is a long-standing myth that bitboards only work on 64 bit machines. I've been using bitboards since the original pentium PC with good results. David Slate did bitboards on a 60 bit machine that required two instructions to manipulate them for the very same reason, yet chess 4.x seemed to work quite well.rbarreira wrote:Why? I'm pretty sure that doing "hw_popcount (low_32_bits) + hw_popcount (high_32_bits)" is faster than other methods of doing 64-bit popcnt on a 32-bit architecture.Engin wrote:..and i forgot to say that you can only gain if you using 64 bit, its not make any sense if you are using it for 32 bit version.
One of the "bitboard tricks of the trade" is to not rely too heavily on popcnt() type operations, and there are some ways to do that for at least mobility which is a heavy popcnt() user...
I am not sure why one would do a pair of 32 bit operations on a machine that is obviously 64 bits (those are the only processors with hardware popcnt)...