SSE4 instructions

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

SSE4 instructions

Post by MM »

Hi all,

how many more knodes and how many elo points or how much more speed can sse4 instructions give an engine?

I don't know anything about programming and i wonder if there's a substantial difference in performance.

Thank you

Best Regards
MM
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: SSE4 instructions

Post by Adam Hair »

MM wrote:Hi all,

how many more knodes and how many elo points or how much more speed can sse4 instructions give an engine?

I don't know anything about programming and i wonder if there's a substantial difference in performance.

Thank you

Best Regards

I have seen references (mainly in connection to Komodo) that SSE 4.2 compiles are 10% to 15% faster. Also, engines seem to gain 50 to 120 Elo per doubling in speed, depending on the engine and the base time control (probably even less gain as the base time control increases). So a rough guess at the increase in Elo would be as little as 5 to 7 Elo to as much as 25 Elo.
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: SSE4 instructions

Post by MM »

Adam Hair wrote:
MM wrote:Hi all,

how many more knodes and how many elo points or how much more speed can sse4 instructions give an engine?

I don't know anything about programming and i wonder if there's a substantial difference in performance.

Thank you

Best Regards

I have seen references (mainly in connection to Komodo) that SSE 4.2 compiles are 10% to 15% faster. Also, engines seem to gain 50 to 120 Elo per doubling in speed, depending on the engine and the base time control (probably even less gain as the base time control increases). So a rough guess at the increase in Elo would be as little as 5 to 7 Elo to as much as 25 Elo.
Thank you :-)

Regards
MM
zamar
Posts: 613
Joined: Sun Jan 18, 2009 7:03 am

Re: SSE4 instructions

Post by zamar »

It depends how heavy program's evaluation function is...

For program with light evaluation it's probably only 2 elo points.
For program with very heavy evaluation it might be even 10 elo points.

Stockfish gains around 4-5 elo points.
Joona Kiiski
Engin
Posts: 918
Joined: Mon Jan 05, 2009 7:40 pm
Location: Germany
Full name: Engin Üstün

Re: SSE4 instructions

Post by Engin »

hard to say how much, but its gain of course a little, its depends how often you are using popcount or prefetch functions, on my Tornado its gain a lot with popcount but near nothing if i am using prefetch.
Engin
Posts: 918
Joined: Mon Jan 05, 2009 7:40 pm
Location: Germany
Full name: Engin Üstün

Re: SSE4 instructions

Post by Engin »

..and i forgot to say that you can only gain if you using 64 bit, its not make any sense if you are using it for 32 bit version.
Engin
Posts: 918
Joined: Mon Jan 05, 2009 7:40 pm
Location: Germany
Full name: Engin Üstün

Re: SSE4 instructions

Post by Engin »

zamar wrote:It depends how heavy program's evaluation function is...

For program with light evaluation it's probably only 2 elo points.
For program with very heavy evaluation it might be even 10 elo points.

Stockfish gains around 4-5 elo points.
how heavy ? light evaluation +2 elo ?

i think its depends on how often you are using the popcount in the evaluation.
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: SSE4 instructions

Post by MM »

Thank you all for the contribution :-)

Best Regards
MM
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: SSE4 instructions

Post by rbarreira »

Engin wrote:..and i forgot to say that you can only gain if you using 64 bit, its not make any sense if you are using it for 32 bit version.
Why? I'm pretty sure that doing "hw_popcount (low_32_bits) + hw_popcount (high_32_bits)" is faster than other methods of doing 64-bit popcnt on a 32-bit architecture.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: SSE4 instructions

Post by bob »

rbarreira wrote:
Engin wrote:..and i forgot to say that you can only gain if you using 64 bit, its not make any sense if you are using it for 32 bit version.
Why? I'm pretty sure that doing "hw_popcount (low_32_bits) + hw_popcount (high_32_bits)" is faster than other methods of doing 64-bit popcnt on a 32-bit architecture.
There is a long-standing myth that bitboards only work on 64 bit machines. I've been using bitboards since the original pentium PC with good results. David Slate did bitboards on a 60 bit machine that required two instructions to manipulate them for the very same reason, yet chess 4.x seemed to work quite well.

One of the "bitboard tricks of the trade" is to not rely too heavily on popcnt() type operations, and there are some ways to do that for at least mobility which is a heavy popcnt() user...

I am not sure why one would do a pair of 32 bit operations on a machine that is obviously 64 bits (those are the only processors with hardware popcnt)...