Stockfish - enabling AVX / AVX2 like popcnt

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Krgp
Posts: 20
Joined: Mon Nov 04, 2013 6:18 am

Stockfish - enabling AVX / AVX2 like popcnt

Post by Krgp »

I have a sincere query ... regarding 'AVX' enabled (?) SF. There have been a lot of AVX SF compiles floating around for quite some time ...

My 'doubt' is ... is this SF really AVX enabled ? Because the .cpp .h files are same as popcnt files ... it's pertinent to note here that popcnt is clearly defined and provided for everywhere, i.e. sources (for e.g. bitcount.h), compilers themselves and compiling commands ... whereas AVX is not. So what effect there could be on the binaries? In my understanding the binaries would be the same for AVX or popcnt SFs.

Same goes for AVX2 ... however, a part of AVX2 - i.e. bmi (PEXT) has been defined and provided for ... so it's rightly mentioned as BMI2 and not AVX2 (meaning only a part of AVX2 (pext) is enabled) ...

So, as I understand it, this Haswell+ version (or bmi2) has some 'difference' or different 'binary' ... but AVX is 'only an illusion' ... nothing else ... it serves only one purpose not to run on non-avx systems ?! ... is my understanding correct ?

What would it really take to enable AVX/AVX2 properly ?

(please bare with my 'basic' difficulties considering that I am only an amateur who is trying to learn)
KP
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish - enabling AVX / AVX2 like popcnt

Post by bob »

Krgp wrote:I have a sincere query ... regarding 'AVX' enabled (?) SF. There have been a lot of AVX SF compiles floating around for quite some time ...

My 'doubt' is ... is this SF really AVX enabled ? Because the .cpp .h files are same as popcnt files ... it's pertinent to note here that popcnt is clearly defined and provided for everywhere, i.e. sources (for e.g. bitcount.h), compilers themselves and compiling commands ... whereas AVX is not. So what effect there could be on the binaries? In my understanding the binaries would be the same for AVX or popcnt SFs.

Same goes for AVX2 ... however, a part of AVX2 - i.e. bmi (PEXT) has been defined and provided for ... so it's rightly mentioned as BMI2 and not AVX2 (meaning only a part of AVX2 (pext) is enabled) ...

So, as I understand it, this Haswell+ version (or bmi2) has some 'difference' or different 'binary' ... but AVX is 'only an illusion' ... nothing else ... it serves only one purpose not to run on non-avx systems ?! ... is my understanding correct ?

What would it really take to enable AVX/AVX2 properly ?

(please bare with my 'basic' difficulties considering that I am only an amateur who is trying to learn)
I would assume they are depending on the compiler to use the additional instructions, when it knows how, as opposed to explicitly coding AVX routines in their source. Just like the older SSE stuff such as the SIMD instructions where compilers will automatically use 'em if told to compile for hardware that has 'em. (Intel compiler calls this vectorization).
Krgp
Posts: 20
Joined: Mon Nov 04, 2013 6:18 am

Re: Stockfish - enabling AVX / AVX2 like popcnt

Post by Krgp »

Many Thanks ! Compiler by Vectorization may make this instructions enable but whether the algorithm itself will be in a position to 'fruitfully use' them unless explicitly coded routines ? Like popcnt or pext ? My ultimate intention is to make provisions in the code itself and assign some tasks - such as 'pext' implemented (for threat generation) by RDM in syzygy ... or for 'hash' & superior search as these instructions offer better/faster memory management ... what seems to me is, it is a huge task ... of course I understand truly/properly enabling such instruction sets is one thing & assigning work to them is another ... what I am not able to grasp is what are all the 'areas' & 'aspects' would require amendments? Or just leaving it to compiler for enabling them and assign some task is sufficient ? The more I think, the more it seems this would not be sufficient ... Anyway thanks again for the guidance ...
KP
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Stockfish - enabling AVX / AVX2 like popcnt

Post by matthewlai »

Krgp wrote:I have a sincere query ... regarding 'AVX' enabled (?) SF. There have been a lot of AVX SF compiles floating around for quite some time ...

My 'doubt' is ... is this SF really AVX enabled ? Because the .cpp .h files are same as popcnt files ... it's pertinent to note here that popcnt is clearly defined and provided for everywhere, i.e. sources (for e.g. bitcount.h), compilers themselves and compiling commands ... whereas AVX is not. So what effect there could be on the binaries? In my understanding the binaries would be the same for AVX or popcnt SFs.

Same goes for AVX2 ... however, a part of AVX2 - i.e. bmi (PEXT) has been defined and provided for ... so it's rightly mentioned as BMI2 and not AVX2 (meaning only a part of AVX2 (pext) is enabled) ...

So, as I understand it, this Haswell+ version (or bmi2) has some 'difference' or different 'binary' ... but AVX is 'only an illusion' ... nothing else ... it serves only one purpose not to run on non-avx systems ?! ... is my understanding correct ?

What would it really take to enable AVX/AVX2 properly ?

(please bare with my 'basic' difficulties considering that I am only an amateur who is trying to learn)
I am working on a derivative of Stockfish and have been using AVX2 for quite a while now (since my application benefits from it).

It doesn't make any difference with standard Stockfish functions.

I am assuming that's why AVX/AVX2 are not in the standard Stockfish Makefile.

Technically, like Bob said, just because there is no special code path doesn't mean the compiler won't actually use AVX. If you do nothing but enabling the AVX instruction sets, the compiler can use those instructions for optimization. It just doesn't make a difference with Stockfish.

There is no official AVX build, probably for that reason.

AVX (like most of SSE/SSE2/etc) is a single-instruction-multiple-data (SIMD) instruction set that essentially allows the CPU to efficiently do the same operation with an array of numbers at the same time. There is just not really a place in a standard chess engine where that really helps. It only helps for some applications.

The only reason SSE is in the standard Makefile is because it also includes a prefetch instruction, and that I believe is used.

PEXT is not actually part of AVX2. It's part of BMI2, which is separate from AVX2. They just happen to be supported starting at the same generation for Intel.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish - enabling AVX / AVX2 like popcnt

Post by bob »

matthewlai wrote:
Krgp wrote:I have a sincere query ... regarding 'AVX' enabled (?) SF. There have been a lot of AVX SF compiles floating around for quite some time ...

My 'doubt' is ... is this SF really AVX enabled ? Because the .cpp .h files are same as popcnt files ... it's pertinent to note here that popcnt is clearly defined and provided for everywhere, i.e. sources (for e.g. bitcount.h), compilers themselves and compiling commands ... whereas AVX is not. So what effect there could be on the binaries? In my understanding the binaries would be the same for AVX or popcnt SFs.

Same goes for AVX2 ... however, a part of AVX2 - i.e. bmi (PEXT) has been defined and provided for ... so it's rightly mentioned as BMI2 and not AVX2 (meaning only a part of AVX2 (pext) is enabled) ...

So, as I understand it, this Haswell+ version (or bmi2) has some 'difference' or different 'binary' ... but AVX is 'only an illusion' ... nothing else ... it serves only one purpose not to run on non-avx systems ?! ... is my understanding correct ?

What would it really take to enable AVX/AVX2 properly ?

(please bare with my 'basic' difficulties considering that I am only an amateur who is trying to learn)
I am working on a derivative of Stockfish and have been using AVX2 for quite a while now (since my application benefits from it).

It doesn't make any difference with standard Stockfish functions.

I am assuming that's why AVX/AVX2 are not in the standard Stockfish Makefile.

Technically, like Bob said, just because there is no special code path doesn't mean the compiler won't actually use AVX. If you do nothing but enabling the AVX instruction sets, the compiler can use those instructions for optimization. It just doesn't make a difference with Stockfish.

There is no official AVX build, probably for that reason.

AVX (like most of SSE/SSE2/etc) is a single-instruction-multiple-data (SIMD) instruction set that essentially allows the CPU to efficiently do the same operation with an array of numbers at the same time. There is just not really a place in a standard chess engine where that really helps. It only helps for some applications.

The only reason SSE is in the standard Makefile is because it also includes a prefetch instruction, and that I believe is used.

PEXT is not actually part of AVX2. It's part of BMI2, which is separate from AVX2. They just happen to be supported starting at the same generation for Intel.
One place you MIGHT see a very small gain is with the MG/EG scores that are kept separately. Stockfish combines them into one value by specifically biasing the values to avoid overflow/underflow issues. But a compiler might well recognize that a pair of score_mg+=x; score_eg+=y could be done via AVX perhaps. I played with this a long while back and found no benefit, leaving me with the keep it simple approach.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Stockfish - enabling AVX / AVX2 like popcnt

Post by matthewlai »

bob wrote:One place you MIGHT see a very small gain is with the MG/EG scores that are kept separately. Stockfish combines them into one value by specifically biasing the values to avoid overflow/underflow issues. But a compiler might well recognize that a pair of score_mg+=x; score_eg+=y could be done via AVX perhaps. I played with this a long while back and found no benefit, leaving me with the keep it simple approach.
For only a width of 2, though, it's probably not worth the trouble of loading them into SIMD registers and reading them back out afterwards. It's probably faster on a modern CPU to just do regular additions, with multiple issue and all that. The add instructions don't depend on each other and can probably be scheduled in a way that's effectively free.

Also, I believe AVX only extends the width of SSE instructions from 128-bit to 256-bit. With just 2 values older instruction sets should be enough.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish - enabling AVX / AVX2 like popcnt

Post by bob »

matthewlai wrote:
bob wrote:One place you MIGHT see a very small gain is with the MG/EG scores that are kept separately. Stockfish combines them into one value by specifically biasing the values to avoid overflow/underflow issues. But a compiler might well recognize that a pair of score_mg+=x; score_eg+=y could be done via AVX perhaps. I played with this a long while back and found no benefit, leaving me with the keep it simple approach.
For only a width of 2, though, it's probably not worth the trouble of loading them into SIMD registers and reading them back out afterwards. It's probably faster on a modern CPU to just do regular additions, with multiple issue and all that. The add instructions don't depend on each other and can probably be scheduled in a way that's effectively free.

Also, I believe AVX only extends the width of SSE instructions from 128-bit to 256-bit. With just 2 values older instruction sets should be enough.
I agree. Treating the two scores as one is very difficult to measure in terms of speed improvement, it is so small (thanks to multiple pipes that can do the two adds in parallel anyway). And I didn't really think it would help speed any to do this in hardware with SSE or AVX either. There are lots of times when extra instructions have zero cost due to data dependencies and such on other instructions.