AVX

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Feb 07, 2011 9:32 pm

So Sandy Bridge is out, and has 16 256-bit YMM registers with AVX, Bulldozer will follow.

So far, they can only treated as vectors of 4 64-bit doubles or 8 32-bit floats, likely later extended to integer vectors.
Similar to SSE with 128-bit registers, soon followed by SSE2 for various integers. Following bitwise operations are available for doubles, as well for floats.

Code: Select all

VANDPD  ymm1, ymm2, ymm3/mem256    ymm1 &#58;=  ymm2 & ymm3
VANDNPD ymm1, ymm2, ymm3/mem256    ymm1 &#58;= ~ymm2 & ymm3
VORPD   ymm1, ymm2, ymm3/mem256    ymm1 &#58;=  ymm2 | ymm3
VXORPD  ymm1, ymm2, ymm3/mem256    ymm1 &#58;=  ymm2 ^ ymm3

for instrinsics, see
http://software.intel.com/sites/product ... /index.htm

My question, for what are bitwise operations used with float or double, if not interpreted as integer or set? Is there an enriched double arithmetic, where modifying bits other than sign, that is the mantissa or exponent makes any sense?

Thanks,
Gerd

Zach Wegner · Post by **Zach Wegner** » Mon Feb 07, 2011 9:40 pm

Gerd Isenberg wrote:So Sandy Bridge is out, and has 16 256-bit YMM registers with AVX, Bulldozer will follow.

So far, they can only treated as vectors of 4 64-bit doubles or 8 32-bit floats, likely later extended to integer vectors.
Similar to SSE with 128-bit registers, soon followed by SSE2 for various integers. Following bitwise operations are available for doubles, as well for floats.
Code: Select all
VANDPD  ymm1, ymm2, ymm3/mem256    ymm1 &#58;=  ymm2 & ymm3
VANDNPD ymm1, ymm2, ymm3/mem256    ymm1 &#58;= ~ymm2 & ymm3
VORPD   ymm1, ymm2, ymm3/mem256    ymm1 &#58;=  ymm2 | ymm3
VXORPD  ymm1, ymm2, ymm3/mem256    ymm1 &#58;=  ymm2 ^ ymm3
for instrinsics, see
http://software.intel.com/sites/product ... /index.htm

My question, for what are bitwise operations used with float or double, if not interpreted as integer or set? Is there an enriched double arithmetic, where modifying bits other than sign, that is the mantissa or exponent makes any sense?

Thanks,
Gerd

I would guess that's for masking with -1/0 values, and OR/XOR would be for manipulating multiple masks.

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Feb 07, 2011 10:07 pm

Zach Wegner wrote: I would guess that's for masking with -1/0 values, and OR/XOR would be for manipulating multiple masks.

Sure, of course! How embarrassing

All the branchless stuff with

Code: Select all

VCMPPD ymm1, ymm2, ymm3/mem256, imm8

Wow! Treating bitboards as doubles

Some problems with Nan-values though.

bob · Post by **bob** » Mon Feb 07, 2011 10:59 pm

Gerd Isenberg wrote:So Sandy Bridge is out, and has 16 256-bit YMM registers with AVX, Bulldozer will follow.

So far, they can only treated as vectors of 4 64-bit doubles or 8 32-bit floats, likely later extended to integer vectors.
Similar to SSE with 128-bit registers, soon followed by SSE2 for various integers. Following bitwise operations are available for doubles, as well for floats.
Code: Select all
VANDPD  ymm1, ymm2, ymm3/mem256    ymm1 &#58;=  ymm2 & ymm3
VANDNPD ymm1, ymm2, ymm3/mem256    ymm1 &#58;= ~ymm2 & ymm3
VORPD   ymm1, ymm2, ymm3/mem256    ymm1 &#58;=  ymm2 | ymm3
VXORPD  ymm1, ymm2, ymm3/mem256    ymm1 &#58;=  ymm2 ^ ymm3
for instrinsics, see
http://software.intel.com/sites/product ... /index.htm

My question, for what are bitwise operations used with float or double, if not interpreted as integer or set? Is there an enriched double arithmetic, where modifying bits other than sign, that is the mantissa or exponent makes any sense?

Thanks,
Gerd

Been programming in assembly language for 40+ years now. I have never seen bitwise instructions for FP. Whether those are designed to be used on 64/128/whatever values, I do not know. But using ints is dangerous in true FP mode since there are invalid configurations of bits that can make a program blow up (from experience, I did this on a sun SPARC when I first tried to get a version of Cray Blitz up on a local sparc in 1985. I used a double (double precision in FORTRAN, actually) for the hash signature and got a lot of various types of floating point exceptions.

This would therefore suggest that the FP regs can be used as integer values. Whether that gets extended to the "pseudo-vector" stuff or not I have no idea at the moment.

Gerd Isenberg · Post by **Gerd Isenberg** » Tue Feb 08, 2011 12:07 am

bob wrote:
Gerd Isenberg wrote:So Sandy Bridge is out, and has 16 256-bit YMM registers with AVX, Bulldozer will follow.

So far, they can only treated as vectors of 4 64-bit doubles or 8 32-bit floats, likely later extended to integer vectors.
Similar to SSE with 128-bit registers, soon followed by SSE2 for various integers. Following bitwise operations are available for doubles, as well for floats.
Code: Select all
VANDPD  ymm1, ymm2, ymm3/mem256    ymm1 &#58;=  ymm2 & ymm3
VANDNPD ymm1, ymm2, ymm3/mem256    ymm1 &#58;= ~ymm2 & ymm3
VORPD   ymm1, ymm2, ymm3/mem256    ymm1 &#58;=  ymm2 | ymm3
VXORPD  ymm1, ymm2, ymm3/mem256    ymm1 &#58;=  ymm2 ^ ymm3
for instrinsics, see
http://software.intel.com/sites/product ... /index.htm

My question, for what are bitwise operations used with float or double, if not interpreted as integer or set? Is there an enriched double arithmetic, where modifying bits other than sign, that is the mantissa or exponent makes any sense?

Thanks,
Gerd
Been programming in assembly language for 40+ years now. I have never seen bitwise instructions for FP. Whether those are designed to be used on 64/128/whatever values, I do not know. But using ints is dangerous in true FP mode since there are invalid configurations of bits that can make a program blow up (from experience, I did this on a sun SPARC when I first tried to get a version of Cray Blitz up on a local sparc in 1985. I used a double (double precision in FORTRAN, actually) for the hash signature and got a lot of various types of floating point exceptions.

This would therefore suggest that the FP regs can be used as integer values. Whether that gets extended to the "pseudo-vector" stuff or not I have no idea at the moment.

Yes, as Zach mentioned with vectors of floats or doubles you will do stuff branchless. For instance max or min and that like, where a mask selects from two sources

Code: Select all

VXORPD  ymm1, ymm2, ymm3  ; a ^ b
VCMPPD  ymm0, ymm2, ymm3, less ; mask is 0xff..ff if a < b, zero otherwise
VANDPD  ymm0, ymm0, ymm1  ; &#40;a ^ b&#41; & mask
VXORPD  ymm0, ymm0, ymm2  ; result = a ^ (&#40;a ^ b&#41; & mask&#41;

smatovic · Post by **smatovic** » Tue Feb 08, 2011 8:48 pm

i try to get a 4*32 bit optimized Board presentation running on a GPU.

Therefore i thought on vectorizing your QuadBoards into 8*32 bit pieces, maybe this idea could also fit on AVX...

--
srdja

Gerd Isenberg · Post by **Gerd Isenberg** » Tue Feb 08, 2011 11:28 pm

smatovic wrote:i try to get a 4*32 bit optimized Board presentation running on a GPU.

Therefore i thought on vectorizing your QuadBoards into 8*32 bit pieces, maybe this idea could also fit on AVX...

--
srdja

Yes, if AVX 2 with 256-bit integer vectors become available. I need shifts, which is not yet possible with 256-bits float/double vectors.

smatovic · Post by **smatovic** » Tue Feb 08, 2011 11:55 pm

Yes, if AVX 2 with 256-bit integer vectors become available. I need shifts, which is not yet possible with 256-bits float/double vectors.

Afaik AMD plans beside AVX also an extended instruction set with Bulldozer, "XOP"

http://en.wikipedia.org/wiki/XOP_instruction_set

This AVX/SSE5, Intel vs AMD story is confusing.

--
srdja

AVX

AVX

Re: AVX

Re: AVX

Re: AVX

Re: AVX

Re: AVX

Re: AVX

Re: AVX