VPPERM

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

VPPERM

Post by Gerd Isenberg »

Are there any chess related applications of AMD's future (Vapo?) PPERM-Instruction (SSE5), now revised as (XOP) VPPERM-Instruction, beside bitboard reversal or flipping 8*8 boards?

Guess this thing will take some cycles. Anyway, a freaky instruction for crypto guys and advanced bitboarders. This is how VPPERM will hopefully work (Bulldozer or K11?, planned 2011):

Code: Select all

 VPPERM dest, src1, src2, selector

For each of 16 destination bytes the corresponding selector-byte addresses one of 32 input bytes (from src1, src2) and a logical operation including bit-reversal:

Code: Select all

char src[32];   // src2:src1
char select[16];
char dest[16];
for &#40;int i = 0; i < 16; i++) &#123;
   char opera = select&#91;i&#93; >> 5;
   char idx32 = select&#91;i&#93; & 31;
 
   switch ( opera ) &#123;
      case 0&#58; dest&#91;i&#93; =  src&#91;idx32&#93;; break;
      case 1&#58; dest&#91;i&#93; = ~src&#91;idx32&#93;; break;
      case 2&#58; dest&#91;i&#93; =  bitreverse&#40; src&#91;idx32&#93;); break;
      case 3&#58; dest&#91;i&#93; = ~bitreverse&#40; src&#91;idx32&#93;); break;
      case 4&#58; dest&#91;i&#93; = 0x00; break;
      case 5&#58; dest&#91;i&#93; = 0xFF; break;
      case 6&#58; dest&#91;i&#93; =  src&#91;idx32&#93; >> 7;  break; // signed shift
      case 7&#58; dest&#91;i&#93; = ~src&#91;idx32&#93; >> 7;  break; // signed shift
   &#125;
&#125;
I'll hope Intel will follow, to extend AVX in an compatible manner.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: VPPERM

Post by bob »

Gerd Isenberg wrote:Are there any chess related applications of AMD's future (Vapo?) PPERM-Instruction (SSE5), now revised as (XOP) VPPERM-Instruction, beside bitboard reversal or flipping 8*8 boards?

Guess this thing will take some cycles. Anyway, a freaky instruction for crypto guys and advanced bitboarders. This is how VPPERM will hopefully work (Bulldozer or K11?, planned 2011):

Code: Select all

 VPPERM dest, src1, src2, selector

For each of 16 destination bytes the corresponding selector-byte addresses one of 32 input bytes (from src1, src2) and a logical operation including bit-reversal:

Code: Select all

char src&#91;32&#93;;   // src2&#58;src1
char select&#91;16&#93;;
char dest&#91;16&#93;;
for &#40;int i = 0; i < 16; i++) &#123;
   char opera = select&#91;i&#93; >> 5;
   char idx32 = select&#91;i&#93; & 31;
 
   switch ( opera ) &#123;
      case 0&#58; dest&#91;i&#93; =  src&#91;idx32&#93;; break;
      case 1&#58; dest&#91;i&#93; = ~src&#91;idx32&#93;; break;
      case 2&#58; dest&#91;i&#93; =  bitreverse&#40; src&#91;idx32&#93;); break;
      case 3&#58; dest&#91;i&#93; = ~bitreverse&#40; src&#91;idx32&#93;); break;
      case 4&#58; dest&#91;i&#93; = 0x00; break;
      case 5&#58; dest&#91;i&#93; = 0xFF; break;
      case 6&#58; dest&#91;i&#93; =  src&#91;idx32&#93; >> 7;  break; // signed shift
      case 7&#58; dest&#91;i&#93; = ~src&#91;idx32&#93; >> 7;  break; // signed shift
   &#125;
&#125;
I'll hope Intel will follow, to extend AVX in an compatible manner.
This looks like an old instruction from very old computers. The vax had a TTBS instruction that was a bit more complicated than this one. And which could be used to do the same thing as I read it.

Compiler writers liked it for certain parsing tasks, I never found a use for it in chess as I worked on that box.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: VPPERM

Post by wgarvin »

Gerd Isenberg wrote:Are there any chess related applications of AMD's future (Vapo?) PPERM-Instruction (SSE5), now revised as (XOP) VPPERM-Instruction, beside bitboard reversal or flipping 8*8 boards?

Guess this thing will take some cycles. Anyway, a freaky instruction for crypto guys and advanced bitboarders. This is how VPPERM will hopefully work (Bulldozer or K11?, planned 2011):

Code: Select all

 VPPERM dest, src1, src2, selector

For each of 16 destination bytes the corresponding selector-byte addresses one of 32 input bytes (from src1, src2) and a logical operation including bit-reversal:

Code: Select all

char src&#91;32&#93;;   // src2&#58;src1
char select&#91;16&#93;;
char dest&#91;16&#93;;
for &#40;int i = 0; i < 16; i++) &#123;
   char opera = select&#91;i&#93; >> 5;
   char idx32 = select&#91;i&#93; & 31;
 
   switch ( opera ) &#123;
      case 0&#58; dest&#91;i&#93; =  src&#91;idx32&#93;; break;
      case 1&#58; dest&#91;i&#93; = ~src&#91;idx32&#93;; break;
      case 2&#58; dest&#91;i&#93; =  bitreverse&#40; src&#91;idx32&#93;); break;
      case 3&#58; dest&#91;i&#93; = ~bitreverse&#40; src&#91;idx32&#93;); break;
      case 4&#58; dest&#91;i&#93; = 0x00; break;
      case 5&#58; dest&#91;i&#93; = 0xFF; break;
      case 6&#58; dest&#91;i&#93; =  src&#91;idx32&#93; >> 7;  break; // signed shift
      case 7&#58; dest&#91;i&#93; = ~src&#91;idx32&#93; >> 7;  break; // signed shift
   &#125;
&#125;
I'll hope Intel will follow, to extend AVX in an compatible manner.
Looks like a small improvement over the Cell SPU's SHUFB instruction.

Altivec (or whatever it was called) had vperm, which could permute bytes from two registers in a data-dependent order controlled by the third register.

The SPU had SHUFB which could do that, plus override certain bytes with 0x00, 0xFF or 0x7F (and maybe one or two others?)

This instruction can do that, plus a few crazy things like bit-reversal and/or negation, and sign-bit saturation. Being able to control it on a byte-for-byte basis is actually pretty useful-looking. I'm sure it will find some uses in real-world SIMD algorithms.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: VPPERM

Post by Gerd Isenberg »

bob wrote: This looks like an old instruction from very old computers. The vax had a TTBS instruction that was a bit more complicated than this one. And which could be used to do the same thing as I read it.
Wow, really? Including bit reversal and sign-bit extension?
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: VPPERM

Post by Gerd Isenberg »

wgarvin wrote: Looks like a small improvement over the Cell SPU's SHUFB instruction.

Altivec (or whatever it was called) had vperm, which could permute bytes from two registers in a data-dependent order controlled by the third register.

The SPU had SHUFB which could do that, plus override certain bytes with 0x00, 0xFF or 0x7F (and maybe one or two others?)

This instruction can do that, plus a few crazy things like bit-reversal and/or negation, and sign-bit saturation. Being able to control it on a byte-for-byte basis is actually pretty useful-looking. I'm sure it will find some uses in real-world SIMD algorithms.
Yes, it can do some things with bitboards, flip vertical (like bswap), mirror horizontal (bit-reversal) and in combination rotate by 180 degree, controlled by the selector, which is a xmm-register. One may shuffle bytes, sign- or zero extend bytes to words, dwords, etc.. One may also do one (or 16) in-register lookups from an byte array of 32 bytes. This is really one of the most flexible instructions for a lot of purposes and a real bit-twiddling dream.

Some other interesting XOD 128/256-bit integer instructions, see
Volume 6: 128-Bit and 256-Bit XOP, FMA4 and CVT16 Instructions (pdf):

Code: Select all

VPCMOV Vector Conditional Moves
VPCMOV dest, src1, src2, selector
dest&#91;0&#58;255&#93; &#58;= &#40;src1 & selector&#41; | &#40;src2 & ~selector&#41;;

VPROTB/W/D/Q Packed Rotate Bytes/Words/Dwords/Qwords
VPSHLB/W/D/Q Packed Shift Logical Bytes/Words/Dwords/Qwords
VPSHAB/W/D/Q Packed Shift Arithmetic Bytes/Words/Dwords/Qwords
a bunch of packed multiply add and accumulate instructions and horizontals adds, f.i.

Code: Select all

VPMADCSWD Packed Multiply Add and Accumulate Signed Word to Signed Doubleword
VPHADDBW Packed Horizontal Add Signed Byte to Signed Word