Are there any chess related applications of AMD's future (Vapo?) PPERM-Instruction (SSE5), now revised as (XOP) VPPERM-Instruction, beside bitboard reversal or flipping 8*8 boards?
Guess this thing will take some cycles. Anyway, a freaky instruction for crypto guys and advanced bitboarders. This is how VPPERM will hopefully work (Bulldozer or K11?, planned 2011):
For each of 16 destination bytes the corresponding selector-byte addresses one of 32 input bytes (from src1, src2) and a logical operation including bit-reversal:
Gerd Isenberg wrote:Are there any chess related applications of AMD's future (Vapo?) PPERM-Instruction (SSE5), now revised as (XOP) VPPERM-Instruction, beside bitboard reversal or flipping 8*8 boards?
Guess this thing will take some cycles. Anyway, a freaky instruction for crypto guys and advanced bitboarders. This is how VPPERM will hopefully work (Bulldozer or K11?, planned 2011):
For each of 16 destination bytes the corresponding selector-byte addresses one of 32 input bytes (from src1, src2) and a logical operation including bit-reversal:
char src[32]; // src2:src1
char select[16];
char dest[16];
for (int i = 0; i < 16; i++) {
char opera = select[i] >> 5;
char idx32 = select[i] & 31;
switch ( opera ) {
case 0: dest[i] = src[idx32]; break;
case 1: dest[i] = ~src[idx32]; break;
case 2: dest[i] = bitreverse( src[idx32]); break;
case 3: dest[i] = ~bitreverse( src[idx32]); break;
case 4: dest[i] = 0x00; break;
case 5: dest[i] = 0xFF; break;
case 6: dest[i] = src[idx32] >> 7; break; // signed shift
case 7: dest[i] = ~src[idx32] >> 7; break; // signed shift
}
}
I'll hope Intel will follow, to extend AVX in an compatible manner.
This looks like an old instruction from very old computers. The vax had a TTBS instruction that was a bit more complicated than this one. And which could be used to do the same thing as I read it.
Compiler writers liked it for certain parsing tasks, I never found a use for it in chess as I worked on that box.
Gerd Isenberg wrote:Are there any chess related applications of AMD's future (Vapo?) PPERM-Instruction (SSE5), now revised as (XOP) VPPERM-Instruction, beside bitboard reversal or flipping 8*8 boards?
Guess this thing will take some cycles. Anyway, a freaky instruction for crypto guys and advanced bitboarders. This is how VPPERM will hopefully work (Bulldozer or K11?, planned 2011):
For each of 16 destination bytes the corresponding selector-byte addresses one of 32 input bytes (from src1, src2) and a logical operation including bit-reversal:
char src[32]; // src2:src1
char select[16];
char dest[16];
for (int i = 0; i < 16; i++) {
char opera = select[i] >> 5;
char idx32 = select[i] & 31;
switch ( opera ) {
case 0: dest[i] = src[idx32]; break;
case 1: dest[i] = ~src[idx32]; break;
case 2: dest[i] = bitreverse( src[idx32]); break;
case 3: dest[i] = ~bitreverse( src[idx32]); break;
case 4: dest[i] = 0x00; break;
case 5: dest[i] = 0xFF; break;
case 6: dest[i] = src[idx32] >> 7; break; // signed shift
case 7: dest[i] = ~src[idx32] >> 7; break; // signed shift
}
}
I'll hope Intel will follow, to extend AVX in an compatible manner.
Looks like a small improvement over the Cell SPU's SHUFB instruction.
Altivec (or whatever it was called) had vperm, which could permute bytes from two registers in a data-dependent order controlled by the third register.
The SPU had SHUFB which could do that, plus override certain bytes with 0x00, 0xFF or 0x7F (and maybe one or two others?)
This instruction can do that, plus a few crazy things like bit-reversal and/or negation, and sign-bit saturation. Being able to control it on a byte-for-byte basis is actually pretty useful-looking. I'm sure it will find some uses in real-world SIMD algorithms.
bob wrote:
This looks like an old instruction from very old computers. The vax had a TTBS instruction that was a bit more complicated than this one. And which could be used to do the same thing as I read it.
Wow, really? Including bit reversal and sign-bit extension?
wgarvin wrote:
Looks like a small improvement over the Cell SPU's SHUFB instruction.
Altivec (or whatever it was called) had vperm, which could permute bytes from two registers in a data-dependent order controlled by the third register.
The SPU had SHUFB which could do that, plus override certain bytes with 0x00, 0xFF or 0x7F (and maybe one or two others?)
This instruction can do that, plus a few crazy things like bit-reversal and/or negation, and sign-bit saturation. Being able to control it on a byte-for-byte basis is actually pretty useful-looking. I'm sure it will find some uses in real-world SIMD algorithms.
Yes, it can do some things with bitboards, flip vertical (like bswap), mirror horizontal (bit-reversal) and in combination rotate by 180 degree, controlled by the selector, which is a xmm-register. One may shuffle bytes, sign- or zero extend bytes to words, dwords, etc.. One may also do one (or 16) in-register lookups from an byte array of 32 bytes. This is really one of the most flexible instructions for a lot of purposes and a real bit-twiddling dream.