Guess this thing will take some cycles. Anyway, a freaky instruction for crypto guys and advanced bitboarders. This is how VPPERM will hopefully work (Bulldozer or K11?, planned 2011):
Code: Select all
VPPERM dest, src1, src2, selector
For each of 16 destination bytes the corresponding selector-byte addresses one of 32 input bytes (from src1, src2) and a logical operation including bit-reversal:
Code: Select all
char src[32]; // src2:src1
char select[16];
char dest[16];
for (int i = 0; i < 16; i++) {
char opera = select[i] >> 5;
char idx32 = select[i] & 31;
switch ( opera ) {
case 0: dest[i] = src[idx32]; break;
case 1: dest[i] = ~src[idx32]; break;
case 2: dest[i] = bitreverse( src[idx32]); break;
case 3: dest[i] = ~bitreverse( src[idx32]); break;
case 4: dest[i] = 0x00; break;
case 5: dest[i] = 0xFF; break;
case 6: dest[i] = src[idx32] >> 7; break; // signed shift
case 7: dest[i] = ~src[idx32] >> 7; break; // signed shift
}
}