
If the from- and to-square numbers were packed in bytes in the same integer to begin with, as part of a move from the move list, you could do the |7 and addition SIMD-wise, to end up with 256*t88 + f88. If you would multiply that by -255 << 16 = (-256 + 1) << 16 you would get (-256*256*t88 + 256*(t88-f88) + f88) << 16. and the first term would have been shifted out of the word. Since f88 would be positive, it would not contaminate the higher bytes with its sign extension, and you would get rid of it entirely by shifting 24 bits to the right:
return table[(((move | 0x707) + move)*0xFF010000 >> 24) + 120];
I suppose in modern C standards this might be undefined behavior, so you would have to do some casting so that the overflowing multiply is an unsigned, but the right-shift is a signed int. Or perhaps you should rely on the whole thing being unsigned, mapping negative differences 256 entries higher. You would then not need the +120. (Which was free anyway. You could also have eliminated that add from your code, adding it to the array address at compile time.)