If the from- and to-square numbers were packed in bytes in the same integer to begin with, as part of a move from the move list, you could do the |7 and addition SIMD-wise, to end up with 256*t88 + f88. If you would multiply that by -255 << 16 = (-256 + 1) << 16 you would get (-256*256*t88 + 256*(t88-f88) + f88) << 16. and the first term would have been shifted out of the word. Since f88 would be positive, it would not contaminate the higher bytes with its sign extension, and you would get rid of it entirely by shifting 24 bits to the right:
I suppose in modern C standards this might be undefined behavior, so you would have to do some casting so that the overflowing multiply is an unsigned, but the right-shift is a signed int. Or perhaps you should rely on the whole thing being unsigned, mapping negative differences 256 entries higher. You would then not need the +120. (Which was free anyway. You could also have eliminated that add from your code, adding it to the array address at compile time.)
direction(int, int):
mov edx, esi
mov eax, edi
and esi, 7
and edi, 7
sar edx, 3
sar eax, 3
xor r8d, r8d
sub esi, edi
mov edi, edx
sete r8b
xor ecx, ecx
sub edi, eax
sete cl
cmp edi, esi
sete dil
sub eax, edx
lea ecx, [rcx+r8*2]
cmp eax, esi
movzx edi, dil
sete al
lea ecx, [rcx+rdi*4]
movzx eax, al
lea eax, [rcx+rax*8]
sal eax, 3
ret
direction(int, int):
mov eax, esi
mov edx, edi
and esi, 7
and edi, 7
sar edx, 3
sar eax, 3
xor ecx, ecx
sub esi, edi
sete cl
sub eax, edx
sete dl
movzx edx, dl
lea edx, [rdx+rcx*2]
xor ecx, ecx
cmp eax, esi
sete cl
add eax, esi
sete al
lea edx, [rdx+rcx*4]
movzx eax, al
lea eax, [rdx+rax*8]
sal eax, 3
ret
lucasart wrote: ↑Wed Aug 19, 2020 12:39 pm
Code like this lights all the red flags upon code review. It is way too clever to be trusted. If it's not obvious, it can't be trusted, and must be tested (leaving a unit test to allow later modifications of the codebase).
Do you trust the code a little more, when I show you one usage of this function?
lucasart wrote: ↑Wed Aug 19, 2020 12:39 pm
Code like this lights all the red flags upon code review. It is way too clever to be trusted. If it's not obvious, it can't be trusted, and must be tested (leaving a unit test to allow later modifications of the codebase).
Do you trust the code a little more, when I show you one usage of this function?
The position of one of my favorite position. They say, this move (Qd3) is invisible for chess engines... he is not the only one. https://www.youtube.com/watch?v=yGnpewUKP88
OliverBr wrote: ↑Thu Aug 20, 2020 1:49 am
This is very nice, unfortunately doesn't bring any win:
That is fine, seems the branch prediction works quite well. Your code is bit-twiddling as its best, and imho requires some comments (despite square mapping dependency) to immediately become understandable and possibly some asserts for the precondition. I guess if it gains a few Elo, even Stockfish would take it. Testing divisibility by 7 fails due to 7x9 = 63
// precondition f and t are on common rank, file, diagonal or antidiagonal
int getDir(int f, int t) {
if (((f ^ t) & 56) == 0) return 8; // rank delta zero -> common rank
if (((f ^ t) & 7) == 0) return 16; // file delta zero -> common file
return (((f - t) % 9) == 0) ? 32 : 64; // delta divisible by 9 -> common diagonal, otherwise antidia
}
getDir(int, int):
mov edx, edi
mov eax, 8
xor edx, esi
test dl, 56
je .L1
and edx, 7
mov eax, 16
je .L1
sub edi, esi
imul edi, edi, 954437177
add edi, 238609294
cmp edi, 477218589
sbb eax, eax
and eax, -32
add eax, 64
.L1:
ret
getDir(int, int):
mov edx, edi
mov eax, 8
xor edx, esi
test dl, 56
je .L5
and edx, 7
mov eax, 16
je .L5
sub esi, edi
imul esi, esi, 954437177
add esi, 238609294
cmp esi, 477218589
sbb eax, eax
and eax, -32
add eax, 64
.L5:
ret
Gerd Isenberg wrote: ↑Thu Aug 20, 2020 10:43 am
That is fine, seems the branch prediction works quite well. Your code is bit-twiddling as its best, and imho requires some comments (despite square mapping dependency) to immediately become understandable and possibly some asserts for the precondition. I guess if it gains a few Elo, even Stockfish would take it. Testing divisibility by 7 fails due to 7x9 = 63
You are right, of course. It is quite sparsely commented. It should be more.
My idea was to look a the movement from square f (0...63) when adding some numbers:
f+1 : same rank, next file.
f+7 : next rank, previous file.
f+8 : next rank, same file.
f+9 : next rank, next file.
This bug where I dismissed the fact that field 63 is dividable by both 7 and 9 came from the fact, that the perft numbers were already perfect for every position it tried. That's because only very strange positions with B/Q/K on the black corners fail, like this one: