Jacob wrote:
I might just keep my old move generator that has disjoint loops,... At the very least, I won't have to rewrite my engine to use the different data structures :-/. It was fun playing with the idea, though.
C++ templates can help to avoid source code duplication in cases:
1) white/black pieces moves;
I'm kind of new to C/C++, and my engine is written in C++, Do you have an example of how you might use a template for this?
With integer-templates someting like this is possible for "side" as compile time parameter.
Code: Select all
template <int side>
void genPawnPushs(u64 &moves)
{
int pc = PAWN|side;
while (moves)
{
u32 sqr[2];
sqr[side] = PopLSB(moves);
sqr[side^1] = sqr[side] - 8;
AddMove(sqr[0], sqr[1], pc);
}
}
Since "side" is compile time invariant - either 0 or 1 for each incarnation - the optimizer can make it disappear.
The drawback is with a side-variable, you can only use it with a conditional branch ...
Code: Select all
if ( side == white )
genPawnPushs<white>(...);
else
genPawnPushs<black>(...);
... Which might be nice, if good predictable (likely) - because the templates pay off in smaller code of the loop-bodies.
There are other, probably better solutions for your purpose - to hoist the side-variant outside the loop body without introducing too much register pressure and memory accesses.
Your approach is to traverse different set-semantics, to get rid of the side-condition. You traverse either a set of from-squares or to-squares - and then you do a conditional swap of the from/to-squares via memory inside the loop body - which seems quite expensive according to your posted assembly. Ok, expensive is relative here - the memory is L1 and processor (at least AMD K8, K10) will apply store/load forwarding. Anyway, there are more and longer instructions.
You may try the conditional swap via registers instead of memory ...
Code: Select all
void genPawnPushs(u64 moves, int side /* {0,1} */)
{
int pc = PAWN|side;
side = -sise; // {0,-1}
while (moves)
{
from = PopLSB(&moves);
to = from - 8;
swpm = (from ^ to) & side; // 0 if side == 0, otherwise from ^ to
from = from ^ swpm; // becomes "to" if side == 1
to = to ^ swpm; // becomes "from" if side == 1
AddMove(from, to, pc);
}
}
... but that also looks not that cheap either - four register ops inside the loop body. I suggest the "usual" and imho more consistant semantic of traversing target-sets only - and to calculate from-to delta (eg. +-8) by side as a loop invariant.
Code: Select all
void genPawnPushs(u64 moves, int side /* {0,1} */)
{
int delta = side ? 8 : -8; // hopefully a branchless cmove
int pc = PAWN|side;
while (moves)
{
from = PopLSB(&moves);
to = from + delta;
AddMove(from, to, pc);
}
}
or if the compiler doesn't like cmove
Code: Select all
void genPawnPushs(u64 moves, int side /* {0,1} */)
{
int delta = side * 16 - 8; // (side << 4) - 8 or even delta = pawnPushDelta[side]
int pc = PAWN|side;
while (moves)
{
from = PopLSB(&moves);
to = from + delta;
AddMove(from, to, pc);
}
}
Some further remarks to your assembly - Gnu ATA-Syntax is a bit hard to read for me
You should inline PopLSB - and I also suggest to use a pure bitscanforward and to reset the found bit explicitly inside the loop by
moves &= moves - 1. This usually produces better and faster code, more parallel computation, and don't takes redundant tests for the continue-condition at the end of the loop. It also helps some compilers to keep the traversed set inside a register - but it is a matter of taste, and also the optimization-ability of the compiler. Of course it only works with bitscanforward but not with bitscanreverse that way.
Code: Select all
#ifdef ...
inline int bitscanforward(u64 moves)
{
long idx;
_BitScanForward64(&idx, moves);
return (int) idx;
}
#else
...
#endif
void genPawnPushs(u64 moves, int side /* {0,1} */)
{
int delta = side * 16 - 8; // (side << 4) - 8
int pc = PAWN|side;
while (moves)
{
from = bitscanforward(moves);
AddMove(from, from + delta, pc);
moves &= moves - 1;
}
}
Cheers,
Gerd