Yet another bitboard attack generator

michiguel · Post by **michiguel** » Wed Oct 28, 2009 6:50 am

Gaviota uses the following functions to find the squares attacked by a given slider in a given board with blocking pieces (occ). I dropped my own implementation of rotated bitboards because the code was akward and carrying extra bitboards all over the place was a nuisance. This implementation is not faster or better, but at least it is not slower in my engine. It is certainly more compact, readable and clearer. It only needs a little supporting table. Most importantly (at least for me), It allows me to do something different than everybody else. It is more fun.

I checked CPW to see whether there is anything similar. Apparently, this would be a modification of the Dumb7fill method. The difference is that I do attacks in all directions in parallel, including the queen (with a little trick).

The method assume that we have a table
uint64_t reach [MAXPIECES] [MAXSQUARES];
where each bitboard there represents the attacks of a given slider from a given square in an empty board. I generate the table at start up with a 0x88 procedure

, but it is small enough to hardwire it into the program.
You can replace this with
uint64_t reach_bishop [64];
uint64_t reach_queen [64];
uint64_t reach_rook [64];
That is about ~1.5 k

The code below is optimized for (my) readability, so I am sure that it could be improved. At least in my engine, unrolling the loops slow things down.

Miguel
PS: I hope that the code is self-explanatory.

Code: Select all


/* not_A is a bitboard with all bits turned on except on column A */
/* not_H is a bitboard with all bits turned on except on column H */

/* grow in all directions */
static uint64_t
grow &#40;uint64_t x&#41;
&#123;
	x |= &#40;x << 1&#41; & not_A; 
	x |= &#40;x >> 1&#41; & not_H;	
	x |=  x >> 8;
	x |=  x << 8;
	return x;
&#125;

/* grow laterally like a rook */
static uint64_t
growlat &#40;uint64_t x&#41;
&#123;
	uint64_t y = x;
	x |= &#40;x << 1&#41; & not_A;
	x |= &#40;x >> 1&#41; & not_H;	
	y |=  y >> 8;
	y |=  y << 8;
	return x | y;	
&#125;


extern uint64_t
B_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;BISHOP&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
R_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;ROOK&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
Q_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini, h;

	r = reach&#91;QUEEN&#93;&#91;from&#93;;
	ini = U64&#40;1&#41; << from;
	
	g = grow &#40;ini&#41; & ~ini;
	h = growlat &#40;g & occ&#41;;
	h &= ~g;
	
	corral = r & ~occ;
	corral &= ~h;

	g &= ~occ;

	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	return &#40;grow&#40;g&#41; & r&#41; & ~ini & ~h ;
&#125;

Dann Corbit · Post by **Dann Corbit** » Wed Oct 28, 2009 7:03 am

Is Q_Attacks faster than R_Attacks + B_Attacks?

michiguel · Post by **michiguel** » Wed Oct 28, 2009 7:10 am

Dann Corbit wrote:Is Q_Attacks faster than R_Attacks + B_Attacks?

My guess is that is probably the same. Q_attacks is supposed to be more efficient but on average it will stop growing later than R_attacks + B_attacks. Any direction that continue growing makes the whole thing going. If we do R + B, maybe one or the other will stop sooner.

This is so low in my profiling that I did not bother checking but it is a good question.

Miguel

Gerd Isenberg · Post by **Gerd Isenberg** » Wed Oct 28, 2009 8:20 am

michiguel wrote:Gaviota uses the following functions to find the squares attacked by a given slider in a given board with blocking pieces (occ). I dropped my own implementation of rotated bitboards because the code was akward and carrying extra bitboards all over the place was a nuisance. This implementation is not faster or better, but at least it is not slower in my engine. It is certainly more compact, readable and clearer. It only needs a little supporting table. Most importantly (at least for me), It allows me to do something different than everybody else. It is more fun.

I checked CPW to see whether there is anything similar. Apparently, this would be a modification of the Dumb7fill method. The difference is that I do attacks in all directions in parallel, including the queen (with a little trick).

Fine! Your approach has also some similarities with Harald Lüßen's Exploding Bitboards. However, with loop and branches I expect it slower than the loopless Obstruction Difference or Hyperbola Quintessence, it takes some btb entries if heavily inlined. Of course dump7fill with one piece will often terminate early in the opening. One grow cycle takes 10 almost dependent operations. You may need up to seven fill cycles in worst case. Bishop HQ takes 17 ops (including 4 or 5 bswaps) with up to 3 ipc.

I agree it is more fun to rely on own ideas and code, and if your code and memory balance is sufficient, there are more important issues to improve engine strength

Gerd

bob · Post by **bob** » Wed Oct 28, 2009 6:50 pm

michiguel wrote:Gaviota uses the following functions to find the squares attacked by a given slider in a given board with blocking pieces (occ). I dropped my own implementation of rotated bitboards because the code was akward and carrying extra bitboards all over the place was a nuisance. This implementation is not faster or better, but at least it is not slower in my engine. It is certainly more compact, readable and clearer. It only needs a little supporting table. Most importantly (at least for me), It allows me to do something different than everybody else. It is more fun.

I checked CPW to see whether there is anything similar. Apparently, this would be a modification of the Dumb7fill method. The difference is that I do attacks in all directions in parallel, including the queen (with a little trick).

The method assume that we have a table
uint64_t reach [MAXPIECES] [MAXSQUARES];
where each bitboard there represents the attacks of a given slider from a given square in an empty board. I generate the table at start up with a 0x88 procedure , but it is small enough to hardwire it into the program.
You can replace this with
uint64_t reach_bishop [64];
uint64_t reach_queen [64];
uint64_t reach_rook [64];
That is about ~1.5 k

The code below is optimized for (my) readability, so I am sure that it could be improved. At least in my engine, unrolling the loops slow things down.

Miguel
PS: I hope that the code is self-explanatory.
Code: Select all
/* not_A is a bitboard with all bits turned on except on column A */
/* not_H is a bitboard with all bits turned on except on column H */

/* grow in all directions */
static uint64_t
grow &#40;uint64_t x&#41;
&#123;
	x |= &#40;x << 1&#41; & not_A; 
	x |= &#40;x >> 1&#41; & not_H;	
	x |=  x >> 8;
	x |=  x << 8;
	return x;
&#125;

/* grow laterally like a rook */
static uint64_t
growlat &#40;uint64_t x&#41;
&#123;
	uint64_t y = x;
	x |= &#40;x << 1&#41; & not_A;
	x |= &#40;x >> 1&#41; & not_H;	
	y |=  y >> 8;
	y |=  y << 8;
	return x | y;	
&#125;


extern uint64_t
B_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;BISHOP&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
R_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;ROOK&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
Q_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini, h;

	r = reach&#91;QUEEN&#93;&#91;from&#93;;
	ini = U64&#40;1&#41; << from;
	
	g = grow &#40;ini&#41; & ~ini;
	h = growlat &#40;g & occ&#41;;
	h &= ~g;
	
	corral = r & ~occ;
	corral &= ~h;

	g &= ~occ;

	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	return &#40;grow&#40;g&#41; & r&#41; & ~ini & ~h ;
&#125;

Would have to think about the speed, but there is yet another way to do with with fewer loops.

I'll describe this for bishops only. The rooks will be just as obvious and queen combine both.

Code: Select all


1.  I assume some sort of array b_attacks&#91;64&#93; that has a bitboard representation of all the possible squares a bishop on square s can reach on an empty board.

2.  b_moves = b_attacks&#91;square&#93;;

3.  Now we need to determine where each diagonal is blocked.  so we do blockers = b_moves & occupied.

4.  we now use a 4x loop &#40;for the four directions&#41;.  we need a new set of bitboards like plus7dir&#91;64&#93;, plus9dir&#91;64&#93;, minus7dir&#91;64&#93; and minus9dir&#91;64&#93; where these are a subset of the oritinal b_attacks&#91;sq&#93; for each diagonal originating at sq.

     4a.  we take blockers & plus7dir&#91;sq&#93; to find out where that diagonal is blocked.  Since this is the + direction, I assume that matches the bit numbers which means the bit numbers for each square in this direction is increasing.

     4b.  We now compute b =LSB&#40;x&#41; where x is the bitboard computed in step 4a.  This gives us the _first_ blocker down that diagonal.  We then compute b_moves &= ~plus7dir&#91;blocker&#93;.

explanation: We first found all the bits on the diagonals that a bishop can move to on an open board. We then found the square(s) where that diagonal is blocked. We locate the first, which is the only blocker that counts. Now we take the diagonal bits in that same direction, from the blocking square, and complement them. That makes the diagonal bits beyond the blocker zero (0) and all other bits on the board one (1). When we and that with the bishop moves, we 'chop off' the diagonal beyond that blocker.

4c. We repeat that for the remaining 3 diagonals, noting that for the minus7 and minus9 dir we have to use MSB() instead since we are looking in the backward directions with respect to bit numbers.

This works and is fast. It is nearly as fast as rotated bitboards or magic bitboards. And certainly more cache-friendly. But "nearly" isn't always "good enough".

I'm using magic generation which is as fast as anything I have seen, and the code is quite simple. The tables are a bit big, but doesn't seem to hurt performance as the magic operations are faster, but the table lookups hurt a bit. Overall this is a gain.

Michael Sherwin · Post by **Michael Sherwin** » Thu Oct 29, 2009 12:37 am

bob wrote:
michiguel wrote:Gaviota uses the following functions to find the squares attacked by a given slider in a given board with blocking pieces (occ). I dropped my own implementation of rotated bitboards because the code was akward and carrying extra bitboards all over the place was a nuisance. This implementation is not faster or better, but at least it is not slower in my engine. It is certainly more compact, readable and clearer. It only needs a little supporting table. Most importantly (at least for me), It allows me to do something different than everybody else. It is more fun.

I checked CPW to see whether there is anything similar. Apparently, this would be a modification of the Dumb7fill method. The difference is that I do attacks in all directions in parallel, including the queen (with a little trick).

The method assume that we have a table
uint64_t reach [MAXPIECES] [MAXSQUARES];
where each bitboard there represents the attacks of a given slider from a given square in an empty board. I generate the table at start up with a 0x88 procedure , but it is small enough to hardwire it into the program.
You can replace this with
uint64_t reach_bishop [64];
uint64_t reach_queen [64];
uint64_t reach_rook [64];
That is about ~1.5 k

The code below is optimized for (my) readability, so I am sure that it could be improved. At least in my engine, unrolling the loops slow things down.

Miguel
PS: I hope that the code is self-explanatory.
Code: Select all
/* not_A is a bitboard with all bits turned on except on column A */
/* not_H is a bitboard with all bits turned on except on column H */

/* grow in all directions */
static uint64_t
grow &#40;uint64_t x&#41;
&#123;
	x |= &#40;x << 1&#41; & not_A; 
	x |= &#40;x >> 1&#41; & not_H;	
	x |=  x >> 8;
	x |=  x << 8;
	return x;
&#125;

/* grow laterally like a rook */
static uint64_t
growlat &#40;uint64_t x&#41;
&#123;
	uint64_t y = x;
	x |= &#40;x << 1&#41; & not_A;
	x |= &#40;x >> 1&#41; & not_H;	
	y |=  y >> 8;
	y |=  y << 8;
	return x | y;	
&#125;


extern uint64_t
B_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;BISHOP&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
R_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;ROOK&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
Q_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini, h;

	r = reach&#91;QUEEN&#93;&#91;from&#93;;
	ini = U64&#40;1&#41; << from;
	
	g = grow &#40;ini&#41; & ~ini;
	h = growlat &#40;g & occ&#41;;
	h &= ~g;
	
	corral = r & ~occ;
	corral &= ~h;

	g &= ~occ;

	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	return &#40;grow&#40;g&#41; & r&#41; & ~ini & ~h ;
&#125;
Would have to think about the speed, but there is yet another way to do with with fewer loops.

I'll describe this for bishops only. The rooks will be just as obvious and queen combine both.
Code: Select all
1.  I assume some sort of array b_attacks&#91;64&#93; that has a bitboard representation of all the possible squares a bishop on square s can reach on an empty board.

2.  b_moves = b_attacks&#91;square&#93;;

3.  Now we need to determine where each diagonal is blocked.  so we do blockers = b_moves & occupied.

4.  we now use a 4x loop &#40;for the four directions&#41;.  we need a new set of bitboards like plus7dir&#91;64&#93;, plus9dir&#91;64&#93;, minus7dir&#91;64&#93; and minus9dir&#91;64&#93; where these are a subset of the oritinal b_attacks&#91;sq&#93; for each diagonal originating at sq.

     4a.  we take blockers & plus7dir&#91;sq&#93; to find out where that diagonal is blocked.  Since this is the + direction, I assume that matches the bit numbers which means the bit numbers for each square in this direction is increasing.

     4b.  We now compute b =LSB&#40;x&#41; where x is the bitboard computed in step 4a.  This gives us the _first_ blocker down that diagonal.  We then compute b_moves &= ~plus7dir&#91;blocker&#93;.
explanation: We first found all the bits on the diagonals that a bishop can move to on an open board. We then found the square(s) where that diagonal is blocked. We locate the first, which is the only blocker that counts. Now we take the diagonal bits in that same direction, from the blocking square, and complement them. That makes the diagonal bits beyond the blocker zero (0) and all other bits on the board one (1). When we and that with the bishop moves, we 'chop off' the diagonal beyond that blocker.

4c. We repeat that for the remaining 3 diagonals, noting that for the minus7 and minus9 dir we have to use MSB() instead since we are looking in the backward directions with respect to bit numbers.

This works and is fast. It is nearly as fast as rotated bitboards or magic bitboards. And certainly more cache-friendly. But "nearly" isn't always "good enough". I'm using magic generation which is as fast as anything I have seen, and the code is quite simple. The tables are a bit big, but doesn't seem to hurt performance as the magic operations are faster, but the table lookups hurt a bit. Overall this is a gain.

Is what you are describing faster than this:

Code: Select all

        h->moves&#91;id&#93; = (&#40;ray7p&#91;fs&#93; & belowInc&#91;FirstBit&#40;ray7p&#91;fs&#93; & aPieces&#41;&#93;)
                     |  &#40;ray9p&#91;fs&#93; & belowInc&#91;FirstBit&#40;ray9p&#91;fs&#93; & aPieces&#41;&#93;)
                     |  &#40;ray7m&#91;fs&#93; & aboveInc&#91;LastBit&#40;ray7m&#91;fs&#93; & aPieces&#41;&#93;)
                     |  &#40;ray9m&#91;fs&#93; & aboveInc&#91;LastBit&#40;ray9m&#91;fs&#93; & aPieces&#41;&#93;));

:and why?

Thanks!

bob · Post by **bob** » Thu Oct 29, 2009 7:56 am

Michael Sherwin wrote:
bob wrote:
michiguel wrote:Gaviota uses the following functions to find the squares attacked by a given slider in a given board with blocking pieces (occ). I dropped my own implementation of rotated bitboards because the code was akward and carrying extra bitboards all over the place was a nuisance. This implementation is not faster or better, but at least it is not slower in my engine. It is certainly more compact, readable and clearer. It only needs a little supporting table. Most importantly (at least for me), It allows me to do something different than everybody else. It is more fun.

I checked CPW to see whether there is anything similar. Apparently, this would be a modification of the Dumb7fill method. The difference is that I do attacks in all directions in parallel, including the queen (with a little trick).

The method assume that we have a table
uint64_t reach [MAXPIECES] [MAXSQUARES];
where each bitboard there represents the attacks of a given slider from a given square in an empty board. I generate the table at start up with a 0x88 procedure , but it is small enough to hardwire it into the program.
You can replace this with
uint64_t reach_bishop [64];
uint64_t reach_queen [64];
uint64_t reach_rook [64];
That is about ~1.5 k

The code below is optimized for (my) readability, so I am sure that it could be improved. At least in my engine, unrolling the loops slow things down.

Miguel
PS: I hope that the code is self-explanatory.
Code: Select all
/* not_A is a bitboard with all bits turned on except on column A */
/* not_H is a bitboard with all bits turned on except on column H */

/* grow in all directions */
static uint64_t
grow &#40;uint64_t x&#41;
&#123;
	x |= &#40;x << 1&#41; & not_A; 
	x |= &#40;x >> 1&#41; & not_H;	
	x |=  x >> 8;
	x |=  x << 8;
	return x;
&#125;

/* grow laterally like a rook */
static uint64_t
growlat &#40;uint64_t x&#41;
&#123;
	uint64_t y = x;
	x |= &#40;x << 1&#41; & not_A;
	x |= &#40;x >> 1&#41; & not_H;	
	y |=  y >> 8;
	y |=  y << 8;
	return x | y;	
&#125;


extern uint64_t
B_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;BISHOP&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
R_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;ROOK&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
Q_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini, h;

	r = reach&#91;QUEEN&#93;&#91;from&#93;;
	ini = U64&#40;1&#41; << from;
	
	g = grow &#40;ini&#41; & ~ini;
	h = growlat &#40;g & occ&#41;;
	h &= ~g;
	
	corral = r & ~occ;
	corral &= ~h;

	g &= ~occ;

	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	return &#40;grow&#40;g&#41; & r&#41; & ~ini & ~h ;
&#125;
Would have to think about the speed, but there is yet another way to do with with fewer loops.

I'll describe this for bishops only. The rooks will be just as obvious and queen combine both.
Code: Select all
1.  I assume some sort of array b_attacks&#91;64&#93; that has a bitboard representation of all the possible squares a bishop on square s can reach on an empty board.

2.  b_moves = b_attacks&#91;square&#93;;

3.  Now we need to determine where each diagonal is blocked.  so we do blockers = b_moves & occupied.

4.  we now use a 4x loop &#40;for the four directions&#41;.  we need a new set of bitboards like plus7dir&#91;64&#93;, plus9dir&#91;64&#93;, minus7dir&#91;64&#93; and minus9dir&#91;64&#93; where these are a subset of the oritinal b_attacks&#91;sq&#93; for each diagonal originating at sq.

     4a.  we take blockers & plus7dir&#91;sq&#93; to find out where that diagonal is blocked.  Since this is the + direction, I assume that matches the bit numbers which means the bit numbers for each square in this direction is increasing.

     4b.  We now compute b =LSB&#40;x&#41; where x is the bitboard computed in step 4a.  This gives us the _first_ blocker down that diagonal.  We then compute b_moves &= ~plus7dir&#91;blocker&#93;.
explanation: We first found all the bits on the diagonals that a bishop can move to on an open board. We then found the square(s) where that diagonal is blocked. We locate the first, which is the only blocker that counts. Now we take the diagonal bits in that same direction, from the blocking square, and complement them. That makes the diagonal bits beyond the blocker zero (0) and all other bits on the board one (1). When we and that with the bishop moves, we 'chop off' the diagonal beyond that blocker.

4c. We repeat that for the remaining 3 diagonals, noting that for the minus7 and minus9 dir we have to use MSB() instead since we are looking in the backward directions with respect to bit numbers.

This works and is fast. It is nearly as fast as rotated bitboards or magic bitboards. And certainly more cache-friendly. But "nearly" isn't always "good enough". I'm using magic generation which is as fast as anything I have seen, and the code is quite simple. The tables are a bit big, but doesn't seem to hurt performance as the magic operations are faster, but the table lookups hurt a bit. Overall this is a gain.
Is what you are describing faster than this:
Code: Select all
        h->moves&#91;id&#93; = (&#40;ray7p&#91;fs&#93; & belowInc&#91;FirstBit&#40;ray7p&#91;fs&#93; & aPieces&#41;&#93;)
                     |  &#40;ray9p&#91;fs&#93; & belowInc&#91;FirstBit&#40;ray9p&#91;fs&#93; & aPieces&#41;&#93;)
                     |  &#40;ray7m&#91;fs&#93; & aboveInc&#91;LastBit&#40;ray7m&#91;fs&#93; & aPieces&#41;&#93;)
                     |  &#40;ray9m&#91;fs&#93; & aboveInc&#91;LastBit&#40;ray9m&#91;fs&#93; & aPieces&#41;&#93;));
:and why?

Thanks!

Hard to say. You are doing each ray separately. I did effectively the same thing, except I set all rays at once, and then clipped each one off at the first blocking piece. Whether you use a loop 4x or unroll the loop into 4 pieces as above is probably unimportant as a compiler would likely unroll anyway. Main thing is to avoid branches.

michiguel · Post by **michiguel** » Thu Oct 29, 2009 10:12 am

bob wrote:
michiguel wrote:Gaviota uses the following functions to find the squares attacked by a given slider in a given board with blocking pieces (occ). I dropped my own implementation of rotated bitboards because the code was akward and carrying extra bitboards all over the place was a nuisance. This implementation is not faster or better, but at least it is not slower in my engine. It is certainly more compact, readable and clearer. It only needs a little supporting table. Most importantly (at least for me), It allows me to do something different than everybody else. It is more fun.

I checked CPW to see whether there is anything similar. Apparently, this would be a modification of the Dumb7fill method. The difference is that I do attacks in all directions in parallel, including the queen (with a little trick).

The method assume that we have a table
uint64_t reach [MAXPIECES] [MAXSQUARES];
where each bitboard there represents the attacks of a given slider from a given square in an empty board. I generate the table at start up with a 0x88 procedure , but it is small enough to hardwire it into the program.
You can replace this with
uint64_t reach_bishop [64];
uint64_t reach_queen [64];
uint64_t reach_rook [64];
That is about ~1.5 k

The code below is optimized for (my) readability, so I am sure that it could be improved. At least in my engine, unrolling the loops slow things down.

Miguel
PS: I hope that the code is self-explanatory.
Code: Select all
/* not_A is a bitboard with all bits turned on except on column A */
/* not_H is a bitboard with all bits turned on except on column H */

/* grow in all directions */
static uint64_t
grow &#40;uint64_t x&#41;
&#123;
	x |= &#40;x << 1&#41; & not_A; 
	x |= &#40;x >> 1&#41; & not_H;	
	x |=  x >> 8;
	x |=  x << 8;
	return x;
&#125;

/* grow laterally like a rook */
static uint64_t
growlat &#40;uint64_t x&#41;
&#123;
	uint64_t y = x;
	x |= &#40;x << 1&#41; & not_A;
	x |= &#40;x >> 1&#41; & not_H;	
	y |=  y >> 8;
	y |=  y << 8;
	return x | y;	
&#125;


extern uint64_t
B_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;BISHOP&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
R_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;ROOK&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
Q_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini, h;

	r = reach&#91;QUEEN&#93;&#91;from&#93;;
	ini = U64&#40;1&#41; << from;
	
	g = grow &#40;ini&#41; & ~ini;
	h = growlat &#40;g & occ&#41;;
	h &= ~g;
	
	corral = r & ~occ;
	corral &= ~h;

	g &= ~occ;

	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	return &#40;grow&#40;g&#41; & r&#41; & ~ini & ~h ;
&#125;
Would have to think about the speed, but there is yet another way to do with with fewer loops.

I'll describe this for bishops only. The rooks will be just as obvious and queen combine both.
Code: Select all
1.  I assume some sort of array b_attacks&#91;64&#93; that has a bitboard representation of all the possible squares a bishop on square s can reach on an empty board.

2.  b_moves = b_attacks&#91;square&#93;;

3.  Now we need to determine where each diagonal is blocked.  so we do blockers = b_moves & occupied.

4.  we now use a 4x loop &#40;for the four directions&#41;.  we need a new set of bitboards like plus7dir&#91;64&#93;, plus9dir&#91;64&#93;, minus7dir&#91;64&#93; and minus9dir&#91;64&#93; where these are a subset of the oritinal b_attacks&#91;sq&#93; for each diagonal originating at sq.

     4a.  we take blockers & plus7dir&#91;sq&#93; to find out where that diagonal is blocked.  Since this is the + direction, I assume that matches the bit numbers which means the bit numbers for each square in this direction is increasing.

     4b.  We now compute b =LSB&#40;x&#41; where x is the bitboard computed in step 4a.  This gives us the _first_ blocker down that diagonal.  We then compute b_moves &= ~plus7dir&#91;blocker&#93;.
explanation: We first found all the bits on the diagonals that a bishop can move to on an open board. We then found the square(s) where that diagonal is blocked. We locate the first, which is the only blocker that counts. Now we take the diagonal bits in that same direction, from the blocking square, and complement them. That makes the diagonal bits beyond the blocker zero (0) and all other bits on the board one (1). When we and that with the bishop moves, we 'chop off' the diagonal beyond that blocker.

4c. We repeat that for the remaining 3 diagonals, noting that for the minus7 and minus9 dir we have to use MSB() instead since we are looking in the backward directions with respect to bit numbers.

This works and is fast. It is nearly as fast as rotated bitboards or magic bitboards. And certainly more cache-friendly. But "nearly" isn't always "good enough". I'm using magic generation which is as fast as anything I have seen, and the code is quite simple. The tables are a bit big, but doesn't seem to hurt performance as the magic operations are faster, but the table lookups hurt a bit. Overall this is a gain.

How do you use MSB? with assembler? I try to avoid that. Still, even if I use an algorithm in C it may be fast. I will try it later.
I use a similar concept to find if two pieces attack each other in some other places of the program, but I do not need to use MSB there, because I always start from the lowest bit.

Miguel

bob · Post by **bob** » Thu Oct 29, 2009 10:36 pm

I simply use inline asm. MSB is bsr, LSB is bsf.

michiguel · Post by **michiguel** » Thu Oct 29, 2009 11:15 pm

bob wrote:I simply use inline asm. MSB is bsr, LSB is bsf.

That's what I figured. I generally try to avoid inline asm because my engine is not fast at all, and the only thing I do is to compromise portability. But you made me think about the possibility to have an MSB that is iterative. It may be fine because this bitboards will not be very populated.

Code: Select all

uint64_t
msb &#40;uint64_t x&#41;
&#123; 
    uint64_t y;
    do &#123;
        y = x;
        x &= x - 1;
    &#125; while &#40;x&#41;;
    return y;
&#125;

Miguel

Yet another bitboard attack generator

Yet another bitboard attack generator

Re: Yet another bitboard attack generator

Re: Yet another bitboard attack generator

Re: Yet another bitboard attack generator

Re: Yet another bitboard attack generator

Re: Yet another bitboard attack generator

Re: Yet another bitboard attack generator

Re: Yet another bitboard attack generator

Re: Yet another bitboard attack generator

Re: Yet another bitboard attack generator