Yet another bitboard attack generator

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Yet another bitboard attack generator

Post by michiguel »

Gaviota uses the following functions to find the squares attacked by a given slider in a given board with blocking pieces (occ). I dropped my own implementation of rotated bitboards because the code was akward and carrying extra bitboards all over the place was a nuisance. This implementation is not faster or better, but at least it is not slower in my engine. It is certainly more compact, readable and clearer. It only needs a little supporting table. Most importantly (at least for me), It allows me to do something different than everybody else. It is more fun.

I checked CPW to see whether there is anything similar. Apparently, this would be a modification of the Dumb7fill method. The difference is that I do attacks in all directions in parallel, including the queen (with a little trick).

The method assume that we have a table
uint64_t reach [MAXPIECES] [MAXSQUARES];
where each bitboard there represents the attacks of a given slider from a given square in an empty board. I generate the table at start up with a 0x88 procedure :-), but it is small enough to hardwire it into the program.
You can replace this with
uint64_t reach_bishop [64];
uint64_t reach_queen [64];
uint64_t reach_rook [64];

That is about ~1.5 k

The code below is optimized for (my) readability, so I am sure that it could be improved. At least in my engine, unrolling the loops slow things down.

Miguel
PS: I hope that the code is self-explanatory.

Code: Select all


/* not_A is a bitboard with all bits turned on except on column A */
/* not_H is a bitboard with all bits turned on except on column H */

/* grow in all directions */
static uint64_t
grow (uint64_t x)
{
	x |= &#40;x << 1&#41; & not_A; 
	x |= &#40;x >> 1&#41; & not_H;	
	x |=  x >> 8;
	x |=  x << 8;
	return x;
&#125;

/* grow laterally like a rook */
static uint64_t
growlat &#40;uint64_t x&#41;
&#123;
	uint64_t y = x;
	x |= &#40;x << 1&#41; & not_A;
	x |= &#40;x >> 1&#41; & not_H;	
	y |=  y >> 8;
	y |=  y << 8;
	return x | y;	
&#125;


extern uint64_t
B_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;BISHOP&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
R_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;ROOK&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
Q_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini, h;

	r = reach&#91;QUEEN&#93;&#91;from&#93;;
	ini = U64&#40;1&#41; << from;
	
	g = grow &#40;ini&#41; & ~ini;
	h = growlat &#40;g & occ&#41;;
	h &= ~g;
	
	corral = r & ~occ;
	corral &= ~h;

	g &= ~occ;

	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	return &#40;grow&#40;g&#41; & r&#41; & ~ini & ~h ;
&#125;
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Yet another bitboard attack generator

Post by Dann Corbit »

Is Q_Attacks faster than R_Attacks + B_Attacks?
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Yet another bitboard attack generator

Post by michiguel »

Dann Corbit wrote:Is Q_Attacks faster than R_Attacks + B_Attacks?
My guess is that is probably the same. Q_attacks is supposed to be more efficient but on average it will stop growing later than R_attacks + B_attacks. Any direction that continue growing makes the whole thing going. If we do R + B, maybe one or the other will stop sooner.

This is so low in my profiling that I did not bother checking but it is a good question.

Miguel
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Yet another bitboard attack generator

Post by Gerd Isenberg »

michiguel wrote:Gaviota uses the following functions to find the squares attacked by a given slider in a given board with blocking pieces (occ). I dropped my own implementation of rotated bitboards because the code was akward and carrying extra bitboards all over the place was a nuisance. This implementation is not faster or better, but at least it is not slower in my engine. It is certainly more compact, readable and clearer. It only needs a little supporting table. Most importantly (at least for me), It allows me to do something different than everybody else. It is more fun.

I checked CPW to see whether there is anything similar. Apparently, this would be a modification of the Dumb7fill method. The difference is that I do attacks in all directions in parallel, including the queen (with a little trick).
Fine! Your approach has also some similarities with Harald Lüßen's Exploding Bitboards. However, with loop and branches I expect it slower than the loopless Obstruction Difference or Hyperbola Quintessence, it takes some btb entries if heavily inlined. Of course dump7fill with one piece will often terminate early in the opening. One grow cycle takes 10 almost dependent operations. You may need up to seven fill cycles in worst case. Bishop HQ takes 17 ops (including 4 or 5 bswaps) with up to 3 ipc.

I agree it is more fun to rely on own ideas and code, and if your code and memory balance is sufficient, there are more important issues to improve engine strength ;-)

Gerd
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Yet another bitboard attack generator

Post by bob »

michiguel wrote:Gaviota uses the following functions to find the squares attacked by a given slider in a given board with blocking pieces (occ). I dropped my own implementation of rotated bitboards because the code was akward and carrying extra bitboards all over the place was a nuisance. This implementation is not faster or better, but at least it is not slower in my engine. It is certainly more compact, readable and clearer. It only needs a little supporting table. Most importantly (at least for me), It allows me to do something different than everybody else. It is more fun.

I checked CPW to see whether there is anything similar. Apparently, this would be a modification of the Dumb7fill method. The difference is that I do attacks in all directions in parallel, including the queen (with a little trick).

The method assume that we have a table
uint64_t reach [MAXPIECES] [MAXSQUARES];
where each bitboard there represents the attacks of a given slider from a given square in an empty board. I generate the table at start up with a 0x88 procedure :-), but it is small enough to hardwire it into the program.
You can replace this with
uint64_t reach_bishop [64];
uint64_t reach_queen [64];
uint64_t reach_rook [64];

That is about ~1.5 k

The code below is optimized for (my) readability, so I am sure that it could be improved. At least in my engine, unrolling the loops slow things down.

Miguel
PS: I hope that the code is self-explanatory.

Code: Select all


/* not_A is a bitboard with all bits turned on except on column A */
/* not_H is a bitboard with all bits turned on except on column H */

/* grow in all directions */
static uint64_t
grow &#40;uint64_t x&#41;
&#123;
	x |= &#40;x << 1&#41; & not_A; 
	x |= &#40;x >> 1&#41; & not_H;	
	x |=  x >> 8;
	x |=  x << 8;
	return x;
&#125;

/* grow laterally like a rook */
static uint64_t
growlat &#40;uint64_t x&#41;
&#123;
	uint64_t y = x;
	x |= &#40;x << 1&#41; & not_A;
	x |= &#40;x >> 1&#41; & not_H;	
	y |=  y >> 8;
	y |=  y << 8;
	return x | y;	
&#125;


extern uint64_t
B_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;BISHOP&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
R_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;ROOK&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
Q_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini, h;

	r = reach&#91;QUEEN&#93;&#91;from&#93;;
	ini = U64&#40;1&#41; << from;
	
	g = grow &#40;ini&#41; & ~ini;
	h = growlat &#40;g & occ&#41;;
	h &= ~g;
	
	corral = r & ~occ;
	corral &= ~h;

	g &= ~occ;

	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	return &#40;grow&#40;g&#41; & r&#41; & ~ini & ~h ;
&#125;
Would have to think about the speed, but there is yet another way to do with with fewer loops.

I'll describe this for bishops only. The rooks will be just as obvious and queen combine both.

Code: Select all


1.  I assume some sort of array b_attacks&#91;64&#93; that has a bitboard representation of all the possible squares a bishop on square s can reach on an empty board.

2.  b_moves = b_attacks&#91;square&#93;;

3.  Now we need to determine where each diagonal is blocked.  so we do blockers = b_moves & occupied.

4.  we now use a 4x loop &#40;for the four directions&#41;.  we need a new set of bitboards like plus7dir&#91;64&#93;, plus9dir&#91;64&#93;, minus7dir&#91;64&#93; and minus9dir&#91;64&#93; where these are a subset of the oritinal b_attacks&#91;sq&#93; for each diagonal originating at sq.

     4a.  we take blockers & plus7dir&#91;sq&#93; to find out where that diagonal is blocked.  Since this is the + direction, I assume that matches the bit numbers which means the bit numbers for each square in this direction is increasing.

     4b.  We now compute b =LSB&#40;x&#41; where x is the bitboard computed in step 4a.  This gives us the _first_ blocker down that diagonal.  We then compute b_moves &= ~plus7dir&#91;blocker&#93;.
explanation: We first found all the bits on the diagonals that a bishop can move to on an open board. We then found the square(s) where that diagonal is blocked. We locate the first, which is the only blocker that counts. Now we take the diagonal bits in that same direction, from the blocking square, and complement them. That makes the diagonal bits beyond the blocker zero (0) and all other bits on the board one (1). When we and that with the bishop moves, we 'chop off' the diagonal beyond that blocker.

4c. We repeat that for the remaining 3 diagonals, noting that for the minus7 and minus9 dir we have to use MSB() instead since we are looking in the backward directions with respect to bit numbers.

This works and is fast. It is nearly as fast as rotated bitboards or magic bitboards. And certainly more cache-friendly. But "nearly" isn't always "good enough". :) I'm using magic generation which is as fast as anything I have seen, and the code is quite simple. The tables are a bit big, but doesn't seem to hurt performance as the magic operations are faster, but the table lookups hurt a bit. Overall this is a gain.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Yet another bitboard attack generator

Post by Michael Sherwin »

bob wrote:
michiguel wrote:Gaviota uses the following functions to find the squares attacked by a given slider in a given board with blocking pieces (occ). I dropped my own implementation of rotated bitboards because the code was akward and carrying extra bitboards all over the place was a nuisance. This implementation is not faster or better, but at least it is not slower in my engine. It is certainly more compact, readable and clearer. It only needs a little supporting table. Most importantly (at least for me), It allows me to do something different than everybody else. It is more fun.

I checked CPW to see whether there is anything similar. Apparently, this would be a modification of the Dumb7fill method. The difference is that I do attacks in all directions in parallel, including the queen (with a little trick).

The method assume that we have a table
uint64_t reach [MAXPIECES] [MAXSQUARES];
where each bitboard there represents the attacks of a given slider from a given square in an empty board. I generate the table at start up with a 0x88 procedure :-), but it is small enough to hardwire it into the program.
You can replace this with
uint64_t reach_bishop [64];
uint64_t reach_queen [64];
uint64_t reach_rook [64];

That is about ~1.5 k

The code below is optimized for (my) readability, so I am sure that it could be improved. At least in my engine, unrolling the loops slow things down.

Miguel
PS: I hope that the code is self-explanatory.

Code: Select all


/* not_A is a bitboard with all bits turned on except on column A */
/* not_H is a bitboard with all bits turned on except on column H */

/* grow in all directions */
static uint64_t
grow &#40;uint64_t x&#41;
&#123;
	x |= &#40;x << 1&#41; & not_A; 
	x |= &#40;x >> 1&#41; & not_H;	
	x |=  x >> 8;
	x |=  x << 8;
	return x;
&#125;

/* grow laterally like a rook */
static uint64_t
growlat &#40;uint64_t x&#41;
&#123;
	uint64_t y = x;
	x |= &#40;x << 1&#41; & not_A;
	x |= &#40;x >> 1&#41; & not_H;	
	y |=  y >> 8;
	y |=  y << 8;
	return x | y;	
&#125;


extern uint64_t
B_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;BISHOP&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
R_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;ROOK&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
Q_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini, h;

	r = reach&#91;QUEEN&#93;&#91;from&#93;;
	ini = U64&#40;1&#41; << from;
	
	g = grow &#40;ini&#41; & ~ini;
	h = growlat &#40;g & occ&#41;;
	h &= ~g;
	
	corral = r & ~occ;
	corral &= ~h;

	g &= ~occ;

	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	return &#40;grow&#40;g&#41; & r&#41; & ~ini & ~h ;
&#125;
Would have to think about the speed, but there is yet another way to do with with fewer loops.

I'll describe this for bishops only. The rooks will be just as obvious and queen combine both.

Code: Select all


1.  I assume some sort of array b_attacks&#91;64&#93; that has a bitboard representation of all the possible squares a bishop on square s can reach on an empty board.

2.  b_moves = b_attacks&#91;square&#93;;

3.  Now we need to determine where each diagonal is blocked.  so we do blockers = b_moves & occupied.

4.  we now use a 4x loop &#40;for the four directions&#41;.  we need a new set of bitboards like plus7dir&#91;64&#93;, plus9dir&#91;64&#93;, minus7dir&#91;64&#93; and minus9dir&#91;64&#93; where these are a subset of the oritinal b_attacks&#91;sq&#93; for each diagonal originating at sq.

     4a.  we take blockers & plus7dir&#91;sq&#93; to find out where that diagonal is blocked.  Since this is the + direction, I assume that matches the bit numbers which means the bit numbers for each square in this direction is increasing.

     4b.  We now compute b =LSB&#40;x&#41; where x is the bitboard computed in step 4a.  This gives us the _first_ blocker down that diagonal.  We then compute b_moves &= ~plus7dir&#91;blocker&#93;.
explanation: We first found all the bits on the diagonals that a bishop can move to on an open board. We then found the square(s) where that diagonal is blocked. We locate the first, which is the only blocker that counts. Now we take the diagonal bits in that same direction, from the blocking square, and complement them. That makes the diagonal bits beyond the blocker zero (0) and all other bits on the board one (1). When we and that with the bishop moves, we 'chop off' the diagonal beyond that blocker.

4c. We repeat that for the remaining 3 diagonals, noting that for the minus7 and minus9 dir we have to use MSB() instead since we are looking in the backward directions with respect to bit numbers.

This works and is fast. It is nearly as fast as rotated bitboards or magic bitboards. And certainly more cache-friendly. But "nearly" isn't always "good enough". :) I'm using magic generation which is as fast as anything I have seen, and the code is quite simple. The tables are a bit big, but doesn't seem to hurt performance as the magic operations are faster, but the table lookups hurt a bit. Overall this is a gain.
Is what you are describing faster than this:

Code: Select all

        h->moves&#91;id&#93; = (&#40;ray7p&#91;fs&#93; & belowInc&#91;FirstBit&#40;ray7p&#91;fs&#93; & aPieces&#41;&#93;)
                     |  &#40;ray9p&#91;fs&#93; & belowInc&#91;FirstBit&#40;ray9p&#91;fs&#93; & aPieces&#41;&#93;)
                     |  &#40;ray7m&#91;fs&#93; & aboveInc&#91;LastBit&#40;ray7m&#91;fs&#93; & aPieces&#41;&#93;)
                     |  &#40;ray9m&#91;fs&#93; & aboveInc&#91;LastBit&#40;ray9m&#91;fs&#93; & aPieces&#41;&#93;));

:and why?

Thanks!
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Yet another bitboard attack generator

Post by bob »

Michael Sherwin wrote:
bob wrote:
michiguel wrote:Gaviota uses the following functions to find the squares attacked by a given slider in a given board with blocking pieces (occ). I dropped my own implementation of rotated bitboards because the code was akward and carrying extra bitboards all over the place was a nuisance. This implementation is not faster or better, but at least it is not slower in my engine. It is certainly more compact, readable and clearer. It only needs a little supporting table. Most importantly (at least for me), It allows me to do something different than everybody else. It is more fun.

I checked CPW to see whether there is anything similar. Apparently, this would be a modification of the Dumb7fill method. The difference is that I do attacks in all directions in parallel, including the queen (with a little trick).

The method assume that we have a table
uint64_t reach [MAXPIECES] [MAXSQUARES];
where each bitboard there represents the attacks of a given slider from a given square in an empty board. I generate the table at start up with a 0x88 procedure :-), but it is small enough to hardwire it into the program.
You can replace this with
uint64_t reach_bishop [64];
uint64_t reach_queen [64];
uint64_t reach_rook [64];

That is about ~1.5 k

The code below is optimized for (my) readability, so I am sure that it could be improved. At least in my engine, unrolling the loops slow things down.

Miguel
PS: I hope that the code is self-explanatory.

Code: Select all


/* not_A is a bitboard with all bits turned on except on column A */
/* not_H is a bitboard with all bits turned on except on column H */

/* grow in all directions */
static uint64_t
grow &#40;uint64_t x&#41;
&#123;
	x |= &#40;x << 1&#41; & not_A; 
	x |= &#40;x >> 1&#41; & not_H;	
	x |=  x >> 8;
	x |=  x << 8;
	return x;
&#125;

/* grow laterally like a rook */
static uint64_t
growlat &#40;uint64_t x&#41;
&#123;
	uint64_t y = x;
	x |= &#40;x << 1&#41; & not_A;
	x |= &#40;x >> 1&#41; & not_H;	
	y |=  y >> 8;
	y |=  y << 8;
	return x | y;	
&#125;


extern uint64_t
B_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;BISHOP&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
R_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;ROOK&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
Q_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini, h;

	r = reach&#91;QUEEN&#93;&#91;from&#93;;
	ini = U64&#40;1&#41; << from;
	
	g = grow &#40;ini&#41; & ~ini;
	h = growlat &#40;g & occ&#41;;
	h &= ~g;
	
	corral = r & ~occ;
	corral &= ~h;

	g &= ~occ;

	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	return &#40;grow&#40;g&#41; & r&#41; & ~ini & ~h ;
&#125;
Would have to think about the speed, but there is yet another way to do with with fewer loops.

I'll describe this for bishops only. The rooks will be just as obvious and queen combine both.

Code: Select all


1.  I assume some sort of array b_attacks&#91;64&#93; that has a bitboard representation of all the possible squares a bishop on square s can reach on an empty board.

2.  b_moves = b_attacks&#91;square&#93;;

3.  Now we need to determine where each diagonal is blocked.  so we do blockers = b_moves & occupied.

4.  we now use a 4x loop &#40;for the four directions&#41;.  we need a new set of bitboards like plus7dir&#91;64&#93;, plus9dir&#91;64&#93;, minus7dir&#91;64&#93; and minus9dir&#91;64&#93; where these are a subset of the oritinal b_attacks&#91;sq&#93; for each diagonal originating at sq.

     4a.  we take blockers & plus7dir&#91;sq&#93; to find out where that diagonal is blocked.  Since this is the + direction, I assume that matches the bit numbers which means the bit numbers for each square in this direction is increasing.

     4b.  We now compute b =LSB&#40;x&#41; where x is the bitboard computed in step 4a.  This gives us the _first_ blocker down that diagonal.  We then compute b_moves &= ~plus7dir&#91;blocker&#93;.
explanation: We first found all the bits on the diagonals that a bishop can move to on an open board. We then found the square(s) where that diagonal is blocked. We locate the first, which is the only blocker that counts. Now we take the diagonal bits in that same direction, from the blocking square, and complement them. That makes the diagonal bits beyond the blocker zero (0) and all other bits on the board one (1). When we and that with the bishop moves, we 'chop off' the diagonal beyond that blocker.

4c. We repeat that for the remaining 3 diagonals, noting that for the minus7 and minus9 dir we have to use MSB() instead since we are looking in the backward directions with respect to bit numbers.

This works and is fast. It is nearly as fast as rotated bitboards or magic bitboards. And certainly more cache-friendly. But "nearly" isn't always "good enough". :) I'm using magic generation which is as fast as anything I have seen, and the code is quite simple. The tables are a bit big, but doesn't seem to hurt performance as the magic operations are faster, but the table lookups hurt a bit. Overall this is a gain.
Is what you are describing faster than this:

Code: Select all

        h->moves&#91;id&#93; = (&#40;ray7p&#91;fs&#93; & belowInc&#91;FirstBit&#40;ray7p&#91;fs&#93; & aPieces&#41;&#93;)
                     |  &#40;ray9p&#91;fs&#93; & belowInc&#91;FirstBit&#40;ray9p&#91;fs&#93; & aPieces&#41;&#93;)
                     |  &#40;ray7m&#91;fs&#93; & aboveInc&#91;LastBit&#40;ray7m&#91;fs&#93; & aPieces&#41;&#93;)
                     |  &#40;ray9m&#91;fs&#93; & aboveInc&#91;LastBit&#40;ray9m&#91;fs&#93; & aPieces&#41;&#93;));

:and why?

Thanks!
Hard to say. You are doing each ray separately. I did effectively the same thing, except I set all rays at once, and then clipped each one off at the first blocking piece. Whether you use a loop 4x or unroll the loop into 4 pieces as above is probably unimportant as a compiler would likely unroll anyway. Main thing is to avoid branches.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Yet another bitboard attack generator

Post by michiguel »

bob wrote:
michiguel wrote:Gaviota uses the following functions to find the squares attacked by a given slider in a given board with blocking pieces (occ). I dropped my own implementation of rotated bitboards because the code was akward and carrying extra bitboards all over the place was a nuisance. This implementation is not faster or better, but at least it is not slower in my engine. It is certainly more compact, readable and clearer. It only needs a little supporting table. Most importantly (at least for me), It allows me to do something different than everybody else. It is more fun.

I checked CPW to see whether there is anything similar. Apparently, this would be a modification of the Dumb7fill method. The difference is that I do attacks in all directions in parallel, including the queen (with a little trick).

The method assume that we have a table
uint64_t reach [MAXPIECES] [MAXSQUARES];
where each bitboard there represents the attacks of a given slider from a given square in an empty board. I generate the table at start up with a 0x88 procedure :-), but it is small enough to hardwire it into the program.
You can replace this with
uint64_t reach_bishop [64];
uint64_t reach_queen [64];
uint64_t reach_rook [64];

That is about ~1.5 k

The code below is optimized for (my) readability, so I am sure that it could be improved. At least in my engine, unrolling the loops slow things down.

Miguel
PS: I hope that the code is self-explanatory.

Code: Select all


/* not_A is a bitboard with all bits turned on except on column A */
/* not_H is a bitboard with all bits turned on except on column H */

/* grow in all directions */
static uint64_t
grow &#40;uint64_t x&#41;
&#123;
	x |= &#40;x << 1&#41; & not_A; 
	x |= &#40;x >> 1&#41; & not_H;	
	x |=  x >> 8;
	x |=  x << 8;
	return x;
&#125;

/* grow laterally like a rook */
static uint64_t
growlat &#40;uint64_t x&#41;
&#123;
	uint64_t y = x;
	x |= &#40;x << 1&#41; & not_A;
	x |= &#40;x >> 1&#41; & not_H;	
	y |=  y >> 8;
	y |=  y << 8;
	return x | y;	
&#125;


extern uint64_t
B_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;BISHOP&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
R_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini;
	r = reach&#91;ROOK&#93;&#91;from&#93;;
	corral = r & ~occ;
	ini = U64&#40;1&#41; << from;

	g = ini;
	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	
	return &#40;grow&#40;g&#41; & r&#41; & ~ini;
&#125;

extern uint64_t
Q_attacks &#40;SQUARE from, uint64_t occ&#41;
&#123;
	uint64_t g, corral, prev, r, ini, h;

	r = reach&#91;QUEEN&#93;&#91;from&#93;;
	ini = U64&#40;1&#41; << from;
	
	g = grow &#40;ini&#41; & ~ini;
	h = growlat &#40;g & occ&#41;;
	h &= ~g;
	
	corral = r & ~occ;
	corral &= ~h;

	g &= ~occ;

	do  &#123;
		prev = g;
		g = grow &#40;prev&#41; & corral;
	&#125; while &#40;prev != g&#41;;
	g |= ini;
	return &#40;grow&#40;g&#41; & r&#41; & ~ini & ~h ;
&#125;
Would have to think about the speed, but there is yet another way to do with with fewer loops.

I'll describe this for bishops only. The rooks will be just as obvious and queen combine both.

Code: Select all


1.  I assume some sort of array b_attacks&#91;64&#93; that has a bitboard representation of all the possible squares a bishop on square s can reach on an empty board.

2.  b_moves = b_attacks&#91;square&#93;;

3.  Now we need to determine where each diagonal is blocked.  so we do blockers = b_moves & occupied.

4.  we now use a 4x loop &#40;for the four directions&#41;.  we need a new set of bitboards like plus7dir&#91;64&#93;, plus9dir&#91;64&#93;, minus7dir&#91;64&#93; and minus9dir&#91;64&#93; where these are a subset of the oritinal b_attacks&#91;sq&#93; for each diagonal originating at sq.

     4a.  we take blockers & plus7dir&#91;sq&#93; to find out where that diagonal is blocked.  Since this is the + direction, I assume that matches the bit numbers which means the bit numbers for each square in this direction is increasing.

     4b.  We now compute b =LSB&#40;x&#41; where x is the bitboard computed in step 4a.  This gives us the _first_ blocker down that diagonal.  We then compute b_moves &= ~plus7dir&#91;blocker&#93;.
explanation: We first found all the bits on the diagonals that a bishop can move to on an open board. We then found the square(s) where that diagonal is blocked. We locate the first, which is the only blocker that counts. Now we take the diagonal bits in that same direction, from the blocking square, and complement them. That makes the diagonal bits beyond the blocker zero (0) and all other bits on the board one (1). When we and that with the bishop moves, we 'chop off' the diagonal beyond that blocker.

4c. We repeat that for the remaining 3 diagonals, noting that for the minus7 and minus9 dir we have to use MSB() instead since we are looking in the backward directions with respect to bit numbers.

This works and is fast. It is nearly as fast as rotated bitboards or magic bitboards. And certainly more cache-friendly. But "nearly" isn't always "good enough". :) I'm using magic generation which is as fast as anything I have seen, and the code is quite simple. The tables are a bit big, but doesn't seem to hurt performance as the magic operations are faster, but the table lookups hurt a bit. Overall this is a gain.
How do you use MSB? with assembler? I try to avoid that. Still, even if I use an algorithm in C it may be fast. I will try it later.
I use a similar concept to find if two pieces attack each other in some other places of the program, but I do not need to use MSB there, because I always start from the lowest bit.

Miguel
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Yet another bitboard attack generator

Post by bob »

I simply use inline asm. MSB is bsr, LSB is bsf.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Yet another bitboard attack generator

Post by michiguel »

bob wrote:I simply use inline asm. MSB is bsr, LSB is bsf.
That's what I figured. I generally try to avoid inline asm because my engine is not fast at all, and the only thing I do is to compromise portability. But you made me think about the possibility to have an MSB that is iterative. It may be fine because this bitboards will not be very populated.

Code: Select all

uint64_t
msb &#40;uint64_t x&#41;
&#123; 
    uint64_t y;
    do &#123;
        y = x;
        x &= x - 1;
    &#125; while &#40;x&#41;;
    return y;
&#125;
Miguel