The wrong way

Sven · Post by **Sven** » Sat Jan 02, 2016 12:02 am

hgm wrote:But this is awful for left shifts. The natural carry propagation would do all that shifting and ORing in a single instruction (O-2R).

It might even be competitive to keep two copies of the board similar to rotated bitboard, but one rotated 180 degrees, so that you can do the O-2R trick in both directions. Especially if you do not store a whole board per memory cell, but just a ray. Even in 32-bit words you could store two copies of a ray of 15 squares, so it would work upto 15x15 boards. The board would be divided over many memory cells, but each move would only touch the 4 rays through the from- and the 8 rays through the to-square, so it isn't really important how many words the entire board takes.

Please don't blame me for the algorithm, I am not its author, I have only referred to it.

If you want to propose an improved version of it then go ahead.

I hope you are aware of the purpose of the eastAttacksFrom() and similar functions; otherwise see below (where "X" is just a marker for visualization, not part of the bitboards; also note: LSB = A1 = lower left corner, MSB = H8 = upper right corner):

Code: Select all

bbRook:
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . 1 . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .

bbEmpty:
1 1 1 1 . 1 1 .
1 1 . 1 1 1 . 1
1 1 1 1 1 1 1 1
1 1 1 X 1 1 . 1
1 1 1 . 1 1 1 1
1 1 1 1 1 1 1 1
1 1 . 1 1 1 1 1
. 1 1 1 1 1 1 1

northAttacks:
. . . 1 . . . .
. . . 1 . . . .
. . . 1 . . . .
. . . X . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .

eastAttacks:
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . X 1 1 1 .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .

southAttacks:
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . X . . . .
. . . 1 . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .

westAttacks:
. . . . . . . .
. . . . . . . .
. . . . . . . .
1 1 1 X . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .

hgm · Post by **hgm** » Sat Jan 02, 2016 10:26 am

Sven Schüle wrote:Please don't blame me for the algorithm, I am not its author, I have only referred to it.

If you want to propose an improved version of it then go ahead.

You refer to it in a context that implies you recommend it. The normal way of doing this would be:

Code: Select all

uint64_t eastAttacksFrom(uint64_t bbRook, uint64_t bbOccupied)
{
    bbOccupied |= BB_FILE_H;
    uint64_t flood = bbOccupied - 2*bbRook;
    return   flood ^ bbOccupied;
}

If you set the bits of a bitboard square by square, you are off worse than with mailbox: there at least you would not have to unpack the bits first to see if the square was occupied.

Sven · Post by **Sven** » Sat Jan 02, 2016 1:17 pm

hgm wrote:
Sven Schüle wrote:Please don't blame me for the algorithm, I am not its author, I have only referred to it.

If you want to propose an improved version of it then go ahead.
You refer to it in a context that implies you recommend it. The normal way of doing this would be:
Code: Select all
uint64_t eastAttacksFrom(uint64_t bbRook, uint64_t bbOccupied)
{
    bbOccupied |= BB_FILE_H;
    uint64_t flood = bbOccupied - 2*bbRook;
    return   flood ^ bbOccupied;
}
If you set the bits of a bitboard square by square, you are off worse than with mailbox: there at least you would not have to unpack the bits first to see if the square was occupied.

Thanks, I'll check your proposal later today. Looks interesting.

hgm · Post by **hgm** » Sat Jan 02, 2016 2:01 pm

Note that, although I wrote it myself because I thought it was obvious, I cannot be credited with the invention, as it was already known (as commonly happens with obvious things), under the name O^(O-2R) trick.

As usual, you can force the carry (borrow, really) propagation to skip bits by clearing them. (You can see this as having the slider move not along files or (anti-)diagonals, but as a raster scan along ranks after having removed all obstructions not on the ray where it should really move, and at the end masking away everything not on that ray.)

Unfortunately this only works for slides towards the MSB. Although for files and (anti-)diagonal slides you can repair that by reversing the byte order, (and at the end reversing it back), for which there is an x86 instruction. Unfortunately there aren't x86 instructions to reverse the bit order. So moving towards the LSB along a rank remains a problem.

Sven · Post by **Sven** » Sat Jan 02, 2016 2:22 pm

hgm wrote:Note that, although I wrote it myself because I thought it was obvious, I cannot be credited with the invention, as it was already known (as commonly happens with obvious things), under the name O^(O-2R) trick.

As usual, you can force the carry (borrow, really) propagation to skip bits by clearing them. (You can see this as having the slider move not along files or (anti-)diagonals, but as a raster scan along ranks after having removed all obstructions not on the ray where it should really move, and at the end masking away everything not on that ray.)

Unfortunately this only works for slides towards the MSB. Although for files and (anti-)diagonal slides you can repair that by reversing the byte order, (and at the end reversing it back), for which there is an x86 instruction. Unfortunately there aren't x86 instructions to reverse the bit order. So moving towards the LSB along a rank remains a problem.

I have not tried it yet. But the main problem I see already now is that your proposal, although it is most probably faster for the specific case you describe, is not suitable for the purpose I was recommending it for. It requires
- different solutions for different sliding directions,
- additional coding effort for some of them,
and as far as I can see without testing (but I may be wrong here) it also requires an additional infrastructure since I cannot pass the bitboard of all occupied squares on the whole board as 'bbOccupied' parameter but need to restrict it to one ray (the "eastern" ray in the given example), for which I need to provide precalculated data.

All of that is "trivial" when seeing it in the context of a mature engine. But then it is also "obsolete" since there are much faster solutions than "dumb7fill" or the improved version you have mentioned. However, in the context we are discussing in this thread I still believe that "dumb7fill" is a better compromise:
- one common solution for all 8 directions,
- no additional infrastructure or precalculated data needed,
- and therefore it can be used immediately.

Nevertheless I will take a deeper look into your partially better version as soon as I have time for it.

EDIT: O^(O-2R) is also known under the name "Hyperbola Quintessence", see the link above. The description confirms what you say regarding its use for rank attacks towards the LSB, i.e. "westAttacks" in the context above. It is obvious that "Hyperbola Quintessence" outperforms "dumb7fill" but my arguments given above are still valid. For a "professional" approach I would still vote for magic bitboards, of course.

hgm · Post by **hgm** » Sat Jan 02, 2016 4:05 pm

Sven Schüle wrote:However, in the context we are discussing in this thread I still believe that "dumb7fill" is a better compromise:
- one common solution for all 8 directions,
- no additional infrastructure or precalculated data needed,
- and therefore it can be used immediately.

Except that it is intrinsically slower than mailbox. So recommending it to someone that hopes to improve the speed of a mailbox design by converting to bitboards it is a bummer. Since you used separate routines for each direction I didn't see any harm in using different methods in each. Note that the difference between bbOccupied and bbEmpty can of course be fixed by complementing it at run time; it would still be much faster.

I am not sure that magic bitboards are better than O^O-2R. Naively counting instructions might make it look that way, but magic bitboards use huge tables that cause enormous cache pressure, while O^O-2R requires virtually nothing. Calculating all directions separately can very efficiently exploit the CPU's capability to do 3 ALU instructions in parallel, and then OR them together (where with magic bitboards you are waiting for the L1-cache misses for each complete move pattern to be resolved one at a time). The O^O-2R trick can furthermore be sped up by storing not bitboards but bitfiles/bitranks/bitdiagonals that contain both the ray and the inverted ray (allowing handling of 16x16 boards without any additional effort even in a 32-bit architecture). I would never recommend magics as a bitboard technique. Not even on 8x8.

Sven · Post by **Sven** » Sat Jan 02, 2016 5:12 pm

hgm wrote:
Sven Schüle wrote:However, in the context we are discussing in this thread I still believe that "dumb7fill" is a better compromise:
- one common solution for all 8 directions,
- no additional infrastructure or precalculated data needed,
- and therefore it can be used immediately.
Except that it is intrinsically slower than mailbox. So recommending it to someone that hopes to improve the speed of a mailbox design by converting to bitboards it is a bummer. Since you used separate routines for each direction I didn't see any harm in using different methods in each. Note that the difference between bbOccupied and bbEmpty can of course be fixed by complementing it at run time; it would still be much faster.

I am not sure that magic bitboards are better than O^O-2R. Naively counting instructions might make it look that way, but magic bitboards use huge tables that cause enormous cache pressure, while O^O-2R requires virtually nothing. Calculating all directions separately can very efficiently exploit the CPU's capability to do 3 ALU instructions in parallel, and then OR them together (where with magic bitboards you are waiting for the L1-cache misses for each complete move pattern to be resolved one at a time). The O^O-2R trick can furthermore be sped up by storing not bitboards but bitfiles/bitranks/bitdiagonals that contain both the ray and the inverted ray (allowing handling of 16x16 boards without any additional effort even in a 32-bit architecture). I would never recommend magics as a bitboard technique. Not even on 8x8.

It seems I need to improve my English ... so that it can be understood better in the Netherlands

For those from other countries, what you wrote may appear confusing since I clearly stated the context (a better bitboard solution) and the goal of my proposal to Henk (better code in general).

I also ask you kindly to refrain from further "hijacking" of Henk's thread for teaching us about your broad and deep experience in writing a bitboard chess program.

hgm · Post by **hgm** » Sat Jan 02, 2016 5:51 pm

Perhaps you can understand the following English, then:

Warning the OP that the 'advice' he gets is rather dubious or bad hardly counts as "thread hijacking". And 'a better bitboard solution'? Better than what? What you presented both sucks from the POV of speed as well as code quality.

Henk furthermore only seems to care about making his move generator 10% faster. So it is actually you who seems to have hijacked the thread for teaching bitboard techniques that would certainly not acheive his goal.

Sven · Post by **Sven** » Sun Jan 03, 2016 11:56 am

hgm wrote:Perhaps you can understand the following English, then:

Warning the OP that the 'advice' he gets is rather dubious or bad hardly counts as "thread hijacking". And 'a better bitboard solution'? Better than what? What you presented both sucks from the POV of speed as well as code quality.

Henk furthermore only seems to care about making his move generator 10% faster. So it is actually you who seems to have hijacked the thread for teaching bitboard techniques that would certainly not acheive his goal.

That the "dumb7fill" solution would "suck from code quality" is a hilarious claim which I also consider to be wrong, I don't see where you derive that from. You need zero additional code for it, no tables etc., the code immediately works. Most other bitboard techniques are faster but require at least some precalculated data. There are even simpler solutions than "dumb7fill" based on loops, I remember Michael Hoffmann posted one some time ago. Also Bob posted a clever way for a direct calculation. No doubt about "dumb7fill" not being the best one performance-wise, someone who wants optimal speed should take magics or another very fast solution and must accept some additional work for its initial setup.

But the advice given to the OP by many people, including myself, was *not* to focus too much on speed. If you would read this thread more carefully you would find my words when I proposed "dumb7fill" to him. It is strange to see how you repeatedly ignore the context in which things have been said. Also there has certainly been a sufficient number of discussions about speed of various bitboard attack getter solutions, CPW is full of information about that, and I see no need to add one totally useless discussion to it since the whole point here is a different one.

As to your question "better than what?", the answer is obvious, just try to read the code the OP has posted several times. Maybe my term "better bitboard solution" was not precise enough, my point was not to repeat all that low-level bit-twiddling code dozens of times all over the program (certainly appearing not only in the move generator) but to use simple and small functions that can be reused at various places (getBishopAttacks(), getRookAttacks() or whatever name you give to them). This along with an implementation of these functions that is simple and small as well (small because it needs very few code in total) increases code quality, certainly in relation to the code that has been posted by the OP. The "dumb7fill" algorithm is just one example for it, certainly not the best one but one that can be taken for a start. Whatever funny reason you may have to raise your doubts about that, I don't care

hgm · Post by **hgm** » Sun Jan 03, 2016 1:31 pm

Well, so we have different optinions there. Doesn't seem a reason to get nasty...

I dislike the dumbfill code because it is repetitive in the lines in the routines for a single direction (making it more bug prone, as you might mis-count the number of repetitions), as well as needing separate routines for a lot of different directions. And O^O-2R also does not need pre-calculated data. But I don't see that as an advantage other than for speed. Bitboards heavily make use of pre-calculated tables, for the leapers if not for the sliders. If you want to use them, better get used to it.

'Kindergarten bitboards' would seem a much more educative first step to get acquainted with bitboards and teach you the elementary techniques. Dumbfill more seems an example of how to not do things.

The wrong way

Re: The wrong way

Re: The wrong way

Re: The wrong way

Re: The wrong way

Re: The wrong way

Re: The wrong way

Re: The wrong way

Re: The wrong way

Re: The wrong way

Re: The wrong way