bitboards like mailbox

tcusr · Post by **tcusr** » Tue Nov 16, 2021 11:43 pm

i've been busy in the past days and now i can finally continue the development of the engine.
now that i have move generation out of the way i was wondering what board representation should i use. i've always used 6 bitboards for piece types and 2 for colours but only because stockfish does it and without actually considering other alternatives, what do you guys use? also, more importantly, does it really matter?

anyway implementing KS with a gcc vector resulted in 800 (!?) instructions, a normal KS to one direction is about 20, am i missing something?

dangi12012 · Post by **dangi12012** » Wed Nov 17, 2021 12:31 am

tcusr wrote: ↑Tue Nov 16, 2021 11:43 pm i've been busy in the past days and now i can finally continue the development of the engine.
now that i have move generation out of the way i was wondering what board representation should i use. i've always used 6 bitboards for piece types and 2 for colours but only because stockfish does it and without actually considering other alternatives, what do you guys use? also, more importantly, does it really matter?

anyway implementing KS with a gcc vector resulted in 800 (!?) instructions, a normal KS to one direction is about 20, am i missing something?

Its funny - my movegenerator is so optimized I can measure with confidence a single AND/XOR etc. instruction less in the code. Thats why this question also came up for me - This was fastest for me:

Code: Select all

struct Board {
    const map BPawn;
    const map BKnight;
    const map BBishop;
    const map BRook;
    const map BQueen;
    const map BKing;

    const map WPawn;
    const map WKnight;
    const map WBishop;
    const map WRook;
    const map WQueen;
    const map WKing;

    const map Black;
    const map White;
    const map Occ;

    constexpr Board(
        map bp, map bn, map bb, map br, map bq, map bk,
        map wp, map wn, map wb, map wr, map wq, map wk) :
        BPawn(bp), BKnight(bn), BBishop(bb), BRook(br), BQueen(bq), BKing(bk),
        WPawn(wp), WKnight(wn), WBishop(wb), WRook(wr), WQueen(wq), WKing(wk),
        Black(bp | bn | bb | br | bq | bk),
        White(wp | wn | wb | wr | wq | wk),
        Occ(Black | White)
    {

    }

I tried with 8 like you did too and it was 4% slower. Because for every single move you have to infer the color for every piece before moving with Pawn & Black.

If you go for compactness you can do a bitboard with 4 boards. - by vertically storing a 4bit code for each square. (This can also store all gamedata like enpassant). Then a single board fits in a single AVX2 register which also has HUGE advantages.
https://www.chessprogramming.org/Quad-Bitboards
4 Board = 16 states stored in 4 bits vertically:
- Empty
- BLACKMOVE_WhiteKing- (single bit where the white king stands but not inside the 12 pieces but as a standalone state for color to move)
- Special - On first and last rank these bits mark the castling rooks. The other squares mark Enpassant target pawns.
- 12 Pieces

With 4 AVX2 instructions and some AND instructions you can unpack this 4 board stored in a single AVX2 register to its 8 board form.

You can go for 3 boards - but that is totally unpractical.
http://www.talkchess.com/forum3/viewtop ... ds#p909698

My recommendation:
Pick 8 - or 4. But 6 is just another copy of stockfish and we need more different ideas implemented and not a copy of existing conventions.
Go for 8. I implemented 4 and it gave me headaches.

tcusr · Post by **tcusr** » Wed Nov 17, 2021 6:14 pm

dangi12012 wrote: ↑Wed Nov 17, 2021 12:31 am
tcusr wrote: ↑Tue Nov 16, 2021 11:43 pm i've been busy in the past days and now i can finally continue the development of the engine.
now that i have move generation out of the way i was wondering what board representation should i use. i've always used 6 bitboards for piece types and 2 for colours but only because stockfish does it and without actually considering other alternatives, what do you guys use? also, more importantly, does it really matter?

anyway implementing KS with a gcc vector resulted in 800 (!?) instructions, a normal KS to one direction is about 20, am i missing something?
Its funny - my movegenerator is so optimized I can measure with confidence a single AND/XOR etc. instruction less in the code. Thats why this question also came up for me - This was fastest for me:
Code: Select all
struct Board {
    const map BPawn;
    const map BKnight;
    const map BBishop;
    const map BRook;
    const map BQueen;
    const map BKing;

    const map WPawn;
    const map WKnight;
    const map WBishop;
    const map WRook;
    const map WQueen;
    const map WKing;

    const map Black;
    const map White;
    const map Occ;

    constexpr Board(
        map bp, map bn, map bb, map br, map bq, map bk,
        map wp, map wn, map wb, map wr, map wq, map wk) :
        BPawn(bp), BKnight(bn), BBishop(bb), BRook(br), BQueen(bq), BKing(bk),
        WPawn(wp), WKnight(wn), WBishop(wb), WRook(wr), WQueen(wq), WKing(wk),
        Black(bp | bn | bb | br | bq | bk),
        White(wp | wn | wb | wr | wq | wk),
        Occ(Black | White)
    {

    }
I tried with 8 like you did too and it was 4% slower. Because for every single move you have to infer the color for every piece before moving with Pawn & Black.

If you go for compactness you can do a bitboard with 4 boards. - by vertically storing a 4bit code for each square. (This can also store all gamedata like enpassant). Then a single board fits in a single AVX2 register which also has HUGE advantages.
https://www.chessprogramming.org/Quad-Bitboards
4 Board = 16 states stored in 4 bits vertically:
- Empty
- BLACKMOVE_WhiteKing- (single bit where the white king stands but not inside the 12 pieces but as a standalone state for color to move)
- Special - On first and last rank these bits mark the castling rooks. The other squares mark Enpassant target pawns.
- 12 Pieces

With 4 AVX2 instructions and some AND instructions you can unpack this 4 board stored in a single AVX2 register to its 8 board form.

You can go for 3 boards - but that is totally unpractical.
http://www.talkchess.com/forum3/viewtop ... ds#p909698

My recommendation:
Pick 8 - or 4. But 6 is just another copy of stockfish and we need more different ideas implemented and not a copy of existing conventions.
Go for 8. I implemented 4 and it gave me headaches.

thanks dangi, i don't understand what you mean by 8, i used stockfish way which is also 8 (6 for types, 2 for color).

anyway now that i'm writing the move generator i noticed a possible problem for the sorting of moves. i know that at one point i have to sort both quiets and captures but since i'm generating them together i always have to traverse the whole list before ordering them (this is specially concerning for captures because in some positions there are none). is this a problem or the higher number of moves to sort is insignificant? if it matters i use std::sort to do the sorting

hgm · Post by **hgm** » Wed Nov 17, 2021 6:24 pm

It would be more efficient to generate them separately. Split the move set in moves to empty squares and moves to enemy squares, by ANDing it with ~occupied or color[opponent], and use a separate extraction loop for each.

An alternative, which I also use in my mailbox engines, is test on move by move basis whether the to-square is empty, and add captures to the start of the list, and non-captures to the end (basically treating the list as a back-to-back stack). You can then easily loop through captures only.

tcusr · Post by **tcusr** » Wed Nov 17, 2021 6:31 pm

hgm wrote: ↑Wed Nov 17, 2021 6:24 pm It would be more efficient to generate them separately. Split the move set in moves to empty squares and moves to enemy squares, by ANDing it with ~occupied or color[opponent], and use a separate extraction loop for each.

An alternative, which I also use in my mailbox engines, is test on move by move basis whether the to-square is empty, and add captures to the start of the list, and non-captures to the end (basically treating the list as a back-to-back stack). You can then easily loop through captures only.

ok thanks. btw i'm back to the mailbox-like generator so i really can't do your first method. but i can do the second, i can keep track of number of captures and use it as a offset for std::sort (std::sort(movelist.begin(), movelist.begin() + n_captures))

hgm · Post by **hgm** » Wed Nov 17, 2021 6:43 pm

Engines usually do not sort al moves at once, but extract the next move in line, and then immediately sort it. If the move produces a beta cutoff, you then don't have to sort the rest. This can be especially adventageous for the captures, as the chance that the first or second move will produce a cutoff is quite large.

dangi12012 · Post by **dangi12012** » Wed Nov 17, 2021 7:10 pm

hgm wrote: ↑Wed Nov 17, 2021 6:24 pm It would be more efficient to generate them separately. Split the move set in moves to empty squares and moves to enemy squares, by ANDing it with ~occupied or color[opponent], and use a separate extraction loop for each.

An alternative, which I also use in my mailbox engines, is test on move by move basis whether the to-square is empty, and add captures to the start of the list, and non-captures to the end (basically treating the list as a back-to-back stack). You can then easily loop through captures only.

That comes with the cost of one IF multiplied by all tested squares vs Bitboards which can do this in a single and operation for all squares at once and loop over all remaining bits (branchless).
If statements inside a loop are a huge performance penalty.

tcusr · Post by **tcusr** » Wed Nov 17, 2021 8:29 pm

hgm wrote: ↑Wed Nov 17, 2021 6:43 pm Engines usually do not sort al moves at once, but extract the next move in line, and then immediately sort it. If the move produces a beta cutoff, you then don't have to sort the rest. This can be especially adventageous for the captures, as the chance that the first or second move will produce a cutoff is quite large.

alright thanks.
but i still don't get why i should generate the quiets and captures separately, is it because it's more convenient to score them?
https://github.com/maksimKorzh/chess_pr ... rch.c#L263 if use this method to sort the next move like you said it doesn't matter if the move is right after our current index or is the last one of the list, it still needs to go to the end of the list

Mergi · Post by **Mergi** » Wed Nov 17, 2021 8:44 pm

Using std::sort is a huge waste, unless you have absolutely abysmal move evaluation.

In c++ for convenience, I use a struct like this

Code: Select all

    struct ScoredMove {
        Move move;
        Score score;

        std::strong_ordering operator<=>(const ScoredMove &other) const {
            return score <=> other.score;
        }
    };

After you generate each move, you assign a score to them as well. After that you simply pick the best move using std::max_element, swap the last move from the move list to the best move's place, and decrease the size of the move list by 1.

Generating moves in stages (captures, quiets, ...) can potentially save a lot of time. With any half decent move ordering, you will make a node cutoff fairly early on in most cases, so you can potentially save a huge amount of time by not having to generate, score and sort as many moves.

hgm · Post by **hgm** » Wed Nov 17, 2021 8:56 pm

dangi12012 wrote: ↑Wed Nov 17, 2021 7:10 pm
hgm wrote: ↑Wed Nov 17, 2021 6:24 pm An alternative, which I also use in my mailbox engines, is test on move by move basis whether the to-square is empty, and add captures to the start of the list, and non-captures to the end (basically treating the list as a back-to-back stack). You can then easily loop through captures only.
That comes with the cost of one IF multiplied by all tested squares vs Bitboards which can do this in a single and operation for all squares at once and loop over all remaining bits (branchless).
If statements inside a loop are a huge performance penalty.

True, but if you are clever you can avoid it. If you put the stackpointers in an array, you can access them as sptr[!victim] to decide whether the move is pushed on the capture or the non-capture stack. Even if you want the stacks to grow in opposit direction you can do this through sptr[!victim] += sinc[!victim].

Note that the loop to extract the bits is also not entirely branchless, as you have to break out of the loop after an unpredictable number of iterations. So each loop will give you a misprediction.

Of course the fast way to do this is like I did in the mailbox trials: not making a list of the captures, but store them as a set created by ORing together and appending the elements of the attack map, use a single bit-extraction loop for extracting the captures, and immediately search them as they extract. That would only have one mispredict for the loop that extracted the captures; the loop that created the captures set would have a fixed number of iterations.

bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox