bitboards like mailbox

tcusr · Post by **tcusr** » Wed Nov 17, 2021 8:57 pm

Mergi wrote: ↑Wed Nov 17, 2021 8:44 pm Using std::sort is a huge waste, unless you have absolutely abysmal move evaluation.

I use a struct like this
Code: Select all
    struct ScoredMove {
        Move move;
        Score score;

        std::strong_ordering operator<=>(const ScoredMove &other) const {
            return score <=> other.score;
        }
    };
After you generate each move, you assign a score to them as well. After that you simply pick the best move using std::max_element, swap the last move from the move list with this best move, and decrease the size of the move list by 1.

thanks, everything is more clear now.
also, after reading a bit of code from olithink https://github.com/olithink/OliThink/bl ... ink.c#L987, i understand why we generate the moves separately. if we get a beta cutoff when trying captures we completely avoid the generation of the other moves

hgm · Post by **hgm** » Wed Nov 17, 2021 9:01 pm

tcusr wrote: ↑Wed Nov 17, 2021 8:29 pm alright thanks.
but i still don't get why i should generate the quiets and captures separately, is it because it's more convenient to score them?
https://github.com/maksimKorzh/chess_pr ... rch.c#L263 if use this method to sort the next move like you said it doesn't matter if the move is right after our current index or is the last one of the list, it still needs to go to the end of the list

Well, the captures usually need different treatment that the non-captures. E.g. their sort key would be calculated in a different way (e.g. MVV/LVA versus history). And if you prevent them from getting mixed in the piece list, you don't have to search for captures in the entire list. If you generate captures and non-captures in different loops, you can tailor those loops for what you want done with them. If they are treated in the same loop, you would have to figure out whether you are dealing with a capture or non-capture for each move individually.

tcusr · Post by **tcusr** » Wed Nov 17, 2021 9:07 pm

hgm wrote: ↑Wed Nov 17, 2021 9:01 pm
tcusr wrote: ↑Wed Nov 17, 2021 8:29 pm alright thanks.
but i still don't get why i should generate the quiets and captures separately, is it because it's more convenient to score them?
https://github.com/maksimKorzh/chess_pr ... rch.c#L263 if use this method to sort the next move like you said it doesn't matter if the move is right after our current index or is the last one of the list, it still needs to go to the end of the list
Well, the captures usually need different treatment that the non-captures. E.g. their sort key would be calculated in a different way (e.g. MVV/LVA versus history). And if you prevent them from getting mixed in the piece list, you don't have to search for captures in the entire list. If you generate captures and non-captures in different loops, you can tailor those loops for what you want done with them. If they are treated in the same loop, you would have to figure out whether you are dealing with a capture or non-capture for each move individually.

thanks, you've cleared another doubt. one of the engines i use as a reference does exactly what you say (https://github.com/algerbrex/blunder/bl ... ch.go#L568) but i suppose this is a matter of preference to keep things simple.

hgm · Post by **hgm** » Thu Nov 18, 2021 10:32 am

Note that the method for sorting the captures that I used in the mailbox trials can also be used in less advanced designs, to save on sorting time. You just take two static tables attackerTable[256], victimTable[256], which you initialize with all possible (attacker, victim) combinations, in MVV/LVA order. Each capture can then be represented by the index in those tables. You then associate each index value with a bit in a 256-bit data set, which initially is cleared. For each capture you generate, you set the corresponding bit to 1, rather than writing it in some move-list array. You can then use a bit-extraction loop for retrieving the captures without further sorting.

OliverBr · Post by **OliverBr** » Thu Nov 18, 2021 11:22 pm

tcusr wrote: ↑Wed Nov 17, 2021 8:57 pm also, after reading a bit of code from olithink https://github.com/olithink/OliThink/bl ... ink.c#L987, i understand why we generate the moves separately. if we get a beta cutoff when trying captures we completely avoid the generation of the other moves

Very well said.
First (1) the move from Hash/TTTable will be used. If this gets a bet cutoff, there is no need to generate more moves.
Second (2) noisy moves (typically captures and promotions are being generated). They often get cutoffs, so the generation of
Third(3) quiet moves is not necessary.

tcusr · Post by **tcusr** » Thu Nov 18, 2021 11:53 pm

OliverBr wrote: ↑Thu Nov 18, 2021 11:22 pm
tcusr wrote: ↑Wed Nov 17, 2021 8:57 pm also, after reading a bit of code from olithink https://github.com/olithink/OliThink/bl ... ink.c#L987, i understand why we generate the moves separately. if we get a beta cutoff when trying captures we completely avoid the generation of the other moves
Very well said.
First (1) the move from Hash/TTTable will be used. If this gets a bet cutoff, there is no need to generate more moves.
Second (2) noisy moves (typically captures and promotions are being generated). They often get cutoffs, so the generation of
Third(3) quiet moves is not necessary.

my idea is only fast for the generation of quiet moves, i guess it's time abandon it now and go back to magic bitboards

hgm · Post by **hgm** » Fri Nov 19, 2021 10:15 am

Indeed, simple mailbox is very fast for generating non-captures, but when you want to generate only captures it is hardly faster than generating all moves. Sliders move in such a way that you basically get all non-captures as a free side effect when traveling towards the piece to be captured. The problem of course is that in an engine you are only interested in generating captures in about 85% of the nodes, because these are QS nodes.

This problem can of course be solved through the use of a neighbor table. Then generation of a slider capture doesn't need to examine any intermediate square along the ray, but directly goes to the target through

Code: Select all

toSqr = neighbor[fromSqr][direction];

You would still have to test whether there is an enemy piece on the toSquare, though: it could also be a friendly piece, or off board. But at least for generating captures by a rook you would only have to examine 4 squares, no matter how far away those targets are. The price is of course that you have to update the neighbor table before you start to generate moves. But on a capture (~95% of all moves!) that is not very hard, as only a single square gets evacuated for those. Then you only have to make the neighbors of that evacuated square refer to each other.

dangi12012 · Post by **dangi12012** » Fri Nov 19, 2021 3:07 pm

hgm wrote: ↑Fri Nov 19, 2021 10:15 am Indeed, simple mailbox is very fast for generating non-captures, but when you want to generate only captures it is hardly faster than generating all moves. Sliders move in such a way that you basically get all non-captures as a free side effect when traveling towards the piece to be captured. The problem of course is that in an engine you are only interested in generating captures in about 85% of the nodes, because these are QS nodes.

This problem can of course be solved through the use of a neighbor table. Then generation of a slider capture doesn't need to examine any intermediate square along the ray, but directly goes to the target through
Code: Select all
toSqr = neighbor[fromSqr][direction];
You would still have to test whether there is an enemy piece on the toSquare, though: it could also be a friendly piece, or off board. But at least for generating captures by a rook you would only have to examine 4 squares, no matter how far away those targets are. The price is of course that you have to update the neighbor table before you start to generate moves. But on a capture (~95% of all moves!) that is not very hard, as only a single square gets evacuated for those. Then you only have to make the neighbors of that evacuated square refer to each other.

The incremental nature of chess makes sliding lookup a 100% nop. I have a version where the Seemap and Attackmap are different. On normal move only ever 2 squares change - and you only need to update the seemap+atkmap of the pieces that see those squares.
That is slower for a full movegen - but for 95% attacks it could be very much faster.
An interesting side effect is that pawns that are stuck dont even need a single instruction wasted on them to confirm they are still stuck.

The "sensitivity list" of a pawn and all other pieces never gets activated when the change is not visible to them.

tcusr · Post by **tcusr** » Fri Nov 19, 2021 3:26 pm

hgm wrote: ↑Fri Nov 19, 2021 10:15 am Indeed, simple mailbox is very fast for generating non-captures, but when you want to generate only captures it is hardly faster than generating all moves. Sliders move in such a way that you basically get all non-captures as a free side effect when traveling towards the piece to be captured. The problem of course is that in an engine you are only interested in generating captures in about 85% of the nodes, because these are QS nodes.

This problem can of course be solved through the use of a neighbor table. Then generation of a slider capture doesn't need to examine any intermediate square along the ray, but directly goes to the target through
Code: Select all
toSqr = neighbor[fromSqr][direction];
You would still have to test whether there is an enemy piece on the toSquare, though: it could also be a friendly piece, or off board. But at least for generating captures by a rook you would only have to examine 4 squares, no matter how far away those targets are. The price is of course that you have to update the neighbor table before you start to generate moves. But on a capture (~95% of all moves!) that is not very hard, as only a single square gets evacuated for those. Then you only have to make the neighbors of that evacuated square refer to each other.

i really like this idea but i swear i can't wrap my head on how to make it work, especially with bitboards

btw i think the solution has been hiding under my nose all along. in the first post of this thread i linked a thread where a guy wrote a function to get sliding attacks on the fly that was as fast as kindergarten bitboards. olithink uses kindergarten bitboards and is rated 2900. this is proof that it's fast enough
i modified the code bit to use lookup tables instead of calculating the rank/file masks on the fly, they are small so it's not a problem

Code: Select all

#include <cstdint>

// kindergarten bitboards from olithink

#define u64 uint64_t
#define RATT1(f) rays[(f << 6) | key000(BOARD, f)]
#define RATT2(f) rays[(f << 6) | key090(BOARD, f) | 0x1000]

inline int key000(u64 b, int f) {
	return (int) ((b >> ((f & 56) + 1)) & 0x3F);
}

inline int key090(u64 b, int f) {
	u64 _b = (b >> (f&7)) & 0x0101010101010101LL;
	return (int)((_b * 0x0080402010080400LL) >> 58);
}

extern u64 rays[], BOARD;

u64 ratt(u64 b, int f) {
    return RATT1(f) | RATT2(f);
}

// on the fly method

inline uint64_t slide_arithmetic(uint64_t piece, uint64_t line, uint64_t block) {
    // mask blockers
    block = block & ~piece & line;

    // split the line into upper and lower rays
    uint64_t bottom = piece-UINT64_C(1);

    // for the upper part we can use the x^(x-1) trick to fill in from the bottom
    uint64_t masked_up = block & ~bottom;
    uint64_t blocked_up = (masked_up ^ (masked_up-UINT64_C(1)));

    // for the bottom we use CLZ + a shift to fill in from the top
    uint64_t masked_down = block & bottom;
    uint64_t blocked_down = (UINT64_C(0x7FFFFFFFFFFFFFFF) >> __builtin_clzll(masked_down|UINT64_C(1)));
    
    // the intersection of the two is the move set after masking with the line
    return (blocked_up ^ blocked_down) & line;
}

extern u64 rookfile[], rookrank[];

uint64_t ratt2(int square, uint64_t blockers) {
    return 
        slide_arithmetic(1ull<<square, rookfile[square], blockers) ^
        slide_arithmetic(1ull<<square, rookrank[square], blockers);
}

hgm · Post by **hgm** » Fri Nov 19, 2021 5:32 pm

Well, it is not meant to work with bitboards, because it is not needed for bitboards. There is no penalty there for generating both captures and non-captures when you need only the latter, as all moves are generated simultaneously. And by ANDing with the opponent bitboard you are left with the captures, and that takes only a single instruction. The neighbor table is just an aid to do efficient capture-only generation with a mailbox board representation.

There is hardly any relation between speed and strength, BTW. An engine can be 1000 times slower, and still 1000 Elo stronger.

bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox

Re: bitboards like mailbox