Faster Movelist in Mailslot Engines?

dangi12012 · Post by **dangi12012** » Mon Nov 29, 2021 8:59 pm

This applies to sliders:
Instead of sending out Rays and putting elements into the list one by one -
wouldnt it be much faster to have bulk ray elements ready?

So that the algorithm sends out a ray and just returns the number of squares it can walk. So for example Rook North: 5.
Then this is just an offset into a preprepared list that contains these 5 elements with from/to squares already stored.

The movelist would become a list of pointers and these point to readonly memory of all possible moves in a bulk size of 0-7.

Could be interesting or am I missing something?

hgm · Post by **hgm** » Mon Nov 29, 2021 9:10 pm

I guess you are missing that to make alpha-beta pruning work effectively, the order in which the moves are searched is crucial. So the moves along the ray are unlikely to be searched sequentially. This is the main reason for putting them in a move list, and not search them immediately as they get out of the move generator. You can then first sort the list by suspected quality of the moves, and then search them starting with the suspected best.

So if there is a capture at the end of the ray you would first want to search that, and all other captures, before you get to playing any of the other moves along the ray. And the latter would probably be sorted by their history score, which could be completely different. So that the 2nd move along the ray most be searched 12th, the 4th move 20th and the first move 28th.

maksimKorzh · Post by **maksimKorzh** » Mon Nov 29, 2021 10:24 pm

Yes, it would be a good idea if only move ordering is not crucial. Maybe if MCTS is used it might be an option.

hgm · Post by **hgm** » Mon Nov 29, 2021 10:53 pm

There is no reason to prepare a list; the squares along the ray can be generated incrementally by just adding the step to the to-square (rather than incrementing the list index and fetching the next square). This is what micro-Max does. Basically this is directly searching the moves that come out of the move generator, as this is how the inner loop of the move generator generated the to-squares.

dangi12012 · Post by **dangi12012** » Tue Nov 30, 2021 1:54 am

hgm wrote: ↑Mon Nov 29, 2021 10:53 pm There is no reason to prepare a list; the squares along the ray can be generated incrementally by just adding the step to the to-square (rather than incrementing the list index and fetching the next square). This is what micro-Max does. Basically this is directly searching the moves that come out of the move generator, as this is how the inner loop of the move generator generated the to-squares.

Think of bitboards: the answer for the lookup becomes not a bitboard of the attacked squares but an offset into pre-prepared movelists. Knights also dont return a uint64_t but a pointer into an movelist.

You skip the whole most inner loop!

With fancy magic your lookup would return the correct offset into a movelist and you are done.

emadsen · Post by **emadsen** » Tue Nov 30, 2021 3:57 am

Magic bitboards are based off occupancy. They provide no information about what's occupying the squares. They give you a move list. So what? What's an unsorted move list good for? Perft.

HGM is correct. In a chess engine- you know, a program that actually plays a game of chess- the order moves are searched is of a paramount importance.

I'm shaking my head at the noise pollution here. An answer is given based on one's experience actually writing a functioning chess engine. And the reply is a a command that someone else should really try this other technique- a really brilliant idea. Do your homework. Write it and report your results. Why all the rumination and theorizing and prodding others to take initiative? Do the work.

We'd be spared a lot of noise when the brilliant ideas fizzle out.

Chessnut1071 · Post by **Chessnut1071** » Tue Nov 30, 2021 6:23 am

dangi12012 wrote: ↑Mon Nov 29, 2021 8:59 pm This applies to sliders:
Instead of sending out Rays and putting elements into the list one by one -
wouldnt it be much faster to have bulk ray elements ready?

So that the algorithm sends out a ray and just returns the number of squares it can walk. So for example Rook North: 5.
Then this is just an offset into a preprepared list that contains these 5 elements with from/to squares already stored.

The movelist would become a list of pointers and these point to readonly memory of all possible moves in a bulk size of 0-7.

Could be interesting or am I missing something?

I think you're describing the same thing I did: I have a 64,2,8,8 matrix that looks something like the code below. The numbers refer to squares 2 and 3 (g1 & h1) where the 2nd element provides the number of moves down that each of the 8 rays (vector). Rays with 0 aren't even searched. That approach increased my engine speed by 20x. You need a blocker mask which terminates the ray with a simple array lookup and a check mask for both black & white giving a fast check & discovered check flag as well. There's a lot of numbers, but, they're all pre-computed. I thought most mailbox routines were programmed this way; however, I'm not an expert on mailboxes. I'm not sure how bitboards can be any faster although I'm trying to find out.

2,7,10,18,26,34,42,50,58,
2,0,0,0,0,0,0,0,0,
2,5,3,4,5,6,7,0,0,
2,2,1,0,0,0,0,0,0,
2,5,11,20,29,38,47,0,0,
2,0,0,0,0,0,0,0,0,
2,2,9,16,0,0,0,0,0,
2,0,0,0,0,0,0,0,0,

3,7,11,19,27,35,43,51,59,
3,0,0,0,0,0,0,0,0,
3,4,4,5,6,7,0,0,0,
3,3,2,1,0,0,0,0,0,
3,4,12,21,30,39,0,0,0,
3,0,0,0,0,0,0,0,0,
3,3,10,17,24,0,0,0,0,
3,0,0,0,0,0,0,0,0,

I suggest you program it and experiment yourself. If you don't want to type in the matrix, I'll list if for you.

The legal move order must be found by an optimized evaluation function. From what I found so far, bit boards may be faster at move generation, but, they're weak on byte info on the metrics like check, discovered check, capture, pins and material gain/loss. The evaluation function will be the largest reduction in search time and the information above is faster in mailboxes, I THINK!

hgm · Post by **hgm** » Tue Nov 30, 2021 8:50 am

dangi12012 wrote: ↑Tue Nov 30, 2021 1:54 am
hgm wrote: ↑Mon Nov 29, 2021 10:53 pm There is no reason to prepare a list; the squares along the ray can be generated incrementally by just adding the step to the to-square (rather than incrementing the list index and fetching the next square). This is what micro-Max does. Basically this is directly searching the moves that come out of the move generator, as this is how the inner loop of the move generator generated the to-squares.
Think of bitboards: the answer for the lookup becomes not a bitboard of the attacked squares but an offset into pre-prepared movelists. Knights also dont return a uint64_t but a pointer into an movelist.

You skip the whole most inner loop!

With fancy magic your lookup would return the correct offset into a movelist and you are done.

You would not skip any loop, but just replace one loop by another. You would still have to loop through the pre-computed list. The on-board move targets of a knight can be tabulated as a function of its location, irrespective of board representation. The advantage of bitboards would be that you can select the type of moves you want from this with bitwise boolean operations on the entire board. (E.g. only captures.) My experience in the mailbox trials, however, was rhat it was faster to give the loop a fixed number of iterations, than to make that number variable by doing the minimum number of iterations. So in the end I just looped over all 8 moves, even the off-board ones, and just do the same operation on al of them. Making sure it was a dummy for the off-board case. So the bitboard trick might do more harm than good.

dangi12012 · Post by **dangi12012** » Tue Nov 30, 2021 10:48 pm

Chessnut1071 wrote: ↑Tue Nov 30, 2021 6:23 am
dangi12012 wrote: ↑Mon Nov 29, 2021 8:59 pm This applies to sliders:
Instead of sending out Rays and putting elements into the list one by one -
wouldnt it be much faster to have bulk ray elements ready?

So that the algorithm sends out a ray and just returns the number of squares it can walk. So for example Rook North: 5.
Then this is just an offset into a preprepared list that contains these 5 elements with from/to squares already stored.

The movelist would become a list of pointers and these point to readonly memory of all possible moves in a bulk size of 0-7.

Could be interesting or am I missing something?
I think you're describing the same thing I did: I have a 64,2,8,8 matrix that looks something like the code below. The numbers refer to squares 2 and 3 (g1 & h1) where the 2nd element provides the number of moves down that each of the 8 rays (vector). Rays with 0 aren't even searched. That approach increased my engine speed by 20x. You need a blocker mask which terminates the ray with a simple array lookup and a check mask for both black & white giving a fast check & discovered check flag as well. There's a lot of numbers, but, they're all pre-computed. I thought most mailbox routines were programmed this way; however, I'm not an expert on mailboxes. I'm not sure how bitboards can be any faster although I'm trying to find out.

2,7,10,18,26,34,42,50,58,
2,0,0,0,0,0,0,0,0,
2,5,3,4,5,6,7,0,0,
2,2,1,0,0,0,0,0,0,
2,5,11,20,29,38,47,0,0,
2,0,0,0,0,0,0,0,0,
2,2,9,16,0,0,0,0,0,
2,0,0,0,0,0,0,0,0,

3,7,11,19,27,35,43,51,59,
3,0,0,0,0,0,0,0,0,
3,4,4,5,6,7,0,0,0,
3,3,2,1,0,0,0,0,0,
3,4,12,21,30,39,0,0,0,
3,0,0,0,0,0,0,0,0,
3,3,10,17,24,0,0,0,0,
3,0,0,0,0,0,0,0,0,

I suggest you program it and experiment yourself. If you don't want to type in the matrix, I'll list if for you.

The legal move order must be found by an optimized evaluation function. From what I found so far, bit boards may be faster at move generation, but, they're weak on byte info on the metrics like check, discovered check, capture, pins and material gain/loss. The evaluation function will be the largest reduction in search time and the information above is faster in mailboxes, I THINK!

Yes instead of having your cpu generate a list - sort it and make a move on that. You skip the "generate list" step and point to a preprepared list in memory.
I think the generation part can be skipped fully and precomputed tables like yours help - its funny because I have literally the same implemented already:

This loop knows how far to walk in each of the 8 directions fo all 64 squares:

Code: Select all

static int dirc[8];
for (int sq = 0, x = 0, y = 0; sq < 64; sq++, x = sq & 7, y = sq >> 3) {
	int EA = dirc[0] = max(7 - x, 0); int WE = dirc[4] = x;
	int SO = dirc[6] = max(7 - y, 0); int NO = dirc[2] = y;
	int SE = dirc[7] = min(EA, SO);   int NW = dirc[3] = min(WE, NO);
	int SW = dirc[5] = min(WE, SO);   int NE = dirc[1] = min(EA, NO);
}

Its very smart to not pollute your own thinking process by "the way its usually done".
Bitboards are superior in every way except this:
- For evaluation you would like the board to be 64x8 bit integers as input into a neural network.

But no one implemented a single bit neural network for chess TO DATE. There your inputs can be the bitboard directly and the first layer is XOR + POPCNT which yields 64x8 bit integers as output. With bitboards you also have a changemask of the 2-4 bits that changed on the board so you could even do that incrementally and maintain the first input layer like NNUE does.

There are so many ideas that are not implemented because chess engines is not an economic market to develop for profesionally.

Sopel · Post by **Sopel** » Tue Nov 30, 2021 11:47 pm

dangi12012 wrote: ↑Tue Nov 30, 2021 10:48 pm There are so many ideas that are not implemented because chess engines is not an economic market to develop for profesionally.

Or they are just not sound ideas.

Faster Movelist in Mailslot Engines?

Faster Movelist in Mailslot Engines?

Re: Faster Movelist in Mailslot Engines?

Re: Faster Movelist in Mailslot Engines?

Re: Faster Movelist in Mailslot Engines?

Re: Faster Movelist in Mailslot Engines?

Re: Faster Movelist in Mailslot Engines?

Re: Faster Movelist in Mailslot Engines?

Re: Faster Movelist in Mailslot Engines?

Re: Faster Movelist in Mailslot Engines?

Re: Faster Movelist in Mailslot Engines?