Bitboard techniques in Xiangqi

Discussion of chess software programming and technical issues.

Moderator: Ras

Han Chengye
Posts: 23
Joined: Sun Feb 08, 2009 3:54 am

Re: Bitboard techniques in Xiangqi

Post by Han Chengye »

bob wrote:I would personally not call them "inefficient" just because they need more than 64 bits. The original chess 4.x used 64 bit values on a 60 bit (CDC) computer. Tom Truscott developed a bitboard version of Duchess in 1977 and used a 32 bit IBM mainframe to do 64 bit operations (there were no 64 bit values on that box either).

Their "data density" makes up for the extra instruction or two needed to deal with longer word sizes...

I suppose one could use newer boxes with 128 bit SSE operations if need be...
The first code i read carefully is crafty-18.15, and i print some parts for reading. At that time, it's hard for me and i can't image that i can talk engine problem with you......
I can't find crafty-21.2, where is it? it's the latest using rotated bitboard, and
may it be easy read than before, i think.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Bitboard techniques in Xiangqi

Post by bob »

Han Chengye wrote:
bob wrote:I would personally not call them "inefficient" just because they need more than 64 bits. The original chess 4.x used 64 bit values on a 60 bit (CDC) computer. Tom Truscott developed a bitboard version of Duchess in 1977 and used a 32 bit IBM mainframe to do 64 bit operations (there were no 64 bit values on that box either).

Their "data density" makes up for the extra instruction or two needed to deal with longer word sizes...

I suppose one could use newer boxes with 128 bit SSE operations if need be...
The first code i read carefully is crafty-18.15, and i print some parts for reading. At that time, it's hard for me and i can't image that i can talk engine problem with you......
I can't find crafty-21.2, where is it? it's the latest using rotated bitboard, and
may it be easy read than before, i think.
The 22.x versions are easier to read. They use magic move generation which is simpler than rotated stuff. And they have no duplicated code for black and white, so the code is significantly smaller. All recent versions are on my ftp box (source code only).


ftp.cis.uab.edu/pub/hyatt/source
Han Chengye
Posts: 23
Joined: Sun Feb 08, 2009 3:54 am

Re: Bitboard techniques in Xiangqi

Post by Han Chengye »

Yes, i see.
That's why i want to use magic move generation in XiangQi, but there're a lot of things to do.
User avatar
hgm
Posts: 28429
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Bitboard techniques in Xiangqi

Post by hgm »

Han Chengye wrote:yes,i plan to use mailbox, but your bitmap approach seems attractive.
Just using a word to represent the attack-sets of the king,advisers,pawns and elephants in the half part of board, am i right?
Yes, that's right.

I am still in doubt if I should use this combined attack set in the conventional bitboard way of intersecing it with a (partial) bitboard for all opponent pieces, and then extracting bits one by one, or if it is better to loop through the opponent's move list and intersect the combined attack map just with the square mask for that single piece. The latter could generate the captures in order of decreasing victim value, while otherwise you would need storing and sorting of the moves. (But on average there might be so few moves in the intersections that there is nothing to sort.)

A hybrid between those two methods would be to keep 3 different bitmaps: one for Rooks, one for Cannons+Horses and one for (promoted) Pawns, as the order in which Cannons and Horses are captured does not really matter. In stead of having to loop through (potentially) 11 pieces, you would loop through only 3 piece-type classes, and then extract the bits for all victims of that class. This would require an extra level of indirection in the MakeMove(), though: in stead of storing a mask for each piece (i.e. in the piece list) to indicate which bits it should test in the (P,E,A,K) attack set, you now need to store a pointer to the combined bitmap for its piece type. And the latter should be updated by both clearing the bits corresponding to the fromSquare, and setting those for the toSquare (requiring two accesses to the table that translates square number to bitmap), while the other would only involve overwriting the currently stored bitmap with that for the toSquare. Despite that somewhat slower MakeMove, it might still be the overall winner.
Han Chengye
Posts: 23
Joined: Sun Feb 08, 2009 3:54 am

Re: Bitboard techniques in Xiangqi

Post by Han Chengye »

Another idea: for the fixed combination of K,A,E position , R,N,C also have a
lot of offensive combinations。Maybe we should construct a tactical combination database? Just like Chinook?
User avatar
hgm
Posts: 28429
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Bitboard techniques in Xiangqi

Post by hgm »

I don't know enough about Chinook, and neither are a good enough Xiangqi player to have a sensible opinion on that. :(

I have started coding my XQ engine based on the following design, now:

There is a piece list, which contains for eache piece both a pointer to a bitmap (u64 *typeMap), and a pointer to a board of bitmaps (u64 *squareMask), indicating which bits in the typeMap have to change when the piece enters or leaves a given square. So the MakeMove() routine contains a statement

typeMap[piece] ^= squareMask[piece][from] ^ squareMask[piece][to] ^squareMask[victim][to];

This is used for different purposes on different pieces, though: for the weak pieces (unpromoted P,E,A,K) it contains their packed attack maps. This can have multiple bits set per square, as a single piece on a single square can attack multiple squares. But all set bits belong to the same piece. For the strong pieces (promoted P (=Q), H, C, R) it contains the squares they are on, with the same encoding as the attack sets. This can also have multiple bits set, as a given square can occur in several attack sets. (e.g. e1 kan be in the attack set of K and each advisor.) C and H share the same map (since they are approximately equally valuable).

The piece list is divided in 4 'victim groups': Rooks, Cannons+Horses, Advisors+promoted Pawns, Elephants+unpromoted Pawns. King is not in a victim group, as moves that put the King in check are never made. The capture moves are generated by 3 nested loops: the outer loop iterates through the victim groups, starting with R, (Most Valuable Victim), and ending with E+P. For each victim group we then iterate through the possible attckers, low to high. This start by intersecting the position map of the (opponent's) victim group with the attack set of the weak pieces. The E+P group uses a dummy map for this, which is always zero. After that, the middle loop iterates through the remaining attackers individually (Q,H,C,R), and then performs a 0x88-style alignment test in the inner loop with all possible victims of the current victim group. (For Q attackers probing the board directly in 3 places for attacking Q around the victim might actually be faster than a 0x88 test on all P and Q.)

There are potentially 15 victims in the combined victim groups, and typically 6 pieces (other than PEAK) to attack them. This would require 60 attack tests, consisting of subtracting the square numbers of attacker and victm, an using the difference for a table lookup (to typically find there is no alignment; in the rare case there is alignment a ray scan is necessary to see if there is an actual capture). For comparison: using a 'direct' move generator (starting from the moving piece, rather than the victim) would require generating of moves in 8 directions each for 2 Horses, and 4 drections each for 2 Cannons and 2 Rooks. In total 32 directions, and in half of those (for C+R) a ray scan would always be required. (And the Horses also need testing if their moves are blocked.) It is not obvious which is more work.

But as the board grows emptier, the work involved in direct generation decreases linearly, or even worse, as the ra scans tend to reach further. With the 'inverted' generation (towards victims) the work decreases quadratically, as both nmber of victims and attackers decreases. In addition, the direct generation always has to be performed upto the end, to be sure you have found the MVV, and then requires sorting, or at least extraction of themove with the best MVV/LVA. The inverted generation does produce the captures in MVV/LVA order, so you can search them immediately as they are generated. In cut-nodes this means you might not need many attack tests at all, becase the first move you find already causes a cutoff. And in all-nodes, the lightest victims might be futile, and they are most abundant. E.g. if currentEval < alpha - 200 all captures on E+P are futile, (E=150, P=100), and that exempts 7 of the 15 victims, cutting the work nearly by 50%. So even with all material still on the board, I think the inverted capture generation is competitive, and it starts winning big time when the material thins out.