Perft as a Measure of Speed...
Moderator: Ras
-
hgm
- Posts: 28454
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Perft as a Measure of Speed...
In a bitboard there are usually multiple boards that have to be updated. E.g. on moving a Queen you would need to update occupancy, color, diagonal sliders, orthogonal sliders. All 64 bits. And the same for removing the victim, if the move was a capture. With mailbox it is just moving a single byte to the new square, and clearing the old.
-
AlvaroBegue
- Posts: 932
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: Perft as a Measure of Speed...
Oh, I see. Well, I use bitboards, but I simply have one bitboard per piece type, and I use magics for sliding attacks, so I don't need diagonal sliders or orthogonal sliders. I do need to recompute color and occupancy, but I recompute them every time, which just takes a few clock cycles (6 is a quick guess, but I'll try to measure it).hgm wrote:In a bitboard there are usually multiple boards that have to be updated. E.g. on moving a Queen you would need to update occupancy, color, diagonal sliders, orthogonal sliders. All 64 bits. And the same for removing the victim, if the move was a capture. With mailbox it is just moving a single byte to the new square, and clearing the old.
-
hgm
- Posts: 28454
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Perft as a Measure of Speed...
Well, that is even worse, right? With 6 piece types the color boards each need you to OR 6 boards, which you then have to or together to get occupancy (12 loads, 11 ORs, 3 stores, plus 2 loads, 2 XORs and 2 stores to update the mover and victim piece types). Updating color incrementally (and occupancy from those) would just require 2 loads, 2 XOR, 3 stores (so 6 loads, 4 XOR, 1 OR and 5 stores in total).
Mailbox would require 1 load, 1 XOR (to create a zero) and 2 stores.
This does not take into account saving stuff for the later UnMake (which in the mailbox case would just be 1 load, 2 stores).
Having Q on separate boards from R and B would require you to alway test two boards if you want to know if a piece is attacked from a given direction. (E.g. the RookAttacks board on the King square would have to be ANDed with both opponent Q and R to detect if you are in check. Which loses you time there.
Mailbox would require 1 load, 1 XOR (to create a zero) and 2 stores.
This does not take into account saving stuff for the later UnMake (which in the mailbox case would just be 1 load, 2 stores).
Having Q on separate boards from R and B would require you to alway test two boards if you want to know if a piece is attacked from a given direction. (E.g. the RookAttacks board on the King square would have to be ANDed with both opponent Q and R to detect if you are in check. Which loses you time there.
-
AlvaroBegue
- Posts: 932
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: Perft as a Measure of Speed...
I am not sure that counting operations the way you are doing is very relevant to how long it's going to take the CPU to perform these operations. ORs are ridiculously simple to implement in hardware, and it's very likely the CPU will be able to parallelize a bunch of them. Stores and loads from L1 cache are incredibly fast too.
Doing an extra OR (like taking the union of rook and queen) can often be scheduled together with other operations, so it's effectively free.
My best attempt at measuring how long it takes to compute those 11 ORs tells me it takes 5 cycles.
Doing an extra OR (like taking the union of rook and queen) can often be scheduled together with other operations, so it's effectively free.
My best attempt at measuring how long it takes to compute those 11 ORs tells me it takes 5 cycles.
-
Rein Halbersma
- Posts: 751
- Joined: Tue May 22, 2007 11:13 am
Re: Perft as a Measure of Speed...
Isn't having 2 color boards + 6 piece_type boards a good compromise? MakeMove takes 2 updates (one for the color, one for the piece type) and using Occupied takes 2 loads + 1 xor. Bitboards are a tradeoff: you gain some by doing pattern matching across squares in parallel, but you lose some by needing supporting data to also get patterns within squares (i.e. which color/piece occupy a single square).hgm wrote:Well, that is even worse, right? With 6 piece types the color boards each need you to OR 6 boards, which you then have to or together to get occupancy (12 loads, 11 ORs, 3 stores, plus 2 loads, 2 XORs and 2 stores to update the mover and victim piece types). Updating color incrementally (and occupancy from those) would just require 2 loads, 2 XOR, 3 stores (so 6 loads, 4 XOR, 1 OR and 5 stores in total).
Mailbox would require 1 load, 1 XOR (to create a zero) and 2 stores.
This does not take into account saving stuff for the later UnMake (which in the mailbox case would just be 1 load, 2 stores).
Having Q on separate boards from R and B would require you to alway test two boards if you want to know if a piece is attacked from a given direction. (E.g. the RookAttacks board on the King square would have to be ANDed with both opponent Q and R to detect if you are in check. Which loses you time there.
-
hgm
- Posts: 28454
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Perft as a Measure of Speed...
Well, let me stress I have never written a bitboard program, but I am not sure why it would matter whether you have a full set of type-boards for each color or share them. You would only update one of them anyway (or two, for a capture). It seems just a nuisance that you would have to add an extra &myPieces or &hisPieces in every test you do. I cannot imagine situations in which you would be interested to get all Bishops (say) of both colors.
The way I imagine a bitboard MakeMove would work is:
The way I imagine a bitboard MakeMove would work is:
Code: Select all
Make:
change = square[move->from] | square[move->to];
capture = square[move->to] & occupied;
pieceType = ?; // from move struct?
myPieces[pieceType] ^= change;
myPieces[COLOR] ^= change;
victimType = ?; // not sure how you would get this at all
hisPieces[victimType] ^= capture;
hisPieces[COLOR] ^= capture;
occupied = myPieces[COLOR] | hisPieces[COLOR];
UnMake:
myPieces[pieceType] ^= change;
myPieces[COLOR] ^= change;
hisPieces[victimType] ^= capture;
hisPieces[COLOR] ^= capture;
occupied = myPieces[COLOR] | hisPieces[COLOR];
-
hgm
- Posts: 28454
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Perft as a Measure of Speed...
That holds for all operations. Individually they are all 'free', because they are scheduled together with something else.AlvaroBegue wrote:I am not sure that counting operations the way you are doing is very relevant to how long it's going to take the CPU to perform these operations. ORs are ridiculously simple to implement in hardware, and it's very likely the CPU will be able to parallelize a bunch of them. Stores and loads from L1 cache are incredibly fast too.
Doing an extra OR (like taking the union of rook and queen) can often be scheduled together with other operations, so it's effectively free.
But if they would not have been executed 'for free', some other instruction would have been executed for free in their place, which now takes time. In the end the only thing that matters is how much you want the CPU to do. When it has to do half as much, it will be twice as fast. That is true whether it can do 1 or 10 instructions at a time.
-
AlvaroBegue
- Posts: 932
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: Perft as a Measure of Speed...
That's only partially correct. Very often code has long dependency chains (the parameters of an operation depend on the result of previous operations), where the processor has to churn through them one at a time. This code is cheap to execute because it's easy for a superscalar CPU to perform some of these operations in parallel.hgm wrote:That holds for all operations. Individually they are all 'free', because they are scheduled together with something else.AlvaroBegue wrote:I am not sure that counting operations the way you are doing is very relevant to how long it's going to take the CPU to perform these operations. ORs are ridiculously simple to implement in hardware, and it's very likely the CPU will be able to parallelize a bunch of them. Stores and loads from L1 cache are incredibly fast too.
Doing an extra OR (like taking the union of rook and queen) can often be scheduled together with other operations, so it's effectively free.
But if they would not have been executed 'for free', some other instruction would have been executed for free in their place, which now takes time. In the end the only thing that matters is how much you want the CPU to do. When it has to do half as much, it will be twice as fast. That is true whether it can do 1 or 10 instructions at a time.
An example of code that has dependency chains:
Code: Select all
u64 rook_attacks(int from, u64 occupied) {
MagicInfo const &magic_info = rook_attack_table[from];
unsigned scrambled = ((occupied & magic_info.mask) * magic_info.magic) >> magic_info.shift;
return magic_info.table[scrambled];
}
In a program that has a lot of code like that (and I am afraid mine does), performing the occasional OR operation turns out to be free.
-
Evert
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: Perft as a Measure of Speed...
I can. For instance: I skip bishop (rook, knight) evaluation if there are no bishops (rooks, knights) remaining, which is a simple test. If I want to classify an ending it's convenient to test for the presence of queens (say) using "if (board[QUEEN] != 0)". I'm sure there are other examples.hgm wrote:I cannot imagine situations in which you would be interested to get all Bishops (say) of both colors.
Typically though you do want to know about pieces of a particular side.
The reason I use 6 piece types + 2 colours in Jazz is that I use copy-make and copying less stuff means it's faster.
-
hgm
- Posts: 28454
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Perft as a Measure of Speed...
This was true on Pentium I. But-of-order execution would completely hide that, by executing other dependency chains that are independent of this one (like calculating the attack set of the other Rook, or of the Bishops) in parallel with this one.AlvaroBegue wrote:That's only partially correct. Very often code has long dependency chains (the parameters of an operation depend on the result of previous operations), where the processor has to churn through them one at a time.