Perft search speed bottleneck

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

JarvisXerxes
Posts: 20
Joined: Thu May 02, 2013 5:52 pm
Location: United States

Re: Perft search speed bottleneck

Post by JarvisXerxes »

tpetzke wrote:Hi Jim,

it is a design decision to make. If you include the captured piece type and also moved piece type in the move data structure you don't need a reference board for some operations (e.g. for sorting the moves with a MVV/LVA scheme)

As long as your moves are not classes that you allocate with new whenever you need one you can't make much wrong here. The speed difference of the several methods will not determine the strength of your engine in this phase of development.

Thomas...
I changed my move class to a 16-bit word yesterday. The problem is I'm keeping both a board reference and the info inside the move. I just cut the redundancy to make my code cleaner. No speed gain or loss detected so far.
Life is a statistical distribution.
JarvisXerxes
Posts: 20
Joined: Thu May 02, 2013 5:52 pm
Location: United States

Re: Perft search speed bottleneck

Post by JarvisXerxes »

Sven Schüle wrote: No. (Sorry for replying instead of Lucas here ...)
There are two different things you can count in perft:

a) the number of leaves exactly at ply N (regardless whether you actually visit them or "only" know they exist), which is the desired result of the perft run to be compared to a trusted reference number,
b) the number of nodes visited (interior + leaves).

Counting a) is a requirement of the method itself. Counting b) is optional, I would recommend it for the purpose of movegen speed comparison between different programs. Comparing based on a) is less meaningful if only one engine does bulk counting, or if the legality testing methods differ significantly. Numbers for a) are much more impressive, though, when using bulk counting and an efficient legality check ...

For anyone actually comparing perft speeds it is also required to use exactly the same starting position (or to average over the same set of positions) and to use the same hardware.

Sven
Thanks Sven! Apparently I've misunderstood some terminology here ...
Life is a statistical distribution.
JarvisXerxes
Posts: 20
Joined: Thu May 02, 2013 5:52 pm
Location: United States

Re: Perft search speed bottleneck

Post by JarvisXerxes »

Oh, another related question to all:

Is the classical popLSB() an expensive operation? Currently my movegen is filled with the popLSB(). For example, when I generate a knight's move, the pseudocode looks like:

Code: Select all

U64 TempPiece = Knights[us];  // internally stored Knights' bitboard.
while (TempPiece)
{
    int from = popLSB(TempPiece);
    U64 TempTo = knight_attack(occupancy);  // precalculated knight's attack mask
    while (TempTo)
    {
        int to = popLSB(TempTo); 
        addMoveToStack(Move(from, to));
    }
}
The nested loop looks very expensive. But I have no idea how to avoid this - other than keeping another internal array that tracks all the piece positions, which doesn't sound cool.

Thanks.
Life is a statistical distribution.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Perft search speed bottleneck

Post by lucasart »

JarvisXerxes wrote:Oh, another related question to all:

Is the classical popLSB() an expensive operation? Currently my movegen is filled with the popLSB(). For example, when I generate a knight's move, the pseudocode looks like:

Code: Select all

U64 TempPiece = Knights[us];  // internally stored Knights' bitboard.
while (TempPiece)
{
    int from = popLSB(TempPiece);
    U64 TempTo = knight_attack(occupancy);  // precalculated knight's attack mask
    while (TempTo)
    {
        int to = popLSB(TempTo); 
        addMoveToStack(Move(from, to));
    }
}
The nested loop looks very expensive. But I have no idea how to avoid this - other than keeping another internal array that tracks all the piece positions, which doesn't sound cool.

Thanks.
This is perfectly normal and there's no way around it. From what I understaood, your problem is not there. Your problem is that you need to eliminate all this business of playing/testing/undoing all pseudo-legal moves. This is horribly expensive. Perhaps you could restrict this expensive waste of time to the special cases: check escape, en-passant, castling, promotion. These cases are relatively rare, so you still benefit from a first order optimization, before attacking the second order one.

What I recommend is to use legal moves, which will make things much easier down the line. The only difference is that you need to be a little smarter when you calculate your

Code: Select all

U64 TempTo = knight_attack(occupancy);  // precalculated knight's attack mask
If the knight is pinned (pinned being a precalc bitboard that you will need later anyway), then just '&' this with the "pin ray' (the line between king and the knight, that the knight must stay on). Or you can go the pseudo-legal route and code a (complicated) function that determines if a pseudo=-legal move is legel *without* playing it.

But in any case, you need to get rid of this play/test/undo. This is your number 1 problem. The rest is for optimizations later (like how to best encode moves for example).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Patrice Duhamel
Posts: 193
Joined: Sat May 25, 2013 11:17 am
Location: France
Full name: Patrice Duhamel

Re: Perft search speed bottleneck

Post by Patrice Duhamel »

Chan Rasjid wrote: 1) store move with an integer.
2) lower 12 bits for to/from square.
3) next 3 bits for promote type - if any; or it means not a promotion move. Komodo has minor other uses for these 3 bits; pawn push x2 ?
4) remaining 1 bit of lower 16 bits - unused; or for later use to set checking moves when info available.
5) upper 16 bits for other use - order scores for capture moves; or unused for quiet moves.
About not using the moved piece type and captured piece type, it's not a problem when you have to check the move validity ?
(for the hash move and killer moves in a staged move generation).
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Perft search speed bottleneck

Post by Sven »

lucasart wrote:Perhaps you could restrict this expensive waste of time to the special cases: check escape, en-passant, castling, promotion.
Check escape, en-passant, king moves, moves of pinned pieces.

Promotion is no special case regarding legality check. And testing castling moves is not sufficient, all king moves can potentially be illegal. The additional special case of castling is any possible attack to the square the king jumps over (F1, D1, F8, D8).

Sven
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Perft search speed bottleneck

Post by hgm »

Patrice Duhamel wrote:About not using the moved piece type and captured piece type, it's not a problem when you have to check the move validity ?
(for the hash move and killer moves in a staged move generation).
Checking if the moves are valid should not be any more difficult, as when you know the from- and to-squares you can just look on the board to see what is there (which should not be any more expensive than looking in the move). The question could be if a valid move is really the move you want. For hash moves that should be no problem, as they are specific for the current position. Killer moves come from another position, but I don't think the killer algorithm worries about piece type. Killers are supposed to come from sibblings (some engines even explicitly clear the killer slots of the next level when the start a new node), and as the opponent has the move there all the opponent pieces should be in the same place in all the daughters.
syzygy
Posts: 5563
Joined: Tue Feb 28, 2012 11:56 pm

Re: Perft search speed bottleneck

Post by syzygy »

lucasart wrote:Your problem is that you need to eliminate all this business of playing/testing/undoing all pseudo-legal moves. This is horribly expensive. Perhaps you could restrict this expensive waste of time to the special cases: check escape, en-passant, castling, promotion. These cases are relatively rare, so you still benefit from a first order optimization, before attacking the second order one.
I note again that all of this only applies to perft speed and has no bearing on the performance of a chess engine.

I've now experimented a bit with efficiently checking legality before making the move, but I am still losing nps compared to simply playing/testing/undoing. My perft speed has gone through the roof, but that does not buy me anything useful.
But in any case, you need to get rid of this play/test/undo. This is your number 1 problem. The rest is for optimizations later (like how to best encode moves for example).
If the OP's goal is to write a chess engine, there is absolutely no need to get rid of play/test/undo. The number 1 problem, if it is still there, is focussing on perft speed.
Chan Rasjid
Posts: 588
Joined: Thu Mar 09, 2006 4:47 pm
Location: Singapore

Re: Perft search speed bottleneck

Post by Chan Rasjid »

Patrice Duhamel wrote:
Chan Rasjid wrote: 1) store move with an integer.
2) lower 12 bits for to/from square.
3) next 3 bits for promote type - if any; or it means not a promotion move. Komodo has minor other uses for these 3 bits; pawn push x2 ?
4) remaining 1 bit of lower 16 bits - unused; or for later use to set checking moves when info available.
5) upper 16 bits for other use - order scores for capture moves; or unused for quiet moves.
About not using the moved piece type and captured piece type, it's not a problem when you have to check the move validity ?
(for the hash move and killer moves in a staged move generation).
Most bitboard chess programs have these as basics :

Code: Select all

//Pawn = 1, Bishop = 2, Knight = 3, Rook = 4, King = 5, Queen = 6

u64 bitboard[2 = color][8 = pctype]
//updated incrementally in makemove/unmake()

unsigned char board[64];
// 0 == square empty; otherwise, the bits 1-3 stores the pctype; bit 4 stores color 0/1  
// array updated incrementally in makemove/unmake()
Search() lists through the generated move list and call search() after selecting and making a move. If our move integer only has from/to/promote, then we get the movepc and capturedpc by retrieving from board[from/to].

I would be surprised if it is possible to have a more efficient method that may do without the array board[64]; it is still a necessary array although the info could be retrieved from bitboard[2][8].

hashmove: If a probe of the transposition table finds a hashmove, it is played and searched immediately before generating any moves; a hashmove is always valid. It is lucky if it gives a fail high. Otherwise, we generate the moves and go on. Better to remember to skip the same hashmove and not search it twice.

killers: if there is a match from the killer table and the move list, it is searched first; it is not guaranteed that a killer is valid.

Best Regards,
Rasjid.
elcabesa
Posts: 855
Joined: Sun May 23, 2010 1:32 pm

Re: Perft search speed bottleneck

Post by elcabesa »

If the knight is pinned (pinned being a precalc bitboard that you will need later anyway), then just '&' this with the "pin ray' (the line between king and the knight, that the knight must stay on). Or you can go the pseudo-legal route and code a (complicated) function that determines if a pseudo=-legal move is legel *without* playing it.
the example is not so right. a pinned rook, queen bishop or pawn can moe only in the pin ray direction but a pinned knight can't move, it's even simpler.
if you have a pinned knight you can avoid to calculate his movement.