Perft speed optimization (renamed)

mvanthoor · Post by **mvanthoor** » Sat Oct 10, 2020 1:22 pm

hgm wrote: ↑Sat Oct 10, 2020 9:43 am
mvanthoor wrote: ↑Sat Oct 10, 2020 1:58 amI think you got this wrong. If you don't do any tricks such as bulk-counting, it doesn't matter if your engine uses a legal or pseudo-legal move generator. The reason is that in perft, you have to make ALL moves, and you have to check them ALL for legality. It doesn't matter if you do this in the move generator, or in make_move().
That isn't really true. Your move generator can still do 'bulk legality checking' on the moves of non-royal pieces. E.g. when generating Queen moves, you could first test whether removing the Queen puts you in check. If it doesn't, none of the actual Queen moves will put you in check, so you won't have to test those. If it does, you know the Queen is pinned, and the check test you already did probably tells you along which ray it is pinned. You can then resort to a special generator code section that only generates the moves along that ray, and you would not have to test any of these for legality either. You don't only save the effort of the tests, but also the effort of generating the moves that would test negative.

This is roughly how qperft does it: before move generation it determines which pieces are pinned (by running through the slider section of the opponent's piece list, using alignment with the King as a first fast filter to rule out most sliders from being pinners). It then applies the on-ray generation to these pieces, and excludes them from the normal move generation.

Only the King moves have to be tested for stumbling into check. Even here you could do some form of bulk testing; with most board representations a test whether a given piece attacks the square the King is on would with virtually no extra work reveal all squares that are attacked by that piece in the King neighborhood. So instead of testing for attacks on the King after each of its moves you could test for attacks on the King neighborhood in absence of the King, and then only generate the King moves to the safe squares. Qperft doesn't use this. In positions where the King is boxed in by frienly pieces it wouldn't save very much anyway.

This all assumes you are not in check to begin with; for in-check positions other considerations apply, and many engines have a separate move generator for those. A cheap filter there is to test first whether the moves resolve the existing check (which most will not), and don't bother with those that don't.

You're right, I should have been clearer. This is a very specific move generator optimization, and not a perft trick; this will make perft faster, but it'll also make the chess engine itself faster.

I was speaking about a perft function without any tricks and without any special move generator optimizations (The entire thread is basically about getting make(), unmake() and associated functions to be as fast as possible, so no tricks or advanced move generation... yet

). In that case, you -will- have to check legality of all moves, and it doesn't matter if you do it in the move generator, or in make(). It's just a different place for checking the same move.

mvanthoor · Post by **mvanthoor** » Sat Oct 10, 2020 1:45 pm

BlueStar wrote: ↑Sat Oct 10, 2020 5:13 am
I'm so itching to play Rustic now lol. I don't care about the outcome even, I just think it is wonderful, we have been on almost the same pace for so long.

I'd love to have a copy of your engine as well.

With regard to outcome, I can't even make a guess. In some area's, Rustic is very fast compared to other engines just starting out, but in other area's, it doesn't even have functionality (yet). No killers, no history heuristic, no transposition table.... version 1 is very fast but also very stupid.

You may have seen the Youtube tutorial series by Maksim Korzh, author of Wukong and BBC. The first is a 0x88 type engine, the other uses magic bitboards. Rustic obliterates both engines with regard to speed. It is 14 times faster than Wukong, and 3 times faster than BBC in perft, while it already -includes- incremental material count and PSQT updates and the other two do not (to my knowledge). The inclusion of the incremental updates will probably make Rustic's evaluation much faster than those of Wukong and BBC.

HOWEVER...

Wukong is already in the rating list. Strength is 1479 Elo, which is (if I''m honest) higher than I expected. Wukong and Rustic 1 will basically be on par with regard to features: they'll have a/b-search, qsearch, simple move ordering, material count and PSQT's, and that's basically it... but Rustic is 14 times faster. I just wonder what this will do to the rating. Will it mean basically nothing, or will it gain 2 extra ply for a 300-400 rating point advantage? I don't know. I have -no- freaking clue.

On the other hand:

BBC is also in the rating list. The rating is 1957. It has these features which Rustic doesn't (yet) have:

LMR (Late Move Reduction)
NMP (Null Move Pruning)
Transposition table (up to 128MB)
Basic pawn structure/mobility/king safety evaluation
Tapered evaluation

VICE (the tutorial engine by BlueFever Software / Richard Allbert) does NOT have these features, and it's about half the speed of BBC, and it's still 100 points higher in the rating list. Therefore, compared to VICE, BBC is weaker than I would have expected when taking its extra speed and features into account.

I'm mentioning these engines because I like the fact that there is someone who's putting chess programming information into another format (video instead of text), and because these engines are both at the beginning of their development, so I'm following them, alongside yours. (Of the engines that are now stronger than beginner level, been following Minic, Stash, Weiss, and Madchess.)

abulmo2 · Post by **abulmo2** » Sat Oct 10, 2020 2:57 pm

hgm wrote: ↑Sat Oct 10, 2020 10:08 am
BlueStar wrote: ↑Sat Oct 10, 2020 3:00 amMy perft makes the leaf call to increment a counter (rather than bulk count the size of the move list) but does not in this case evaluate the board at the leaf, precisely because my move generator only produces legal moves. I've keep this to keep my perft function as close to my search code. For at least my first release, I've wanted to try and keep my perft/search functions as close in structure as possible, so I know that if my perft is perfect, my search will be as well. It will be NADYA3.0 that starts to make bigger changes.
The disadvantage of this is that the perft speed no longer is related to the speed of the corresponding engine: in search the leave nodes typically do move generation, but make no moves. So for every MakeMove you do one MoveGen. Your perft, OTOH, will do some 30 MakeMoves for every MoveGen, so that the speed gets heavily dominated by that of MakeMove, which is usually completely irrelevant for the engine.

I do not think perft with bulk counting or not is closely related with the overall engine speed. I tried to replace the hyper quintessence algorithm with magic bitboards in Amoeba without success. In perft, magic bitboards gave a 25% speed gain, but when measuring the speed in search benchmarks I got slower results. Spending too much time in perft speed optimization is probably a waste of time. Try to optimize a whole working program, not an isolated part.

Sven · Post by **Sven** » Sat Oct 10, 2020 4:29 pm

mvanthoor wrote: ↑Sat Oct 10, 2020 1:22 pm I was speaking about a perft function without any tricks and without any special move generator optimizations (The entire thread is basically about getting make(), unmake() and associated functions to be as fast as possible, so no tricks or advanced move generation... yet ). In that case, you -will- have to check legality of all moves, and it doesn't matter if you do it in the move generator, or in make(). It's just a different place for checking the same move.

Why would "getting make(), unmake() and associated functions to be as fast as possible" not include the move generation functions? And why would you not want to use a simple solution that costs very little programming effort and results in a huge gain of speed, for both perft and search?

It is a well-known fact that checking pseudo-legal moves for legality can be done much more efficiently by using the knowledge whether certain types of moves can be illegal at all. Based on that knowledge, I would summarize HGM's remarks as follows:

A. When not being in check, most pseudo-legal moves are actually legal and do not require any additional legality check.

B. When being in check, most pseudo-legal moves are illegal so in this case either checking all moves for legality or providing a special check evasion generator is required (or a combination of both).

Regarding A., exceptions include king moves and moves of pinned pieces, as pointed out by HGM. Special attention should be paid to en-passant captures and castling moves. For reasons of code simplicity I usually include en-passant captures in the set of moves that will always be checked for legality. Pseudo-legal castling moves are illegal if the king would move through check, I usually include this test during castling move generation already to keep the "real" legality checking code as simple as possible.

The key point for both A and B is statistics: legality checking is necessary for a small subset of moves only. The saving you get by applying a "can this move ever be illegal?" filter is significant already if you do pseudo-legal move generation during search and only check for legality before actually making a move, and it is really huge for a search with a fully legal move generator (there you can't really go without it) and also for perft. Therefore I would consider it a waste of resources not to use such filtering, not only regarding search speed (i.e. playing strength) but also in terms of testing time (i.e. running perft test suites).

A similar argument (minimize testing time) applies to "bulk counting". Once you have a legal move generator that includes to perform the few legality checks that are actually required, you get a "bulk counting" perft routine for free and your debugging turnaround times during move generation development or optimization phase become much shorter. Furthermore both search and perft implementations can be kept a bit more simple since you do not have legality checking code in it and mate/stalemate detection can be done easier (no counting of legal moves inside the move loop and no postponing of mate/stalemate detection to a place behind the move loop).

And all that can be done without any magic. Just do the well-known stuff. All you need is a function that calculates the set of all pinned pieces and a small function like "can_move_be_illegal()" that basically returns "is_king_move OR is_ep_move OR is_move_of_pinned_piece OR is_in_check", where obviously the actual calculation of pins and of being in check or not is done outside in advance (and the latter is needed in any case and should already be present in each chess engine).

mvanthoor · Post by **mvanthoor** » Sat Oct 10, 2020 4:44 pm

Sven wrote: ↑Sat Oct 10, 2020 4:29 pm Why would "getting make(), unmake() and associated functions to be as fast as possible" not include the move generation functions? And why would you not want to use a simple solution that costs very little programming effort and results in a huge gain of speed, for both perft and search?

I'm using the perft routine to test the impact new features have on make, unmake, and the piece move functions. For example: adding incremental material and PSQT values. In unmake(), I can calculate "backward", or copy then from the previous board state. I wanted to see what was faster. (It's the copy, in my engine.) I add a function, and if it impacts make, unmake, or one of the piece move functions, I test the impact. If there are multiple ways of implementing this feature I test all of them and then keep the fastest.

After the engine starts playing in its most basic form (but as fast as it can be in that form), I'll add features such as the ones you describe above.

Sven · Post by **Sven** » Sat Oct 10, 2020 6:00 pm

mvanthoor wrote: ↑Sat Oct 10, 2020 4:44 pm
Sven wrote: ↑Sat Oct 10, 2020 4:29 pm Why would "getting make(), unmake() and associated functions to be as fast as possible" not include the move generation functions? And why would you not want to use a simple solution that costs very little programming effort and results in a huge gain of speed, for both perft and search?
I'm using the perft routine to test the impact new features have on make, unmake, and the piece move functions. For example: adding incremental material and PSQT values. In unmake(), I can calculate "backward", or copy then from the previous board state. I wanted to see what was faster. (It's the copy, in my engine.) I add a function, and if it impacts make, unmake, or one of the piece move functions, I test the impact. If there are multiple ways of implementing this feature I test all of them and then keep the fastest.

After the engine starts playing in its most basic form (but as fast as it can be in that form), I'll add features such as the ones you describe above.

Nobody prevents you from doing so. However, I get the impression that you are about to do a lot of micro-optimization that you might throw away later when you have actually implemented a search. I call it micro-optimization by intent because your speed measurements are solely based on perft which is not realistic in the context of real search where most of the processing time is spent in eval, move ordering, and the search code itself but not in movegen/make/unmake etc.

A possible alternative would be to focus on correctness of move generation, make/unmake and related functions, use perft to verify it (which is what it was meant for - not for measuring speed), then add search and a simple eval, then do the most basic speed optimizations in movegen/make/unmake etc., then build a strong engine around it, and *now* apply micro-optimizations to squeeze out a few more Elo points.

Of course one can live with a slow legality check. What I meant is: life is too short to waste a lot of time waiting 10 minutes or more for completion of a perft test suite, several times a day, that might as well complete in only one or two minutes.

By the way, writing a very basic alpha-beta search will take much less than an hour, more like a few minutes ...

mvanthoor · Post by **mvanthoor** » Sat Oct 10, 2020 7:51 pm

Sven wrote: ↑Sat Oct 10, 2020 6:00 pm Nobody prevents you from doing so. However, I get the impression that you are about to do a lot of micro-optimization that you might throw away later when you have actually implemented a search. I call it micro-optimization by intent because your speed measurements are solely based on perft which is not realistic in the context of real search where most of the processing time is spent in eval, move ordering, and the search code itself but not in movegen/make/unmake etc.

I understand; but still, if a function is called a few million times and consruct X is 20% faster than construct Y, in doing the exact same thing, then construct X is obviously the way to go. It doesn't really matter if the function is called by perft, or by a simple loop.

make/unmake and related functions (and the move generator) are already correct. This has been verified by the fact that "perftsuite.epd" (which can be found in the repositories of many engines), added by many positions from talkchess, will be executed correctly. (Those are 172 test positions.)

Of course one can live with a slow legality check. What I meant is: life is too short to waste a lot of time waiting 10 minutes or more for completion of a perft test suite, several times a day, that might as well complete in only one or two minutes.

By the way, writing a very basic alpha-beta search will take much less than an hour, more like a few minutes ...

The alpha/beta function is probably going to be written today or tomorrow, and that point, the engine can play in the console. The only thing left to do is to build the UCI component. Material count and PSQT is already implemented.

Adapting the move generator to take pinned pieces into account (and having a check evasion part) is high on the list to implement; possibly right after the transposition table

ChessRenewal · Post by **ChessRenewal** » Sat Oct 10, 2020 8:20 pm

I look forward very much to your explaining in detail how your engine works - as you say at your website https://rustic-chess.org/ - Complete documentation regarding its development and progression. This will complement nicely the youtube series on the BBC engine.

Perft speed optimization (renamed)

Re: Perft speed optimization (renamed)

Re: Perft speed optimization (renamed)

Re: Perft speed optimization (renamed)

Re: Perft speed optimization (renamed)

Re: Perft speed optimization (renamed)

Re: Perft speed optimization (renamed)

Re: Perft speed optimization (renamed)

Re: Perft speed optimization (renamed)