Perft speed optimization (renamed)

Terje · Post by **Terje** » Mon Apr 20, 2020 11:50 pm

mvanthoor wrote: ↑Mon Apr 20, 2020 11:43 pm
Terje wrote: ↑Mon Apr 20, 2020 11:23 pm Does cmd_make_move use the same makemove function used by perft?
Yes.

If no command matches the input, it is assumed that the user has typed a move such as "e2e4" or "b7b8q", and cmd_make_move() is called.

cmd_make_move uses:
-- "parse_move()" to convert the algebraic move into square and promotion piece numbers. If the squares and promotion piece numbers are all valid, we have a possible move. (The input could be "d2g8r", which are all legal squares and a legal promotion piece, but is never a legal move.)
-- "try_move()" to actually try and play the move that comes out of parse_move(). It uses the same make() function as perft() does. If this make() function returns "true", the move is legal, and made. If it returns "false", the move was illegal and has already been reversed.

All of this code to parse and try to make a move is not used when I type "perft" (enter). Then cmd_perft() is executed, which just calls "perft()".

If I change nothing else but just take out cmd_make_move() (by replacing it with either Rust's "do nothing", or CMD_CONTINUE), perft runs at its normal speed after compilation. The difference is that if I take out cmd_make_move(), then the code in cmd_make_move(), parse_move(), and try_move() is not compiled into the exe. When I type "perft" to run perft, that code _is_ in the exe (but not called touched during the perft run).

I wonder if the CPU could go on a tangent, trying to "predict" all of the parsing and move making and such from the match statement, and is then completely choked when I finally decide to type "perft" and it has to go a certain way. I don't know.

It could be that the compiler chose to inline makemove when it was only called from one place, but retained the function and used calls when it is now called from two. You could try to force it inline to see if this is the case.

mvanthoor · Post by **mvanthoor** » Tue Apr 21, 2020 12:47 am

Could that be it?

It already has an #inline[always] annotation because the function is too large to inline automatically. That would mean that the compiler _still_ decides on its own even when the always inline directive is set. If so, I would consider that a bug.

Even so, the text ui is only for testing during development, so I'll probably make a feature out of it that'll be off for a normal compile.

Terje · Post by **Terje** » Tue Apr 21, 2020 12:52 am

mvanthoor wrote: ↑Tue Apr 21, 2020 12:47 am Could that be it?

It already has an #inline[always] annotation because the function is too large to inline automatically. That would mean that the compiler _still_ decides on its own even when the always inline directive is set. If so, I would consider that a bug.

If you've already marked it to always inline I would assume it does so, and the problem is probably something else.

mvanthoor · Post by **mvanthoor** » Tue Apr 21, 2020 1:44 am

I've just quickly tried something...

1. With cmd_make_move() and its helper functions compiled into the program, perft is faster when I remove always inline from make() and unmake(). It approaches, but doesn't reach the speed of version 2. below.
2. With cmd_make_move() NOT in the program, it is at its fastest I put always inline before make() and unmake(). This is the fastest version; it runs perft in 88-89 seconds. Version 1 runs perft in 90-91 seconds.
3. When I remove always inline from square_attacked(), all versions are the same; 10% slower (98 seconds) compared to the fastest version 2.

If calling make() from multiple places changes the inline behavior and speed of the program, that's going to be possibly problematic, because it will at least be called in perft() and during the search.

How do other programs cope with this? I understand that move generation is dwarfed by search (and search functionality) and evaluation, but there has to be a way to not drop program speed by 10% because some code and function calls get added.

mvanthoor · Post by **mvanthoor** » Tue Apr 21, 2020 10:34 pm

Hi

I've refactored make() a bit more (combining and stripping zobrist-calculations where possible) and the perft-run is now down to 87.xx seconds (down from 89.xx).

Code: Select all

Perft 1: 20 (0 ms, inf leaves/sec)
Perft 2: 400 (0 ms, inf leaves/sec)
Perft 3: 8902 (0 ms, inf leaves/sec)
Perft 4: 197281 (5 ms, 39456200 leaves/sec)
Perft 5: 4865609 (133 ms, 36583526 leaves/sec)
Perft 6: 119060324 (3291 ms, 36177552 leaves/sec)
Perft 7: 3195901860 (87279 ms, 36617076 leaves/sec)
Total time spent: 90708 ms
Execution speed: 36601340 leaves/second

@Terje:

I finally gave in and implemented the CastlePerms array way of doing things from VICE / Weiss to be able to cut some helper functions from the playmove.rs module. It seems it only gained about 0.2 seconds, if that; well within the margin of error. (However, the code -is- shorter now.) I still have a seperate handling for when a rook is captured in the corner. Even though it also uses the CastlePerm array, the make() gets A LOT slower when I combine this with the castle permission handling in the same way you have done. With a lot, I mean like 10-12% slower.

While looking through your makemove, I noticed this:

Code: Select all

  if (pos->castlingRights && CastlePerm[from] ^ CastlePerm[to])
        HASH_CA,
        pos->castlingRights &= CastlePerm[from] & CastlePerm[to],
        HASH_CA;

Am I missing something, or has C adopted some Python-esque coding standards? Shouldn't this be:

Code: Select all

  if (pos->castlingRights && CastlePerm[from] ^ CastlePerm[to])
  {
        HASH_CA,
        pos->castlingRights &= CastlePerm[from] & CastlePerm[to],
        HASH_CA;
  }

It seems only your first HASH_CA is executed, but your adjustment of castling permissions and the second HASH_CA are always executed, regardless of the if-statement. If so, you have a few bugs around make/unmake, unless I'm mistaken...

Ras · Post by **Ras** » Tue Apr 21, 2020 10:55 pm

mvanthoor wrote: ↑Tue Apr 21, 2020 10:34 pmIt seems only your first HASH_CA is executed

The first two lines in the if clause have a comma, not a semicolon. So it's one statement, not three. That's why the braces aren't required.

Terje · Post by **Terje** » Tue Apr 21, 2020 11:01 pm

mvanthoor wrote: ↑Tue Apr 21, 2020 10:34 pm Hi I've refactored make() a bit more (combining and stripping zobrist-calculations where possible) and the perft-run is now down to 87.xx seconds (down from 89.xx).
Code: Select all
Perft 1: 20 (0 ms, inf leaves/sec)
Perft 2: 400 (0 ms, inf leaves/sec)
Perft 3: 8902 (0 ms, inf leaves/sec)
Perft 4: 197281 (5 ms, 39456200 leaves/sec)
Perft 5: 4865609 (133 ms, 36583526 leaves/sec)
Perft 6: 119060324 (3291 ms, 36177552 leaves/sec)
Perft 7: 3195901860 (87279 ms, 36617076 leaves/sec)
Total time spent: 90708 ms
Execution speed: 36601340 leaves/second
@Terje:

I finally gave in and implemented the CastlePerms array way of doing things from VICE / Weiss to be able to cut some helper functions from the playmove.rs module. It seems it only gained about 0.2 seconds, if that; well within the margin of error. (However, the code -is- shorter now.) I still have a seperate handling for when a rook is captured in the corner. Even though it also uses the CastlePerm array, the make() gets A LOT slower when I combine this with the castle permission handling in the same way you have done. With a lot, I mean like 10-12% slower.

While looking through your makemove, I noticed this:
Code: Select all
  if (pos->castlingRights && CastlePerm[from] ^ CastlePerm[to])
        HASH_CA,
        pos->castlingRights &= CastlePerm[from] & CastlePerm[to],
        HASH_CA;
Am I missing something, or has C adopted some Python-esque coding standards? Shouldn't this be:
Code: Select all
  if (pos->castlingRights && CastlePerm[from] ^ CastlePerm[to])
  {
        HASH_CA,
        pos->castlingRights &= CastlePerm[from] & CastlePerm[to],
        HASH_CA;
  }
It seems only your first HASH_CA is executed, but your adjustment of castling permissions and the second HASH_CA are always executed, regardless of the if-statement. If so, you have a few bugs around make/unmake, unless I'm mistaken...

Could you give a rough outline of how you handle the rook being captured, I'd love to try it out in Weiss.

Using , you can string together statements, with the full expression taking on the value of the final statement. The value of these expressions is not used for anything so it doesn't matter, and I like this aesthetic more as long as it's still clear what the code should do. (I may be using the terms statement/expression incorrectly.)

mvanthoor · Post by **mvanthoor** » Tue Apr 21, 2020 11:19 pm

Oh, ok. thanks to my poor eyesight and quite fast scan of your code, I missed the ,. If I'd seen it, I would have probably thought it a mistake. Stringing stuff like that isn't my preference. I also didn't know about this functionality.

A statement just does something, and is then finished.
An expression is evaluated, and then returns a value.

In Rust, it's actually somewhat confusing, because the ; determines if something is a statement or an expression. Took me some time to get used to. You can turn entire if-blocks into an expression by omitting the last semicolon after a statement. It will then return the current value of that statement.

Rook capturing in the corner to handle castling permissions now uses the castle perms array:

Code: Select all

    // If piece was captured with this move then remove it.
    if captured != PNONE {
        board.remove_piece(opponent, captured, to);
        
        // Change castling permissions on rook capture.
        if captured == ROOK && (board.game_state.castling > 0) {
            board.zobrist_castling();
            board.game_state.castling &= CASTLING_PERMS[to as usize];
            board.zobrist_castling();
        }
    }

The three statementsin "if catpured == ROOK" previously were a function "adjust_castling_when_rook_captured()", which had a match (switch) for the four corner squares. Now it substracts the castling permission that are connected to that corner square, just as you do. The difference is that you have folded this into one piece of code (the one I posted actually). If I do that as well, my code slows down by as much as 12%.

I've also tried the |= and ^= trick (with a CONST array with square bitboards) instead of set/clear_bit, but it also slows my code by as much as 15%. It's the second time I've tried this, actually; this time the slowdown is even bigger than last time.

I might try it later, again, when I finish the search and UCI-interface. It seems these "speed tricks" mainly help when the code gets bigger, and for some reason are actually detrimental if the code is smaller. Don't ask me why. Could be because of what was discussed earlier; memory layout, compiler optimizations, branch predictions.... I don't know. The only reason for me to switch to the Castling permissions array is not because it's faster (in my code, it isn't), but because it gets rid of three helper functions.

mvanthoor · Post by **mvanthoor** » Wed Apr 22, 2020 7:33 pm

@Terje: I've tried to compile the latest version of Weiss, but it seems the makefile is mssing. In the beginning of the topic you said to compile with "make dev" (which worked back then), but it doesn't do so now. How do I compile the latest version?

PS: I was looking through the reposity, and found Weiss 0.1 in the very beginning. The resemblance between Weiss' and Rustic's console modes is almost creepy. I SWEAR I've never seen Weiss' console mode before now (I never actually ran the engine before this topic). Taking into account that our make/unmake functions are also almost carbon copies leads me to believe we have a similar way of thinking. This will probably be 'assisted' by the fact that we have both followed VICE's tutorials. I started my engine from scratch, partly based on those tutorials, and you started from the engine itself, so it's only logical there are similarities... but look at this:

I don't know if you wrote the console yourself, or if VICE 1.0 has a console mode. I know it has a rudimentary way to put in moves, to get the search going before the UCI interface was built.

(I prefer "I" and "i" for pawns instead of "P" and "p". In the console, P and p are too similar to comfortably use with my poor eyesight.)

Terje · Post by **Terje** » Wed Apr 22, 2020 10:57 pm

mvanthoor wrote: ↑Wed Apr 22, 2020 7:33 pm @Terje: I've tried to compile the latest version of Weiss, but it seems the makefile is mssing. In the beginning of the topic you said to compile with "make dev" (which worked back then), but it doesn't do so now. How do I compile the latest version?

PS: I was looking through the reposity, and found Weiss 0.1 in the very beginning. The resemblance between Weiss' and Rustic's console modes is almost creepy. I SWEAR I've never seen Weiss' console mode before now (I never actually ran the engine before this topic). Taking into account that our make/unmake functions are also almost carbon copies leads me to believe we have a similar way of thinking. This will probably be 'assisted' by the fact that we have both followed VICE's tutorials. I started my engine from scratch, partly based on those tutorials, and you started from the engine itself, so it's only logical there are similarities... but look at this:

...

I don't know if you wrote the console yourself, or if VICE 1.0 has a console mode. I know it has a rudimentary way to put in moves, to get the search going before the UCI interface was built.

(I prefer "I" and "i" for pawns instead of "P" and "p". In the console, P and p are too similar to comfortably use with my poor eyesight.)

The makefile should be there, in the src folder.

That print is probably the same as in VICE. I changed it slightly at some point. More recently I have changed it up so it prints the board, the hash and the full FEN.

Perft speed optimization (renamed)

Re: Can't get Minic to run perft?

Re: Can't get Minic to run perft?

Re: Can't get Minic to run perft?

Re: Can't get Minic to run perft?

Re: Can't get Minic to run perft?

Re: Can't get Minic to run perft?

Re: Can't get Minic to run perft?

Re: Can't get Minic to run perft?

Re: Can't get Minic to run perft?

Re: Can't get Minic to run perft?