Perft speed optimization (renamed)

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Terje
Posts: 347
Joined: Tue Nov 19, 2019 4:34 am
Location: https://github.com/TerjeKir/weiss
Full name: Terje Kirstihagen

Re: Can't get Minic to run perft?

Post by Terje »

xr_a_y wrote: Tue Apr 07, 2020 10:45 am
abulmo2 wrote: Tue Apr 07, 2020 10:30 am IMHO, the reference program for perft without bulk counting nor hash table should be crafty. In my computer, it does perft 7 in 64.78s vs 3m30s for minic 2.00.
Quite a huge gap on your hardware !

On mine, Minic does 107 sec and Crafty 61 sec

What is your hardware and what Minic executable are you using ?
Had a go. Here Crafty took 68 sec, Weiss 104 sec, and Minic 154 sec (minic_2.05_mingw_x64_sse4.2.exe).
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Can't get Minic to run perft?

Post by mvanthoor »

abulmo2 wrote: Tue Apr 07, 2020 10:30 am IMHO, the reference program for perft without bulk counting nor hash table should be crafty. In my computer, it does perft 7 in 64.78s vs 3m30s for minic 2.00.
Crafty runs Perft 7 from the starting position in 55 seconds on my system. That's about half the time of Rustic. I assume Crafty has a more advanced move generator. It probably checks for being in check, double check, pinned pieces, etc... i.e., generating less illegal moves, and thus saving a lot of time making and then unmaking illegal moves.

Rustic doesn't do this (yet), so when the king is in check or if pieces are pinned, it still generates huge amounts of illegal moves. If you're in double check, all moves except king moves are illegal. These are optimizations that are going to be done later. First finish the search and uci protocol (and making my Rust code more idiomatic :| )

This evening I'll post some outputs.

@Terje:

In my case, Rustic is now slower when using the GNU toolchain for Rust. In the beginning when I started perft testing, it didn't matter, but for some reason (there were several Rust updates for example, since I neglected updating for some time), a GNU-toolchain compile now takes 120 seconds for Perft 7 from the starting position, while an MSVC compile takes 114 seconds. As the rustc compiler is the same in both toolchains, it seems MSVC's "link.exe" is more efficient in doing link-time optimization than GNU's ld.exe, on Windows.

Next to that, I probably also lost some speed, due to:
- Fixing two zobrist hash bugs (so there are now more zobrist XOR commands)
- Splitting a very long and ugly make_move() into several functions. To make this possible, it was necessary to take out a reference that went directly into the Board-struct. (Rust doesn't allow you to take mutliple mutable references into the same entity at the same time.) Therefore I now have to decide each time (in the new function) if I need to use the black or white pieces bitboard when setting or removing a piece. That check that's now done a huge amount of times as compared to just checking once and taking the pointer, will probably be responsible for some of the speed loss.

In this case, it's a speed vs. code readability/maintainability issue. The above two points lost about a second (113.2 vs 114.3) in perft 7 from the starting position, which at this point, is a speed loss of about 1%. I can live with that, getting much clearner code and better maintainability in return. In the end, the optimizations to the move generator itself will have a much bigger impact than individual statements in a function.

Also: was it difficult to switch from Fancy Magic bitboards to PEXT bitboards (didn't look into them yet), and was it worth it? Do you do one or more optimizations in the move generator? As Crafty runs slower on your system than it does on mine, I assume you have slower CPU; but still, on that slower CPU, Weiss is 10 seconds faster than Rustic is on my CPU.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Can't get Minic to run perft?

Post by xr_a_y »

Not sure about your point on invalid move.
On start position, Minic, who does nothing about checks and pinned pieces, gives this stat

pseudoNodes 3222676138
validNodes 3195901860

So there are not many invalid moves for perft 7.
Terje
Posts: 347
Joined: Tue Nov 19, 2019 4:34 am
Location: https://github.com/TerjeKir/weiss
Full name: Terje Kirstihagen

Re: Can't get Minic to run perft?

Post by Terje »

mvanthoor wrote: Tue Apr 07, 2020 1:05 pm ...
@Terje:
...
Also: was it difficult to switch from Fancy Magic bitboards to PEXT bitboards (didn't look into them yet), and was it worth it? Do you do one or more optimizations in the move generator? As Crafty runs slower on your system than it does on mine, I assume you have slower CPU; but still, on that slower CPU, Weiss is 10 seconds faster than Rustic is on my CPU.
Pext is far simpler than magic bitboards and adds a few % speed up, easily worth it imo. I like to think Weiss code is fairly readable and everything related to this is found in bitboard.c/h (look for USE_PEXT) if you want to take a look. The times posted here use magic bitboards tho.

I don't think Weiss does anything 'smart' in movegen. It generates all moves when in check etc. It's also split into noisy/quiet with no other way to gen both than first do one then the other which might have a negative impact since we always want all in perft.

Weiss adds scores for move ordering in movegen, and performs part of the evaluation (piece square table and phase calculation) in makemove, which I guess isn't done in Rustic. These are pointless for perft and commenting them out drops the time to 92 sec.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Can't get Minic to run perft?

Post by mvanthoor »

xr_a_y wrote: Tue Apr 07, 2020 1:45 pm Not sure about your point on invalid move.
On start position, Minic, who does nothing about checks and pinned pieces, gives this stat

pseudoNodes 3222676138
validNodes 3195901860

So there are not many invalid moves for perft 7.
There are 26.774.278‬ invalid moves for perft 7.

If you could cut most of them from the generated pseudo-move-list, you could potentially save all of those calls to make_move and unmake_move. I don't know if it will cut the perft time in half (I don't think so), but it will make it a lot shorter. If this doesn't account for the entire extra speed of crafty's move generator, then I wonder what else does?
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Can't get Minic to run perft?

Post by mvanthoor »

Terje wrote: Tue Apr 07, 2020 2:10 pm
mvanthoor wrote: Tue Apr 07, 2020 1:05 pm ...
@Terje:
...
Also: was it difficult to switch from Fancy Magic bitboards to PEXT bitboards (didn't look into them yet), and was it worth it? Do you do one or more optimizations in the move generator? As Crafty runs slower on your system than it does on mine, I assume you have slower CPU; but still, on that slower CPU, Weiss is 10 seconds faster than Rustic is on my CPU.
Pext is far simpler than magic bitboards and adds a few % speed up, easily worth it imo. I like to think Weiss code is fairly readable and everything related to this is found in bitboard.c/h (look for USE_PEXT) if you want to take a look. The times posted here use magic bitboards tho.

I don't think Weiss does anything 'smart' in movegen. It generates all moves when in check etc. It's also split into noisy/quiet with no other way to gen both than first do one then the other which might have a negative impact since we always want all in perft.

Weiss adds scores for move ordering in movegen, and performs part of the evaluation (piece square table and phase calculation) in makemove, which I guess isn't done in Rustic. These are pointless for perft and commenting them out drops the time to 92 sec.
I'll have to compile Weiss this evening and run it with magic bitboards on my system. If you're not doing anything special in either move generation AND are doing more than necessary in make_move AND are still faster than Rustic, then not only Minic, but also myself would need to take a look at make_move. I might also try and take a look into compiler options. (I now have, what I believe, the fastest options set and I'm compiling for a Skylake CPU, which is correct for the 6700K.)

The only thing Rustic does in perft that's not necessary for perft without hashing is zobrist hash updates.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Can't get Minic to run perft?

Post by xr_a_y »

mvanthoor wrote: Tue Apr 07, 2020 2:44 pm There are 26.774.278‬ invalid moves for perft 7.

If you could cut most of them from the generated pseudo-move-list, you could potentially save all of those calls to make_move and unmake_move. I don't know if it will cut the perft time in half (I don't think so), but it will make it a lot shorter. If this doesn't account for the entire extra speed of crafty's move generator, then I wonder what else does?
In Minic, apply (a.k.a make_move) is calling another function "movePiece" in which the Zobrist, bitboard and material updates are done.
Here's minic timers

Code: Select all

Generate     2399955906  7.96499%  5072213 473
Apply        17514366504  58.1268%  125049325 140
    Move Piece   3307831780  10.9781%  125043819 26
    IsAttacked   3173190728  10.5312%  125049325 25
In the 58% of time spent in apply, 10% (of whole elapsed) time is spent in movePiece and another 10% for checking the move is legal.
So that 38% of time is spent somewhere else !

Apply has many branches in Minic, this is maybe the part to be investigated.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Can't get Minic to run perft?

Post by mvanthoor »

I just compiled and ran Weiss with magic bitboards on my system.

It runs perft 7 from the starting position in 88 seconds.

Here's the kicker... Weiss' MakeMove is almost a carbon copy of Rustic's, with only two differences:
- It uses the CastlePerms array. I didn't; just because I didnt want to. (I can obviously easily change that.)
- It does some things in a different order, or with local functions where I use functions in the board struct. Essentially, the functions are similar to almost the same. The same is true for TakeMove / unmake_move.

This isn't a surprise, because Weiss is based on VICE, and I watched all the VICE tutorials when I started out with Rustic to (re)learn some of the chess programming concepts and make sure I didn't forget anything.

I'm assuming the speed difference has, again, something to do with Rustic initializing memory where C doesn't. I can't believe a few differences in the order of which things are done, or some different locations of functions makes Rustic take a hit of 26 seconds as compared to Weiss.

First thing that I'm going to do is replace the vector holding "unmake_info" with an array. Indexing an array in Rust is somewhat faster than push()-ing onto a vector (even if the vector has enough capacity already), and of course "pop" removes the element; which is not strictly necessary.

Maybe I'll then implement the CastlePerm array as well, to get rid of the 6-armed match() ("switch with patterns") statement.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
Terje
Posts: 347
Joined: Tue Nov 19, 2019 4:34 am
Location: https://github.com/TerjeKir/weiss
Full name: Terje Kirstihagen

Re: Can't get Minic to run perft?

Post by Terje »

mvanthoor wrote: Tue Apr 07, 2020 5:40 pm I just compiled and ran Weiss with magic bitboards on my system.

It runs perft 7 from the starting position in 88 seconds.

Here's the kicker... Weiss' MakeMove is almost a carbon copy of Rustic's, with only two differences:
- It uses the CastlePerms array. I didn't; just because I didnt want to. (I can obviously easily change that.)
- It does some things in a different order, or with local functions where I use functions in the board struct. Essentially, the functions are similar to almost the same. The same is true for TakeMove / unmake_move.

This isn't a surprise, because Weiss is based on VICE, and I watched all the VICE tutorials when I started out with Rustic to (re)learn some of the chess programming concepts and make sure I didn't forget anything.

I'm assuming the speed difference has, again, something to do with Rustic initializing memory where C doesn't. I can't believe a few differences in the order of which things are done, or some different locations of functions makes Rustic take a hit of 26 seconds as compared to Weiss.

First thing that I'm going to do is replace the vector holding "unmake_info" with an array. Indexing an array in Rust is somewhat faster than push()-ing onto a vector (even if the vector has enough capacity already), and of course "pop" removes the element; which is not strictly necessary.

Maybe I'll then implement the CastlePerm array as well, to get rid of the 6-armed match() ("switch with patterns") statement.
Thanks for comparing. It's nice to see Weiss is fairly fast, but I'm now very curious how Crafty does it so much faster!

I would guess our movegen is quite different?
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Can't get Minic to run perft?

Post by mvanthoor »

Terje wrote: Tue Apr 07, 2020 6:02 pm
mvanthoor wrote: Tue Apr 07, 2020 5:40 pm I just compiled and ran Weiss with magic bitboards on my system.

It runs perft 7 from the starting position in 88 seconds.

Here's the kicker... Weiss' MakeMove is almost a carbon copy of Rustic's, with only two differences:
- It uses the CastlePerms array. I didn't; just because I didnt want to. (I can obviously easily change that.)
- It does some things in a different order, or with local functions where I use functions in the board struct. Essentially, the functions are similar to almost the same. The same is true for TakeMove / unmake_move.

This isn't a surprise, because Weiss is based on VICE, and I watched all the VICE tutorials when I started out with Rustic to (re)learn some of the chess programming concepts and make sure I didn't forget anything.

I'm assuming the speed difference has, again, something to do with Rustic initializing memory where C doesn't. I can't believe a few differences in the order of which things are done, or some different locations of functions makes Rustic take a hit of 26 seconds as compared to Weiss.

First thing that I'm going to do is replace the vector holding "unmake_info" with an array. Indexing an array in Rust is somewhat faster than push()-ing onto a vector (even if the vector has enough capacity already), and of course "pop" removes the element; which is not strictly necessary.

Maybe I'll then implement the CastlePerm array as well, to get rid of the 6-armed match() ("switch with patterns") statement.
Thanks for comparing. It's nice to see Weiss is fairly fast, but I'm now very curious how Crafty does it so much faster!

I would guess our movegen is quite different?
I haven't looked at it yet, but I just do the normal magic bitboard lookup for sliders, and moves for other pieces are generated by using their "to" mask for each square. This is the function that generates moves for everything except pawns:
(I should rename bb_pieces and get_pieces() to somthing like bb_piece_of_type and get_piece_of_type() or something.)

Code: Select all

fn piece(piece: Piece, board: &Board, list: &mut MoveList) {
    let side = board.active_color as usize;
    let bb_occupancy = board.occupancy();
    let bb_own_pieces = board.bb_pieces[side];
    let mut bb_pieces = board.get_pieces(piece, side);
    while bb_pieces > 0 {
        let from = bits::next(&mut bb_pieces);
        let bb_target = match piece {
            QUEEN | ROOK | BISHOP => board.get_slider_attacks(piece, from, bb_occupancy),
            KING | KNIGHT => board.get_non_slider_attacks(piece, from),
            _ => 0,
        };
        let bb_moves = bb_target & !bb_own_pieces;
        add_move(board, piece, from, bb_moves, list);
    }
}
Compared to make_move and unmake_move, everything else is basically non_existent in the profiler.

In one other thread I sped up perft generation by a factor of 3 by taking out the local allocation of the move list. (I allocated a MoveListPool, with a list for each depth, beforehand; just to avoid safe code.) I'm fairly sure that make_move and unmake_move() will speed up a lot if I can take the push() and pop() out. I just quickly rewrote that part to use an array backed by a counter (wrapped in a struct), but it doesn't work. I must have made a mistake somewhere.

Rust also always wants to copy and clone (it often complains about needing the Copy or Clone trait on a struct); probably to avoid memory unsafety. For example, you can't create a big struct in a function, and then return a pointer to that memory (without dropping into unsafe rust), because the struct you just created, will be gone at the end of the function. So, you need to first create the struct, and then copy it out of the function.

I don't (yet) know a way to circumvent that. If I need to start going to unsafe rust all the time, and/or need to go and drop 'lifetime markers' everywhere (basically telling the compiler that *I* will ensure that a variable lives long enough to have a pointer to it), for performance reasons, then I could just as well have chosen C... and in that case, Rust will *NEVER* replace C. So I hope there's something I just don't know yet.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL