Had a go. Here Crafty took 68 sec, Weiss 104 sec, and Minic 154 sec (minic_2.05_mingw_x64_sse4.2.exe).
Perft speed optimization (renamed)
Moderators: hgm, Rebel, chrisw
-
- Posts: 347
- Joined: Tue Nov 19, 2019 4:34 am
- Location: https://github.com/TerjeKir/weiss
- Full name: Terje Kirstihagen
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Can't get Minic to run perft?
Crafty runs Perft 7 from the starting position in 55 seconds on my system. That's about half the time of Rustic. I assume Crafty has a more advanced move generator. It probably checks for being in check, double check, pinned pieces, etc... i.e., generating less illegal moves, and thus saving a lot of time making and then unmaking illegal moves.
Rustic doesn't do this (yet), so when the king is in check or if pieces are pinned, it still generates huge amounts of illegal moves. If you're in double check, all moves except king moves are illegal. These are optimizations that are going to be done later. First finish the search and uci protocol (and making my Rust code more idiomatic )
This evening I'll post some outputs.
@Terje:
In my case, Rustic is now slower when using the GNU toolchain for Rust. In the beginning when I started perft testing, it didn't matter, but for some reason (there were several Rust updates for example, since I neglected updating for some time), a GNU-toolchain compile now takes 120 seconds for Perft 7 from the starting position, while an MSVC compile takes 114 seconds. As the rustc compiler is the same in both toolchains, it seems MSVC's "link.exe" is more efficient in doing link-time optimization than GNU's ld.exe, on Windows.
Next to that, I probably also lost some speed, due to:
- Fixing two zobrist hash bugs (so there are now more zobrist XOR commands)
- Splitting a very long and ugly make_move() into several functions. To make this possible, it was necessary to take out a reference that went directly into the Board-struct. (Rust doesn't allow you to take mutliple mutable references into the same entity at the same time.) Therefore I now have to decide each time (in the new function) if I need to use the black or white pieces bitboard when setting or removing a piece. That check that's now done a huge amount of times as compared to just checking once and taking the pointer, will probably be responsible for some of the speed loss.
In this case, it's a speed vs. code readability/maintainability issue. The above two points lost about a second (113.2 vs 114.3) in perft 7 from the starting position, which at this point, is a speed loss of about 1%. I can live with that, getting much clearner code and better maintainability in return. In the end, the optimizations to the move generator itself will have a much bigger impact than individual statements in a function.
Also: was it difficult to switch from Fancy Magic bitboards to PEXT bitboards (didn't look into them yet), and was it worth it? Do you do one or more optimizations in the move generator? As Crafty runs slower on your system than it does on mine, I assume you have slower CPU; but still, on that slower CPU, Weiss is 10 seconds faster than Rustic is on my CPU.
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
Re: Can't get Minic to run perft?
Not sure about your point on invalid move.
On start position, Minic, who does nothing about checks and pinned pieces, gives this stat
pseudoNodes 3222676138
validNodes 3195901860
So there are not many invalid moves for perft 7.
On start position, Minic, who does nothing about checks and pinned pieces, gives this stat
pseudoNodes 3222676138
validNodes 3195901860
So there are not many invalid moves for perft 7.
-
- Posts: 347
- Joined: Tue Nov 19, 2019 4:34 am
- Location: https://github.com/TerjeKir/weiss
- Full name: Terje Kirstihagen
Re: Can't get Minic to run perft?
Pext is far simpler than magic bitboards and adds a few % speed up, easily worth it imo. I like to think Weiss code is fairly readable and everything related to this is found in bitboard.c/h (look for USE_PEXT) if you want to take a look. The times posted here use magic bitboards tho.mvanthoor wrote: ↑Tue Apr 07, 2020 1:05 pm ...
@Terje:
...
Also: was it difficult to switch from Fancy Magic bitboards to PEXT bitboards (didn't look into them yet), and was it worth it? Do you do one or more optimizations in the move generator? As Crafty runs slower on your system than it does on mine, I assume you have slower CPU; but still, on that slower CPU, Weiss is 10 seconds faster than Rustic is on my CPU.
I don't think Weiss does anything 'smart' in movegen. It generates all moves when in check etc. It's also split into noisy/quiet with no other way to gen both than first do one then the other which might have a negative impact since we always want all in perft.
Weiss adds scores for move ordering in movegen, and performs part of the evaluation (piece square table and phase calculation) in makemove, which I guess isn't done in Rustic. These are pointless for perft and commenting them out drops the time to 92 sec.
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Can't get Minic to run perft?
There are 26.774.278 invalid moves for perft 7.
If you could cut most of them from the generated pseudo-move-list, you could potentially save all of those calls to make_move and unmake_move. I don't know if it will cut the perft time in half (I don't think so), but it will make it a lot shorter. If this doesn't account for the entire extra speed of crafty's move generator, then I wonder what else does?
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Can't get Minic to run perft?
I'll have to compile Weiss this evening and run it with magic bitboards on my system. If you're not doing anything special in either move generation AND are doing more than necessary in make_move AND are still faster than Rustic, then not only Minic, but also myself would need to take a look at make_move. I might also try and take a look into compiler options. (I now have, what I believe, the fastest options set and I'm compiling for a Skylake CPU, which is correct for the 6700K.)Terje wrote: ↑Tue Apr 07, 2020 2:10 pmPext is far simpler than magic bitboards and adds a few % speed up, easily worth it imo. I like to think Weiss code is fairly readable and everything related to this is found in bitboard.c/h (look for USE_PEXT) if you want to take a look. The times posted here use magic bitboards tho.mvanthoor wrote: ↑Tue Apr 07, 2020 1:05 pm ...
@Terje:
...
Also: was it difficult to switch from Fancy Magic bitboards to PEXT bitboards (didn't look into them yet), and was it worth it? Do you do one or more optimizations in the move generator? As Crafty runs slower on your system than it does on mine, I assume you have slower CPU; but still, on that slower CPU, Weiss is 10 seconds faster than Rustic is on my CPU.
I don't think Weiss does anything 'smart' in movegen. It generates all moves when in check etc. It's also split into noisy/quiet with no other way to gen both than first do one then the other which might have a negative impact since we always want all in perft.
Weiss adds scores for move ordering in movegen, and performs part of the evaluation (piece square table and phase calculation) in makemove, which I guess isn't done in Rustic. These are pointless for perft and commenting them out drops the time to 92 sec.
The only thing Rustic does in perft that's not necessary for perft without hashing is zobrist hash updates.
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
Re: Can't get Minic to run perft?
In Minic, apply (a.k.a make_move) is calling another function "movePiece" in which the Zobrist, bitboard and material updates are done.mvanthoor wrote: ↑Tue Apr 07, 2020 2:44 pm There are 26.774.278 invalid moves for perft 7.
If you could cut most of them from the generated pseudo-move-list, you could potentially save all of those calls to make_move and unmake_move. I don't know if it will cut the perft time in half (I don't think so), but it will make it a lot shorter. If this doesn't account for the entire extra speed of crafty's move generator, then I wonder what else does?
Here's minic timers
Code: Select all
Generate 2399955906 7.96499% 5072213 473
Apply 17514366504 58.1268% 125049325 140
Move Piece 3307831780 10.9781% 125043819 26
IsAttacked 3173190728 10.5312% 125049325 25
So that 38% of time is spent somewhere else !
Apply has many branches in Minic, this is maybe the part to be investigated.
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Can't get Minic to run perft?
I just compiled and ran Weiss with magic bitboards on my system.
It runs perft 7 from the starting position in 88 seconds.
Here's the kicker... Weiss' MakeMove is almost a carbon copy of Rustic's, with only two differences:
- It uses the CastlePerms array. I didn't; just because I didnt want to. (I can obviously easily change that.)
- It does some things in a different order, or with local functions where I use functions in the board struct. Essentially, the functions are similar to almost the same. The same is true for TakeMove / unmake_move.
This isn't a surprise, because Weiss is based on VICE, and I watched all the VICE tutorials when I started out with Rustic to (re)learn some of the chess programming concepts and make sure I didn't forget anything.
I'm assuming the speed difference has, again, something to do with Rustic initializing memory where C doesn't. I can't believe a few differences in the order of which things are done, or some different locations of functions makes Rustic take a hit of 26 seconds as compared to Weiss.
First thing that I'm going to do is replace the vector holding "unmake_info" with an array. Indexing an array in Rust is somewhat faster than push()-ing onto a vector (even if the vector has enough capacity already), and of course "pop" removes the element; which is not strictly necessary.
Maybe I'll then implement the CastlePerm array as well, to get rid of the 6-armed match() ("switch with patterns") statement.
It runs perft 7 from the starting position in 88 seconds.
Here's the kicker... Weiss' MakeMove is almost a carbon copy of Rustic's, with only two differences:
- It uses the CastlePerms array. I didn't; just because I didnt want to. (I can obviously easily change that.)
- It does some things in a different order, or with local functions where I use functions in the board struct. Essentially, the functions are similar to almost the same. The same is true for TakeMove / unmake_move.
This isn't a surprise, because Weiss is based on VICE, and I watched all the VICE tutorials when I started out with Rustic to (re)learn some of the chess programming concepts and make sure I didn't forget anything.
I'm assuming the speed difference has, again, something to do with Rustic initializing memory where C doesn't. I can't believe a few differences in the order of which things are done, or some different locations of functions makes Rustic take a hit of 26 seconds as compared to Weiss.
First thing that I'm going to do is replace the vector holding "unmake_info" with an array. Indexing an array in Rust is somewhat faster than push()-ing onto a vector (even if the vector has enough capacity already), and of course "pop" removes the element; which is not strictly necessary.
Maybe I'll then implement the CastlePerm array as well, to get rid of the 6-armed match() ("switch with patterns") statement.
-
- Posts: 347
- Joined: Tue Nov 19, 2019 4:34 am
- Location: https://github.com/TerjeKir/weiss
- Full name: Terje Kirstihagen
Re: Can't get Minic to run perft?
Thanks for comparing. It's nice to see Weiss is fairly fast, but I'm now very curious how Crafty does it so much faster!mvanthoor wrote: ↑Tue Apr 07, 2020 5:40 pm I just compiled and ran Weiss with magic bitboards on my system.
It runs perft 7 from the starting position in 88 seconds.
Here's the kicker... Weiss' MakeMove is almost a carbon copy of Rustic's, with only two differences:
- It uses the CastlePerms array. I didn't; just because I didnt want to. (I can obviously easily change that.)
- It does some things in a different order, or with local functions where I use functions in the board struct. Essentially, the functions are similar to almost the same. The same is true for TakeMove / unmake_move.
This isn't a surprise, because Weiss is based on VICE, and I watched all the VICE tutorials when I started out with Rustic to (re)learn some of the chess programming concepts and make sure I didn't forget anything.
I'm assuming the speed difference has, again, something to do with Rustic initializing memory where C doesn't. I can't believe a few differences in the order of which things are done, or some different locations of functions makes Rustic take a hit of 26 seconds as compared to Weiss.
First thing that I'm going to do is replace the vector holding "unmake_info" with an array. Indexing an array in Rust is somewhat faster than push()-ing onto a vector (even if the vector has enough capacity already), and of course "pop" removes the element; which is not strictly necessary.
Maybe I'll then implement the CastlePerm array as well, to get rid of the 6-armed match() ("switch with patterns") statement.
I would guess our movegen is quite different?
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Can't get Minic to run perft?
I haven't looked at it yet, but I just do the normal magic bitboard lookup for sliders, and moves for other pieces are generated by using their "to" mask for each square. This is the function that generates moves for everything except pawns:Terje wrote: ↑Tue Apr 07, 2020 6:02 pmThanks for comparing. It's nice to see Weiss is fairly fast, but I'm now very curious how Crafty does it so much faster!mvanthoor wrote: ↑Tue Apr 07, 2020 5:40 pm I just compiled and ran Weiss with magic bitboards on my system.
It runs perft 7 from the starting position in 88 seconds.
Here's the kicker... Weiss' MakeMove is almost a carbon copy of Rustic's, with only two differences:
- It uses the CastlePerms array. I didn't; just because I didnt want to. (I can obviously easily change that.)
- It does some things in a different order, or with local functions where I use functions in the board struct. Essentially, the functions are similar to almost the same. The same is true for TakeMove / unmake_move.
This isn't a surprise, because Weiss is based on VICE, and I watched all the VICE tutorials when I started out with Rustic to (re)learn some of the chess programming concepts and make sure I didn't forget anything.
I'm assuming the speed difference has, again, something to do with Rustic initializing memory where C doesn't. I can't believe a few differences in the order of which things are done, or some different locations of functions makes Rustic take a hit of 26 seconds as compared to Weiss.
First thing that I'm going to do is replace the vector holding "unmake_info" with an array. Indexing an array in Rust is somewhat faster than push()-ing onto a vector (even if the vector has enough capacity already), and of course "pop" removes the element; which is not strictly necessary.
Maybe I'll then implement the CastlePerm array as well, to get rid of the 6-armed match() ("switch with patterns") statement.
I would guess our movegen is quite different?
(I should rename bb_pieces and get_pieces() to somthing like bb_piece_of_type and get_piece_of_type() or something.)
Code: Select all
fn piece(piece: Piece, board: &Board, list: &mut MoveList) {
let side = board.active_color as usize;
let bb_occupancy = board.occupancy();
let bb_own_pieces = board.bb_pieces[side];
let mut bb_pieces = board.get_pieces(piece, side);
while bb_pieces > 0 {
let from = bits::next(&mut bb_pieces);
let bb_target = match piece {
QUEEN | ROOK | BISHOP => board.get_slider_attacks(piece, from, bb_occupancy),
KING | KNIGHT => board.get_non_slider_attacks(piece, from),
_ => 0,
};
let bb_moves = bb_target & !bb_own_pieces;
add_move(board, piece, from, bb_moves, list);
}
}
In one other thread I sped up perft generation by a factor of 3 by taking out the local allocation of the move list. (I allocated a MoveListPool, with a list for each depth, beforehand; just to avoid safe code.) I'm fairly sure that make_move and unmake_move() will speed up a lot if I can take the push() and pop() out. I just quickly rewrote that part to use an array backed by a counter (wrapped in a struct), but it doesn't work. I must have made a mistake somewhere.
Rust also always wants to copy and clone (it often complains about needing the Copy or Clone trait on a struct); probably to avoid memory unsafety. For example, you can't create a big struct in a function, and then return a pointer to that memory (without dropping into unsafe rust), because the struct you just created, will be gone at the end of the function. So, you need to first create the struct, and then copy it out of the function.
I don't (yet) know a way to circumvent that. If I need to start going to unsafe rust all the time, and/or need to go and drop 'lifetime markers' everywhere (basically telling the compiler that *I* will ensure that a variable lives long enough to have a pointer to it), for performance reasons, then I could just as well have chosen C... and in that case, Rust will *NEVER* replace C. So I hope there's something I just don't know yet.