Perft function too slow because of vcruntime140.dll

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Perft function too slow because of vcruntime140.dll

Post by Ras »

mvanthoor wrote: Tue Mar 24, 2020 8:30 pmYou mean an array?

Code: Select all

let mut list = [Move; 256]
Yeah something like that. I faintly remember that some traits are not implemented for such arrays above 32 elements.
That is what I started out with, and assigning new memory over and over again in each instance of the function is brutally slow.
Actually, allocating stack memory is brutally fast because it's just increasing the stack pointer of the CPU. That's as fast as it gets.

I think the difference to C is that stack variables are not zero initialised in C. Of course it's going to be slow if you have some zeroing loop going on for 256 elements even if it turns out that you only have 20 moves, like in the initial position itself. So the question would be how to get uninitialised stack arrays in Rust.
Rasmus Althoff
https://www.ct800.net
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Perft function too slow because of vcruntime140.dll

Post by mvanthoor »

Ras wrote: Tue Mar 24, 2020 8:43 pm I think the difference to C is that stack variables are not zero initialised in C. Of course it's going to be slow if you have some zeroing loop going on for 256 elements even if it turns out that you only have 20 moves, like in the initial position itself. So the question would be how to get uninitialised stack arrays in Rust.
Possibly. Rust requires that a variable/array is completely initialized before it's used. If you don't, it won't compile, so getting uninitialized arrays is probably impossible without using unsafe code.

Also, I've been running some more tests. I repeated each of them 5 times. Results matched in the order of being off by 1-10 milliseconds or so (a few moves per second).

Pool: Vector. Move list: Vector:

Peft 1: 20 (0 ms, inf moves/sec)
Peft 2: 400 (0 ms, inf moves/sec)
Peft 3: 8902 (0 ms, inf moves/sec)
Peft 4: 197281 (7 ms, 28183000 moves/sec)
Peft 5: 4865609 (169 ms, 28790585 moves/sec)
Peft 6: 119060324 (4856 ms, 24518188 moves/sec)
Peft 7: 3195901860 (113672 ms, 28115119 moves/sec)

Perft 7 runs at +/- 113.7 seconds.

====

Pool: Vector. Move list: Array:

Peft 1: 20 (0 ms, inf moves/sec)
Peft 2: 400 (0 ms, inf moves/sec)
Peft 3: 8902 (0 ms, inf moves/sec)
Peft 4: 197281 (7 ms, 28183000 moves/sec)
Peft 5: 4865609 (171 ms, 28453853 moves/sec)
Peft 6: 119060324 (4256 ms, 27974700 moves/sec)
Peft 7: 3195901860 (112512 ms, 28404986 moves/sec)

Perft 7 runs at +/- 112.5 seconds.

After many runs, perft 7 turned out to be about 1.0 to 1.3 seconds faster when the move list is an array instead of a vector. The difference is almost 300.000 moves per second.

====

Pool: Array. Movelist: Array.

Peft 1: 20 (0 ms, inf moves/sec)
Peft 2: 400 (0 ms, inf moves/sec)
Peft 3: 8902 (0 ms, inf moves/sec)
Peft 4: 197281 (7 ms, 28183000 moves/sec)
Peft 5: 4865609 (170 ms, 28621229 moves/sec)
Peft 6: 119060324 (4226 ms, 28173290 moves/sec)
Peft 7: 3195901860 (110367 ms, 28957042 moves/sec)

Perft 7 runs at 110.3 seconds.

This is when both the pool and the move list are an array. After numerous tests, it turns out that this is the fastest option, consistently. Compared with the vector solution, the difference in speed is about 840.000 moves/sec.

In time, it's a small difference. Even when doing short runs of a few seconds such as perft 6, it starts to be noticable with the fastest option being more than half a second faster than the slowest. (Perft 5 is particularly slow, for some reason. That one takes the biggest hit when using vectors.)

Note that the only difference between these options is indexing vectors using var[iindex] versus addressing arrays using var[index]. It can be concluded that indexing arrays is faster than indexing vectors in Rust. At least in the latest version 1.42. The difference is small, but it is consistently noticable and measurable.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Perft function too slow because of vcruntime140.dll

Post by mvanthoor »

This is the KiwiPete position, tested between "everything vectors" and "everything arrays". The latter is about 3.7 seconds faster in Perft 6; the difference in calculation speed is about 400.000 moves per second. The difference in speed is only 1.3%, but hey... it's noticable, measurable, and consistent. I'm not one to leave free speed hanging around. (And, because resizing a vector constantly is very expensive, the vectors are initialized using a certain minimum needed size, so there isn't much difference with regard to memory usage; or in-program usage for that matter.) I have another vector ("UnmakeInfo", used by unmake_move), which is used in make_move and unmake_move. I'll try that as an array next, and see if I can get another speedup.

Code: Select all

8   r . . . k . . r
7   i . i i q i b .
6   b n . . i n i .
5   . . . I N . . .
4   . i . . I . . .
3   . . N . . Q . i
2   I I I B B I I I
1   R . . . K . . R

    A B C D E F G H

Zobrist key:        17a25d12a8850d3e
Active Color:       White
Castling:           KQkq
En Passant:         -
Half-move clock:    0
Full-move number:   1

Peft 1: 48 (0 ms, inf moves/sec)
Peft 2: 2039 (0 ms, inf moves/sec)
Peft 3: 97862 (3 ms, 32620666 moves/sec)
Peft 4: 4085603 (143 ms, 28570650 moves/sec)
Peft 5: 193690690 (6406 ms, 30235824 moves/sec)
Peft 6: 8031647685 (277395 ms, 28953830 moves/sec)
Finished.

===

Zobrist key:        17a25d12a8850d3e
Active Color:       White
Castling:           KQkq
En Passant:         -
Half-move clock:    0
Full-move number:   1

Peft 1: 48 (0 ms, inf moves/sec)
Peft 2: 2039 (0 ms, inf moves/sec)
Peft 3: 97862 (3 ms, 32620666 moves/sec)
Peft 4: 4085603 (138 ms, 29605818 moves/sec)
Peft 5: 193690690 (6342 ms, 30540947 moves/sec)
Peft 6: 8031647685 (273659 ms, 29349108 moves/sec)
Finished.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Perft function too slow because of vcruntime140.dll

Post by hgm »

I always put moves on a separate global stack. That way I don't have to reserve some worst-case move array (that would remain largely unused) in every stack frame; every new invocation of Search can simply build its move list directly on top of the move list of its parent. Usually I reserve some empty space between the two, though, because I like to grow the move list in two directions. So that I can add captures at the start, and non-captures at the end. That saves me the work of extracting captures.

Anyway, you can initialize the global array that will contain the move stack at the start of the program if you want to make sure even unused elements are initialized.

If you dislike global arrays, you could also declare (and initialize) the array for the move stack in SearchRoot(), and pass the index of the first free element (or a pointer to that element) to Search().
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Perft function too slow because of vcruntime140.dll

Post by mvanthoor »

hgm wrote: Tue Mar 24, 2020 10:22 pm I always put moves on a separate global stack. That way I don't have to reserve some worst-case move array (that would remain largely unused) in every stack frame; every new invocation of Search can simply build its move list directly on top of the move list of its parent. Usually I reserve some empty space between the two, though, because I like to grow the move list in two directions. So that I can add captures at the start, and non-captures at the end. That saves me the work of extracting captures.

Anyway, you can initialize the global array that will contain the move stack at the start of the program if you want to make sure even unused elements are initialized.

If you dislike global arrays, you could also declare (and initialize) the array for the move stack in SearchRoot(), and pass the index of the first free element (or a pointer to that element) to Search().
Thanks for your input :)

Rust doesn't even have global variables. AFAIK, it's not even possible.

As I don't want to keep passing the ZobristRandoms, the MoveGenerator with its attacktables and so on down to every function, I have attached them to the Board struct using a reference, and created pass-through functions in Board.

(In an object-oriented language, this would have been called "composition".)

When you do that, the compiler will complain because it can't verify if the struct the reference points to (MoveGenerator) actually exists. Using a "lifetime annotation", you can tell the compiler that "MoveGenerator" will live at least as long as "Board" so Board can point safely to MoveGenerator.

As the compiler now knows the intended lifetimes of both objects, it can now check the code if this is actually true, and compile correctly if it is.

Maybe I'll create an all-encompassing struct containing read-only data (MoveGenerator, ZobristRandoms, etc...), and attach the lists that need changing to the board, as, up until now, Board is the only struct that actually changes.

The one thing I still find somewhat baffling is that the age-old fight of "Vectors with a certain size" vs. "Arrays backed with an index counter wrapped in a struct" doesn't seem to have ended, not even in 2020. The Vectors have definitely improved (they're only just 1-2% slower) but the arrays are still measurably and consistently faster.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Perft function too slow because of vcruntime140.dll

Post by hgm »

Well, I know nothing of object-oriented programming. But can't you make the move stack and stack pointer simply a field of the board struct? Apparently all routines are able to access the (components of) the board.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Perft function too slow because of vcruntime140.dll

Post by mvanthoor »

Maybe I was a bit too excessive in my explanation above.

Yes, I can attach any item to the Board struct using a reference. (Which in Rust needs a lifetime verification, as I explained above.)

Rust is not an object oriented language. The closest it gets, are functions attached to the struct they work on, so you don't need to pass the struct to the function.

And yes, most functions will get the board passed in at this point. Some functions such as perft take a reference to the move generator as well, for example, but when I attach this to the board, that reference can be dropped.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
fabianVDW
Posts: 146
Joined: Fri Mar 15, 2019 8:46 pm
Location: Germany
Full name: Fabian von der Warth

Re: Perft function too slow because of vcruntime140.dll

Post by fabianVDW »

Ras wrote: Tue Mar 24, 2020 8:43 pm

I think the difference to C is that stack variables are not zero initialised in C. Of course it's going to be slow if you have some zeroing loop going on for 256 elements even if it turns out that you only have 20 moves, like in the initial position itself. So the question would be how to get uninitialised stack arrays in Rust.
Rust does zero initialize, but one can easily circumvent:

Code: Select all

use std::mem;

fn main() {
    let vec: [u32;32] = unsafe{ mem::MaybeUninit::uninit().assume_init()};
    println!("{:?}",vec)
}
This is using unsafe code, but it will arguably lead to better and nicer code than the ReservedMoveList container, if wrapped in a Safe API. For example I could imagine this as a MoveList struct:

Code: Select all

pub struct MoveList{
    counter:usize,
    data: [Move;MAX_MOVES]
}
impl MoveList{
   pub fn new() -> Self{
       let data = unsafe{ mem::MaybeUninit::uninit().assume_init()};
       MoveList{counter:0, data: data}
   }
   pub fn set_move(&mut self, mv: Move){
   	self.data[self.counter] = mv;
   	self.counter += 1;
   }
   pub fn get_move(&self, index: usize) -> Move{
       if index >= counter{
       		panic!("Index out of Range");
       }
       self.data[index]
   }
}
The provided wrapper is completly safe. Now every movegen call can instantiate a completly new MoveList without losing performance. The only deficit is the loss to look at other depths move_list.
Author of FabChess: https://github.com/fabianvdW/FabChess
A UCI compliant chess engine written in Rust.
FabChessWiki: https://github.com/fabianvdW/FabChess/wiki
fabianvonderwarth@gmail.com
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Perft function too slow because of vcruntime140.dll

Post by mvanthoor »

fabianVDW wrote: Wed Mar 25, 2020 11:28 am The provided wrapper is completly safe. Now every movegen call can instantiate a completly new MoveList without losing performance. The only deficit is the loss to look at other depths move_list.
Personally, I think using unsafe code to intentionally do something which the core language specifically prohibits, is uglier than writing a "workaround" like a move list pool. Using pools, be it for threads, applications, or stuff like printers, is a known way to share resources.

In my engine, one of the things I try to achieve is to write the engine without using unsafe blocks if at all possible, because the other engines I've seen in Rust (cicada and crabby; haven't studied FabChess in depth) use lots of unsafe code, and even some nightly features that have been scrapped/deprecated. Maybe it was necessary in Rust 4-5 years ago when those engines were written, but the language is much more mature now.

So, no unsafe or nightly-feature-only code if I can help it, even if I need to implement some creative/alternative solutions to the "normal" way of doing things.

(I even try to minimize dependencies on other crates. At this point, I only use "rand", which by itself pulls in some dependencies for itself.)
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
fabianVDW
Posts: 146
Joined: Fri Mar 15, 2019 8:46 pm
Location: Germany
Full name: Fabian von der Warth

Re: Perft function too slow because of vcruntime140.dll

Post by fabianVDW »

mvanthoor wrote: Wed Mar 25, 2020 5:49 pm
Personally, I think using unsafe code to intentionally do something which the core language specifically prohibits, is uglier than writing a "workaround" like a move list pool. Using pools, be it for threads, applications, or stuff like printers, is a known way to share resources.
I for myself can quite enjoy the powerful way Rust allows you to write a safe api over inherently unsafe operations. I shared your view for a long time, it only changed recently. That said, staying in safe Rust is just as fine, I just personally have a preference for more elegant code, and I don't necessarily think unsafe makes code less elegant, although it does in 99% of the cases (e.g. circumventing the borrow checker).
In my engine, one of the things I try to achieve is to write the engine without using unsafe blocks if at all possible, because the other engines I've seen in Rust (cicada and crabby; haven't studied FabChess in depth) use lots of unsafe code, and even some nightly features that have been scrapped/deprecated. Maybe it was necessary in Rust 4-5 years ago when those engines were written, but the language is much more mature now.
So, no unsafe or nightly-feature-only code if I can help it, even if I need to implement some creative/alternative solutions to the "normal" way of doing things.
I have this policy for FabChess aswell, but it terribly failed in multithreaded search. Upto a certain amount of threads, let's say 4-8, one is fine with safe Rust, but for TCEC with 176 Threads I had to wrap the cache and certain shared fields with unsafe. Again though, the idea here is to provide a safe api over the unsafe features, spending effort ensuring that the unsafe code is in fact, safe.
Author of FabChess: https://github.com/fabianvdW/FabChess
A UCI compliant chess engine written in Rust.
FabChessWiki: https://github.com/fabianvdW/FabChess/wiki
fabianvonderwarth@gmail.com