if statement and calculation faster than constant

mvanthoor · Post by **mvanthoor** » Sat Apr 11, 2020 3:40 pm

Terje wrote: ↑Sat Apr 11, 2020 2:23 pm
mvanthoor wrote: ↑Sat Apr 11, 2020 2:04 pm You need to check for move legality somewhere, somehow. For perft, it doesn't matter if you do it in the move genrator, or in make_move().
A third possiblity is to have a function which checks the legality of a move without making it. SF uses this in search before making the move so there is no chance it needs to be taken back immediately after making it, and the same could be used to get something closer to bulk counting. This could be a speedup of perft depending on how expensive it is to compute, and, since SF uses it, probably good for the chess engine part.

Check legality of a move without making it... even a human 'makes' a move in their head to check legality, so I'll have to look at the Stockfish code (which, to be honest, I find quite horrible) to see how this works.

By the way; I think HGM and Fabian were right; apart from cutting out superfluous code, watching for things such as unneeded memory allocations or replacing X with faster Y (where Y is more efficient or does less to reach the same conclusion as X, such as minimax vs a/b), optimization down to the statement level is completely and utterly useless.

I just added a function to Rustic which needs to initialize a variable in "GameState" by counting some stuff on the board (similar to material count). It is only called ONCE, during board initialization, and has NOTHING to do with perft... and still my perft 7 time rose from 86 to 92 (!) seconds.

Therefore I'll adopt HGM's approach; only optimize by replacing code with code that is objectively faster (such as replacing bubble-sort with qsort for example), and cut out inefficiencies. I'm not going to try and optimize down to the statement order again, if this optimization can be disrupted by adding a completely unreleated function somewhere else in the program, or even by compiling with a different version of Rust or for a different CPU. (Or even by running the binary on the same CPU but on a different system...)

it seems all of the optimization tricks I have in my toolkit for optimizing things such as PLC's and microcontrollers are useless when it comes to x64 compilers and CPU's.

mvanthoor · Post by **mvanthoor** » Mon Apr 13, 2020 11:42 am

Well, I think optimization is finished.

Perft 7 from the starting position runs at a speed of 89.x seconds (it fluctuates between 89.0 and 89.5.)
No hash, no bulk-counting, no move generator tricks.
With incremental Zobrist updates, material count updates, and updates of piece lists and occupation bitboards.

The "to ^ 8" trick, replacing "if side == WHITE" {to - 8} else {to + 8}" for the ep_square and pawn_square did speed up make() this time around. It did speed it up by almost 2M leaves/s, so this construction is clearly faster. I also managed to ditch the extra calculation in square_attacked(). (Asking for the pawn locations from the attack table by the "attacker ^ 1" didn't work for all positions, because I had no pawns on the first and last rank in that table, which is necessary for this trick to work. It uses the attack table for WHITE to find pawn locations for BLACK and the other way around. BLACK can obviously have pawns on the second rank, so to find them, WHITE must have pawns on the first rank. Those pawns will obviously never be used for playing; they're in the table to make this trick work. This is fixed now.) I've also force-inlined make() and unmake() which also dropped the time by 1.5 seconds or so.

I think I've finally managed to get all superfluous or inefficient code out. When I now profile Rustic, make(), unmake(), en square_attacked() are still the most-called functions obviously, but nothing in them sticks out as a hotspot anymore. The hotspots indicated by the profiler are actually the function header and the exit brace. I can't do anything about that, except inline them.

@ Terje:
Weiss and Rustic now run Perft 7 within half a second of one another on my system, with fluctuations in the 89.xx range. Sometimes, Weiss drops to the high 88.x range. I think that at this point, any difference comes down to the compiler. I compiled Weiss using gcc (on MSYS2/MinGW), and Rustic using the GNU toolchain for Windows. This is basically rustc / LLVM, linked by GNU's ld.exe. Compiling using the MSVC toolchain (rustc / LLVM + MS Link.exe) yields a binary that is about 6% slower.

Next step: a basic search function.

Code: Select all

Benchmarking perft 1-7:

8   r n b q k b n r
7   i i i i i i i i
6   . . . . . . . .
5   . . . . . . . .
4   . . . . . . . .
3   . . . . . . . .
2   I I I I I I I I
1   R N B Q K B N R

    A B C D E F G H

Zobrist key:        819aa694337673fb
Active Color:       White
Castling:           KQkq
En Passant:         -
Half-move clock:    0
Full-move number:   1

Perft 1: 20 (0 ms, inf leaves/sec)
Perft 2: 400 (0 ms, inf leaves/sec)
Perft 3: 8902 (0 ms, inf leaves/sec)
Perft 4: 197281 (6 ms, 32880166 leaves/sec)
Perft 5: 4865609 (138 ms, 35258036 leaves/sec)
Perft 6: 119060324 (3377 ms, 35256240 leaves/sec)
Perft 7: 3195901860 (89345 ms, 35770349 leaves/sec)
Total time spent: 92866 ms
Execution speed: 35750806 leaves/second
Rustic >

leanchess · Post by **leanchess** » Mon Apr 13, 2020 12:10 pm

Congratulations. If you're interested, there are some interesting (to me, at least) optimisation tricks in MPerft's Makefile.

I must say that my perft was fundamentally broken, so please disregard those timings I was raving about. The problem was I was only updating the occupancy BB and ANDing each piece's BB with that during subsequent plies. Now, if e.g. a white knight took the place of a captured white pawn, the pawn would become "resurrected" for whites' subsequent move.

I thought I was so clever... but you and Professor Knuth have been right.

Terje · Post by **Terje** » Mon Apr 13, 2020 12:57 pm

mvanthoor wrote: ↑Mon Apr 13, 2020 11:42 am The "to ^ 8" trick, replacing "if side == WHITE" {to - 8} else {to + 8}" for the ep_square and pawn_square did speed up make() this time around. It did speed it up by almost 2M leaves/s, so this construction is clearly faster. I also managed to ditch the extra calculation in square_attacked(). (Asking for the pawn locations from the attack table by the "attacker ^ 1" didn't work for all positions, because I had no pawns on the first and last rank in that table, which is necessary for this trick to work. It uses the attack table for WHITE to find pawn locations for BLACK and the other way around. BLACK can obviously have pawns on the second rank, so to find them, WHITE must have pawns on the first rank. Those pawns will obviously never be used for playing; they're in the table to make this trick work. This is fixed now.) I've also force-inlined make() and unmake() which also dropped the time by 1.5 seconds or so.

This reminds me of the first major bug I had in Weiss. I changed the way I initialized pawn attack bitboards, but didn't realize that with the new method I would be looking up squares 'behind' rank 1, or 'above' rank 8 in an array. This just found random bits in memory outside the array, instead of the empty bitboards they needed to be for this to work. (An 'if' to avoid initializing attacks on the relative last rank fixed it.) square_attacked() would then give wrong answers when checking if a color attacked its own first rank, due to the last ranks of pawnattacks being random bits instead of empty.

At that point in time, Weiss checked legality of each move when given a 'position fen ... moves ...' command and didn't make them if they were illegal. When an opponent (without the bug) moved their king to Weiss' first rank, sometimes the random bits now representing attacks by the opponent's pawns on that square hit one of Weiss' pawns. Then having ignored the last move for being 'illegal', Weiss would play for the opponents side resulting in an illegal move and game loss

(When it rains it pours, so of course I introduced a completely unrelated bug that ALSO resulted in playing for the wrong side before discovering the first bug existed, making debugging very hard)

Good luck with the search implementation

mvanthoor · Post by **mvanthoor** » Mon Apr 13, 2020 3:17 pm

Terje wrote: ↑Mon Apr 13, 2020 12:57 pm This reminds me of the first major bug I had in Weiss.

Personally I've got a very rigorous optimization and testing scheme. In my personal projects, I first write something that works and can be verified by testing. In this case, it's the move generator, make/unmake, and square_attacked. Then I tend to keep pounding away at functions, stripping out or replacing code, testing each change over and over again, until I'm absolutely convinced that I can't cut out more code and still keep it working as it should. I've reached that stage now, with Perft, as the profiler doesn't complain about anything in the functions anymore. The function entries and exists are actually the most expensive parts; which I've hopefully resolved by inlining them. The compiler won't do it by itself because they're bigger than Rust's default inline threshold. I don't want to raise that threshold, because in that case it would inline more functions; even those that are called very rarely ("move_rook_during_castling" comes to mind), and that can be detrimental.

If a function turns out to secretly be doing more than one thing, then I split it up during this process.

I _hate_ writing draft code and finishing an entire product with it, only to discover it performs poorly, and then having to go back... and back again... and again... to clean up code I've written weeks or possibly months before. I much prefer to write one function, and get it completely finished without any possibility of improvement, short of replacing it with a different algorithm or adding more smarts. (The obvious result would be a basic a/b-search, which after it works as well as it can, then gets move ordering, LMR, etc... tacked onto it.)

Good luck with the search implementation

Thanks

mvanthoor · Post by **mvanthoor** » Mon Apr 13, 2020 3:27 pm

leanchess wrote: ↑Mon Apr 13, 2020 12:10 pm Congratulations. If you're interested, there are some interesting (to me, at least) optimisation tricks in MPerft's Makefile.

Thanks

I'll go and see. Maybe I can use something from it, but Rust doesn't use make files. It has its own build system called Cargo. (Even so, because it uses LLVM, many options in Cargo are exactly the same as the ones in C/C++ make files.)

I must say that my perft was fundamentally broken, so please disregard those timings I was raving about. The problem was I was only updating the occupancy BB and ANDing each piece's BB with that during subsequent plies. Now, if e.g. a white knight took the place of a captured white pawn, the pawn would become "resurrected" for whites' subsequent move.

I thought I was so clever... but you and Professor Knuth have been right.

Don't fret about it; I've been bitten more than once by one of my own "smart" stuff, such as in the post above, where I thought I didn't need to initialize pawns on the first and last rank, because they'll never contain pawns....

(I wrote that code weeks and weeks ago.)

...EXCEPT when you start "abusing" the pawn attack table to find out the opponent's pawn locations. If you have a white pawn on E4, it can attack D5 and F5; but a BLACK pawn on D5 or F5 can also attack E4. So, if you want to know which BLACK pawns can possibly attack E4, then you can use the WHITE pawn attack table. That's the "attacker ^ 1" trick I've mentioned.

This only works if you also have "virtual" pawns on the first and last rank. You must have a "virtual" WHITE pawn on, for example, E1 attacking D2 and F2 (where in play, there can never BE a pawn because it promotes immediately), or you can't find the BLACK pawns that can attack E1.

Just don't start your optimization process before you get all the numbers correct. Then, optimize the functions that obtain those numbers, one by one, pruning and snipping at the code until it is as clean and minimal as possible. After THAT, start adding new functions and smarts, testing after each change, to keep your numbers correct.

I did it like that, and it worked well; except for the fact that I went a bit overboard and started optimizing down to the level of variable ordering and assignment, which for current-day compilers and CPU's is completely useless as pointed out and explained by HGM and Fabian.

fabianVDW · Post by **fabianVDW** » Mon Apr 13, 2020 4:26 pm

mvanthoor wrote: ↑Mon Apr 13, 2020 3:17 pm The function entries and exists are actually the most expensive parts; which I've hopefully resolved by inlining them. The compiler won't do it by itself because they're bigger than Rust's default inline threshold. I don't want to raise that threshold, because in that case it would inline more functions; even those that are called very rarely ("move_rook_during_castling" comes to mind), and that can be detrimental.

You can also annoate function in Rust with

Code: Select all

#[inline(always)]
pub fn some_function(){
}

I personally do this a lot so I can still split some functions for readability, but I still want the compiler to inline them.

mvanthoor · Post by **mvanthoor** » Mon Apr 13, 2020 4:30 pm

Oh, I should have been clearer.

That's exactly what I did. I didn't inline the function by hand.

(Even though I don't really like "always", because it will also inline on debug builds I'd want to profile. Using only "inline" doesn't seem to make a difference with using nothing at all.)

fabianVDW · Post by **fabianVDW** » Mon Apr 13, 2020 5:12 pm

mvanthoor wrote: ↑Mon Apr 13, 2020 4:30 pm Oh, I should have been clearer.

That's exactly what I did. I didn't inline the function by hand.

(Even though I don't really like "always", because it will also inline on debug builds I'd want to profile. Using only "inline" doesn't seem to make a difference with using nothing at all.)

Oh ok, thanks, I should try it without always aswell.

bob · Post by **bob** » Mon Apr 13, 2020 6:55 pm

mvanthoor wrote: ↑Fri Apr 10, 2020 10:23 pm Also, the speed progression from the time I started up until now (with reports split over several threads in this forum, on different subjects...)

Perft 7, starting position:
277 seconds (superfluous memory allocation -> replace with move list pool)
114 seconds (start optimization rounds here..., such as replacing vectors with counter-backed struct arrays)
104 seconds (start code cleanup/optimization here)
100 seconds
94 seconds
90 seconds
86 seconds

Some of these speed gains came about through discussion in those threads, and me learning some stuff with regard to several subjects that I hadn't encountered during my research (yet).

And I also got to fix two bugs in the process with regard to the incremental updates of zobrist hashing and material table.

All of the speed gains were by replacing / changing costly code with faster options, removing as many if/else checks as possible, and re-ordering if-statements in such a way that the option that would short-circuit it the soonest comes first.

For example, if you have a very costly if-check like this:

if (... && function() && ... && test() && ... && ... && moving_piece == KING) { ... }

It could mean that the entire chain needs to be evaluated, and then it cuts short if it's not the king that's moving. If you replace it with this:

if (moving_piece == KING && ... && ... && ... && .... && ... && ... ) { ... }

the program might very well get a lot faster, because the king rarely moves during most of the game.

At least, optimizations such as these were simple, and they did work for this engine.

Compile using the profile-guided-optimization option. It will handle that for you painlessly...

if statement and calculation faster than constant

Re: if statement and calculation faster than constant

Re: if statement and calculation faster than constant

Re: if statement and calculation faster than constant

Re: if statement and calculation faster than constant

Re: if statement and calculation faster than constant

Re: if statement and calculation faster than constant

Re: if statement and calculation faster than constant

Re: if statement and calculation faster than constant

Re: if statement and calculation faster than constant

Re: if statement and calculation faster than constant