Bit twiddling: what do you prefer?

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Bit twiddling: what do you prefer?

Post by mvanthoor »

dangi12012 wrote: Sat Oct 16, 2021 12:11 pm Wrong. That is only the case if the compiler can infer what sq is. Like if you use it in a for loop - but its exectly opposite if you use a not compiletime known value - like when you are deep in a calculation - then the compiler will emit that array into some memory location and you have a L1 lookup for [sq] but register calculation for <<.

Its 100% verifyably slower - just test it with random numbers :)
You continuously keep screaming "wrong" against everything everyone on here says. It's getting annoying.

The const fn builds a constant array. For the Rust compiler it is no difference if you say:

Code: Select all

x = (1u64 << sq) & stuff // direct shift
or

Code: Select all

x = BB_SQUARES[sq] & stuff // pre-calculated
or even

Code: Select all

x = bb_squares(sq) & stuff // contains the calculation of the first option
I tested it, both with positions and in a 2000 game match.

Maybe there are some situations where "1u64 << x" is faster than indexing into a constant SQUARES[x] array, but for a chess engine, at least in Rust, it makes no difference. I tested it myself yesterday, because Rustic uses the mask[sq] option (where mask is a const array, filled by a const fn function), and I replaced it by by a function that does a shift. In the end I replaced that by things such as "BB_RANK1 << rank" etc; so I didn't do it for squares only, but also for ranks and files. All thee options are completely equal.

I can understand that, if you store a normal array of random numbers in memory and then start indexing it, that this is slower than directly shifting the bits on the original number. However, that's not what happens in a chess engine. In a chess engine, all information is always known. Even the random numbers aren't random because they're either generated to always yield the same set, or actually pasted into the code.

That's also the reason why I often say: "doesn't make a difference _in a chess engine_", because a chess engine, at the very core, is only a tree walker which scores nodes using an evaluation function and finds the most optimal path (= best move for each side) through the tree. The rest is only techniques on cutting down on the tree, scoring the nodes better/more accurately, and obviously a way to generate the nodes in real-time because we can't build a computer large enough to just generate the entire tree, score the entire thing, and then find the optimal path; which would thus solve chess.

It has been done for draughts and/or checkers, though. In that game, the optimal path between a large opening book and an endgame tablebase is known.
Last edited by mvanthoor on Sat Oct 16, 2021 1:18 pm, edited 1 time in total.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
Mergi
Posts: 127
Joined: Sat Aug 21, 2021 9:55 pm
Full name: Jen

Re: Bit twiddling: what do you prefer?

Post by Mergi »

There's a difference between "being faster" and "being worth no elo". Add 10 tiny improvements that each increase the engine speed just a little bit and you might end up increasing it enough overall to gain some ELO as well. Is << faster? It should be. Is it worth stressing over? Probably not.

I think measuring every single change only in terms of ELO impact is not necessarily a good thing either tbh. This being one such example, where obviously you cannot measure it in terms of ELO, but there's almost no way that a lookup would be faster.
Karlo Bala
Posts: 373
Joined: Wed Mar 22, 2006 10:17 am
Location: Novi Sad, Serbia
Full name: Karlo Balla

Re: Bit twiddling: what do you prefer?

Post by Karlo Bala »

dangi12012 wrote: Sat Oct 16, 2021 12:11 pm
mvanthoor wrote: Sat Oct 16, 2021 11:29 am
dangi12012 wrote: Thu Oct 14, 2021 8:30 pm But TLDR: All your options 1) are better than the others. Maybe think about BMI1 intrisics like _blsr_u64 instead of b&(b-1). Your compiler might figure it out but sometimes doesnt. Never ever do mask[sq] instead of 1 << sq. 1 << sq is many times faster.
"mask[sq]" or "1 << sq" doesn't make a difference in Rust. I actually tested it yesterday.

When the array is initialized with a const fn (similar to constexpr in C++), then mask[sq] is inlined, just as if you used "1 << sq". There's no significant Elo difference in a match of 2000 games between engines that only differ in this detail.
Wrong. That is only the case if the compiler can infer what sq is. Like if you use it in a for loop - but its exectly opposite if you use a not compiletime known value - like when you are deep in a calculation - then the compiler will emit that array into some memory location and you have a L1 lookup for [sq] but register calculation for <<.

Its 100% verifyably slower - just test it with random numbers :)
It's not that simple. The point is to balance calculations with memory access. Any memory access could be hidden with enough calculations. For example, if you have something to do in the meantime, you can get TT data for free.
Best Regards,
Karlo Balla Jr.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Bit twiddling: what do you prefer?

Post by mvanthoor »

Mergi wrote: Sat Oct 16, 2021 1:10 pm There's a difference between "being faster" and "being worth no elo". Add 10 tiny improvements that each increase the engine speed just a little bit and you might end up increasing it enough overall to gain some ELO as well. Is << faster? It should be. Is it worth stressing over? Probably not.

I think measuring every single change only in terms of ELO impact is not necessarily a good thing either tbh. This being one such example, where obviously you cannot measure it in terms of ELO, but there's almost no way that a lookup would be faster.
It probably isn't a lookup, because both the array and the function filling it are constants.

I'll probably keep the change in the code to using the inline functions with the shift instead of the mask[sq] solution, because it makes the code shorter; not because it gains any speed. The perft routine didn't get even half a second faster, apart from fluctuations within the error margin.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
Joost Buijs
Posts: 1646
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Bit twiddling: what do you prefer?

Post by Joost Buijs »

mvanthoor wrote: Sat Oct 16, 2021 1:22 pm
Mergi wrote: Sat Oct 16, 2021 1:10 pm There's a difference between "being faster" and "being worth no elo". Add 10 tiny improvements that each increase the engine speed just a little bit and you might end up increasing it enough overall to gain some ELO as well. Is << faster? It should be. Is it worth stressing over? Probably not.

I think measuring every single change only in terms of ELO impact is not necessarily a good thing either tbh. This being one such example, where obviously you cannot measure it in terms of ELO, but there's almost no way that a lookup would be faster.
It probably isn't a lookup, because both the array and the function filling it are constants.

I'll probably keep the change in the code to using the inline functions with the shift instead of the mask[sq] solution, because it makes the code shorter; not because it gains any speed. The perft routine didn't get even half a second faster, apart from fluctuations within the error margin.
If sq is not constexpr in the code where you need mask[sq] it will be a lookup. I agree that in practice the speed difference between mask[sq] and 1ULL << sq is so small that it is hardly measurable and that that it won't affect the engines Elo.

It can be processor dependent too, maybe AMD Zen does worse on mask[sq] than Intel.