I am pretty sure modern compilers are smart enough to make that optimization. They are a little dumb sometimes when it comes to indirect memory access, but they are still very smart when only working with variables.AlvaroBegue wrote:I would have thought that this would be better (but needs to be tested, of course):Code: Select all
uint64_t data = (uint16_t)taken; data = (data << 16) | (uint16_t)dst; data = (data << 16) | (uint16_t)src;
The reason is that modifying `data' multiple times introduces data dependencies that might remove opportunities for parallel computation.Code: Select all
uint64_t data = (taken << 32) | (dst << 16) | src;
A little speed-up
Moderator: Ras
-
matthewlai
- Posts: 793
- Joined: Sun Aug 03, 2014 4:48 am
- Location: London, UK
Re: A little speed-up
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
-
matthewlai
- Posts: 793
- Joined: Sun Aug 03, 2014 4:48 am
- Location: London, UK
Re: A little speed-up
That is no longer the case in C99 by the way, but is true in previous standards.brtzsnr wrote:stegemma wrote:Code: Select all
typedef int16_t tsq; struct clsSimpleMove { union { uint64_t data; struct { int16_t src, dst; tsq taken, alfavalue; }; }; };
It's undefined behavior, don't do it. if you write to a union field and read from another field it is not guaranteed to get the same results. See for example http://stackoverflow.com/a/1812359/88622 . Moreover, the compiler is allowed to implement union using #define union struct.
http://www.open-std.org/jtc1/sc22/wg14/ ... dr_283.htm
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
-
syzygy
- Posts: 5949
- Joined: Tue Feb 28, 2012 11:56 pm
Re: A little speed-up
According to your link, it is not undefined behaviour anymore since C99. From the C99 standard:brtzsnr wrote:stegemma wrote:Code: Select all
typedef int16_t tsq; struct clsSimpleMove { union { uint64_t data; struct { int16_t src, dst; tsq taken, alfavalue; }; }; };
It's undefined behavior, don't do it. if you write to a union field and read from another field it is not guaranteed to get the same results. See for example http://stackoverflow.com/a/1812359/88622 . Moreover, the compiler is allowed to implement union using #define union struct.
So there is undefined behaviour only when the appropriate part of the object representation happens to be a "trap representation". For unsigned ints trap representations are impossible, as far as I can tell from par. 6.2.6.If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
Of course it is not ideal to depend on byte ordering, but everything can be solved by simply removing the accesses to the struct everywhere in the engine (replace by macros, for example).
It probably would be good as well to replace 16-bit ints for src and dst with 8-bits ints or, even better, 6-bit ints (so the 64-bit int move type can become a 32-bit int type or even a 16-bit int type).
It most probably is not. First of all, if all fields are anyway 16 bits, just use 16 bit ints. Bitfields are for efficiently packing bits into bytes.Marco Belli wrote:it's probably better to use c birfield.
Secondly, even if 6-bit values are used for src and dst, it is better to use explicit shift operations to access them. That way the programmer keeps control on how the memory is accessed. Processor caches get confused if the same memory location is accessed at different integer widths, and if you use bitfields you can't tell what the compiler will do. And the compiler can't really be blamed for not remembering whether it last accessed part of a bit-field struct using a 32-bit read or a 16-bit read.
Also with C11 the compiler is forbidden to write to objects that are not being written too (without such a guarantee writing correct threaded code becomes virtually impossible). I'm not completely sure how this applies to bitfields, since it seems practically impossible to update the value of a bitfield that is not aligned on byte boundaries without writing to neighbouring fields (unless the compiler emits expensive instructions to do this atomically), but it does seem to limit what a compiler can do.
-
syzygy
- Posts: 5949
- Joined: Tue Feb 28, 2012 11:56 pm
Re: A little speed-up
C11 in fact states that the members of a bit-field may not be updated concurrently by different threads (unless they are separated by a zero-length bit-field declaration). A sequence of adjacent bit-fields all having nonzero width counts as one "memory location".syzygy wrote:Also with C11 the compiler is forbidden to write to objects that are not being written too (without such a guarantee writing correct threaded code becomes virtually impossible). I'm not completely sure how this applies to bitfields, (...)
-
stegemma
- Posts: 859
- Joined: Mon Aug 10, 2009 10:05 pm
- Location: Italy
- Full name: Stefano Gemma
Re: A little speed-up
In my test, bitfields give worst performance. I've found some other not obvious things:
- defining inline functions in the header gives to me worst performance; say the PushMove function, for sample, works faster if defined as a simple function in the cpp part of the code (maybe because compiled code itself is shorter)
- if you have to pass something to a function in a class, is faster to store that value as a class member:
clsEngine2 style class gets a perft of about 53M nodes/s while clsEngine1 style classes gets about 40 M nodes/s; this is because i call PushMove for any move of a given piece type and it's faster to store the initial position (src) and the pointer to bits (pBits points to the piece bits).
This speed-up maybe depends on the compiler but are fully portables.
- defining inline functions in the header gives to me worst performance; say the PushMove function, for sample, works faster if defined as a simple function in the cpp part of the code (maybe because compiled code itself is shorter)
- if you have to pass something to a function in a class, is faster to store that value as a class member:
Code: Select all
class clsEngine1
{
...
bool PushMove(uint64_t *pBits, uint64_t src, uint64_t dst);
...
};
class clsEngine2
{
...
uint64_t *pBits, uint64_t src;
bool PushMove(uint64_t dst);
...
};
This speed-up maybe depends on the compiler but are fully portables.
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: A little speed-up
Bit fields are simply semantically bad. For example you KNOW that a value can't be more than six bits, so you shift and OR. The the compiler can't know that so it has to take care and remove excess bits. When I first wrote Crafty in C, I used bit fields. Took me less than a week to figure out that was not a good idea.stegemma wrote:In my test, bitfields give worst performance. I've found some other not obvious things:
- defining inline functions in the header gives to me worst performance; say the PushMove function, for sample, works faster if defined as a simple function in the cpp part of the code (maybe because compiled code itself is shorter)
- if you have to pass something to a function in a class, is faster to store that value as a class member:
clsEngine2 style class gets a perft of about 53M nodes/s while clsEngine1 style classes gets about 40 M nodes/s; this is because i call PushMove for any move of a given piece type and it's faster to store the initial position (src) and the pointer to bits (pBits points to the piece bits).Code: Select all
class clsEngine1 { ... bool PushMove(uint64_t *pBits, uint64_t src, uint64_t dst); ... }; class clsEngine2 { ... uint64_t *pBits, uint64_t src; bool PushMove(uint64_t dst); ... };
This speed-up maybe depends on the compiler but are fully portables.
-
Bo Persson
- Posts: 264
- Joined: Sat Mar 11, 2006 8:31 am
- Location: Malmö, Sweden
- Full name: Bo Persson
Re: A little speed-up
Or, "passing one parameter is faster than passing three"?stegemma wrote: - if you have to pass something to a function in a class, is faster to store that value as a class member:
clsEngine2 style class gets a perft of about 53M nodes/s while clsEngine1 style classes gets about 40 M nodes/s; this is because i call PushMove for any move of a given piece type and it's faster to store the initial position (src) and the pointer to bits (pBits points to the piece bits).Code: Select all
class clsEngine1 { ... bool PushMove(uint64_t *pBits, uint64_t src, uint64_t dst); ... }; class clsEngine2 { ... uint64_t *pBits, uint64_t src; bool PushMove(uint64_t dst); ... };
-
jdart
- Posts: 4429
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: A little speed-up
I do not recommend bitfields because the language does not guarantee packing and alignment.
--Jon
--Jon
-
jdart
- Posts: 4429
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: A little speed-up
Type-punning via unions is disallowed in C++. However since this idiom is widely used most compilers tolerate it (g++ may require -fno-strict-aliasing).
--Jon
--Jon
-
stegemma
- Posts: 859
- Joined: Mon Aug 10, 2009 10:05 pm
- Location: Italy
- Full name: Stefano Gemma
Re: A little speed-up
Of course, that's why it works. The called function need all of the 3 parameters but the first two doesn't change often, between calls, while the 3th change at any call. Somewhere the first 2 parameters must be saved and the better position is the class members.Bo Persson wrote:[...]
Or, "passing one parameter is faster than passing three"?