A little speed-up

Discussion of chess software programming and technical issues.

Moderator: Ras

matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: A little speed-up

Post by matthewlai »

AlvaroBegue wrote:

Code: Select all

      uint64_t data = (uint16_t)taken; 
      data = (data << 16) | (uint16_t)dst; 
      data = (data << 16) | (uint16_t)src; 
I would have thought that this would be better (but needs to be tested, of course):

Code: Select all

    uint64_t data = (taken << 32) | (dst << 16) | src;
The reason is that modifying `data' multiple times introduces data dependencies that might remove opportunities for parallel computation.
I am pretty sure modern compilers are smart enough to make that optimization. They are a little dumb sometimes when it comes to indirect memory access, but they are still very smart when only working with variables.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: A little speed-up

Post by matthewlai »

brtzsnr wrote:
stegemma wrote:

Code: Select all


typedef int16_t tsq;

struct clsSimpleMove
{
	union
	{
		uint64_t data;
		struct
		{
			int16_t src, dst;
			tsq taken, alfavalue;
		};
	};
};

It's undefined behavior, don't do it. if you write to a union field and read from another field it is not guaranteed to get the same results. See for example http://stackoverflow.com/a/1812359/88622 . Moreover, the compiler is allowed to implement union using #define union struct.
That is no longer the case in C99 by the way, but is true in previous standards.

http://www.open-std.org/jtc1/sc22/wg14/ ... dr_283.htm
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
syzygy
Posts: 5949
Joined: Tue Feb 28, 2012 11:56 pm

Re: A little speed-up

Post by syzygy »

brtzsnr wrote:
stegemma wrote:

Code: Select all


typedef int16_t tsq;

struct clsSimpleMove
{
	union
	{
		uint64_t data;
		struct
		{
			int16_t src, dst;
			tsq taken, alfavalue;
		};
	};
};

It's undefined behavior, don't do it. if you write to a union field and read from another field it is not guaranteed to get the same results. See for example http://stackoverflow.com/a/1812359/88622 . Moreover, the compiler is allowed to implement union using #define union struct.
According to your link, it is not undefined behaviour anymore since C99. From the C99 standard:
If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
So there is undefined behaviour only when the appropriate part of the object representation happens to be a "trap representation". For unsigned ints trap representations are impossible, as far as I can tell from par. 6.2.6.

Of course it is not ideal to depend on byte ordering, but everything can be solved by simply removing the accesses to the struct everywhere in the engine (replace by macros, for example).

It probably would be good as well to replace 16-bit ints for src and dst with 8-bits ints or, even better, 6-bit ints (so the 64-bit int move type can become a 32-bit int type or even a 16-bit int type).
Marco Belli wrote:it's probably better to use c birfield.
It most probably is not. First of all, if all fields are anyway 16 bits, just use 16 bit ints. Bitfields are for efficiently packing bits into bytes.

Secondly, even if 6-bit values are used for src and dst, it is better to use explicit shift operations to access them. That way the programmer keeps control on how the memory is accessed. Processor caches get confused if the same memory location is accessed at different integer widths, and if you use bitfields you can't tell what the compiler will do. And the compiler can't really be blamed for not remembering whether it last accessed part of a bit-field struct using a 32-bit read or a 16-bit read.

Also with C11 the compiler is forbidden to write to objects that are not being written too (without such a guarantee writing correct threaded code becomes virtually impossible). I'm not completely sure how this applies to bitfields, since it seems practically impossible to update the value of a bitfield that is not aligned on byte boundaries without writing to neighbouring fields (unless the compiler emits expensive instructions to do this atomically), but it does seem to limit what a compiler can do.
syzygy
Posts: 5949
Joined: Tue Feb 28, 2012 11:56 pm

Re: A little speed-up

Post by syzygy »

syzygy wrote:Also with C11 the compiler is forbidden to write to objects that are not being written too (without such a guarantee writing correct threaded code becomes virtually impossible). I'm not completely sure how this applies to bitfields, (...)
C11 in fact states that the members of a bit-field may not be updated concurrently by different threads (unless they are separated by a zero-length bit-field declaration). A sequence of adjacent bit-fields all having nonzero width counts as one "memory location".
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: A little speed-up

Post by stegemma »

In my test, bitfields give worst performance. I've found some other not obvious things:

- defining inline functions in the header gives to me worst performance; say the PushMove function, for sample, works faster if defined as a simple function in the cpp part of the code (maybe because compiled code itself is shorter)

- if you have to pass something to a function in a class, is faster to store that value as a class member:

Code: Select all


class clsEngine1
{
...
    bool PushMove(uint64_t *pBits, uint64_t src, uint64_t dst);
...
};

class clsEngine2
{
...
    uint64_t *pBits, uint64_t src; 
    bool PushMove(uint64_t dst);
...
};
clsEngine2 style class gets a perft of about 53M nodes/s while clsEngine1 style classes gets about 40 M nodes/s; this is because i call PushMove for any move of a given piece type and it's faster to store the initial position (src) and the pointer to bits (pBits points to the piece bits).

This speed-up maybe depends on the compiler but are fully portables.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A little speed-up

Post by bob »

stegemma wrote:In my test, bitfields give worst performance. I've found some other not obvious things:

- defining inline functions in the header gives to me worst performance; say the PushMove function, for sample, works faster if defined as a simple function in the cpp part of the code (maybe because compiled code itself is shorter)

- if you have to pass something to a function in a class, is faster to store that value as a class member:

Code: Select all


class clsEngine1
{
...
    bool PushMove(uint64_t *pBits, uint64_t src, uint64_t dst);
...
};

class clsEngine2
{
...
    uint64_t *pBits, uint64_t src; 
    bool PushMove(uint64_t dst);
...
};
clsEngine2 style class gets a perft of about 53M nodes/s while clsEngine1 style classes gets about 40 M nodes/s; this is because i call PushMove for any move of a given piece type and it's faster to store the initial position (src) and the pointer to bits (pBits points to the piece bits).

This speed-up maybe depends on the compiler but are fully portables.
Bit fields are simply semantically bad. For example you KNOW that a value can't be more than six bits, so you shift and OR. The the compiler can't know that so it has to take care and remove excess bits. When I first wrote Crafty in C, I used bit fields. Took me less than a week to figure out that was not a good idea.
User avatar
Bo Persson
Posts: 264
Joined: Sat Mar 11, 2006 8:31 am
Location: Malmö, Sweden
Full name: Bo Persson

Re: A little speed-up

Post by Bo Persson »

stegemma wrote: - if you have to pass something to a function in a class, is faster to store that value as a class member:

Code: Select all


class clsEngine1
{
...
    bool PushMove(uint64_t *pBits, uint64_t src, uint64_t dst);
...
};

class clsEngine2
{
...
    uint64_t *pBits, uint64_t src; 
    bool PushMove(uint64_t dst);
...
};
clsEngine2 style class gets a perft of about 53M nodes/s while clsEngine1 style classes gets about 40 M nodes/s; this is because i call PushMove for any move of a given piece type and it's faster to store the initial position (src) and the pointer to bits (pBits points to the piece bits).
Or, "passing one parameter is faster than passing three"?
jdart
Posts: 4429
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: A little speed-up

Post by jdart »

I do not recommend bitfields because the language does not guarantee packing and alignment.

--Jon
jdart
Posts: 4429
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: A little speed-up

Post by jdart »

Type-punning via unions is disallowed in C++. However since this idiom is widely used most compilers tolerate it (g++ may require -fno-strict-aliasing).

--Jon
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: A little speed-up

Post by stegemma »

Bo Persson wrote:[...]
Or, "passing one parameter is faster than passing three"?
Of course, that's why it works. The called function need all of the 3 parameters but the first two doesn't change often, between calls, while the 3th change at any call. Somewhere the first 2 parameters must be saved and the better position is the class members.