Back to assembly

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Back to assembly

Post by stegemma »

sean_vn wrote:Hey, there is a lot of junk in the Intel instruction set, but bswap, haddps, rdrand, and some of the crc instructions do things that are not so expressable in c. Sometimes you can gain.
If there is no special instruction you can exploit then gcc or java hotspot will generally do better than you can.
There are something else that couldn't be done in C:

- choosing to use only registers (the register keyword doesn't works all the times)
- keeping most used pointers in fixed registers (edi=move, esi=node, for sample)
- call a set of functions with values/pointers in fixed registers, not on the stack (a kind of __fastcall shared by multiple functions)
- return from a function both a value and a flag (Carry set, for sample)
- some magics with flags (maybe this couldn't be true)

All of this has not be proven that could give you a faster programs, but sometimes does, i think.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Back to assembly

Post by matthewlai »

stegemma wrote:
sean_vn wrote:Hey, there is a lot of junk in the Intel instruction set, but bswap, haddps, rdrand, and some of the crc instructions do things that are not so expressable in c. Sometimes you can gain.
If there is no special instruction you can exploit then gcc or java hotspot will generally do better than you can.
There something else that couldn't be done in C:

- choosing to use only registers (the register keyword doesn't works all the times)
- keeping most used pointers in fixed registers (edi=move, esi=node, for sample)
- call a set of functions with values/pointers in fixed registers, not on the stack (a kind of __fastcall shared by multiple functions)
- return from a function both a value and a flag (Carry set, for sample)
- some magics with flags (maybe this couldn't be true)

All of this has not be proven that could give you a faster programs, but sometimes does, i think.
Most/all modern compilers completely ignore the register keyword intentionally, because the optimizer can do better register allocation than most programmers. They will analyze your code and put the most used variables in registers. If you use fewer variables than you have registers, all variables will be in registers. It also tries to put pointers in appropriate registers, etc. It just doesn't allow you to specify those allocations manually.

Returning with a flag is not really possible because that breaks ABI compatibility. So compilers will only do that if they can inline your function call. Or possibly with whole-program optimization.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Aleks Peshkov
Posts: 892
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia

Re: Back to assembly

Post by Aleks Peshkov »

stegemma wrote:- keeping most used pointers in fixed registers (edi=move, esi=node, for sample)
- call a set of functions with values/pointers in fixed registers, not on the stack (a kind of __fastcall shared by multiple functions)
It was possible even in 20 years old versions of C-compilers, before __fastcall was even proposed.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Back to assembly

Post by sje »

matthewlai wrote:Most/all modern compilers completely ignore the register keyword intentionally
Almost. They do check register variables for being able to fit into a register, and they prohibit the reference operator & from being used on a register variable.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Back to assembly

Post by stegemma »

Aleks Peshkov wrote:
stegemma wrote:- keeping most used pointers in fixed registers (edi=move, esi=node, for sample)
- call a set of functions with values/pointers in fixed registers, not on the stack (a kind of __fastcall shared by multiple functions)
It was possible even in 20 years old versions of C-compilers, before __fastcall was even proposed.
With __fastcall you can know the registers used to pass value to the function but, as i remember, you're not granted that the registers keep the original value, after calling the function. In C, i'm not sure that i can use registers as they were global variables, this is what i mean.

As said, the only way to know if this can give some performance gain has to be proved, maybe not... maybe yes... i could say it only after having completed the next new engine.
Aleks Peshkov
Posts: 892
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia

Re: Back to assembly

Post by Aleks Peshkov »

stegemma wrote: In C, i'm not sure that i can use registers as they were global variables, this is what i mean.
Using registers globally is important when building virtual machines (or Forth machines with separare data and return stacks as it was 20 years ago).
tej
Posts: 1
Joined: Mon Feb 23, 2015 6:53 pm

Re: Back to assembly

Post by tej »

now that i can use the full power of the new 64bit registers
Hi, I'v started experimenting with similar things, e.g. using AVX, AVX2 instructions, and AVX registers for permament storage ( holding values in them between function calls, using global register variables ). I do these in C++ nowadays, but I would be interested what fun things you can come up with. For example I found out the following loop:

Code: Select all

  for &#40;int i = 0; i < 64; ++i&#41; &#123;
    unsigned char temp = lsrc&#91;i&#93;;

    if &#40;temp != 0&#41; &#123;
      temp ^= 1;
    &#125;
    ldst&#91;i&#93; = temp;
  &#125;
for flipping a chess board of 64 bytes, compiles into a few instructions in AVX2, working on 32 bytes at a time. Probably going to be even better on AVX512.
It is a bit harder to get these things to work in C++ --- no restrict keyword, have to make sure the alignment is OK, etc...
So I started to learn about x86 intrinsics ( there are a lot of them https://software.intel.com/sites/landin ... sicsGuide/ )
Anyways, these kind of tricks are usually not very useful, but really fun. Do you have some "tricks" to share?
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Back to assembly

Post by stegemma »

tej wrote:[...]
Anyways, these kind of tricks are usually not very useful, but really fun. Do you have some "tricks" to share?
The first that i would try is only the "no-ram" moves generator, with all the stuffs using only CPU registers. Then maybe i could try some SIMD algorithm to generate multiple moves at once.

It requires time and i'm really busy for work, so don't expect some result in a short period.