TalkChess.com
Hosted by Your Move Chess & Games

 Two small in-register-lookups Goto page 1, 2  Next
Author Message
Gerd Isenberg

Joined: 08 Mar 2006
Posts: 1785
Location: Hattingen, Germany

Posted: Mon Apr 23, 2007 10:28 pm    Post subject: Two small in-register-lookups

Color of square - assumes 32-bit shift by mod 32.
Is that portable for 32-bit ints?
(otherwise sq&31 or sq&15 is necessary)

 Code: unsigned int colorOfSquare (unsigned int sq) {    return  (0xAA55AA55 >> sq) & 1; }

Center-Distance in the 0..3 range as a folded [32]-element lookup of two bits:

 Code: 3 3 3 3 3 3 3 3  3 2 2 2 2 2 2 3  3 2 1 1 1 1 2 3  3 2 1 0 0 1 2 3  3 2 1 0 0 1 2 3  3 2 1 1 1 1 2 3  3 2 2 2 2 2 2 3  3 3 3 3 3 3 3 3

 Code: unsigned int centerDistance(unsigned int a) {     const u64 _13 = 0xffffeaabe55be41b;     a ^= (a - 32) >> 27;     return  (_13  >> 2*a) & 3; }

or 32-bit friendly:
 Code: unsigned int centerDistance(unsigned int a) {     a ^=  (a  -  32)  >> 27;     return ((-8274523 >> a) & 1)          | ((  -30842 >> a) & 2); }

Cheers,
Gerd
J. Wesley Cleveland

Joined: 01 Jul 2006
Posts: 622

Posted: Tue Apr 24, 2007 9:08 am    Post subject: Re: Two small in-register-lookups

Gerd Isenberg wrote:
Color of square - assumes 32-bit shift by mod 32.
Is that portable for 32-bit ints?
(otherwise sq&31 or sq&15 is necessary)

 Code: unsigned int colorOfSquare (unsigned int sq) {    return  (0xAA55AA55 >> sq) & 1; }

or
 Code: unsigned int colorOfSquare (unsigned int sq) {    return  ((sq>>3)^ sq) & 1; }
H.G.Muller

Joined: 10 Mar 2006
Posts: 12765
Location: Amsterdam

Posted: Tue Apr 24, 2007 9:51 am    Post subject: Re: Two small in-register-lookups

Why would you want to use in-register lookups? Fetching such simple things from a memory table should be very competative, if not faster. I noticed that these out-of-order CPUs often perform very poorly if you only use registers, it seems that there is some internal bottleneck on how many registers you can access per clock. Having some of the operands come from memory is usually more successful in keeping the ALUs busy.

On AMD machines load and stores are piggybacked on ALU makro ops, so there is no decoder penalty in using instructions with memory operands. uOp fusion in Intel machines now does the same (Pentium M and Core CPUs), while Pentium IV circumvented decoder limits by using a trace cache.

In addition, shifts are usually a bottleneck, as most CPUs have only one ALU capable of doing shifts, wich can become a bottleneck if your code uses them heavily. So better reserve them for what you really can't do without. (Plus, the i86 machine language requires a variable shift count to be in one specific register, leading to extra register-to-register moves to implement them.)

If you are worried about L1 footprint, you could pack several such tables in 64 byte. E.g. bits 0 and 1 the center distance, bit 2 the square color, etc.

 Code: colorOfSquare  = Data[sqr] & 4; // assuming it is used as a boolean, so 4=true centerDistance = Data[sqr] & 3;

(Btw, Joker uses Wesley's method for square color, but I am considering to change that.)
Gerd Isenberg

Joined: 08 Mar 2006
Posts: 1785
Location: Hattingen, Germany

Posted: Tue Apr 24, 2007 10:58 am    Post subject: Re: Two small in-register-lookups

jwes wrote:
Gerd Isenberg wrote:
Color of square - assumes 32-bit shift by mod 32.
Is that portable for 32-bit ints?
(otherwise sq&31 or sq&15 is necessary)

 Code: unsigned int colorOfSquare (unsigned int sq) {    return  (0xAA55AA55 >> sq) & 1; }

or
 Code: unsigned int colorOfSquare (unsigned int sq) {    return  ((sq>>3)^ sq) & 1; }

Yes, I am aware of that. It has one dependent instruction more and with my mapping (a1 == 0, -> dark square), I even need one additional xor 1 or not

 Code: ; ecx == sq mov eax, AA55AA55H shr eax, cl and eax, 1
Gerd Isenberg

Joined: 08 Mar 2006
Posts: 1785
Location: Hattingen, Germany

Posted: Tue Apr 24, 2007 11:00 am    Post subject: Re: Two small in-register-lookups

 hgm wrote: Why would you want to use in-register lookups? Fetching such simple things from a memory table should be very competative, if not faster. I noticed that these out-of-order CPUs often perform very poorly if you only use registers, it seems that there is some internal bottleneck on how many registers you can access per clock. Having some of the operands come from memory is usually more successful in keeping the ALUs busy.

Some alternative methods to compute such data - no serious claims intended. Quite modest register usage on x64, RAX (EAX) and CL, one variable shift. Sometimes with some other memory accesses around such computation may be helpfull to find the "best" balance between memory accesss and register computation...

hgm wrote:
On AMD machines load and stores are piggybacked on ALU makro ops, so there is no decoder penalty in using instructions with memory operands. uOp fusion in Intel machines now does the same (Pentium M and Core CPUs), while Pentium IV circumvented decoder limits by using a trace cache.

In addition, shifts are usually a bottleneck, as most CPUs have only one ALU capable of doing shifts, wich can become a bottleneck if your code uses them heavily. So better reserve them for what you really can't do without. (Plus, the i86 machine language requires a variable shift count to be in one specific register, leading to extra register-to-register moves to implement them.)

If you are worried about L1 footprint, you could pack several such tables in 64 byte. E.g. bits 0 and 1 the center distance, bit 2 the square color, etc.

 Code: colorOfSquare  = Data[sqr] & 4; // assuming it is used as a boolean, so 4=true centerDistance = Data[sqr] & 3;

(Btw, Joker uses Wesley's method for square color, but I am considering to change that.)
H.G.Muller

Joined: 10 Mar 2006
Posts: 12765
Location: Amsterdam

Posted: Tue Apr 24, 2007 12:37 pm    Post subject: Re: Two small in-register-lookups

You could of course replace the (sqr>>3^sqr)&1 by
 Code: 9*sqr & 8

for the purpose of making a boolean.

The multiplication could be implemented with a single LEA instruction:
 Code: movl  _sqr, %eax     leal (%eax, %eax, 8), %eax     andl  \$8, %eax
Gerd Isenberg

Joined: 08 Mar 2006
Posts: 1785
Location: Hattingen, Germany

Posted: Tue Apr 24, 2007 6:36 pm    Post subject: Re: Two small in-register-lookups

hgm wrote:
You could of course replace the (sqr>>3^sqr)&1 by
 Code: 9*sqr & 8

for the purpose of making a boolean.

The multiplication could be implemented with a single LEA instruction:
 Code: movl  _sqr, %eax     leal (%eax, %eax, 8), %eax     andl  \$8, %eax

Cute, lea takes only one register.
Since I need "1" for dark squares (square a1 == 0) and "0" for light squares, lea eax+8*eax+8 would be nice...

But... lea with scaling takes two cycles on K8.
And i can't force vc2005 to generate that code. It always uses imul -7

 Code: ((9*sqr+8)>>3) & 1

 Code: 0000  8D 44 C9 08  lea eax, [eax*8 + eax + 8]  0004  C1 E8 01     shr eax, 3  0007  83 E0 01     and eax, 1

But with sq already in ecx, i still would prefere the in-register lookup.
Codesize is exactly the same

 Code: 0000  B8 55 AA 55 AA mov eax, aa55aa55  0005  D3 E8          shr eax, cl  0007  83 E0 01       and eax, 1
Gerd Isenberg

Joined: 08 Mar 2006
Posts: 1785
Location: Hattingen, Germany

Posted: Tue Apr 24, 2007 8:11 pm    Post subject: Re: Two small in-register-lookups

Three of zillions versions for a boolean subroutine, to look whether two squares share same color. And what vc2005 makes out of it...

 Code: bool sameSquareColor(unsigned int sq1, unsigned int sq2) {     //return  ((0xAA55AA55 >> sq1) ^ (sq2*9 >> 3)) & 1;     //return  (( ((sq1 ^ sq2)>>3) ^ sq1 ^ sq2 ) ^ 1) & 1;     return  ( (0xAA55AA55 >> sq1) ^ (sq2>>3) ^ sq2 ) & 1; }

 Code: ?sameSquareColor@@YA_NII@Z PROC   00000   6b d2 f9    imul    edx, -7   00003   b8 55 aa 55 aa    mov    eax, aa55aa55H   00008   d3 e8       shr    eax, cl   0000a   c1 ea 03    shr    edx, 3   0000d   33 c2       xor    eax, edx   0000f   83 e0 01    and    eax, 1   00012   c3       ret    0

 Code: ?sameSquareColor@@YA_NII@Z PROC   00000   8b c1       mov    eax, ecx   00002   33 c2       xor    eax, edx   00004   c1 e8 03    shr    eax, 3   00007   f7 d0       not    eax   00009   33 c1       xor    eax, ecx   0000b   33 c2       xor    eax, edx   0000d   83 e0 01    and    eax, 1   00010   c3       ret    0

 Code: ?sameSquareColor@@YA_NII@Z PROC   00000   b8 55 aa 55 aa    mov    eax, aa55aa55H   00005   d3 e8       shr    eax, cl   00007   8b ca       mov    ecx, edx   00009   c1 e9 03    shr    ecx, 3   0000c   33 c1       xor    eax, ecx   0000e   33 c2       xor    eax, edx   00010   83 e0 01    and    eax, 1   00013   c3       ret    0

 Code: 00000000  _sameSquareColor PROC  00000000  8D 44 C9 08      lea eax, [rcx + 8*rcx + 8]  00000004  8D 0C D2      lea ecx, [rdx + 8*rdx]  00000007  33 C1      xor eax, ecx  00000009  C1 E8 03      shr eax, 3  0000000C  83 E0 01      and eax, 1  0000000F  C3         ret    0
H.G.Muller

Joined: 10 Mar 2006
Posts: 12765
Location: Amsterdam

Posted: Tue Apr 24, 2007 9:48 pm    Post subject: Re: Two small in-register-lookups

You can XOR the squares first, and then do the test for one square.

 Code: bool sameSquareColor(unsigned int sq1, unsigned int sq2) {     sq1 ^= sq2;     return  ( (0xAA55AA55 >> sq1) & 1); }
Gerd Isenberg

Joined: 08 Mar 2006
Posts: 1785
Location: Hattingen, Germany

Posted: Tue Apr 24, 2007 9:57 pm    Post subject: Re: Two small in-register-lookups

hgm wrote:
You can XOR the squares first, and then do the test for one square.

 Code: bool sameSquareColor(unsigned int sq1, unsigned int sq2) {     sq1 ^= sq2;     return  ( (0xAA55AA55 >> sq1) & 1); }

Of course! Looks quite competetive now.

 Code: ?sameSquareColor@@YA_NII@Z PROC   00000   33 ca       xor    ecx, edx   00002   b8 55 aa 55 aa    mov    eax, aa55aa55H   00005   d3 e8       shr    eax, cl   00007   83 e0 01    and    eax, 1
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First
 All times are GMTGoto page 1, 2  Next Page 1 of 2

 Jump to: Select a forum Computer Chess Club Forums----------------Computer Chess Club: General TopicsComputer Chess Club: Tournaments and MatchesComputer Chess Club: Programming and Technical DiscussionsComputer Chess Club: Engine Origins Other Forums----------------Chess Thinkers ForumForum Help and Suggestions
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum