ChessUSA.com TalkChess.com
Hosted by Your Move Chess & Games
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Two small in-register-lookups
Goto page 1, 2  Next
 
Post new topic       TalkChess.com Forum Index -> Computer Chess Club: Programming and Technical Discussions Threaded
View previous topic :: View next topic  
Author Message
Gerd Isenberg



Joined: 08 Mar 2006
Posts: 1785
Location: Hattingen, Germany

PostPosted: Mon Apr 23, 2007 10:28 pm    Post subject: Two small in-register-lookups Reply to topic Reply with quote

Color of square - assumes 32-bit shift by mod 32.
Is that portable for 32-bit ints?
(otherwise sq&31 or sq&15 is necessary)

Code:
unsigned int colorOfSquare (unsigned int sq) {
   return  (0xAA55AA55 >> sq) & 1;
}


Center-Distance in the 0..3 range as a folded [32]-element lookup of two bits:

Code:
 3 3 3 3 3 3 3 3
 3 2 2 2 2 2 2 3
 3 2 1 1 1 1 2 3
 3 2 1 0 0 1 2 3
 3 2 1 0 0 1 2 3
 3 2 1 1 1 1 2 3
 3 2 2 2 2 2 2 3
 3 3 3 3 3 3 3 3

Code:
unsigned int centerDistance(unsigned int a) {
    const u64 _13 = 0xffffeaabe55be41b;
    a ^= (a - 32) >> 27;
    return  (_13  >> 2*a) & 3;
}

or 32-bit friendly:
Code:
unsigned int centerDistance(unsigned int a) {
    a ^=  (a  -  32)  >> 27;
    return ((-8274523 >> a) & 1)
         | ((  -30842 >> a) & 2);
}

Cheers,
Gerd
Back to top
View user's profile Send private message
J. Wesley Cleveland



Joined: 01 Jul 2006
Posts: 622

PostPosted: Tue Apr 24, 2007 9:08 am    Post subject: Re: Two small in-register-lookups Reply to topic Reply with quote

Gerd Isenberg wrote:
Color of square - assumes 32-bit shift by mod 32.
Is that portable for 32-bit ints?
(otherwise sq&31 or sq&15 is necessary)

Code:
unsigned int colorOfSquare (unsigned int sq) {
   return  (0xAA55AA55 >> sq) & 1;
}


or
Code:
unsigned int colorOfSquare (unsigned int sq) {
   return  ((sq>>3)^ sq) & 1;
}
Back to top
View user's profile Send private message
H.G.Muller



Joined: 10 Mar 2006
Posts: 12765
Location: Amsterdam

PostPosted: Tue Apr 24, 2007 9:51 am    Post subject: Re: Two small in-register-lookups Reply to topic Reply with quote

Why would you want to use in-register lookups? Fetching such simple things from a memory table should be very competative, if not faster. I noticed that these out-of-order CPUs often perform very poorly if you only use registers, it seems that there is some internal bottleneck on how many registers you can access per clock. Having some of the operands come from memory is usually more successful in keeping the ALUs busy.

On AMD machines load and stores are piggybacked on ALU makro ops, so there is no decoder penalty in using instructions with memory operands. uOp fusion in Intel machines now does the same (Pentium M and Core CPUs), while Pentium IV circumvented decoder limits by using a trace cache.

In addition, shifts are usually a bottleneck, as most CPUs have only one ALU capable of doing shifts, wich can become a bottleneck if your code uses them heavily. So better reserve them for what you really can't do without. (Plus, the i86 machine language requires a variable shift count to be in one specific register, leading to extra register-to-register moves to implement them.)

If you are worried about L1 footprint, you could pack several such tables in 64 byte. E.g. bits 0 and 1 the center distance, bit 2 the square color, etc.

Code:
colorOfSquare  = Data[sqr] & 4; // assuming it is used as a boolean, so 4=true
centerDistance = Data[sqr] & 3;

(Btw, Joker uses Wesley's method for square color, but I am considering to change that.)
Back to top
View user's profile Send private message Visit poster's website
Gerd Isenberg



Joined: 08 Mar 2006
Posts: 1785
Location: Hattingen, Germany

PostPosted: Tue Apr 24, 2007 10:58 am    Post subject: Re: Two small in-register-lookups Reply to topic Reply with quote

jwes wrote:
Gerd Isenberg wrote:
Color of square - assumes 32-bit shift by mod 32.
Is that portable for 32-bit ints?
(otherwise sq&31 or sq&15 is necessary)

Code:
unsigned int colorOfSquare (unsigned int sq) {
   return  (0xAA55AA55 >> sq) & 1;
}


or
Code:
unsigned int colorOfSquare (unsigned int sq) {
   return  ((sq>>3)^ sq) & 1;
}


Yes, I am aware of that. It has one dependent instruction more and with my mapping (a1 == 0, -> dark square), I even need one additional xor 1 or not Wink

Thus loading the immediate constant to eax, plus shift, mask looks not that bad, despite variable shift needs cl.

Code:
; ecx == sq
mov eax, AA55AA55H
shr eax, cl
and eax, 1
Back to top
View user's profile Send private message
Gerd Isenberg



Joined: 08 Mar 2006
Posts: 1785
Location: Hattingen, Germany

PostPosted: Tue Apr 24, 2007 11:00 am    Post subject: Re: Two small in-register-lookups Reply to topic Reply with quote

hgm wrote:
Why would you want to use in-register lookups? Fetching such simple things from a memory table should be very competative, if not faster. I noticed that these out-of-order CPUs often perform very poorly if you only use registers, it seems that there is some internal bottleneck on how many registers you can access per clock. Having some of the operands come from memory is usually more successful in keeping the ALUs busy.


Bit-twiddling addiction Wink
Some alternative methods to compute such data - no serious claims intended. Quite modest register usage on x64, RAX (EAX) and CL, one variable shift. Sometimes with some other memory accesses around such computation may be helpfull to find the "best" balance between memory accesss and register computation...


hgm wrote:
On AMD machines load and stores are piggybacked on ALU makro ops, so there is no decoder penalty in using instructions with memory operands. uOp fusion in Intel machines now does the same (Pentium M and Core CPUs), while Pentium IV circumvented decoder limits by using a trace cache.

In addition, shifts are usually a bottleneck, as most CPUs have only one ALU capable of doing shifts, wich can become a bottleneck if your code uses them heavily. So better reserve them for what you really can't do without. (Plus, the i86 machine language requires a variable shift count to be in one specific register, leading to extra register-to-register moves to implement them.)

If you are worried about L1 footprint, you could pack several such tables in 64 byte. E.g. bits 0 and 1 the center distance, bit 2 the square color, etc.

Code:
colorOfSquare  = Data[sqr] & 4; // assuming it is used as a boolean, so 4=true
centerDistance = Data[sqr] & 3;

(Btw, Joker uses Wesley's method for square color, but I am considering to change that.)
Back to top
View user's profile Send private message
H.G.Muller



Joined: 10 Mar 2006
Posts: 12765
Location: Amsterdam

PostPosted: Tue Apr 24, 2007 12:37 pm    Post subject: Re: Two small in-register-lookups Reply to topic Reply with quote

You could of course replace the (sqr>>3^sqr)&1 by
Code:
9*sqr & 8

for the purpose of making a boolean.

The multiplication could be implemented with a single LEA instruction:
Code:
    movl  _sqr, %eax
    leal (%eax, %eax, 8), %eax
    andl  $8, %eax
Back to top
View user's profile Send private message Visit poster's website
Gerd Isenberg



Joined: 08 Mar 2006
Posts: 1785
Location: Hattingen, Germany

PostPosted: Tue Apr 24, 2007 6:36 pm    Post subject: Re: Two small in-register-lookups Reply to topic Reply with quote

hgm wrote:
You could of course replace the (sqr>>3^sqr)&1 by
Code:
9*sqr & 8

for the purpose of making a boolean.

The multiplication could be implemented with a single LEA instruction:
Code:
    movl  _sqr, %eax
    leal (%eax, %eax, 8), %eax
    andl  $8, %eax


Cute, lea takes only one register.
Since I need "1" for dark squares (square a1 == 0) and "0" for light squares, lea eax+8*eax+8 would be nice...

But... lea with scaling takes two cycles on K8.
And i can't force vc2005 to generate that code. It always uses imul -7 Wink

Code:
((9*sqr+8)>>3) & 1

Code:
 0000  8D 44 C9 08  lea eax, [eax*8 + eax + 8]
 0004  C1 E8 01     shr eax, 3
 0007  83 E0 01     and eax, 1

But with sq already in ecx, i still would prefere the in-register lookup.
Codesize is exactly the same

Code:
 0000  B8 55 AA 55 AA mov eax, aa55aa55
 0005  D3 E8          shr eax, cl
 0007  83 E0 01       and eax, 1
Back to top
View user's profile Send private message
Gerd Isenberg



Joined: 08 Mar 2006
Posts: 1785
Location: Hattingen, Germany

PostPosted: Tue Apr 24, 2007 8:11 pm    Post subject: Re: Two small in-register-lookups Reply to topic Reply with quote

Three of zillions versions for a boolean subroutine, to look whether two squares share same color. And what vc2005 makes out of it...

Code:
bool sameSquareColor(unsigned int sq1, unsigned int sq2) {
    //return  ((0xAA55AA55 >> sq1) ^ (sq2*9 >> 3)) & 1;
    //return  (( ((sq1 ^ sq2)>>3) ^ sq1 ^ sq2 ) ^ 1) & 1;
    return  ( (0xAA55AA55 >> sq1) ^ (sq2>>3) ^ sq2 ) & 1;
}

Code:
?sameSquareColor@@YA_NII@Z PROC
  00000   6b d2 f9    imul    edx, -7
  00003   b8 55 aa 55 aa    mov    eax, aa55aa55H
  00008   d3 e8       shr    eax, cl
  0000a   c1 ea 03    shr    edx, 3
  0000d   33 c2       xor    eax, edx
  0000f   83 e0 01    and    eax, 1
  00012   c3       ret    0

Code:
?sameSquareColor@@YA_NII@Z PROC
  00000   8b c1       mov    eax, ecx
  00002   33 c2       xor    eax, edx
  00004   c1 e8 03    shr    eax, 3
  00007   f7 d0       not    eax
  00009   33 c1       xor    eax, ecx
  0000b   33 c2       xor    eax, edx
  0000d   83 e0 01    and    eax, 1
  00010   c3       ret    0

Code:

?sameSquareColor@@YA_NII@Z PROC
  00000   b8 55 aa 55 aa    mov    eax, aa55aa55H
  00005   d3 e8       shr    eax, cl
  00007   8b ca       mov    ecx, edx
  00009   c1 e9 03    shr    ecx, 3
  0000c   33 c1       xor    eax, ecx
  0000e   33 c2       xor    eax, edx
  00010   83 e0 01    and    eax, 1
  00013   c3       ret    0

handmade assembly (ml64)

Code:
 00000000  _sameSquareColor PROC
 00000000  8D 44 C9 08      lea eax, [rcx + 8*rcx + 8]
 00000004  8D 0C D2      lea ecx, [rdx + 8*rdx]
 00000007  33 C1      xor eax, ecx
 00000009  C1 E8 03      shr eax, 3
 0000000C  83 E0 01      and eax, 1
 0000000F  C3         ret    0
Back to top
View user's profile Send private message
H.G.Muller



Joined: 10 Mar 2006
Posts: 12765
Location: Amsterdam

PostPosted: Tue Apr 24, 2007 9:48 pm    Post subject: Re: Two small in-register-lookups Reply to topic Reply with quote

You can XOR the squares first, and then do the test for one square.

Code:
bool sameSquareColor(unsigned int sq1, unsigned int sq2) {
    sq1 ^= sq2;
    return  ( (0xAA55AA55 >> sq1) & 1);
}
Back to top
View user's profile Send private message Visit poster's website
Gerd Isenberg



Joined: 08 Mar 2006
Posts: 1785
Location: Hattingen, Germany

PostPosted: Tue Apr 24, 2007 9:57 pm    Post subject: Re: Two small in-register-lookups Reply to topic Reply with quote

hgm wrote:
You can XOR the squares first, and then do the test for one square.

Code:
bool sameSquareColor(unsigned int sq1, unsigned int sq2) {
    sq1 ^= sq2;
    return  ( (0xAA55AA55 >> sq1) & 1);
}


Of course! Looks quite competetive now.

Code:
?sameSquareColor@@YA_NII@Z PROC
  00000   33 ca       xor    ecx, edx
  00002   b8 55 aa 55 aa    mov    eax, aa55aa55H
  00005   d3 e8       shr    eax, cl
  00007   83 e0 01    and    eax, 1
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic       TalkChess.com Forum Index -> Computer Chess Club: Programming and Technical Discussions All times are GMT
Goto page 1, 2  Next
Threaded
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




Powered by phpBB © 2001, 2005 phpBB Group
Enhanced with Moby Threads