64 bits cpus

Discussion of chess software programming and technical issues.

Moderator: Ras

plattyaj

Re: 64 bits cpus

Post by plattyaj »

bob wrote:No. And it is likely that the locks will be an issue. You would probably have to write a set of lock/unlock/etc that is (I assume) PPC compatible, since I suspect that is your target platform? Or you could make them use the pthread_mutex() stuff which is a bit less efficient.
At some point in AIX's history, IBM made the mutex stuff do a spin loop for a couple of hundred cycles before giving up and dropping down to a true kernel lock. Probably not as efficient as something designed for the purpose but it might not be such a killer either.

I haven't looked at your lock code but I'm assuming compare and swap primitives are there somewhere. Here's cas for (32 bit) PPC:

Code: Select all

        .extern .compare_and_swap{pr}

        .csect  .ap_CompareAndSwap{pr},2
        .globl  .ap_CompareAndSwap{pr}
        sync
        stu     1,-64(1)
        mflr    0
        st      0,72(1)
        bl      .compare_and_swap{pr}
        oril    0,0,0
        l       12,72(1)
        cal     1,64(1)
        mtlr    12
        br
        .long   0               # mark beginning of literal pool
        .short  12              # subprogram type is assembly
        .short  0               #

        .toc
        .csect  ap_CompareAndSwap{ds}
        .globl  ap_CompareAndSwap{ds}
        .long   .ap_CompareAndSwap{pr}
        .long   TOC{TC0}
        .long   0
Andy.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: 64 bits cpus

Post by bob »

plattyaj wrote:
bob wrote:No. And it is likely that the locks will be an issue. You would probably have to write a set of lock/unlock/etc that is (I assume) PPC compatible, since I suspect that is your target platform? Or you could make them use the pthread_mutex() stuff which is a bit less efficient.
At some point in AIX's history, IBM made the mutex stuff do a spin loop for a couple of hundred cycles before giving up and dropping down to a true kernel lock. Probably not as efficient as something designed for the purpose but it might not be such a killer either.

I haven't looked at your lock code but I'm assuming compare and swap primitives are there somewhere. Here's cas for (32 bit) PPC:

Code: Select all

        .extern .compare_and_swap{pr}

        .csect  .ap_CompareAndSwap{pr},2
        .globl  .ap_CompareAndSwap{pr}
        sync
        stu     1,-64(1)
        mflr    0
        st      0,72(1)
        bl      .compare_and_swap{pr}
        oril    0,0,0
        l       12,72(1)
        cal     1,64(1)
        mtlr    12
        br
        .long   0               # mark beginning of literal pool
        .short  12              # subprogram type is assembly
        .short  0               #

        .toc
        .csect  ap_CompareAndSwap{ds}
        .globl  ap_CompareAndSwap{ds}
        .long   .ap_CompareAndSwap{pr}
        .long   TOC{TC0}
        .long   0
Andy.
Actually I have seen more than one spin for a bit and then block type implementations. Luckily, I no longer lock the hash and such, so lock performance is far less critical than it was 15 years ago when I did the parallel search in Crafty. Normal MUTEX-type locks won't be horrible today, although the quick spinlocks are a bit better overall.