The Raspberry Pi Thread

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

j_romang
Posts: 79
Joined: Mon May 16, 2011 2:52 am

Re: Ordering info for US residents

Post by j_romang »

The gcc __builtin_clzll is already platform-indepent :wink:
Also, I tried your solution vs the builtin one :

Code: Select all

#include <iostream>
#include <time.h>

#define MAX 100000000

typedef  unsigned long long int Bitboard;

inline int msb_matthew(Bitboard b) { 
  int h, l, m; 
  __asm__("clz %0, %1" : "=r"(h) : "r"(b >> 32)); 
  __asm__("clz %0, %1" : "=r"(l) : "r"((unsigned int)(b))); 
  m = ~((h - 32) >> 31); // ~0 (-1) if h >= 32, else 0 
  return ((h+(m&l)) ^ 63); 
} 

inline int msb_builtin(Bitboard b) {
  return (63 - __builtin_clzll(b));
}

int main()
{
    int x;
    unsigned long time;
    
    x=0;
    time=clock();
    for(Bitboard b=0;b<MAX;b++)
    {
        x+=msb_matthew(b|0x1);
        x+=msb_matthew((b|0x1)<<32);
    }
    std::cout<<"matthew:"<<x<<':'<<clock()-time<<std::endl;

    x=0;
    time=clock();
    for(Bitboard b=0;b<MAX;b++)
    {
        x+=msb_builtin(b|0x1);
        x+=msb_builtin((b|0x1)<<32);
    }
    std::cout<<"builtin:"<<x<<':'<<clock()-time<<std::endl;
}
matthew:-458370044:10800000
builtin:-458370044:7730000

The builtin is the fastest on my armv7.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Ordering info for US residents

Post by mcostalba »

ZirconiumX wrote: Maybe that would be platform-indepent enough for Marco's approval?
Hi Matthew,

I always ponder with great care when adding platform dependant code becuase I'd like to get rid of it, not to add it :-)

I have merged Jean-Francois patch because he reported a 2% speed-up (global speedup, not just the function's one), just a bit south of that and I didn't. In the case of msb I doubt the speed up could be important because it is not a performance critical function (instead lsb it is), it is used rarely and not in hot paths.

Anyhow I would be much more happy to merge patches other than platform dependenat code. Clean ups or removal of useless code would be great. Of course also adding ELO increasing functions it is but these kind of patches needs great care to avoid bloating / slowing down the evaluation (for some reason the people's preferred target ).

Marco
ZirconiumX
Posts: 1361
Joined: Sun Jul 17, 2011 11:14 am
Full name: Hannah Ravensloft

Re: Ordering info for US residents

Post by ZirconiumX »

When LZCNT is defined to "cntlzw" I get a worst-case speedup of 12% (92knps to 103knps) on my PowerPC computer, and when LZCNT is defined to "clz" I get a rough speedup of 12% on my Raspberry Pi (35knps to 40knps)

I'm pretty sure that someone like Michael Hoffman might find this patch usefull because IIRC he has a k8 with hugely slow bitscan.

I think Jean gets the speedup from his LSB method for which ARMv6 and below would have to use an expensive routine to flip the bits.

EDIT: __builtin_ctz() accomplishes what we need - though I'm not sure how fast it'll be. Testers appreciated.

Matthew:out
tu ne cede malis, sed contra audentior ito
ZirconiumX
Posts: 1361
Joined: Sun Jul 17, 2011 11:14 am
Full name: Hannah Ravensloft

Re: Ordering info for US residents

Post by ZirconiumX »

Now I bring out the big guns!

Code: Select all

Square msb(Bitboard b) {
  uint32_t h, l, r, t;
  h = b >> 32;
  l = (uint32_t)b;
  __asm__("cmp %0, 0" : : "r"(h));
  __asm__("clzeq %0, %1" : "=r"(t) : "r"(l));
  __asm__("eoreq %0, %1, 31" : "=r"(r) : "r"(t));
  __asm__("clzne %0, %1" : "=r"(t) : "r"(h));
  __asm__("eorne %0, %1, 63" : "=r"(r) : "r"(t));
  return r;
}
Matthew:out
tu ne cede malis, sed contra audentior ito
User avatar
JuLieN
Posts: 2949
Joined: Mon May 05, 2008 12:16 pm
Location: Bordeaux (France)
Full name: Julien Marcel

Re: Ordering info for US residents

Post by JuLieN »

With Steven's authorization I renamed this thread. (It was, previously, "New chess computer US$25/UK£15"). :)
"The only good bug is a dead bug." (Don Dailey)
[Blog: http://tinyurl.com/predateur ] [Facebook: http://tinyurl.com/fbpredateur ] [MacEngines: http://tinyurl.com/macengines ]
User avatar
JuLieN
Posts: 2949
Joined: Mon May 05, 2008 12:16 pm
Location: Bordeaux (France)
Full name: Julien Marcel

Re: Ordering info for US residents

Post by JuLieN »

Good news today: the Raspberry Pi foundation was able to double the RasPi's RAM size without changing the price tag. Starting from now you'll get 512 MB of SDRAM if you buy a RasPi.

Also, as I found it easier to program on my iMac's big 27" screen I compiled the necessary tools to develop in assembly by cross-compling from my Mac. I decided to share them with those of you who meet those two criteria:
- using an Intel Mac
- learning ARM assembly for the RasPi.
... hopefully I am not the only one on this forum! :lol:

Here's the readme:
MINIMAL TOOLS TO PROGRAM IN ASSEMBLY FOR THE RASPBERRY PI FROM MACOS X
----------------------------------------------------------------------

DISCLAIMER:
-----------
I just compiled this tools. They've been developed by the GNU Project. To know
more about GNU's binutils and/or get the source files, go to:
http://www.gnu.org/software/binutils/


This archive contains the following files:
------------------------------------------
- cross-as: it's the AS assembler from GNU's binutils. It runs on Intel Macs and produces a Raspberry Pi object file.
- cross-ld : it's the LD linker from GNU's binutils. It runs on Intel Macs and produces a Raspberry Pi executable file
- hello.s : it's a simple "hello world" example in assembly for the Raspberry Pi. I'm not the author of this example although I can't remember where I found it.

To assemble the example, open a terminal, cd where you extract the archives and enter:
> ./cross-as hello.s -o hello.o
> ./cross-ld hello.o -o hello

Then copy it onto a USB key, for instance, and transfer the file to your RasPi. To run it, you might have to reset again the executable bit (do that by typing "chmod +x hello" in a terminal).

Have fun with assembly on the Raspeberry Pi! :)
You can download the file there: http://julien.marcel.free.fr/public/cro ... -Intel.zip
"The only good bug is a dead bug." (Don Dailey)
[Blog: http://tinyurl.com/predateur ] [Facebook: http://tinyurl.com/fbpredateur ] [MacEngines: http://tinyurl.com/macengines ]
j_romang
Posts: 79
Joined: Mon May 16, 2011 2:52 am

Re: Ordering info for US residents

Post by j_romang »

Yes, good news for the 512MB ram !
I just recieved an ARM board that is more suited to chess an developpement : it's an odroid-x (http://www.hardkernel.com/renewal_2011/ ... t_info.php) ; 1Gb RAM and quad-core 1.4ghz processor. Stockfish runs >800.000 nps on it, and there is not need for cross-compilation tools :D