Speed comparison for various engines on ARM vs Core 2 Duo

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Speed comparison for various engines on ARM vs Core 2 Duo

Post by Tord Romstad »

Hi all,

I benchmarked a few chess engines on my iPod Touch (412 MHz ARM) and my iMac (2.8 GHz Core 2 Duo). On the Core 2 Duo, all programs were using a single core. The table below shows the ratios (nps on Core 2 Duo) / (nps on ARM) for the programs I tested:

Code: Select all

Phalanx XXII:         29.11
Fruit 2.2:            42.1
Glaurung 1.2.1:       42.7
Strelka 2.0B:         85.7
Glaurung 2.2 magic:   91.4
Glaurung 2.2 hq:     122.0
"Glaurung 2.2 magic" is the public Glaurung 2.2, while "Glaurung 2.2 hq" is the same program with magic bitboards replaced by the hyperbola quintessence method invented by Gerd Isenberg and Aleks Peshkov. If anyone is interested, I can make the source code available.

It seems clear that bitboards of all flavors (Strelka uses rotated bitboards) are highly inefficient on the ARM, at least with the currently most popular techniques. It's probably time to develop a non-bitboard version of Glaurung 2...

If somebody has any other open-source programs they want to see benchmarked, I am happy to do the test, as long as the program can be compiled for the ARM without too many changes. In particular, the program should compile cleanly on modern Unix systems, and not use any x86 assembly language except for bitscanning or reversal of the order of bytes in a word (I already have ARM assembly language equivalents for those).

Hyperbola quintessence is a lot simpler and more elegant than magic multiplication, by the way. It also seems to perform just as well as magic multiplication in 64-bit mode on the Core 2 Duo. Unfortunately, magic multiplication is still a lot faster in 32-bit mode.

Tord
User avatar
Greg Strong
Posts: 388
Joined: Sun Dec 21, 2008 6:57 pm
Location: Washington, DC

Re: Speed comparison for various engines on ARM vs Core 2 Du

Post by Greg Strong »

Actually, I'd be interested in the source code to the Glaurung 2.2 hq if it's not too much trouble...
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Speed comparison for various engines on ARM vs Core 2 Du

Post by mcostalba »

Tord Romstad wrote: It's probably time to develop a non-bitboard version of Glaurung 2...
Hi Tord,

what do you think of the possible improvments to magic botboards of Lasse Hansen and Grant Osborne, namely tables sharing and shift folding ?

http://chessprogramming.wikispaces.com/Magic+Bitboards

Under the section "Stay Tuned" and "Incorporating the Shift".

These, expecially the first seems to save half memory from the standard case. On different architetures like ARM this reduced memory footprint could be a winning choice and it seems good also on PC.

There exsist somewhere a reference implementation? or do you know an engine that uses this technique and has open sources?

Thanks
Marco
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Speed comparison for various engines on ARM vs Core 2 Du

Post by Gerd Isenberg »

Tord Romstad wrote: Hyperbola quintessence is a lot simpler and more elegant than magic multiplication, by the way. It also seems to perform just as well as magic multiplication in 64-bit mode on the Core 2 Duo. Unfortunately, magic multiplication is still a lot faster in 32-bit mode.

Tord
Yes, register usage of magic bitboards is magnificent - even in 32-bit mode.
Does that ARM thing have 64-bit SIMD integer instructions?
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Speed comparison for various engines on ARM vs Core 2 Du

Post by Gerd Isenberg »

Gerd Isenberg wrote: Does that ARM thing have 64-bit SIMD integer instructions?
Yes, NEON Features and Benefits.
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Speed comparison for various engines on ARM vs Core 2 Du

Post by Zach Wegner »

Tord Romstad wrote:Hi all,

I benchmarked a few chess engines on my iPod Touch (412 MHz ARM) and my iMac (2.8 GHz Core 2 Duo). On the Core 2 Duo, all programs were using a single core. The table below shows the ratios (nps on Core 2 Duo) / (nps on ARM) for the programs I tested:

Code: Select all

Phalanx XXII:         29.11
Fruit 2.2:            42.1
Glaurung 1.2.1:       42.7
Strelka 2.0B:         85.7
Glaurung 2.2 magic:   91.4
Glaurung 2.2 hq:     122.0
"Glaurung 2.2 magic" is the public Glaurung 2.2, while "Glaurung 2.2 hq" is the same program with magic bitboards replaced by the hyperbola quintessence method invented by Gerd Isenberg and Aleks Peshkov. If anyone is interested, I can make the source code available.

It seems clear that bitboards of all flavors (Strelka uses rotated bitboards) are highly inefficient on the ARM, at least with the currently most popular techniques. It's probably time to develop a non-bitboard version of Glaurung 2...

If somebody has any other open-source programs they want to see benchmarked, I am happy to do the test, as long as the program can be compiled for the ARM without too many changes. In particular, the program should compile cleanly on modern Unix systems, and not use any x86 assembly language except for bitscanning or reversal of the order of bytes in a word (I already have ARM assembly language equivalents for those).

Hyperbola quintessence is a lot simpler and more elegant than magic multiplication, by the way. It also seems to perform just as well as magic multiplication in 64-bit mode on the Core 2 Duo. Unfortunately, magic multiplication is still a lot faster in 32-bit mode.

Tord
I'd like to see how ZCT's bitboards perform on it. My attack function is very 32-bit friendly. Of course, it compiles fine on Unix systems. 8-)
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Speed comparison for various engines on ARM vs Core 2 Du

Post by Tord Romstad »

Greg Strong wrote:Actually, I'd be interested in the source code to the Glaurung 2.2 hq if it's not too much trouble...
No problem at all. Here it is: http://www.glaurungchess.com/g22-hq.tar.gz

With the default settings, the program uses the 64-bit "bswapq" assembly language instruction, which is only available x86-64. For a 32-bit binary, you will have to add "#define USE_32BIT_BSWAP" at the beginning of bitboard.h.

I particularly like that the initialization code is shorter and cleaner, and that I don't need any big precomputed constant arrays in my code. It's a pity it's not fast enough on 32-bit systems, otherwise I would probably have thrown magic bitboards out now.

Tord
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Speed comparison for various engines on ARM vs Core 2 Du

Post by Tord Romstad »

mcostalba wrote:Hi Tord,

what do you think of the possible improvments to magic botboards of Lasse Hansen and Grant Osborne, namely tables sharing and shift folding ?

http://chessprogramming.wikispaces.com/Magic+Bitboards

Under the section "Stay Tuned" and "Incorporating the Shift".
I think it is at best a very minor improvement, and therefore I haven't found it worth trying so far.
These, expecially the first seems to save half memory from the standard case. On different architetures like ARM this reduced memory footprint could be a winning choice and it seems good also on PC.
I also thought the memory footprint was the problem on the ARM at first, but seeing how miserably rotated bitboards (Strelka) and HQ (the experimental Glaurung version) performed, I don't think there is much hope to make bitboards efficient on the ARM at all, at least not without lots of assembly language.
There exsist somewhere a reference implementation? or do you know an engine that uses this technique and has open sources?
No, I am not aware of any. My impression is that most programmers are skeptical and don't believe these ideas will improve the speed of bitboard operations in practice. They are sufficiently interesting that they deserve to be tried, but everybody is waiting for somebody else to do it. :)

Tord
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Speed comparison for various engines on ARM vs Core 2 Du

Post by Tord Romstad »

Gerd Isenberg wrote:
Gerd Isenberg wrote: Does that ARM thing have 64-bit SIMD integer instructions?
Yes, NEON Features and Benefits.
Thanks for the pointer. I'll have a look and see if I find something I can use.

Tord
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Speed comparison for various engines on ARM vs Core 2 Du

Post by Tord Romstad »

Zach Wegner wrote:I'd like to see how ZCT's bitboards perform on it. My attack function is very 32-bit friendly. Of course, it compiles fine on Unix systems. 8-)
I would like to try, but it seems difficult to download the code. It looks like I am not allowed to check out the code from CVS:

Code: Select all

cvs login: authorization failed: server zct.cvs.sourceforge.net rejected access to /cvsroot/zct for user anonymous
I can view the individual files online, but downloading them one by one is a little too much effort. Do you have a .tar.gz bundle somewhere?

Tord