Speed up factor when moving from 32 bit to 64 bit operations

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Speed up factor when moving from 32 bit to 64 bit operations

Post by sje »

After successfully adapting Symbolic to run in native 64 bit mode, I have observed a speed improvement of about 30 percent. The number would have been even higher had not the pointer representation size doubled with its associated memory bandwidth overhead. (Symbolic uses many one way and two way linked lists, and nearly all move data is passed by reference than by value.)

I haven't tested the ChessLisp interpreter yet, but as it is so heavily pointer dependent there may be no net speed improvement at all.
CRoberson
Posts: 2055
Joined: Mon Mar 13, 2006 2:31 am
Location: North Carolina, USA

Re: Speed up factor when moving from 32 bit to 64 bit operat

Post by CRoberson »

What you are seeing is consistent with the other reports.

If you use bitboard data structures, you get about 2x.
Otherwise, you get 1.3x to 1.4x gain in speed.

Of course, you nailed the performance issue. Anything heavy in pointers
doubles the memory/bandwidth needs.

Telepath gained 2x in the 64 bit port. It uses bitboards and I passed
everything by value. I passed by value in order to make a port to
multi-processing or multi-threading much easier. I was quite stack
abusive.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Speed up factor when moving from 32 bit to 64 bit operat

Post by sje »

Well, there might be a few more optimizations to try, but I doubt there will be any spectacular changes.

Before the changeover, rotated bitboards (selected via a compile time switch) slowed down the program by a few percent. Now, using them speeds up the program by about four percent. Then again, I also upgraded the main testing machine to use four ECC FB-DIMMs instead of two and so the CPU can slurp from twice as many memory chips at the same time; this likely changed the rotated/non-rotated results a bit.

I now have five GB total RAM on the machine, soon to go to eight GB and then to a full sixteen GB. I'll be doing some experiments with larger transposition tables at that point. The computer, a Mac Pro, has eight memory slots and can take 32 GB if one can afford the costly 4 GB FB-DIMMs. The box may also be able to handle 8 GB parts if they were manufactured, this would require a 36 bit address bus (64 GB total) and I think that's the limit of my machine.