1) The (minor) problem with Xcode and crafty.c is that unless otherwise directed, Xcode will try to compile most of the files twice; once each and once as being included in crafty.c. In the Xcode world, it's relatively rare to see non-header files being included in other files, so this is a reasonable default. I'd say that the diagnostics issued shoud be independent of the compilation scheme (all in one vs sequential in any order).
2) Most testing is being done on a Xeon (Woodcrest), not on a Core 2 Duo (Merom), so that may explain some of the overall timing difference.
3) Some/all of the diagnostics generated from compiling egtb.cpp are there no matter what as far as I can see. I suppose I could chase this further. I admit to an unthinking suspicion of C++ code that has numerous goto statements; I haven't used one of these in decades.
4) There's no need to have:
#define lock_t volatile int
to ever appear in more than one place, and this occurs in lock.h, chess.h, and egdb.cpp.
5) I'm not a big fan of PGO, and the lack of PGO use with g++ should be comparable to the lack of PGO use with icc, I'd guess. Could be wrong here, though.
6) The run time failure diagnostic emitted by SharedMalloc() isses a repair hint limited to Linux. The hint doesn't work on OpenBSD, and the hint doesn't mention the four other kernel values that need to be tweaked. I note that the Mac shmmax limit of 4 MB is fairly generous and if the total shared memory demand could fit into that then a lot of grief could be avoided.
7) The darwinG5 symbol in the Makefile is not listed with the other target symbols in all places.
8) Symbolic doesn't use a single explicit externally supplied host target symbol. Instead, it looks at (near-)universal preprocessor symbols like __ppc__, __ppc46__, __i386__, __x86_64__, etc. to figure out (a very few) host specific issues. The closest easy fit with the Crafty Makefile and chess.h source would be to have a single "Apple" target host symbol trigger the appropriate secondary definitions. By the way, gcc/g++ running under OS/X will always define "__Apple__" along with the usual suspects.
9) An interim solution would be to build the ppc, ppc64, i386, and x86-64 binaries as four separate files and allow users to download the best for their system OR produce a single fat executable for download. There is still the problem of the SharedMalloc() demands; a separate note and script would be helpful.
Building Crafty 22.0 on a Macintosh
Moderator: Ras
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
One more thing about Xcode
One more thing about Xcode: it supports distributed compilation with a separate thread per core, and the cores can be distributed on the LAN. This means that non-inclusion of code files (e.g., dropping the crafty.c scheme in favor of separate compilation) results in a very fast build process. Actually, there might be more than one thread per core because of disk/network lag and that helps keep every core running at full throttle.
There might be some way of having Xcode do this with the Intel compiler as well.
There might be some way of having Xcode do this with the Intel compiler as well.
Re: Building Crafty 22.0 on a Macintosh
6) 4 MB? This leaves not much space for the transposition table. 

-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Building Crafty 22.0 on a Macintosh
Back in the Old Days when one could play with Chess 4.x on a 1960s CDC 6500 console, that program could scale its transposition table requirements down to a mere 256 entries. Each entry took a word and a half (96 bits) of ECS (Extended Core Storage). On that CDC CPU model, the program only scored a few hundred positions per second. Close to, but not quite as good as what I was able to do with a 1986 Macintosh Plus running at 8 MHz.Guetti wrote:6) 4 MB? This leaves not much space for the transposition table.
And yet, back then in the 1960s and early 1970s with punch cards, paper tape, and no Internet, the relative paucity of chess program authors seemed to produce new ideas and discoveries rather faster than is done today.
Does this tell us something?
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Building Crafty 22.0 on a Macintosh
I've been using gcc forever, particularly on AMD boxes, and many versions have had problems in the PGO stuff although they worked well otherwise. But in general, what I would see was a compiler crash on the second compile when it tried to use the PGO data. Note that I do some PGO runs using parallel search since I want the parallel stuff optimized also, and that was what seemed to corrupt the .dyn files gcc produced...Guetti wrote:This has improved. I just recently PGO crafty with gcc with different versions on Linux (4.1.1) and OSX (4.0.1, 4.2 and 4.3) and didn't encounter any problems. However, the speed gains varied a lot (5%-12%).bob wrote:For me it is no comparison... 10-15% is the usual benefit I see when comparing PGO for Intel vs PGO for gcc (when it actually works, about 75% of the gcc versions seem to crash on PGO for crafty)...sje wrote:Is the Intel compiler really that much better for C++ than g++ 4.2?Guetti wrote:I wouldn't expect anybody with Intel processors to use gcc anyway to compile crafty but would use the free Intel compiler, cause this gives a huge performance boost.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Building Crafty 22.0 on a Macintosh
At one point, someone was going to write a C version of the egtb.cpp code, which I would have liked to see. But it didn't happen, probably due to the heavy use of templates that might make a pretty large .c file...sje wrote:1) The (minor) problem with Xcode and crafty.c is that unless otherwise directed, Xcode will try to compile most of the files twice; once each and once as being included in crafty.c. In the Xcode world, it's relatively rare to see non-header files being included in other files, so this is a reasonable default. I'd say that the diagnostics issued shoud be independent of the compilation scheme (all in one vs sequential in any order).
2) Most testing is being done on a Xeon (Woodcrest), not on a Core 2 Duo (Merom), so that may explain some of the overall timing difference.
3) Some/all of the diagnostics generated from compiling egtb.cpp are there no matter what as far as I can see. I suppose I could chase this further. I admit to an unthinking suspicion of C++ code that has numerous goto statements; I haven't used one of these in decades.
The occasional warning has always been present, and I have not tried to fix it since I didn't write the code. Just do not use the strict-alias option because Eugene's code violates that rule and the resulting code will crash and burn.
I agree there. I'll fix this by putting it in chess.h where it belongs...
4) There's no need to have:
#define lock_t volatile int
to ever appear in more than one place, and this occurs in lock.h, chess.h, and egdb.cpp.
4mb is tiny. Crafty allocates hash using this, and 4mb of hash would be puny indeed. Makes no sense to me for an O/S to have a lot of memory and then limit shared memory segments to a max of 4mb...
5) I'm not a big fan of PGO, and the lack of PGO use with g++ should be comparable to the lack of PGO use with icc, I'd guess. Could be wrong here, though.
6) The run time failure diagnostic emitted by SharedMalloc() isses a repair hint limited to Linux. The hint doesn't work on OpenBSD, and the hint doesn't mention the four other kernel values that need to be tweaked. I note that the Mac shmmax limit of 4 MB is fairly generous and if the total shared memory demand could fit into that then a lot of grief could be avoided.
7) The darwinG5 symbol in the Makefile is not listed with the other target symbols in all places.
8) Symbolic doesn't use a single explicit externally supplied host target symbol. Instead, it looks at (near-)universal preprocessor symbols like __ppc__, __ppc46__, __i386__, __x86_64__, etc. to figure out (a very few) host specific issues. The closest easy fit with the Crafty Makefile and chess.h source would be to have a single "Apple" target host symbol trigger the appropriate secondary definitions. By the way, gcc/g++ running under OS/X will always define "__Apple__" along with the usual suspects.
Your "near-universal" is the heart of the problem. I started out doing that, but so many problems were reported, I decided to go the "user declares the O/S target" instead and the complaints have dropped off to near zero.
I completely gave up on this. There are multiple flavors of pentiums, with SSE, SSE2, MMX, 32 bit, 64 bit, you name it. And then there are the library versions. I can't install every possible static library version needed, which makes it problematic to provide executables that will work everywhere. I used to try and gave up. Even windows is not that compatible across all platforms and versions of windows.
9) An interim solution would be to build the ppc, ppc64, i386, and x86-64 binaries as four separate files and allow users to download the best for their system OR produce a single fat executable for download. There is still the problem of the SharedMalloc() demands; a separate note and script would be helpful.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: One more thing about Xcode
quick compiles is the reason I have both alternatives. "make -j" will create one thread per C source file, which compiles much faster since much of the compile process is I/O bound, and overlapping multiple compiles at once keeps CPU and I/O screaming along. But for max performance, one file is better because of better inlining.sje wrote:One more thing about Xcode: it supports distributed compilation with a separate thread per core, and the cores can be distributed on the LAN. This means that non-inclusion of code files (e.g., dropping the crafty.c scheme in favor of separate compilation) results in a very fast build process. Actually, there might be more than one thread per core because of disk/network lag and that helps keep every core running at full throttle.
There might be some way of having Xcode do this with the Intel compiler as well.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Building Crafty 22.0 on a Macintosh
Right, but on a CDC Cyber 176, chess 4.x searched a scorching 2700 nodes per second (2.7K). The last CCT that I played in saw crafty hitting 20,000,000 nodes per second (20M). Almost 10,000 times faster. And it needs 10,000 times the hash.sje wrote:Back in the Old Days when one could play with Chess 4.x on a 1960s CDC 6500 console, that program could scale its transposition table requirements down to a mere 256 entries. Each entry took a word and a half (96 bits) of ECS (Extended Core Storage). On that CDC CPU model, the program only scored a few hundred positions per second. Close to, but not quite as good as what I was able to do with a 1986 Macintosh Plus running at 8 MHz.Guetti wrote:6) 4 MB? This leaves not much space for the transposition table.
And yet, back then in the 1960s and early 1970s with punch cards, paper tape, and no Internet, the relative paucity of chess program authors seemed to produce new ideas and discoveries rather faster than is done today.
Does this tell us something?

-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Building Crafty 22.0 on a Macintosh
Diagnostics from compiling egtb.cpp:
Code: Select all
/Users/sje/Projects/crafty/Source/egtb.cpp:83:1: warning: "lock_t" redefined
In file included from /Users/sje/Projects/crafty/Source/egtb.cpp:54:
/Users/sje/Projects/crafty/Source/lock.h:122:1: warning: this is the location of the previous definition
/Users/sje/Projects/crafty/Source/egtb.cpp:4484: warning: 'TB_CRC_CHECK' initialized and declared 'extern'
/Users/sje/Projects/crafty/Source/egtb.cpp:6303: warning: unused variable 'fWasError'
/Users/sje/Projects/crafty/Source/egtb.cpp:6304: warning: unused variable 'block'
/Users/sje/Projects/crafty/Source/egtb.cpp:6305: warning: unused variable 'rgbBuffer'
/Users/sje/Projects/crafty/Source/egtb.cpp:3249: warning: control may reach end of non-void function 'unsigned int IndEnPassant21B(square, square, square, square)' being inlined
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Building Crafty 22.0 on a Macintosh
When trying to compile crafty with -DCPUS=4, I get the following (repeated many times):
when it hits the inline definition in lock.h:
and also with the earlier LockX86() definition.
Without the -PCPUS=4, the single core version runs at about 1.1 to 1.5 Mnps on the opening position on a 2.66 GHz Mac Pro quad core.
Code: Select all
/Users/sje/Projects/crafty/Source/lock.h:102: error: can't find a register in class 'MQ_REGS' while reloading 'asm'
Code: Select all
static void __inline__ UnlockX86(volatile int *lock)
{
int dummy;
asm __volatile__("movl $0, (%1)":"=&q"(dummy)
:"q"(lock));
}
Without the -PCPUS=4, the single core version runs at about 1.1 to 1.5 Mnps on the opening position on a 2.66 GHz Mac Pro quad core.