No problem!
It seems your crafty had more configuration options than mine, any easy way to run it with the same parameters? (except with 6 threads of course)
AMD hex core
Moderators: hgm, Rebel, chrisw
-
- Posts: 228
- Joined: Sun Mar 12, 2006 3:11 pm
Re: AMD hex core
This is my .craftyrc file:rbarreira wrote:No problem!
It seems your crafty had more configuration options than mine, any easy way to run it with the same parameters? (except with 6 threads of course)
mt=4
log=off
hash 4096M
hashp 64M
cache=5M
tbpath /data/CompressedTb
egtb=5
ponder on
info
resign 6
learn=7
exit
-
- Posts: 900
- Joined: Tue Apr 27, 2010 3:48 pm
Re: AMD hex core
Two problems: I was using crafty 23.2, and I was not using -DPOPCNT before.
I'll run a new benchmark with the new crafty, DPOPCNT and some of your settings later.
I'll run a new benchmark with the new crafty, DPOPCNT and some of your settings later.
-
- Posts: 228
- Joined: Sun Mar 12, 2006 3:11 pm
Re: AMD hex core
Do not forget to use mt=6 for your cpu :-)rbarreira wrote:Two problems: I was using crafty 23.2, and I was not using -DPOPCNT before.
I'll run a new benchmark with the new crafty, DPOPCNT and some of your settings later.
(The intel cpu, Q9550, I have does not have a hardware popcount.).
-
- Posts: 900
- Joined: Tue Apr 27, 2010 3:48 pm
Re: AMD hex core
Now my .craftyrc equal to yours except using 6 cpus and 2048 MB hash (my system only has 4 GB of memory).
Using:
crafty 23.3.
gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
Makefile:
Result:
Using:
crafty 23.3.
gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
Makefile:
linux-amd64:
$(MAKE) target=LINUX \
CC=gcc CXX=g++ \
CFLAGS='-Wall -pipe -fbranch-probabilities -fomit-frame-pointer -O3 -march=k8' \
CXFLAGS='' \
LDFLAGS='$(LDFLAGS) -lpthread -lstdc++' \
opt='$(opt) -DINLINE64 -DCPUS=8 -DPOPCNT' \
crafty-make
Result:
unable to open book file [./book.bin].
book is disabled
unable to open book file [./books.bin].
Warning-- xboard 'cores' option disabled
max threads set to 6.
Warning-- xboard 'memory' option disabled
hash table memory = 2048M bytes.
Warning-- xboard 'memory' option disabled
pawn hash table memory = 64M bytes.
EGTB cache memory = 5M bytes.
EGTB access enabled
using tbpath=/data/CompressedTb
0 piece tablebase files found
pondering enabled.
Crafty version 23.3
number of threads = 6
hash table memory = 2048M
pawn hash table memory = 64M
EGTB cache memory = 5M
60 moves/30 minutes 0 seconds primary time control
30 moves/15 minutes 0 seconds secondary time control
book frequency (freq)..............1.00
book static evaluation (eval)......0.10
book learning (learn)..............1.00
resign after 5 consecutive moves with score < -6.
book learning enabled
Crafty v23.3 (6 cpus)
White(1): bench
Running benchmark. . .
......
Total nodes: 431150642
Raw nodes per second: 18802906
Total elapsed time: 22.93
White(1): bench
Running benchmark. . .
......
Total nodes: 555185946
Raw nodes per second: 18903164
Total elapsed time: 29.37
White(1): bench
Running benchmark. . .
......
Total nodes: 485104754
Raw nodes per second: 18758884
Total elapsed time: 25.86
White(1): bench
Running benchmark. . .
......
Total nodes: 355304681
Raw nodes per second: 18563462
Total elapsed time: 19.14
White(1):
-
- Posts: 1822
- Joined: Thu Mar 09, 2006 11:54 pm
- Location: The Netherlands
Re: AMD hex core
And in order to be able to do this, they need a total fubar sabotaged GCC.rbarreira wrote:In my experience the Intel compiler either generates executables that don't run at all when they detect an AMD CPU (if one of the -x options is used) or generates a really crappy codepath which is selected at runtime for AMD CPUs (by default, or if one of the -ax options is used).
Even with the newest AMD CPUs an executable made by icc will run something that would probably work on a 80386 when it detects an AMD CPU. So it will probably be more efficient if I use gcc.
You know, intel has a bigger lookahead than AMD. So if you reschedule branches to be just outside the lookahead of AMD, then you can kill AMD with it and not cripple intel too much and there is tricks likewise to kill intel.
That's exactly what GCC already is doing for years and there are 0 excuses to not generate normal simple straightforward code by the GCC compiler in all those cases.
It's total sabotaged there. It really produces RISC code still and any attempt to get rid of it would directly result in the modification getting undone, as that "would slow down for processor XYZ out of 1980" which no one anymore uses, but avoids a big sabotage getting removed from GCC and thereby directly speeds it up 10% or so, as directly pgo would work for those cases better as well.
Another good example is Linus posting on this same topic; to quote him, "there is no reason to not use cmov now that both intels core as well as AMD have fast cmov handling".
A polish guy replied to linus: "but then it is slower at P4". Thereby overruling Linus, ignoring all objective discussions and parameters like march and mtune. 3 years after Linus posting, GCC still generates the sabotaged code, which basically keeps the branch or even rewrites the branch such that you need to jump inside the code, whereas a simple manner to speedup is generate a cmov.
Under no condition GCC can produce efficient code for a big chessprogram.
If it would, all kind of manufacturers would not be able to use tricks in compilers they use now to avoid competitors to profit from it. That's both true for intel as well as AMD.
The biggest scandal as a result from this, is the PGO pass from GCC. Where intel c++ gets a 22% speedup there, GCC gets just a few %.
Let's suppose now that GCC would produce efficient code and not be deliberate sabotaged, even against wishes of big guys like Linus. In that case there is no way to hide for manufacturers, they simply from performance viewpoint CANNOT AFFORD to produce code that runs worse on their own cpu's anymore then.
Even with sabotage there is not a big IPC difference between AMD and intel cpu's. Many programmers know how to mess up or have themselves fooled too much. From many of the top programmers, when they measure objectively, nehalem is at most 5% faster in ipc using intel c++ than AMD processors.
So it's all about how many Ghz in total you can throw against it, after turning off hyperthreading tricks.
Vincent
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: AMD hex core
Unfortunately, as was alluded to earlier in the thread, for many years ICC has deliberately crippled performance on AMD chips by selecting a slower code path for them at run-time.
http://www.agner.org/optimize/blog/read.php?i=49
The latest compiler versions still do this.
If you want to use ICC, I would recommend patching out the CPUID vendor checks for 'GenuineIntel' that are cause other vendor's chips to take older/slower code path even if their CPUID return values indicate that they support the newer instruction sets. These checks are generated by ICC as part of the executable (and also found in several Intel math libraries you might link against, however, chess programs are unlikely to be using those). Someone has surely written a tool to automate this, by now. It can be done by hand but its probably too tedious for anything you're going to compile more than once.
http://www.agner.org/optimize/blog/read.php?i=49
The latest compiler versions still do this.
If you want to use ICC, I would recommend patching out the CPUID vendor checks for 'GenuineIntel' that are cause other vendor's chips to take older/slower code path even if their CPUID return values indicate that they support the newer instruction sets. These checks are generated by ICC as part of the executable (and also found in several Intel math libraries you might link against, however, chess programs are unlikely to be using those). Someone has surely written a tool to automate this, by now. It can be done by hand but its probably too tedious for anything you're going to compile more than once.
-
- Posts: 900
- Joined: Tue Apr 27, 2010 3:48 pm
Re: AMD hex core
It's actually very easy to override the vendor verification, you don't need to patch the executable. Just adding this to your source code will do the trick:wgarvin wrote: If you want to use ICC, I would recommend patching out the CPUID vendor checks for 'GenuineIntel' that are cause other vendor's chips to take older/slower code path even if their CPUID return values indicate that they support the newer instruction sets.
Code: Select all
int __intel_cpu_indicator = 0;
// this function gets called automatically, don't call it yourself
void __intel_cpu_indicator_init()
{
__intel_cpu_indicator = 0x8000; // Pretend we're running on an Intel CPU with SSE 4.2 no matter what CPU we're using (lower bits set other architectures)
}
Of course Intel doesn't want to do this since it keeps some benchmarks out there favoring Intel CPUs even though they might not be really faster in those cases...
-
- Posts: 155
- Joined: Mon Feb 15, 2010 9:33 am
- Location: New Zealand
Re: AMD hex core
Reality check:diep wrote: And in order to be able to do this, they need a total fubar sabotaged GCC.
<snip>
Another good example is Linus posting on this same topic; to quote him, "there is no reason to not use cmov now that both intels core as well as AMD have fast cmov handling".
A polish guy replied to linus: "but then it is slower at P4". Thereby overruling Linus, ignoring all objective discussions and parameters like march and mtune. 3 years after Linus posting, GCC still generates the sabotaged code, which basically keeps the branch or even rewrites the branch such that you need to jump inside the code, whereas a simple manner to speedup is generate a cmov.
Code: Select all
$ cat test.c
int test( int n ) {
if ( n == 0 ) n = 42;
return n;
}
$ gcc -O3 -S test.c && cat test.s
.text
.align 4,0x90
.globl _test
_test:
LFB2:
pushq %rbp
LCFI0:
movq %rsp, %rbp
LCFI1:
testl %edi, %edi
movl $42, %eax
cmove %eax, %edi
movl %edi, %eax
leave
ret