Crafty vs Stockfish

rbarreira · Post by **rbarreira** » Wed Sep 15, 2010 5:36 pm

bob wrote: For reasons unknown, gcc seems to be better for AMD processors _every_ time I compare them... Particularly for SMP code. But on Intel boxes, and all of our stuff is currently Intel, icc is far better.

There are a bunch of reasons for that.

icc generates crappy code for non-Intel (read: AMD) CPUs, which is one of the reasons why some benchmarks give a huge advantage to Intel CPUs (this even got cited in the recent lawsuits that Intel faced from the US department of something). God knows how many CPUs Intel sold due to the influence of this on CPU reviews.

It's actually possible and easy to override these verifications, but some of the latest extensions (SSE 4.1 maybe) are not fully compatible with AMD, so even if you just want some of the instructions you're screwed, because Intel did not implement CPU detection properly (which would be to use the CPUID instruction to detect what instructions are supported... a mechanism that Intel itself invented).

I like icc's optimization results, but the marketing decisions made with this compiler make it a can of worms for any serious project which might run on AMD CPUs.

Another reason is that AMD seems to contribute to gcc development:

http://developer.amd.com/cpu/gnu/Pages/default.aspx

Don · Post by **Don** » Wed Sep 15, 2010 5:56 pm

On the head to head thing I did a quick study based on some existing data I had. My disclaimer is that I am only going to report the numbers without drawing any conclusions. So draw your own conclusions about the validity of this test.

In this particular set of programs, run at very fast fischer time controls I have this data:

Code: Select all

Rank Name                       Elo    +    - games score oppo. draws 
   1 sf18-12_2                 2992    5    4 29860   65%  2894   24% 
   2 Robbolito-handicapped-6s  2986    5    6 17124   63%  2891   25% 
   3 leiden                    2980    5    5 29859   63%  2897   29% 
   4 Komodo_1.0                2948    5    5 17124   57%  2891   28% 
   5 k-3015.26-3hard           2866    5    6 17124   45%  2891   25% 
   6 k-3015.48-ref             2860   10    9  3957   44%  2891   24% 
   7 spike-24s                 2700    5    5 29860   15%  2950   15%

These programs are all run at different time controls in order that there is no ridiculous disparity between them. The k- programs are weak versions of an experimental program I'm working on and leiden, Komdo_1.0, k- are all heavily related programs.

Roboo is running twice as fast as Stockfish in order to be approximately equal and Stockfish is running faster than Komodo so that Komodo is not too far behind. Spike is given more time than any other program.

In this test stockfish is 6 ELO stronger.

I wold like to note that in this test the komodo based programs never play each other but the "foreign" programs play everyone.

I removed all games except Robbo and stockfish and get this result:

Code: Select all

Rank Name                       Elo    +    - games score oppo. draws 
   1 sf18-12_2                    1    5    5  5708   50%    -1   33% 
   2 Robbolito-handicapped-6s    -1    5    5  5708   50%     1   33%

In head to head stockfish is 2 ELO stronger. The error margins are too large to make any firm conclusions but low enough to suggest that the effect (in this test) is minor if any.

You can draw your own conclusions. Perhaps if I used more of a variety of programs and we would see a more noticeable trend?

bob · Post by **bob** » Wed Sep 15, 2010 6:33 pm

rbarreira wrote:
bob wrote: For reasons unknown, gcc seems to be better for AMD processors _every_ time I compare them... Particularly for SMP code. But on Intel boxes, and all of our stuff is currently Intel, icc is far better.
There are a bunch of reasons for that.

icc generates crappy code for non-Intel (read: AMD) CPUs, which is one of the reasons why some benchmarks give a huge advantage to Intel CPUs (this even got cited in the recent lawsuits that Intel faced from the US department of something). God knows how many CPUs Intel sold due to the influence of this on CPU reviews.

This would imply the compiler knows what it is generating code for. You can tell it explicitly what to target. I have not had the opportunity, very often, to compile using icc on amd, and then run the binary on amd and intel, but I have done that once. AMD had given me access to a quad-dual core a few years back, and I asked if I could try icc. They said fine since the machine would be used only by me, and the disks formatted once i was done. The binary ran poorly on AMD, ran fine on Intel.

I suppose it is possible that some things that work well on Intel work poorly on opterons (or whatever). But we are talking a _huge_ difference. I don't remember specifics, as this was years ago, but I seem to recall that an 8 cpu run produced 2-3x the normal NPS, where on Intel it is always around 7.8x or better. Yet on the intel box I got the expected speed. So it is real, I just don't understand why.

It's actually possible and easy to override these verifications, but some of the latest extensions (SSE 4.1 maybe) are not fully compatible with AMD, so even if you just want some of the instructions you're screwed, because Intel did not implement CPU detection properly (which would be to use the CPUID instruction to detect what instructions are supported... a mechanism that Intel itself invented).

I like icc's optimization results, but the marketing decisions made with this compiler make it a can of worms for any serious project which might run on AMD CPUs.

Another reason is that AMD seems to contribute to gcc development:

http://developer.amd.com/cpu/gnu/Pages/default.aspx

bob · Post by **bob** » Wed Sep 15, 2010 6:40 pm

Don wrote:On the head to head thing I did a quick study based on some existing data I had. My disclaimer is that I am only going to report the numbers without drawing any conclusions. So draw your own conclusions about the validity of this test.

In this particular set of programs, run at very fast fischer time controls I have this data:
Code: Select all
Rank Name                       Elo    +    - games score oppo. draws 
   1 sf18-12_2                 2992    5    4 29860   65%  2894   24% 
   2 Robbolito-handicapped-6s  2986    5    6 17124   63%  2891   25% 
   3 leiden                    2980    5    5 29859   63%  2897   29% 
   4 Komodo_1.0                2948    5    5 17124   57%  2891   28% 
   5 k-3015.26-3hard           2866    5    6 17124   45%  2891   25% 
   6 k-3015.48-ref             2860   10    9  3957   44%  2891   24% 
   7 spike-24s                 2700    5    5 29860   15%  2950   15% 
These programs are all run at different time controls in order that there is no ridiculous disparity between them. The k- programs are weak versions of an experimental program I'm working on and leiden, Komdo_1.0, k- are all heavily related programs.

Roboo is running twice as fast as Stockfish in order to be approximately equal and Stockfish is running faster than Komodo so that Komodo is not too far behind. Spike is given more time than any other program.

In this test stockfish is 6 ELO stronger.

I wold like to note that in this test the komodo based programs never play each other but the "foreign" programs play everyone.

I removed all games except Robbo and stockfish and get this result:
Code: Select all
Rank Name                       Elo    +    - games score oppo. draws 
   1 sf18-12_2                    1    5    5  5708   50%    -1   33% 
   2 Robbolito-handicapped-6s    -1    5    5  5708   50%     1   33% 
In head to head stockfish is 2 ELO stronger. The error margins are too large to make any firm conclusions but low enough to suggest that the effect (in this test) is minor if any.

You can draw your own conclusions. Perhaps if I used more of a variety of programs and we would see a more noticeable trend?

Depends on what you want.

Do you want the Elo to _accurately_ predict the outcome of games between two specific players? If so, use only head-to-head games, and the Elo will be _extremely_ accurate for those two programs. Notice, for the record, the absolute value of the Elo is meaningless anyway, the only thing that matters is the Elo gap between the two players.

Do you want a rough idea of how everybody stacks up to everybody else? Knowing that the individual Elo numbers become less meaningful for anything but this rough ordering? If so, munge all the pgn together and run it thru bayeselo. And you get a pretty good ranking from top to bottom, but you really can not expect to take any two Elo numbers, compare them and use that to predict head-to-head results.

So two different objectives. One way to reach either. But statistically, the statement "program X is N elo better than program Y" has a very specific meaning, because N is supposed to specify a very accurate winning/losing ratio for those two programs. When you mix in other programs, you lose that specificity, to gain an overall view of who is best and who is worse, but the "how much better or worse" is significantly less accurate as a result.

A clear example of "you can't have your cake and eat it too..."

bob · Post by **bob** » Wed Sep 15, 2010 6:46 pm

BubbaTough wrote:If the goal is to compare the two open source programs, it seems MORE fair to compile both the same way, as Bob is doing. The fact that there are versions of Stockfish that use some improved compiling approach may be useful in explaining some of the strength gap on public lists, but is less of a criticism of Bob's methodology than a way of pointing out a slight weakness in using the public lists to compare open source programs (in my opinion).

-Sam

For obvious reasons, I always want to "dance with the one what brung ya'". That is, since I have been using Intel's compiler for years, and doing PGO for years, I want to test like that, because it has, on rare occasions pointed out a compiler bug, or a program bug that was only exposed by heavy-handed optimization. And since I know how to do it, I do it for everyone. And it has broken a program here and there, which is painful. But on my cluster, everything is as close to being fair as I can make it. I can't always run with equal hash, since some use a power of 2 as I do today. Some use 3/4 of a power of 2 as I did a couple of years ago. Some use ungodly numbers. I get 'em all within a power of 2. For hash, I am using 256M for Crafty. And get as close to that without going over as I can for everyone else. Many can deal with 256M exactly, which simplifies life.

I am a bit surprised about some of the Elo misunderstandings I am reading, however. Not understanding that the more players you add, the less accurate Elo is for predicting the outcome between two specific players. Or the more you add, the better you are able to rank the players from best to worst, so long as you take the individual Elo numbers with a grain of salt.

rbarreira · Post by **rbarreira** » Wed Sep 15, 2010 7:14 pm

bob wrote:
rbarreira wrote:
bob wrote: For reasons unknown, gcc seems to be better for AMD processors _every_ time I compare them... Particularly for SMP code. But on Intel boxes, and all of our stuff is currently Intel, icc is far better.
There are a bunch of reasons for that.

icc generates crappy code for non-Intel (read: AMD) CPUs, which is one of the reasons why some benchmarks give a huge advantage to Intel CPUs (this even got cited in the recent lawsuits that Intel faced from the US department of something). God knows how many CPUs Intel sold due to the influence of this on CPU reviews.
This would imply the compiler knows what it is generating code for. You can tell it explicitly what to target. I have not had the opportunity, very often, to compile using icc on amd, and then run the binary on amd and intel, but I have done that once. AMD had given me access to a quad-dual core a few years back, and I asked if I could try icc. They said fine since the machine would be used only by me, and the disks formatted once i was done. The binary ran poorly on AMD, ran fine on Intel.

I suppose it is possible that some things that work well on Intel work poorly on opterons (or whatever). But we are talking a _huge_ difference. I don't remember specifics, as this was years ago, but I seem to recall that an 8 cpu run produced 2-3x the normal NPS, where on Intel it is always around 7.8x or better. Yet on the intel box I got the expected speed. So it is real, I just don't understand why.

It's actually possible and easy to override these verifications, but some of the latest extensions (SSE 4.1 maybe) are not fully compatible with AMD, so even if you just want some of the instructions you're screwed, because Intel did not implement CPU detection properly (which would be to use the CPUID instruction to detect what instructions are supported... a mechanism that Intel itself invented).

I like icc's optimization results, but the marketing decisions made with this compiler make it a can of worms for any serious project which might run on AMD CPUs.

Another reason is that AMD seems to contribute to gcc development:

http://developer.amd.com/cpu/gnu/Pages/default.aspx

Apparently you don't know how the Intel compiler works, it's much worse than that.

It generates code that selects codepaths at runtime, and it looks for the Intel Genuine ID. That's how it gets such poor performance on AMD CPUs, it uses extremely stupid code on non-Intel CPUs. There are tons of people discussing this in other places...

You can actually make the executable think that it's running on an Intel CPU by prepending this to one of your .c files:

Code: Select all

int __intel_cpu_indicator = 0;

void __intel_cpu_indicator_init&#40;)
&#123;
    __intel_cpu_indicator = 0x8000; // Pretend we're running on an Intel CPU with SSE 4.2 no matter what CPU we're using
&#125;

The only problem is that if you actually compile for SSE 4.2, not all the instructions are compatible and your code will crash. SSE3 should be fine, but will lose performance of course. There is no way to get around this problem because Intel doesn't use CPUID to detect capabilities as it should. Instead, it works based on CPU brand and family.

mcostalba · Post by **mcostalba** » Wed Sep 15, 2010 7:22 pm

bob wrote: But on Intel boxes, and all of our stuff is currently Intel, icc is far better.

In this case you may want to try with:

Code: Select all

make profile-build ARCH=x86-64-modern COMP=icc

BTW what make command have you used to compile SF for this test ?

bob · Post by **bob** » Wed Sep 15, 2010 7:32 pm

rbarreira wrote:
bob wrote:
rbarreira wrote:
bob wrote: For reasons unknown, gcc seems to be better for AMD processors _every_ time I compare them... Particularly for SMP code. But on Intel boxes, and all of our stuff is currently Intel, icc is far better.
There are a bunch of reasons for that.

icc generates crappy code for non-Intel (read: AMD) CPUs, which is one of the reasons why some benchmarks give a huge advantage to Intel CPUs (this even got cited in the recent lawsuits that Intel faced from the US department of something). God knows how many CPUs Intel sold due to the influence of this on CPU reviews.
This would imply the compiler knows what it is generating code for. You can tell it explicitly what to target. I have not had the opportunity, very often, to compile using icc on amd, and then run the binary on amd and intel, but I have done that once. AMD had given me access to a quad-dual core a few years back, and I asked if I could try icc. They said fine since the machine would be used only by me, and the disks formatted once i was done. The binary ran poorly on AMD, ran fine on Intel.

I suppose it is possible that some things that work well on Intel work poorly on opterons (or whatever). But we are talking a _huge_ difference. I don't remember specifics, as this was years ago, but I seem to recall that an 8 cpu run produced 2-3x the normal NPS, where on Intel it is always around 7.8x or better. Yet on the intel box I got the expected speed. So it is real, I just don't understand why.

It's actually possible and easy to override these verifications, but some of the latest extensions (SSE 4.1 maybe) are not fully compatible with AMD, so even if you just want some of the instructions you're screwed, because Intel did not implement CPU detection properly (which would be to use the CPUID instruction to detect what instructions are supported... a mechanism that Intel itself invented).

I like icc's optimization results, but the marketing decisions made with this compiler make it a can of worms for any serious project which might run on AMD CPUs.

Another reason is that AMD seems to contribute to gcc development:

http://developer.amd.com/cpu/gnu/Pages/default.aspx
Apparently you don't know how the Intel compiler works, it's much worse than that.

It generates code that selects codepaths at runtime, and it looks for the Intel Genuine ID. That's how it gets such poor performance on AMD CPUs, it uses extremely stupid code on non-Intel CPUs. There are tons of people discussing this in other places...

With the compiler versions I use, you don't get that. You can tell it _explicitly_ which architecture to produce code for so that you don't get that overhead. And I have looked at quite a bit of assembly output (xxx.S files) over the years and have not seen any architectural testing. Yes, commercial users might run into this since they want a "one size fits all" executable. But I always tell it explicitly which architecture to generate for. And at times have had to lie to it, using a previous architecture level, because sometimes it would go just a bit ape-snot doing things that actually slowed things down (one can try prefetch hints and stuff excessively and hurt rather than help performance, as an example).

You can actually make the executable think that it's running on an Intel CPU by prepending this to one of your .c files:
Code: Select all
int __intel_cpu_indicator = 0;

void __intel_cpu_indicator_init&#40;)
&#123;
    __intel_cpu_indicator = 0x8000; // Pretend we're running on an Intel CPU with SSE 4.2 no matter what CPU we're using
&#125;
The only problem is that if you actually compile for SSE 4.2, not all the instructions are compatible and your code will crash. SSE3 should be fine, but will lose performance of course. There is no way to get around this problem because Intel doesn't use CPUID to detect capabilities as it should. Instead, it works based on CPU brand and family.

However, in my quest for ultimate performance, I always tell it explicitly what I want and have not had to deal with that. I can't be certain it didn't do that in the AMD case, but I look often enough to suspect not. However, there could still be things that have several ways of doing them, and it might be possible to choose the way that runs worst on AMD. Can't answer that one as I have not looked for it, and am not certain what works well on one but not on the other, anyway. SSE stuff is irrelevant to me. CMOV is useful. (talking about Crafty).

bob · Post by **bob** » Wed Sep 15, 2010 7:38 pm

mcostalba wrote:
bob wrote: But on Intel boxes, and all of our stuff is currently Intel, icc is far better.
In this case you may want to try with:
Code: Select all
make profile-build ARCH=x86-64-modern COMP=icc
BTW what make command have you used to compile SF for this test ?

"make".

I have to "roll my own" due to some library issues anyway, So I end up taking my basic (not distributed) Crafty Makefile, and changing the object files. A tweak on the profile part and off it goes. I believe I did copy your icc options, assuming that your group had tested and decided those were best. I just added those to my Makefile, which compiles with prof-gen, then runs 24 test positions, then re-compiles with prof-use...

rbarreira · Post by **rbarreira** » Wed Sep 15, 2010 7:41 pm

bob wrote: With the compiler versions I use, you don't get that. You can tell it _explicitly_ which architecture to produce code for so that you don't get that overhead. And I have looked at quite a bit of assembly output (xxx.S files) over the years and have not seen any architectural testing.

You are talking about the -x option, right? That's even worse, as it will produce a binary that will only run on Intel CPUs

I just did this. Created a simple "Hello World" program and compiled it with:

icc -xSSE3 icc.c -o icc

On an Intel CPU it works fine, on an AMD Phenom II X6 (i.e. AMD's newest CPU), this is what happens:

Code: Select all

ricardo@ricardo-desktop&#58;~$ ./icc

Fatal Error&#58; This program was not built to run on the processor in your system.
The allowed processors are&#58; Intel&#40;R&#41; Pentium&#40;R&#41; 4 and compatible Intel processors with Streaming SIMD Extensions 3 &#40;SSE3&#41; instruction support.

edit: even -xSSE2 fails on AMD CPUs... it prints a similar message to the above, saying that it only works on a Pentium 4 or above.

Crafty vs Stockfish

Re: Crafty vs Stockfish

Re: Crafty vs Stockfish

Re: Crafty vs Stockfish

Re: Crafty vs Stockfish

Re: Crafty vs Stockfish

Re: Crafty vs Stockfish

Re: Crafty vs Stockfish

Re: Crafty vs Stockfish

Re: Crafty vs Stockfish

Re: Crafty vs Stockfish