Dear Jarkko,
It pains me to admit, but this is a compiler bug all right (in 9.1, no longer in 10.0). I downloaded the source and could reproduce and debug the difference with “go depth 15” exactly. By a very strange coincidence, but most fitting, the bug was in my own module, namely automatic vectorization. Thanks to your sharp eye, I am able to correct this mistake in the 9.1 version! Ironic how my hobby and job met here.
Thanks again,
Aart Bik
http://www.aartbik.com/
Compiler switches
Moderators: hgm, Rebel, chrisw
-
- Posts: 822
- Joined: Fri Dec 01, 2006 10:46 pm
- Location: Mountain View, CA, USA
- Full name: Aart Bik
-
- Posts: 198
- Joined: Thu Mar 09, 2006 2:44 am
- Location: Helsinki, Finland
Re: Compiler switches
Nice that I could help you and was not imagining things like it sometimes is the case.
Can you as an expert say what could switches could help take most of "your" compiler to make toga even faster executable? Can you say with your fixed version is the /QxT making Toga any faster than /QxP for E4300?
Jarkko
Can you as an expert say what could switches could help take most of "your" compiler to make toga even faster executable? Can you say with your fixed version is the /QxT making Toga any faster than /QxP for E4300?
Jarkko
-
- Posts: 822
- Joined: Fri Dec 01, 2006 10:46 pm
- Location: Mountain View, CA, USA
- Full name: Aart Bik
Re: Compiler switches
Dear Jarkko,
FWIW, I just committed the compiler fix to our development workspace, which means that it will eventually find its way to a product update. As for your performance question, some good suggestions were already made in this thread. Below, I show some results with the fixed 9.1 and upcoming 10.0 on a 2.4GHz Conroe (keep in mind that results you reported earlier for –QxT after about 9 seconds exposed the bug, it would change the variation on depth 15 a few seconds later; the results below are the only variation reported for depth 15). Chess engines pose challenges on compiler optimization, partly due to the nature of the application and probably partly due to the fact that most chess programmers understand compilers well enough to do a lot of optimization at source level already. So I am glad to see that at least some performance benefits are obtained.
Aart Bik
http://www.aartbik.com/
FWIW, I just committed the compiler fix to our development workspace, which means that it will eventually find its way to a product update. As for your performance question, some good suggestions were already made in this thread. Below, I show some results with the fixed 9.1 and upcoming 10.0 on a 2.4GHz Conroe (keep in mind that results you reported earlier for –QxT after about 9 seconds exposed the bug, it would change the variation on depth 15 a few seconds later; the results below are the only variation reported for depth 15). Chess engines pose challenges on compiler optimization, partly due to the nature of the application and probably partly due to the fact that most chess programmers understand compilers well enough to do a lot of optimization at source level already. So I am glad to see that at least some performance benefits are obtained.
Thanks again for bringing this bug to my attention. One final comment, I did not peak at the Toga source other than to debug the compiler (the “weakness” of my own chess engine gives sufficient proof for that).-O2 (9.1)
info multipv 1 depth 15 seldepth 44 score cp 16 time 24156 nodes 21493535 pv b1c3 g8f6 d2d4 d7d5 c1f4 c7c5 e2e3 c5d4 e3d4 d8b6 d1d3 c8d7 e1c1 b8a6 d3f3 a6b4 c1b1
-Qprof_use -O3 -Qipo –QxP (9.1)
info multipv 1 depth 15 seldepth 44 score cp 16 time 19172 nodes 21493535 pv b1c3 g8f6 d2d4 d7d5 c1f4 c7c5 e2e3 c5d4 e3d4 d8b6 d1d3 c8d7 e1c1 b8a6 d3f3 a6b4 c1b1
-Qprof_use -O3 -Qipo –QxT (9.1)
info multipv 1 depth 15 seldepth 44 score cp 16 time 19094 nodes 21493535 pv b1c3 g8f6 d2d4 d7d5 c1f4 c7c5 e2e3 c5d4 e3d4 d8b6 d1d3 c8d7 e1c1 b8a6 d3f3 a6b4 c1b1
-Qprof_use -O3 -Qipo –QxP (10.0)
info multipv 1 depth 15 seldepth 44 score cp 16 time 18828 nodes 21493535 pv b1c3 g8f6 d2d4 d7d5 c1f4 c7c5 e2e3 c5d4 e3d4 d8b6 d1d3 c8d7 e1c1 b8a6 d3f3 a6b4 c1b1
-Qprof_use -O3 -Qipo –QxT (10.0)
info multipv 1 depth 15 seldepth 44 score cp 16 time 18672 nodes 21493535 pv b1c3 g8f6 d2d4 d7d5 c1f4 c7c5 e2e3 c5d4 e3d4 d8b6 d1d3 c8d7 e1c1 b8a6 d3f3 a6b4 c1b1
Aart Bik
http://www.aartbik.com/
-
- Posts: 778
- Joined: Sat Jul 01, 2006 7:11 am
Re: Compiler switches
I read in the intel optimization manual that the bit operations are now very fast in the Core 2 Duo. Does the Intel compiler use these ? E.g., translate
if (x & (1 << n))
do something
x &= ~(1 << n)
to
BTR x,n
JNC xx
do something
xx:
if (x & (1 << n))
do something
x &= ~(1 << n)
to
BTR x,n
JNC xx
do something
xx:
-
- Posts: 822
- Joined: Fri Dec 01, 2006 10:46 pm
- Location: Mountain View, CA, USA
- Full name: Aart Bik
Re: Compiler switches
If you simply are referring to bit-test instructions, then yes, see below. If I miss a subtle detail in your question, please forgive my ignorance and elaborate.
int x, n;
if (x & (1 << n))
global = 0;
translates by default (O2) to:
mov ecx, DWORD PTR [_n]
mov eax, 1
shl eax, cl
test DWORD PTR [_x], eax
je skip
mov DWORD PTR [_global], 0
skip:
but when compiled for Core 2 Duo (QxT) to:
mov eax, DWORD PTR [_x]
mov edx, DWORD PTR [_n]
bt eax, edx
jae skip
mov DWORD PTR [_global], 0
skip:
int x, n;
if (x & (1 << n))
global = 0;
translates by default (O2) to:
mov ecx, DWORD PTR [_n]
mov eax, 1
shl eax, cl
test DWORD PTR [_x], eax
je skip
mov DWORD PTR [_global], 0
skip:
but when compiled for Core 2 Duo (QxT) to:
mov eax, DWORD PTR [_x]
mov edx, DWORD PTR [_n]
bt eax, edx
jae skip
mov DWORD PTR [_global], 0
skip:
-
- Posts: 2251
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Compiler switches
Hi Aart,abik wrote:If you simply are referring to bit-test instructions, then yes, see below. If I miss a subtle detail in your question, please forgive my ignorance and elaborate.
guess Wesley's question was related, whether the compiler understands the semantic of resetting the bit by using btr instead of bt. Eg. what is the assembly of this inlined bool bitTestAndReset - routine:
Code: Select all
bool bitTestAndReset(unsigned int &set, unsigned int bitIndex)
{
unsigned int bit = 1 << bitIndex;
bool isSet = (set & bit) != 0;
set &= ~bit
return isSet;
}
Code: Select all
if ( bitTestAndReset(x, n))
doSomething();
Code: Select all
mov eax, DWORD PTR [_x]
mov edx, DWORD PTR [_n]
btr eax, edx
mov DWORD PTR [_x], eax
jnz skip
Code: Select all
if ( _bittestandreset(&x, n))
doSomething();
Gerd
-
- Posts: 822
- Joined: Fri Dec 01, 2006 10:46 pm
- Location: Mountain View, CA, USA
- Full name: Aart Bik
Re: Compiler switches
Thanks for the detailed explanation Gerd, which was very helpful. In that case the answer is unfortunately no, or perhaps, not yet, as I am going to discuss this idea with our code generator experts.