GCC bug?

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

GCC bug?

Post by michiguel »

It seems there is a problem with inlining in a version of GCC.

My engine has been always a bitboard engine in which the bitboard type was a structure with two 32 bit ints (that approach was faster than a 64 bit int in a 32 bit environment, but clumsy for writing code). I converted it to use 64 ints (long overdue) and it was not very painful (most of the operations AND, OR, NOT etc. were hidden in macros). To check that everything was fine, I checked the node counts from some positions and I found a little discrepancy. After debugging, I found that the error was gone if I introduced a printf() in a certain position (as part of the debugging process a compiler switch activated it to dump the whole tree). This drove me nuts because it did not make any sense. You can imagine my frustration because I did not see the bug when dumping the tree, but I saw it when I did not dumped the tree. I had to test which individual printf was the culprit in several parts of the program.

I tested different combinations of optimization flags and all the problems were gone when I turned off inline functions with
gcc -O3 -fno-inline-functions or when I compiled with -O2

I tend to believe that this is a weird compiler bug. Has anybody experience this problem? A painful situation because that flag is important. Maybe this will disappear if I compile in the 64 bit version of the OS. I have not tried that yet. I have ubuntu 32 bits installed here.

the version I have is
gcc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)

May I should update gcc but I got lazy with the ubuntu packages (4.1.2 is the last one they offered).

Miguel
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: GCC bug?

Post by Michel »

Hard to say. In my experience so-called gcc bugs are usually the result of subtly broken code.

Did you compile with -Wall? Did you try -std=c99? Did you try the Intel compiler?
Did you try to run your program under valgrind?

Regards,
Michel
glorfindel

Re: GCC bug?

Post by glorfindel »

It is really a relatively old version of gcc that you are using. GCC 4.1 was released in February 2006 anf the last bugfix release (4.1.2) in February 2007. Are you sure you can't easily install and try a newer version, because at http://packages.ubuntu.com/intrepid/gcc it says 4.3.1 is available.

Apart from that, I have read in my linux distribution's manual that
# -O3: This is the highest level of optimization possible, and also the riskiest. It will take a longer time to compile your code with this option, and in fact it should not be used system-wide with gcc 4.x. The behavior of gcc has changed significantly since version 3.x. In 3.x, -O3 has been shown to lead to marginally faster execution times over -O2, but this is no longer the case with gcc 4.x. Compiling all your packages with -O3 will result in larger binaries that require more memory, and will significantly increase the odds of compilation failure or unexpected program behavior (including errors). The downsides outweigh the benefits; remember the principle of diminishing returns. Using -O3 is not recommended for gcc 4.x.
Optimization -O3 is described as "the riskiest". I am not sure if this is supposed to be because of bugs in gcc, or because the programmer must be very careful to write perfect code.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: GCC bug?

Post by michiguel »

Michel wrote:Hard to say. In my experience so-called gcc bugs are usually the result of subtly broken code.
That is why I am not 100% relaxed.
Did you compile with -Wall? Did you try -std=c99? Did you try the Intel compiler?
Did you try to run your program under valgrind?

Regards,
Michel
I have -Wall and -std=c99 and it compiles cleanly. I have asserts all over the place and they are quiet with this.

I did not try another compiler yet to check this problem. I do not have the intel compiler installed yet, but I think this will give me an excuse to finally download it. I will try to compile it in windows, because the code is portable. I will also try valgrind. I know I should try all this, but my message was somehow cathartic. :-)

Still, I cannot imagine a reason for this behavior. Why introducing a printf() in a speficic spot can cause to have a dozen or so more nodes counted in a 10 ply search? I have some sort of obsesive compulsive disorder when it comes to bugs :-) and I can't stand it.

Miguel
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: GCC bug?

Post by michiguel »

glorfindel wrote:It is really a relatively old version of gcc that you are using. GCC 4.1 was released in February 2006 anf the last bugfix release (4.1.2) in February 2007. Are you sure you can't easily install and try a newer version, because at http://packages.ubuntu.com/intrepid/gcc it says 4.3.1 is available.
Thanks, I think I missed that because I have Ubuntu 7.04 (synaptic is not showing it to me) and the last one is 8.10. Time to upgrade that too.
Apart from that, I have read in my linux distribution's manual that
# -O3: This is the highest level of optimization possible, and also the riskiest. It will take a longer time to compile your code with this option, and in fact it should not be used system-wide with gcc 4.x. The behavior of gcc has changed significantly since version 3.x. In 3.x, -O3 has been shown to lead to marginally faster execution times over -O2, but this is no longer the case with gcc 4.x. Compiling all your packages with -O3 will result in larger binaries that require more memory, and will significantly increase the odds of compilation failure or unexpected program behavior (including errors). The downsides outweigh the benefits; remember the principle of diminishing returns. Using -O3 is not recommended for gcc 4.x.
Optimization -O3 is described as "the riskiest". I am not sure if this is supposed to be because of bugs in gcc, or because the programmer must be very careful to write perfect code.
I guess that the part of "unexpected program behavior" is a warning I did not follow.

Miguel
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: GCC bug?

Post by bob »

michiguel wrote:It seems there is a problem with inlining in a version of GCC.

My engine has been always a bitboard engine in which the bitboard type was a structure with two 32 bit ints (that approach was faster than a 64 bit int in a 32 bit environment, but clumsy for writing code). I converted it to use 64 ints (long overdue) and it was not very painful (most of the operations AND, OR, NOT etc. were hidden in macros). To check that everything was fine, I checked the node counts from some positions and I found a little discrepancy. After debugging, I found that the error was gone if I introduced a printf() in a certain position (as part of the debugging process a compiler switch activated it to dump the whole tree). This drove me nuts because it did not make any sense. You can imagine my frustration because I did not see the bug when dumping the tree, but I saw it when I did not dumped the tree. I had to test which individual printf was the culprit in several parts of the program.

I tested different combinations of optimization flags and all the problems were gone when I turned off inline functions with
gcc -O3 -fno-inline-functions or when I compiled with -O2

I tend to believe that this is a weird compiler bug. Has anybody experience this problem? A painful situation because that flag is important. Maybe this will disappear if I compile in the 64 bit version of the OS. I have not tried that yet. I have ubuntu 32 bits installed here.

the version I have is
gcc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)

May I should update gcc but I got lazy with the ubuntu packages (4.1.2 is the last one they offered).

Miguel
It also sounds a lot like an uninitialized data error as well. turning off inlining would alter the stack since it will be used more due to extra procedure calls.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: GCC bug?

Post by bob »

michiguel wrote:
Michel wrote:Hard to say. In my experience so-called gcc bugs are usually the result of subtly broken code.
That is why I am not 100% relaxed.
Did you compile with -Wall? Did you try -std=c99? Did you try the Intel compiler?
Did you try to run your program under valgrind?

Regards,
Michel
I have -Wall and -std=c99 and it compiles cleanly. I have asserts all over the place and they are quiet with this.

I did not try another compiler yet to check this problem. I do not have the intel compiler installed yet, but I think this will give me an excuse to finally download it. I will try to compile it in windows, because the code is portable. I will also try valgrind. I know I should try all this, but my message was somehow cathartic. :-)

Still, I cannot imagine a reason for this behavior. Why introducing a printf() in a speficic spot can cause to have a dozen or so more nodes counted in a 10 ply search? I have some sort of obsesive compulsive disorder when it comes to bugs :-) and I can't stand it.

Miguel
(1) do you compile as one giant source file so that the compiler can do the necessary dependency analysis to detect most unitialized data issues?

(2) this can also be a nasty pointer issue as well. Commonly caused by a bad array subscript which compilers have no clue about detecting.
Dann Corbit
Posts: 12781
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: GCC bug?

Post by Dann Corbit »

Did you feed your program source through lint?

Sometimes, there can be undefined behavior and no compiler or lint tool will see it.

I guess that undefined behavior is the most likely culprit.
tvrzsky
Posts: 128
Joined: Sat Sep 23, 2006 7:10 pm
Location: Prague

Re: GCC bug?

Post by tvrzsky »

michiguel wrote:It seems there is a problem with inlining in a version of GCC.

My engine has been always a bitboard engine in which the bitboard type was a structure with two 32 bit ints (that approach was faster than a 64 bit int in a 32 bit environment, but clumsy for writing code). I converted it to use 64 ints (long overdue) and it was not very painful (most of the operations AND, OR, NOT etc. were hidden in macros). To check that everything was fine, I checked the node counts from some positions and I found a little discrepancy. After debugging, I found that the error was gone if I introduced a printf() in a certain position (as part of the debugging process a compiler switch activated it to dump the whole tree). This drove me nuts because it did not make any sense. You can imagine my frustration because I did not see the bug when dumping the tree, but I saw it when I did not dumped the tree. I had to test which individual printf was the culprit in several parts of the program.

I tested different combinations of optimization flags and all the problems were gone when I turned off inline functions with
gcc -O3 -fno-inline-functions or when I compiled with -O2

I tend to believe that this is a weird compiler bug. Has anybody experience this problem? A painful situation because that flag is important. Maybe this will disappear if I compile in the 64 bit version of the OS. I have not tried that yet. I have ubuntu 32 bits installed here.

the version I have is
gcc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)

May I should update gcc but I got lazy with the ubuntu packages (4.1.2 is the last one they offered).

Miguel
I experienced similar problem few years ago when I noticed that my search tree in Winboard engine mode differs from the tree in "console mode" (my customary input/output). It was really hard to find the reason but after many months of hunting I finally localized the bug: some printf() commands in root node. And this bug manifested only in the case that printf format string contained any floating point variable output (which was also the difference between winboard and console mode). Unfortunately I am not able to recall neither which version of gcc I was using at this moment nor the substance and mechanism of this bug nor solution of this problem (I remember only that after checking of assembly I realized that it is a compiler bug).
Filip
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: GCC bug?

Post by michiguel »

bob wrote:
michiguel wrote:
Michel wrote:Hard to say. In my experience so-called gcc bugs are usually the result of subtly broken code.
That is why I am not 100% relaxed.
Did you compile with -Wall? Did you try -std=c99? Did you try the Intel compiler?
Did you try to run your program under valgrind?

Regards,
Michel
I have -Wall and -std=c99 and it compiles cleanly. I have asserts all over the place and they are quiet with this.

I did not try another compiler yet to check this problem. I do not have the intel compiler installed yet, but I think this will give me an excuse to finally download it. I will try to compile it in windows, because the code is portable. I will also try valgrind. I know I should try all this, but my message was somehow cathartic. :-)

Still, I cannot imagine a reason for this behavior. Why introducing a printf() in a speficic spot can cause to have a dozen or so more nodes counted in a 10 ply search? I have some sort of obsesive compulsive disorder when it comes to bugs :-) and I can't stand it.

Miguel
(1) do you compile as one giant source file so that the compiler can do the necessary dependency analysis to detect most unitialized data issues?
No, but I will try. I think there is a GCC switch that does that. -fwhole-program with -combine does something like that, right?

Thanks,
Miguel

(2) this can also be a nasty pointer issue as well. Commonly caused by a bad array subscript which compilers have no clue about detecting.