Indeed, this is really annoying. Perhaps you should switch off -debug again to verify that the probem still regularly occurs. (I understand you made the engine ignore the undo commands now, so that it would not cost you any games or angry ICS admins.)
Behavior like this is usually associated with use of an uninitialized local variable, which gets an altered value when the fprintf used to maintain the log alters the stack content. I could try to make a special XBoard version that keeps a log in a more transparent way (copying all ICS and engine communication to a circular global buffer, and only dumping it to a file when the 'undo' is sent). Just copying stuff to memory usually has no side effects.
Bug in xboard 4.4.4; banned for ICC rated play!
Moderator: Ras
-
- Posts: 28391
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: xboard debug file w/FICS
The first thing I'm going to try is moving from a Core 2 Duo machine with 8 GB RAM to a Core i3 machine with 16 GB RAM. Should be no difference, right?hgm wrote:Indeed, this is really annoying. Perhaps you should switch off -debug again to verify that the probem still regularly occurs. (I understand you made the engine ignore the undo commands now, so that it would not cost you any games or angry ICS admins.)
Behavior like this is usually associated with use of an uninitialized local variable, which gets an altered value when the fprintf used to maintain the log alters the stack content. I could try to make a special XBoard version that keeps a log in a more transparent way (copying all ICS and engine communication to a circular global buffer, and only dumping it to a file when the 'undo' is sent). Just copying stuff to memory usually has no side effects.
If that works, I'll move it to PowerPC machine with OS/X 10.4.11 (about four years old) and try again. And then maybe to a 32 bit Linux box.
I would suggest instead of fiddling with the logger you might try substituting a super paranoid malloc/free library. This might pay off immediately and not after days of testing.
-
- Posts: 28391
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: xboard debug file w/FICS
Well, this might help for solving the malloc problem, but not for the 'undo'. And I am not sure whether such a paranoic malloc/free library exist, or where I should get one. I could write my own, of course, but this might make the error go away again. In fact using another library might already be enough to make the error go away (which is in fact the mostlikely explanation why you have this crash, and I don't). So perhaps what I should do is just write wrappers for the standard library, which keep a log of what is allocated (address and size) and what is freed, and perhaps a checksum of the 8 bytes just before the allocated address, so that it can be checked on free if the size info was corrupted.sje wrote:I would suggest instead of fiddling with the logger you might try substituting a super paranoid malloc/free library. This might pay off immediately and not after days of testing.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: xboard debug file w/FICS
Looking at the 4.5.2a code, I count 76 calls to free() and 28 calls to malloc() although I may have missed a few. I'm from the old school of programming where each malloc() call should have a single free() call and vice versa. Perhaps enforcing this on the code might get rid of the modify-after-change bug. It should also be easier to debug.hgm wrote:Well, this might help for solving the malloc problem, but not for the 'undo'. And I am not sure whether such a paranoic malloc/free library exist, or where I should get one. I could write my own, of course, but this might make the error go away again. In fact using another library might already be enough to make the error go away (which is in fact the mostlikely explanation why you have this crash, and I don't). So perhaps what I should do is just write wrappers for the standard library, which keep a log of what is allocated (address and size) and what is freed, and perhaps a checksum of the 8 bytes just before the allocated address, so that it can be checked on free if the size info was corrupted.
Long, long ago I wrote a sbrk()/malloc()/free() library but have long since lost the source. One thing I remember was inserting a head and tail region on the allocated block that had magic and length recorded. Each free() call checked the magic and length on both ends and then zeroed the whole region.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: xboard debug file w/FICS
Source for magic: http://en.wikipedia.org/wiki/Hexspeak
I remember the dead beef constant from my undergraduate days.
I remember the dead beef constant from my undergraduate days.
-
- Posts: 28391
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: xboard debug file w/FICS
Well, this will be hard to acheive, because in XBoard routines that return strings often return themin malloc'ed memory. And the routine can then be called from many places (presumably the reason for making it intoa routine in the first place), and in all these places the returned memory has to be freed when you are done with it.sje wrote:Looking at the 4.5.2a code, I count 76 calls to free() and 28 calls to malloc() although I may have missed a few. I'm from the old school of programming where each malloc() call should have a single free() call and vice versa. Perhaps enforcing this on the code might get rid of the modify-after-change bug. It should also be easier to debug.
[/quote]Long, long ago I wrote a sbrk()/malloc()/free() library but have long since lost the source. One thing I remember was inserting a head and tail region on the allocated block that had magic and length recorded. Each free() call checked the magic and length on both ends and then zeroed the whole region.[/quote]
I guess that Windows XP must do something like that, because freeing of non-malloc'ed memory never seems to lead to crashed there.
I will cook up a special test version, and push it to my on-line repository later this afternoon.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: xboard debug file w/FICS
The memory of the old code is slowly returning; I also recall writing an explicit sanity checker routine that would run through both of the used and free region lists checking all pointers, lengths, and magics on its journey. Quite useful.
But in the long run, having to deal with C allocation bugs was just another reason to move to C++ with explicit constructors and destructors.
But in the long run, having to deal with C allocation bugs was just another reason to move to C++ with explicit constructors and destructors.
-
- Posts: 28391
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: xboard debug file w/FICS
OK, I pushed a test version to the 'sje' branch of my on-line repository. It contains wrappers for the malloc/free routines:
It produces the debug output on stdout, so it is probably best to run it as "./xboard > errlog". It aborts XBoard on a faulty free.
Strangely enough it already aborts when I load a large PGN with 41k games in it (this heavily uses malloc, as the routine to parse a PGN tag mallocs memory to hold it).Problem is that it does not always abort, and when it aborts, it aborts in a different place every time. Like it doesn't really have to do anything with the task at hand, but in stead is due to some asynchronous process changing memory bytes randomly (or a hardware memory failure...).
Output from just before a typical crash:
Behind every malloc'ed address I print the two words just before the allocated area in hex. It seems the word just before it contains the allocated area length + 1, based on where it allocates the next fresh area. For this area it is 0x38, and the area is freed almost immediately after use. Every time it needs an area of this size (to store a PGN tag) it re-uses this particular one, if I go back through the output I see this particular area at 0x90e3be8 has been used zillions of times before. But now the free find an area length of 0x38, while on the malloc of it just before it was 0x39.
The word before it has changed from 0 to 0x18. I am not sure this word is part of the allocation scheme, though.Sometimes it seems to contain ASCII data: 686369 = ich and a5d22 = "]\n, a likely part of a player name or a PGN tag. The memory just before the corrupted length field does not look like there has been overrun from the previous allocated area: 35 22 5d 0a = 5"]\n , a likely end of a PGN tag (the rest is apparently damaged when this area was freed just before the crash).
Code: Select all
#define ALLOCSIZE 1024*1024
typedef struct {
void *address;
int checksum;
} AllocInfo;
AllocInfo allocInfo[ALLOCSIZE];
int allocPtr = 0;
void *MyMalloc(int size)
{
void *res = malloc(size);
allocInfo[allocPtr].address = res;
allocInfo[allocPtr].checksum = ((int*)res)[-1] + ((int*)res)[-2];
if(allocPtr < ALLOCSIZE) allocPtr++; else printf("overflow\n"), exit(0);
printf("malloc %x %x,%x\n", res,((int*)res)[-2], ((int*)res)[-1]);
return res;
}
void *MyCalloc(int nr, int size)
{
void *res = calloc(nr, size);
allocInfo[allocPtr].address = res;
allocInfo[allocPtr].checksum = ((int*)res)[-1] + ((int*)res)[-2];
if(allocPtr < ALLOCSIZE) allocPtr++; else printf("overflow\n"), exit(0);
printf("calloc %x\n", res);
return res;
}
void MyFree(void *addr)
{
int i, k;
char buf[100], *p;
for(i=0; i<=allocPtr; i++) {
if(allocInfo[i].address == addr) {
if(allocInfo[i].checksum != ((int*)addr)[-1] + ((int*)addr)[-2]) {
printf("corrupted free %x, old=%x, new=%x\n", addr, allocInfo[i].checksum, ((int*)addr)[-1] + ((int*)addr)[-2]);
strncpy(buf, addr, 70); buf[70] = 0;
printf("content = '%s'\n", buf);
printf("before =");
for(k=0; k<32; k++) printf(" %02x",((char*)addr)[k-32]);
printf("\n");
break;
} else {
allocInfo[i] = allocInfo[--allocPtr];
printf("free %x\n", addr);
free(addr);
return;
}
}
}
printf("bad free %x\n", addr);
exit(0);
}
Strangely enough it already aborts when I load a large PGN with 41k games in it (this heavily uses malloc, as the routine to parse a PGN tag mallocs memory to hold it).Problem is that it does not always abort, and when it aborts, it aborts in a different place every time. Like it doesn't really have to do anything with the task at hand, but in stead is due to some asynchronous process changing memory bytes randomly (or a hardware memory failure...).
Output from just before a typical crash:
Code: Select all
malloc 90e3bd0 0,19
malloc 90f4be8 0,31
free 90e3bd0
malloc 90f4c18 a5d22,51
free 90f4be8
malloc 90f4c68 a5d22,61
malloc 90f4be8 0,31
malloc 90f4cc8 0,11
malloc 90f4cd8 0,11
malloc 90f4ce8 35,11
malloc 90f4cf8 0,29
malloc 90f4d20 0,21
malloc 90e3bd0 0,19
malloc 90f4d40 6863,31
free 90e3bd0
malloc 90f4d70 a,51
free 90f4d40
malloc 90f4dc0 0,61
malloc 90f4d40 6863,31
malloc 90f4e20 0,11
malloc 90f4e30 0,11
malloc 90f4e40 35,11
malloc 90f4e50 0,21
malloc 90f4e70 686369,29
malloc 90e3bd0 0,19
malloc 90f4e98 0,31
free 90e3bd0
malloc 90f4ec8 0,51
free 90f4e98
malloc 90f4f18 0,61
malloc 90f4e98 0,31
malloc 90f4f78 0,11
malloc 90f4f88 0,11
malloc 90f4f98 35,11
malloc 90f4fa8 0,29
malloc 90f4fd0 0,29
malloc 90e3bd0 0,19
malloc 90e3be8 0,39
free 90e3bd0
malloc 90f4ff8 0,51
corrupted free 90e3be8, old=39, new=50
content = '[Board "5"]
[WhiteTeam "Vologda region"]
'
before = 00 00 00 00 19 00 00 00 00 24 ffffffae 00 00 24 ffffffae 00 35 22 5d 0a 00 00 00 00 18 00 00 00 38 00 00 00
bad free 90e3be8
The word before it has changed from 0 to 0x18. I am not sure this word is part of the allocation scheme, though.Sometimes it seems to contain ASCII data: 686369 = ich and a5d22 = "]\n, a likely part of a player name or a PGN tag. The memory just before the corrupted length field does not look like there has been overrun from the previous allocated area: 35 22 5d 0a = 5"]\n , a likely end of a PGN tag (the rest is apparently damaged when this area was freed just before the crash).
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: xboard debug file w/FICS
Here's a technique I used when writing MyMalloc() and MyFree(), adjusted below for 64 bit systems:
1) Two separate two-way linked lists freelest and usedlist; both start empty.
2) Four global static pointers: usedlisthead usedlisttail freelisthead freelisttail; all start as 0 (NULL).
3) MyMalloc(size_t length) allocates a region sized as header + length + tailer; header includes prev/next pointers for the list, the requested length, and also eight bytes magic 0xa5a5a5a5a5a5a5a5; tailer has the same eight byte magic followed by a copy of the length and the prev/next pointers. Both header and tailer are 40 bytes long. MyMalloc tries to grab a block from the free list; if it can't than malloc() is called to put a usable block on the free list and MyMalloc than grabs that. Before returning, MyMalloc connects the block to the used list.
4) MyFree(void *ptr) checks everything tha can be checked and then clears the block user data before moving the chunk to the free list.
5) SanityCheck() runs through both lists and checks everything that can be checked. Extra paranoia can be handled by calling SanityCheck at the beginning of MyMalloc and at the end of MyFree.
6) At the end of the program, the used list can be tested to see if all allocated objects have been freed per good programming practice. For extra points, the free list can be deallocated by calls to free.
NOTE: the pointer returned from MyMalloc is actually the pointer value returned from the original malloc call PLUS the length of the header (40 bytes).
NOTE: for thread safety, list access must be guarded by a mutex (spinlock is best here).
1) Two separate two-way linked lists freelest and usedlist; both start empty.
2) Four global static pointers: usedlisthead usedlisttail freelisthead freelisttail; all start as 0 (NULL).
3) MyMalloc(size_t length) allocates a region sized as header + length + tailer; header includes prev/next pointers for the list, the requested length, and also eight bytes magic 0xa5a5a5a5a5a5a5a5; tailer has the same eight byte magic followed by a copy of the length and the prev/next pointers. Both header and tailer are 40 bytes long. MyMalloc tries to grab a block from the free list; if it can't than malloc() is called to put a usable block on the free list and MyMalloc than grabs that. Before returning, MyMalloc connects the block to the used list.
4) MyFree(void *ptr) checks everything tha can be checked and then clears the block user data before moving the chunk to the free list.
5) SanityCheck() runs through both lists and checks everything that can be checked. Extra paranoia can be handled by calling SanityCheck at the beginning of MyMalloc and at the end of MyFree.
6) At the end of the program, the used list can be tested to see if all allocated objects have been freed per good programming practice. For extra points, the free list can be deallocated by calls to free.
NOTE: the pointer returned from MyMalloc is actually the pointer value returned from the original malloc call PLUS the length of the header (40 bytes).
NOTE: for thread safety, list access must be guarded by a mutex (spinlock is best here).
-
- Posts: 28391
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: xboard debug file w/FICS
Yes, but the purpose is not to write a new malloc, but to reproduce an error that occur with the existing one, and print diagnostics when it occurs. So in any case I want to exactly the same malloc calls as would be done when you call it directly.