bob wrote:That is certainly a problem on the flavors of Linux I run, as I had to debug it about a week or so prior to 23.6 being released.
The problem I had was that there were obvious overwrites on a line here and there, and an occasional "gap" where no data was written to a few specific bytes leaving zeros/garbage there. Inspecting the logs would expose that. Most everything I had done regarding fprintf() was inside a lock if multiple threads could do output (something VERY rare in Crafty except for when debugging the SMP code). But printing the null-window fail high could happen from any thread and it caused problems until it was protected via a lock, then the problem disappeared permanently.
You mean this code?
Code: Select all
Lock(lock_io);
Print(2, " %2i %s %2s ", iteration_depth,
Display2Times(end_time - start_time), fh_indicator);
if (display_options & 64)
Print(2, "%d. ", move_number);
if ((display_options & 64) && !wtm)
Print(2, "... ");
Print(2, "%s! ", OutputMove(tree, tree->curmv[1], 1, wtm));
Print(2, "(%c%s) \n", (wtm) ? '>' : '<',
DisplayEvaluationKibitz(value, wtm));
Unlock(lock_io);
You're doing multiple Print()s here resulting in a single line of output, so yes, you have to lock. I said so from the start.
There is already a Lock(lock_root) around this piece of code, but I suppose other threads may issue Print()s from other places. Those other Print()s can get between the individual Print()s in the code above, resulting in corrupted lines. If you had looked carefully, you could have pieced the pieces back together.
bob wrote:So yes, I have unintentionally tried this, and I have seen the resulting corrupted log file with garbage in the middle, and lines partially overwriting previous (shorter) lines...
Of course, you're using multiple fprintf()s to write a single line....
bob wrote:The problem seems to not be the actual characters being written, it seems that fprintf() updates the file position pointer outside the lock. If thread A writes 120 bytes, then thread b writes 80 bytes, every now and then the file pointer comes out wrong.
Nothing to do with "file position pointers" getting confused. Your Print()s simply got interleaved with other Print()s.
If you had anything else than interchanged outputs from individual Print()s, it was not on a POSIX-compliant platform.