WinBoard 4.7.2 released

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Ferdy
Posts: 4840
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: WinBoard 4.7.2 released

Post by Ferdy »

hgm wrote:Just to be sure:

Is the debug from the last crash posted completely, or did the forum software clip off the end because it thought the posting was too long?

The other two seem to end in an incompletely printed line, which was supposed to print input it received from the (first or second) engine. That strongly suggests there is some memory corruption in what it tries to print. There could be a buffer overflow somewhere. What it tries to print should in itself be pretty innocent (a features command received from the engine).

I am afraid this will be pretty hard to find. It seems to crash while busy executing an fprintf. That can happen if the fprintf would somehow try to follow invalid pointers, e.g. with a %s format pointing to an invalid memory location that causes a segfault. But the strings it is in the middle of printing should not contain a %s there. In fact they already are the result of following a pointer for a %s format. It is hard to see how printing such a string could run into invalid memory, as just before that string must have been written in the same memory, when received from the engine. So if it was invalid, it should have segfaulted there, before even starting to print any of it.

Do you ever have any crashes when you run with only 4 instances on this tourney. (After 3 instances crashed, the remaining ones seem to go on pretty long.) And if you would add two more participants, so that a cycle last six games, would only the 7th instance crash? If this would be the case, it suggests that there somehow is a problem in starting a new cycle.
It is the forum software, here is the last part of debug.

Code: Select all

4349174 <second&#58; # moves_to_go = 30
4349174 <second&#58; # TimeLeft=45130.00ms. CT=4513ms, MT=2256.00ms  AT=1436.00ms.
4349174 <second&#58;   2 -1965      0         39 e1d2 e2e1q d2c2
4349174 <second&#58;   3 -1615      0        410 e1f2 e2e1q f2g2 a4b4
4349174 <second&#58;   4 -1634      0       1267 e1f2 e2e1q f2g2 b5c6 g2h3 a4b4
4349190 <second&#58;   5 -1648      1       4377 e1f2 e2e1q f2g2 e1e3 g2h2 a4b4
4349190 <second&#58;   6 -29990      1      15716 e1f2 e2e1q f2g2 e1e3 g2h1 b5c6 h1h2 e3f2 h2h3 c6d7
4349206 <second&#58;   7 -29990      3      30513 e1f2 e2e1q f2g2 e1e3 g2h1 b5c6 h1h2 e3f2 h2h3 c6d7
4349221 <second&#58;   8 -29990      4      53663 e1f2 e2e1q f2g2 e1e3 g2h1 b5c6 h1h2 e3f2 h2h3 c6d7
4349268 <second&#58;   9 -29990      9      98441 e1f2 e2e1q f2g2 e1e3 g2h1 b5c6 h1h2 e3f2 h2h3 c6d7
4350284 <second&#58;  10 -29990    111    1378768 e1f2 e2e1q f2g2 e1e3 g2h1 b5c6 h1h2 e3f2 h2h3 c6d7
4350612 <second&#58; move e1f2
4350612 >first &#58; time 3876
4350612 >first &#58; otim 4370
book hit = &#40;NULL&#41;
4350612 >first &#58; usermove 4350612 >first &#58; e1f2
silence
4350612 <first &#58; 1 +9991 0 21 e1=Q+ Kxg2 Qe3
4350612 <first &#58; 1 +0 0 0 +Mate_5
4350612 <first &#58; 2 +9991 0 42 e1=Q+ Kxg2 Qe3
4350612 <first &#58; 2 +0 0 0 +Mate_5
4350612 <first &#58; 3 +9991 0 63 e1=Q+ Kxg2 Qe3
4350612 <first &#58; 3 +0 0 0 +Mate_5
4350612 <first &#58; 4 +9991 0 84 e1=Q+ Kxg2 Qe3
4350612 <first &#58; 4 +0 0 0 +Mate_5
4350612 <first &#58; 5 +9991 0 105 e1=Q+ Kxg2 Qe3
4350612 <first &#58; 5 +0 0 0 +Mate_5
4350612 <first &#58; 6 +9991 0 362 e1=Q+ Kxg2 Qe3
4350612 <first &#58; 6 +0 0 0 +Mate_5
4350612 <first &#58; 7 +9991 0 628 e1=Q+ Kxg2 Qe3
4350612 <first &#58; 7 +0 0 0 +Mate_5
4350612 <first &#58; 8 +9991 0 631 e1=Q+ Kxg2 Qe3
4350612 <first &#58; 8 +0 0 0 +Mate_5
4350612 <first &#58; 8 +9991 0 1050 e1=Q+ Kxg2 Qe3
4350612 <first &#58; 8 +0 0 0 +Mate_5
4350612 <first &#58; 9 +9991 0 1053 e1=Q+ Kxg2 Qe3
4350612 <first &#58; 9 +0 0 0 +Mate_5
4350612 <first &#58; 9 +9991 0 2487 e1=Q+ Kxg2 Qe3
4350612 <first &#58; 9 +0 0 0 +Mate_5
4350612 <first &#58; 10 +9991 0 2490 e1=Q+ Kxg2 Qe3
4350612 <first &#58; 10 +0 0 0 +Mate_5
4350612 <first &#58; 10 +9991 0 5864 e1=Q+ Kxg2 Qe3
4350612 <first &#58; 10 +0 0 0 +Mate_5
4350612 <first &#58; 11 +9991 0 5867 e1=Q+ Kxg2 Qe3
4350612 <first &#58; 11 +0 0 0 +Mate_5
4350612 <first &#58; 11 +9991 0 11270 e1=Q+ Kxg2 Qe3
4350612 <first &#58; 11 +0 0 0 +Mate_5
4350612 <first &#58; 11 +9991 0 11270 e1=Q+ Kxg2 Qe3
4350612 <first &#58; move e2e1q
4350612 >second&#58; time 4370
4350612 >second&#58; otim 3876
book hit = &#40;NULL&#41;
4350612 >second&#58; e2e1q
silence
4350628 <second&#58; # moves_to_go = 29
4350628 <second&#58; index i has value equal to 0
4350628 <second&#58; # TimeLeft=43700.00ms. CT=4370ms, MT=2185.00ms  AT=1437.00ms.
4350628 <second&#58;   2 -1632      0         92 f2g2
4350628 <second&#58;   2 -1615      0        139 f2g2 a4b4
4350628 <second&#58;   3 -1634      1        555 f2g2 b5c6 g2h3 a4b4
4350628 <second&#58;   4 -1658      1       2463 f2g2 e1e3 g2h1 a4b4
4350628 <second&#58;   5 -29992      1       9235 f2g2 e1e3 g2h1 b5c6 h1h2 e3f2 h2h3 c6d7
4350628 <second&#58;   6 -29992      1      20223 f2g2 e1e3 g2h1 b5c6 h1h2 e3f2 h2h3 c6d7
4350643 <second&#58;   7 -29992      3      35753 f2g2 e1e3 g2h1 b5c6 h1h2 e3f2 h2h3 c6d7
4350643 <second&#58;   8 -29992      3      52842 f2g2 e1e3 g2h1 b5c6 h1h2 e3f2 h2h3 c6d7
4350659 <second&#58;   9 -29992      4      94322 f2g2 e1e3 g2h1 b5c6 h1h2 e3f2 h2h3 c6d7
4352049 <second&#58; move f2g2
4352049 >first &#58; time 3876
4352049 >first &#58; otim 4226
book hit = &#40;NULL&#41;
4352049 >first &#58; usermove 4352049 >first &#58; f2g2
silence
4352049 <first &#58; 1 +2251 0 158 Qe3
4352049 <first &#58; 2 +9993 0 405 Qe3 Kh1 Bc6+
4352049 <first &#58; 2 +0 0 0 +Mate_4
4352049 <first &#58; 3 +9993 0 650 Qe3 Kh1 Bc6+
4352049 <first &#58; 3 +0 0 0 +Mate_4
4352049 <first &#58; 4 +9993 0 895 Qe3 Kh1 Bc6+
4352049 <first &#58; 4 +0 0 0 +Mate_4
4352049 <first &#58; 5 +9993 0 1140 Qe3 Kh1 Bc6+
4352049 <first &#58; 5 +0 0 0 +Mate_4
4352049 <first &#58; 6 +9993 0 1660 Qe3 Kh1 Bc6+
4352049 <first &#58; 6 +0 0 0 +Mate_4
4352049 <first &#58; 7 +9993 0 2210 Qe3 Kh1 Bc6+
4352049 <first &#58; 7 +0 0 0 +Mate_4
4352049 <first &#58; 8 +9993 0 2213 Qe3 Kh1 Bc6+
4352049 <first &#58; 8 +0 0 0 +Mate_4
4352049 <first &#58; 8 +9993 0 2854 Qe3 Kh1 Bc6+
4352049 <first &#58; 8 +0 0 0 +Mate_4
4352049 <first &#58; 9 +9993 0 2857 Qe3 Kh1 Bc6+
4352049 <first &#58; 9 +0 0 0 +Mate_4
4352049 <first &#58; 9 +9993 0 4428 Qe3 Kh1 Bc6+
4352049 <first &#58; 9 +0 0 0 +Mate_4
4352049 <first &#58; 9 +9993 0 4428 Qe3 Kh1 Bc6+
4352049 <first &#58; move e1e3
4352049 >second&#58; time 4226
4352049 >second&#58; otim 3876
book hit = &#40;NULL&#41;
4352049 >second&#58; e1e3
silence
4352065 <second&#58; # moves_to_go = 28
4352065 <second&#58; # TimeLeft=42260.00ms. CT=4226ms, MT=2113.00ms  AT=1437.00ms.
4352065 <second&#58;   2 -1648      0        102 g2h2
4352065 <second&#58;   2 -1648      0        141 g2h2 a4b4
4352065 <second&#58;   3 -29994      1        671 g2h1 b5c6 h1h2 e3f2 h2h3 c6d7
4352065 <second&#58;   4 -29994      1        797 g2h1 b5c6 h1h2 e3f2 h2h3 c6d7
4352065 <second&#58;   5 -29994      1       1039 g2h1 b5c6 h1h2 e3f2 h2h3 c6d7
4352065 <second&#58; move g2h1
4352065 >first &#58; time 3876
4352065 >first &#58; otim 4224
book hit = &#40;NULL&#41;
4352065 >first &#58; usermove 4352065 >first &#58; g2h1
silence
4352065 <first &#58; 1 +9995 0 46 Bc6+ Kh2 Qf2+
4352065 <first &#58; 1 +0 0 0 +Mate_3
4352065 <first &#58; 2 +9995 0 91 Bc6+ Kh2 Qf2+
4352065 <first &#58; 2 +0 0 0 +Mate_3
4352065 <first &#58; 3 +9995 0 136 Bc6+ Kh2 Qf2+
4352065 <first &#58; 3 +0 0 0 +Mate_3
4352065 <first &#58; 4 +9995 0 181 Bc6+ Kh2 Qf2+
4352065 <first &#58; 4 +0 0 0 +Mate_3
4352065 <first &#58; 5 +9995 0 226 Bc6+ Kh2 Qf2+
4352065 <first &#58; 5 +0 0 0 +Mate_3
4352065 <first &#58; 6 +9995 0 610 Bc6+ Kh2 Qf2+
4352065 <first &#58; 6 +0 0 0 +Mate_3
4352065 <first &#58; 7 +9995 0 995 Bc6+ Kh2 Qf2+
4352065 <first &#58; 7 +0 0 0 +Mate_3
4352065 <first &#58; 7 +9995 0 995 Bc6+ Kh2 Qf2+
4352065 <first &#58; move b5c6
4352065 >second&#58; time 4224
4352065 >second&#58; otim 3876
book hit = &#40;NULL&#41;
4352065 >second&#58; b5c6
silence
4352065 <second&#58; # moves_to_go = 27
4352065 <second&#58; # TimeLeft=42240.00ms. CT=4224ms, MT=2112.00m
I am running now with 4 instances.
User avatar
hgm
Posts: 27866
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: WinBoard 4.7.2 released

Post by hgm »

I suppose that last line is also clipped? (Originally ending in ms rather than m?)

One of the strange things here is that it now seems to crash during function calls that would not even be made when you would not create the debug file. Although fprintf woulld of course be called for other purposes as well (e.g. saving the PGN, updating the tourney file). So originally it could also have crashed in there.

It is hard to believe something could be wrong in fprintf, however. OTOH, I think we can exclude that the strings it tries to print here are in inaccessible memory. It is almost like there is a bug in the DLL that contains fprintf that would make it crash if too many programs try to print something simultaneously. This is why I am curious for the result of a test with only 4 instances. (The limit could be more than 4, as the engines might also be using fprintf from the same DLL, and in fact are pretty likely to also print something at the time the crash occurs.)
Ferdy
Posts: 4840
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: WinBoard 4.7.2 released

Post by Ferdy »

I suppose that last line is also clipped? (Originally ending in ms rather than m?)
It is what it is in the debug file.

After 84 games using 4 instances there are no crashes. I stopped the tourney and I will try the the next setup by adding 2 additional engines and using 7 instances.
User avatar
hgm
Posts: 27866
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: WinBoard 4.7.2 released

Post by hgm »

Ferdy wrote:
I suppose that last line is also clipped? (Originally ending in ms rather than m?)
It is what it is in the debug file.
Yes, but what I mean, is "is it less then what the engine had sent?". This was an engine-specific debug message, so only you can know how it was supposed to look. Is what the debug file contains the complete message, or does it look truncated to you?
Ferdy
Posts: 4840
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: WinBoard 4.7.2 released

Post by Ferdy »

hgm wrote:
Ferdy wrote:
I suppose that last line is also clipped? (Originally ending in ms rather than m?)
It is what it is in the debug file.
Yes, but what I mean, is "is it less then what the engine had sent?". This was an engine-specific debug message, so only you can know how it was supposed to look. Is what the debug file contains the complete message, or does it look truncated to you?
Yes, it is less than what the engine had sent.
Ferdy
Posts: 4840
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: WinBoard 4.7.2 released

Post by Ferdy »

hgm wrote:Just to be sure:

Is the debug from the last crash posted completely, or did the forum software clip off the end because it thought the posting was too long?

The other two seem to end in an incompletely printed line, which was supposed to print input it received from the (first or second) engine. That strongly suggests there is some memory corruption in what it tries to print. There could be a buffer overflow somewhere. What it tries to print should in itself be pretty innocent (a features command received from the engine).

I am afraid this will be pretty hard to find. It seems to crash while busy executing an fprintf. That can happen if the fprintf would somehow try to follow invalid pointers, e.g. with a %s format pointing to an invalid memory location that causes a segfault. But the strings it is in the middle of printing should not contain a %s there. In fact they already are the result of following a pointer for a %s format. It is hard to see how printing such a string could run into invalid memory, as just before that string must have been written in the same memory, when received from the engine. So if it was invalid, it should have segfaulted there, before even starting to print any of it.

Do you ever have any crashes when you run with only 4 instances on this tourney. (After 3 instances crashed, the remaining ones seem to go on pretty long.) And if you would add two more participants, so that a cycle last six games, would only the 7th instance crash? If this would be the case, it suggests that there somehow is a problem in starting a new cycle.
With additional 2 participants and using 7 instances, 3 had crashed, 4 are still running.
And it is strange, the 3 * in tourney file have no equivalent debug files.
User avatar
hgm
Posts: 27866
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: WinBoard 4.7.2 released

Post by hgm »

Well, I guess that can happen. It reserves the game (by writing * in the tourney file result string) when the previous game ends, and it switches the debug file only when the new game starts. The incomplete lines are probably at the end of the debug file of the previous game they played. (But it is not easy to figure out what game that would be.)

It could be significant they all crash in the same very short time window, however.

What happens if you run two different tourneys at once (say both with 4 engines, 4 WinBoard instances playing for one, and 3 for the other). Or if you run 7 completely independent tourneys each with one WinBoard instance simultaneously?
Ferdy
Posts: 4840
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: WinBoard 4.7.2 released

Post by Ferdy »

What happens if you run two different tourneys at once (say both with 4 engines, 4 WinBoard instances playing for one, and 3 for the other).
No problem so far, after 10 games and 30 games for 1 and 3 instances respectively for 2 different tourneys.
User avatar
hgm
Posts: 27866
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: WinBoard 4.7.2 released

Post by hgm »

OK, but that is 4 in total. So far everything is consistent with the hypothesis that running upto 4 WinBoard instances in parallel does not lead to any problems, but running 5 or more causes crashing.

What I want to establish is if this has anything with the tournament manager code I added to WinBoard. If we still see crashing when you run more than 4 independent old-fashioned two-player matches in parallel, I can rule out a very large part of the code, (like accessing the tourney file, as there would be none).
Ferdy
Posts: 4840
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: WinBoard 4.7.2 released

Post by Ferdy »

What I want to establish is if this has anything with the tournament manager code I added to WinBoard. If we still see crashing when you run more than 4 independent old-fashioned two-player matches in parallel, I can rule out a very large part of the code, (like accessing the tourney file, as there would be none).
No crashes so far, I started with 4 wb instances, after 1 game each, started the 5th instance, and after 4 games of this instance, started 6th and after 2 games of this instance started 7th instance, until now no crashes after 8 games of 7th instance.

I use the following in the startup dialogue with different engines and changing the debug files, t1.deb, t2.deb and so on.

Code: Select all

/mg 50 /tc 1 /mps 40 /inc -1 /sgf test1.pgn /lgf silver50.pgn /lgi -2 /debug /debugfile t1.deb