CCT Logon

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: CCT Logon (xboard 4.4.x bug)

Post by bob »

marcelk wrote:
hgm wrote:I don't seem to be able to provoke any hangs now. Perhaps it depends on timeof day / server load. nfortunately people start games against FairyMax all the time (and manage to lose... :lol: ), preventing me to start new test games.

I wanted to redo the test, because it is not completely conclusive: if the XBoard ICS-input thread somehow hangs, it would probably also not print the input. And I had not tried to type any commands from the hanging player, to see if there was echo / response.

So this time I added a print statement to the debug every time the input routine returns. But alas, no more hangs...
If it is FICS eating moves then other interfaces should suffer just as much, but they don't. I for example have 3 accounts continuously playing under robofics/icsdrone clones, and no issues with missing moves, never ever, for more than a decade... Also FICS has been up for almost 500 days, so any server side issue must have been there for a while. Further consider that FICS is still a single-threaded server (still handling 2000 connections, bravo!), its behavior would be very predictable and repeatable. Any bug must have been seen frequently before.

My money is that something is happening inside xboard. Not a lot of 24/7 online computers run under xboard at all. They only show up in these CC tourneys.
Crafty certainly qualifies as 24/7, and on ICC there have been zero hangs over the past 3 years that I have logs for. And up until this event, I have not had problems on FICS. I used both xboard versions, 4.4.4 and 4.2.6, and both hung, 4.4.4 on Sat and 4.2.6 yesterday in fifth and 7th rounds. It is certainly possible that we are seeing a race condition somewhere. Linux is pretty quick on process scheduling compared to windows so a race between input and output is possible, as is a buffer overflow or whatever else might happen.

But xboard has been working flawlessly for me for years, right up until Saturday when things went to hell in a handbasket. And the only thing that changed was that ICC was replaced by FICS. Little doubt that there is some difference between them that is causing a problem. Unfortunately, going to a 5+ year old version of xboard (I think the last one released by Tim Mann in fact) did not fix the problem.,..
User avatar
marcelk
Posts: 348
Joined: Sat Feb 27, 2010 12:21 am

Re: CCT Logon (xboard 4.4.x bug)

Post by marcelk »

bob wrote: Crafty certainly qualifies as 24/7,
Not on FICS. It is hardly ever there and RD is over 100.
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: CCT Logon (xboard 4.4.x bug)

Post by jdart »

marcelk wrote: My money is that something is happening inside xboard. Not a lot of 24/7 online computers run under xboard at all. They only show up in these CC tourneys.
I'm not 24/7 but Arasan has been running on fics using Linux + xboard for many months now, playing hundreds of games. It worked ok until I switched to another machine in a different location.
mridul
Posts: 14
Joined: Sun Jan 23, 2011 1:41 pm

Re: CCT Logon (xboard 4.4.x bug)

Post by mridul »

Since you mentioned, just curious (and slightly OT), why did cct hosting move from icc to fics ? I am horribly out of touch with CC for a while now, so this might be an obvious q !


Thanks,
Mridul
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: CCT Logon (xboard 4.4.x bug)

Post by bob »

marcelk wrote:
bob wrote: Crafty certainly qualifies as 24/7,
Not on FICS. It is hardly ever there and RD is over 100.

I am not talking about ICS. I am talking about ICC. Where it is on almost 100% of the time, year after year. With _zero_ hangs with either 4.2.6 or 4.4.x...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: CCT Logon (xboard 4.4.x bug)

Post by bob »

mridul wrote:Since you mentioned, just curious (and slightly OT), why did cct hosting move from icc to fics ? I am horribly out of touch with CC for a while now, so this might be an obvious q !


Thanks,
Mridul
I think because of the way ICC deals with guest accounts... On one hand, you have a server that is trying to make money. On the other hand you have one that is free. While free is good, you often getting what you pay for...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: CCT Logon (xboard 4.4.x bug)

Post by bob »

jdart wrote:
marcelk wrote: My money is that something is happening inside xboard. Not a lot of 24/7 online computers run under xboard at all. They only show up in these CC tourneys.
I'm not 24/7 but Arasan has been running on fics using Linux + xboard for many months now, playing hundreds of games. It worked ok until I switched to another machine in a different location.
Which leads me to believe it is timing related. We have an impossibly fast internet connection at UAB. From my machine it is gigabit ethernet all the way to our gigabit connection to internet2. Typical ping times to FICS is around 60ms at busy times, better at other times.

The idea of a non-threaded server handling 2,000 connections certainly leaves room for some "indefinite postponement" due to the way you use FD_ISSET() to test each possible connection to see if there is data available or not...
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: CCT Logon (xboard 4.4.x bug)

Post by Michel »

I can confirm what Marcel van Kervinck was saying. I have been running icsdrone bots on FICS for several years. Barring network outages, they essentially stay logged in forever. Never a loss for suspicious reasons. In short I have never seen any unpredicable behaviour from FICS.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: CCT Logon (xboard 4.4.x bug)

Post by michiguel »

hgm wrote:It is difficult to know for sure, as the error seems to have gone away now. I played dozens of bullet games, no hangs. This morning and yesterday the first or second game was a hang. I didn't change XBoard (I even went back to the bersion without the extra debug prints, to make sure tese did not cause the error to go away...)

Tomorrow morning I will try again, to see if it is time-of-day related.
In my older machine, ubuntu 10.04 64 bits, ~5 out of 6 times FICS froze everytime I tried to connect on Friday, as I mentioned. I went back to see if I could reproduce the problem... I can't. I repeated the connection loop and it logs in perfectly everytime now.

Heisenbug?

Miguel
User avatar
hgm
Posts: 27795
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: CCT Logon (xboard 4.4.x bug)

Post by hgm »

The problem is that all this reasoning is based on logic, and logic does not apply, as we are dealing with computers...

This morning I could again provoke hangs. I have modified the lowest-level I/O callback in XBoard to print something when it returns, to make sure I can exclude the possibility that it hangs in some high-level processing routine for the input:

Code: Select all

void
DoInputCallback(closure, source, xid)
     caddr_t closure;
     int *source;
     XtInputId *xid;
{
    InputSource *is = (InputSource *) closure;
    int count;
    int error;
    char *p, *q;

    if (is->lineByLine) {
	count = read(is->fd, is->unused,
		     INPUT_SOURCE_BUF_SIZE - (is->unused - is->buf));
	if &#40;count <= 0&#41; &#123;
	    &#40;is->func&#41;&#40;is, is->closure, is->buf, count, count ? errno &#58; 0&#41;;
fprintf&#40;debugFP, "bad read\n");
	    return;
	&#125;
	is->unused += count;
	p = is->buf;
	while &#40;p < is->unused&#41; &#123;
	    q = memchr&#40;p, '\n', is->unused - p&#41;;
	    if &#40;q == NULL&#41; break;
	    q++;
	    &#40;is->func&#41;&#40;is, is->closure, p, q - p, 0&#41;;
	    p = q;
	&#125;
	q = is->buf;
	while &#40;p < is->unused&#41; &#123;
	    *q++ = *p++;
	&#125;
	is->unused = q;
    &#125; else &#123;
	count = read&#40;is->fd, is->buf, INPUT_SOURCE_BUF_SIZE&#41;;
	if &#40;count == -1&#41;
	  error = errno;
	else
	  error = 0;
	&#40;is->func&#41;&#40;is, is->closure, is->buf, count, error&#41;;
    &#125;
fprintf&#40;debugFP, "successful read from %x\n", is&#41;; // ********** Added this *********
&#125;
When the hang occurs, it seems communication to FICS is one way: I can type tell commands, they arrive at the opponent but are not echoed. Tells by the opponent do not arrive, neither am I informed if he resigns the hanging game, or quits FICS altogether,

The last part of the debug file is:
xboard.debug wrote:>ICS: e5h8\015\012
>ICS: kibitz !!! +0.79/9 (0.23 sec, 60637 nodes, 262 knps) PV=e5h8 g4h6 h8e5 h6f7 e5f6 e4e5 d6e5 a2a3 e7e6\015\012
nodes = 60637, 60637
move: e5h8
, parse: Bh8 (
)
AnimateMove: piece 24 slides from 4,4 to 7,7
successful read from 8344e90 // ***** This terminates read from the engine, responsible for move and kibitz to be sent to ICS
<ICS: \007\012\015<12> -------b ----p--- ---p---- --k---p- ---nPpN- p-BP-P-- K-----P- -------- W -1 0 0 0 0 4 19 FairyMax WBtester -1 1 0 10 11 6683 13762 49 B/e5-h8 (0:00.432) Bh8 1 1 0\012\015fics%
ics input 96, castling = -1 -1 -1 -1 -1 -1
wrap(count:1,width:80,line:8,len:1,*lp:7,src: \007
dest: \007
Parsing board: -------b ----p--- ---p---- --k---p- ---nPpN- p-BP-P-- K-----P- -------- W -1 0 0 0 0 4 19 FairyMax WBtester -1 1 0 10 11 6683 13762 49 B/e5-h8 (0:00.432) Bh8 1 1 0

load 8x8 board
parseboard 96, castling = -1 -1 -1 -1 -1 -1
accepted move Bh8 from ICS, parse it.
moveNum = 96
board = 0-8 x 8
move to parse: Bh8
Parser Qa1: yyleng=3, 24(-1,-1)-(7,7) = 0 ( )
Move parsed to 'Bh8 (0:00.432)'
nps: w=-1, b=-1
Display title 'FairyMax (10) vs. WBtester (11) {1 0}, gameInfo.variant = 0'
successful read from 8366ca8 // ***** This terminates read of board from ICS
<ICS: \012\015WBtester(U)(----)[19] kibitzes: !!! +0.74/7 (0.29 sec, 72146 nodes, 248 knps) \012\015\ PV=b3d4 d1c1 c8d7 e1d2 h1c1 d2c1 b7b5 c1d2\012\015(kibitzed to 1 player)\012\015fics%
ics input 96, castling = -1 -1 -1 -1 -1 -1
wrap(count:2,width:80,line:10,len:2,*lp:8,src: 1
dest: 1
successful read from 8366ca8 // ***** This terminates read of kibitz
>ICS: who\015\012
successful read from 8368cc8 // ***** This terminates read from console (where I typed who)
GameEnds(40, xboard exit, 2)
Interrupting first
502077 >first : force
502077 >first : ping 6
502077 >first : quit
As you can see, all processing of input from the various sources terminates succesfully. (I have written the comments behind it afterwards.) The ICS 'input source' has the address ending on ca8. Last activity there is when XBoard gets the kibitz line. XBoard processes it, and the call returns. At that point XBoard is ready and waiting for new input. That new input would be printed to the debug file before anything else is done. The opponent moved, but no new input appears in the log. There also is no response to the 'who' command I type on the console.

I think this is pretty strong evidence that the problem is not in XBoard.

This was without using timeseal, btw. The only way to get ironclad proof that the problem is not in XBoard or in any of the X-libraries it links to would be to use an -icshelper like timeseal and let that make the log. I don't know how to do that, however.