CCT Logon

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: CCT Logon (xboard 4.4.x bug)

Post by bob »

Michel wrote:
There also is no response to the 'who' command I type on the console.
You typed

who CR/LF

whereas, as I pointed out above, the line terminator on FICS is LF/CR. Not saying this is the cause of the problem, but line termination issues often lead to weird behaviour when not properly addressed.
Who came up with LF/CR? Unix only uses LF. Windoze uses CR/LF normally. So a third termination style??? And why 2 characters, ever? TCP/IP is not message oriented, it is stream oriented and reads don't stop at any particular character, they retrieve _everything_ in the buffer...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: CCT Logon (xboard 4.4.x bug)

Post by bob »

Michel wrote:55 lightning games as guest played now. Both engines (gnuchess) use "sd 3" so they move immediately.

Plenty of communication delay and communication overload but no hangs so far.
This is not necessarily "overload" This is a tit-for-tat communication where one side can't swamp the other side. Kibitzing long PVs and other stuff might change this behaviour a bit.
User avatar
marcelk
Posts: 348
Joined: Sat Feb 27, 2010 12:21 am

Re: CCT Logon (xboard 4.4.x bug)

Post by marcelk »

bob wrote:
Michel wrote:I can confirm what Marcel van Kervinck was saying. I have been running icsdrone bots on FICS for several years. Barring network outages, they essentially stay logged in forever. Never a loss for suspicious reasons. In short I have never seen any unpredicable behaviour from FICS.
This is not convincing, however. It could be that FICS is somehow doing something that breaks xboard but not others, even though what it is doing might be perfectly legal, or might violate the style 12 specifications. The known data is this, at the present.

Crafty hung regularly saturday, less regularly sunday, and I could not make it hang yesterday. Yet for years it has never hung on ICC except for those occasions where I introduced a bug into the program itself. It has played for the past year +, 24/7, with _zero_ hangs on ICC.

Others play on ICC with no problems. But when we moved to FICS for the CCT, things fell apart. Obviously it is related to FICS and xboard. Whether it is purely xboard or purely FICS is unknown.

But different interfaces can parse differently, have different timing issues, and different buffering issues. Who knows at present what is causing this...
It could be handling of consecutive lines that arrive as one 1 'read' when the network is congested or when the computer load is high, but arrive as two 'reads' when the network is clean.
It could be handling of lines that arrive in a buffer that is nearly full, so the line is split over two reads anyway, even though the entire line is received by the kernel. (xboard's buffer is 8kB though, seems large enough).
It could be the last, but with a LF/CR spread over the buffer boundary.
It could be the last, but with the fics% prompt spread over the buffer boundary. (FICS does prompting differently than ICC, ICC sends fewer of them.)
It could be a problem in some versions of the X event loop but not in others.

I'll try make Rookie(C) spam all its internals as kibitz and play 1 0 games and see if I can make it hang. But it doesn't get 'communication limt' messages. I do get limits as a human. So it seems that computer accounts are not throttled.
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: CCT Logon (xboard 4.4.x bug)

Post by hgm »

Indeed, it could be all of that. But not in XBoard. At least in the hangs I experienced, which show exactly the same symptoms as the hangs during CCT: one player not getting a move the ICS has already accepted, and apparently not reachable by others on the server. (Bob reported no reaction when trying to reach Crafty with the zippypassword.)

The problem is that XBoard (and timeseal, when used) simply do not get anything from the server. So there is nothing for it to choke on.

I can add that combining lines is something that is impossible to control. XBoard does send the move and the kibitz in separate writes. But timeseal already combines them into one write towards the ICS. It just depends on when the respective processes involved get a time-slice of the CPU, which on heavily loaded systems is quite unpredictable.

On input XBoard is very well aware of and protected against all things you mention, btw.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: CCT Logon (xboard 4.4.x bug)

Post by bob »

hgm wrote:Indeed, it could be all of that. But not in XBoard. At least in the hangs I experienced, which show exactly the same symptoms as the hangs during CCT: one player not getting a move the ICS has already accepted, and apparently not reachable by others on the server. (Bob reported no reaction when trying to reach Crafty with the zippypassword.)

The problem is that XBoard (and timeseal, when used) simply do not get anything from the server. So there is nothing for it to choke on.

I can add that combining lines is something that is impossible to control. XBoard does send the move and the kibitz in separate writes. But timeseal already combines them into one write towards the ICS. It just depends on when the respective processes involved get a time-slice of the CPU, which on heavily loaded systems is quite unpredictable.

On input XBoard is very well aware of and protected against all things you mention, btw.
In further analysis, I am seeing the same thing you reported. It appears that the server simply "goes dead". I put Crafty into a mode where it kibitzes every PV change, and even after it hung (waiting on a move) I would (observing) see the occasional kibitz as the search went deeper and deeper. As as an observer, I saw my opponent's move, while in Crafty's log (and then the xboard debug log once I turned that on) I saw zilch from the server. Tells didn't come thru. It was as though one side of the connection was gone, which points to a server glitch.

I have not yet tried to confirm the no kibitzing idea, but anecdotal evidence suggests that is a problem, because I put Crafty on FICS Friday afternoon to test everything including the new box I was using, and it ran fine all night. But I did not have kibitzing enabled except for selected opponents...
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: CCT Logon (xboard 4.4.x bug)

Post by michiguel »

bob wrote:
hgm wrote:Indeed, it could be all of that. But not in XBoard. At least in the hangs I experienced, which show exactly the same symptoms as the hangs during CCT: one player not getting a move the ICS has already accepted, and apparently not reachable by others on the server. (Bob reported no reaction when trying to reach Crafty with the zippypassword.)

The problem is that XBoard (and timeseal, when used) simply do not get anything from the server. So there is nothing for it to choke on.

I can add that combining lines is something that is impossible to control. XBoard does send the move and the kibitz in separate writes. But timeseal already combines them into one write towards the ICS. It just depends on when the respective processes involved get a time-slice of the CPU, which on heavily loaded systems is quite unpredictable.

On input XBoard is very well aware of and protected against all things you mention, btw.
In further analysis, I am seeing the same thing you reported. It appears that the server simply "goes dead". I put Crafty into a mode where it kibitzes every PV change, and even after it hung (waiting on a move) I would (observing) see the occasional kibitz as the search went deeper and deeper. As as an observer, I saw my opponent's move, while in Crafty's log (and then the xboard debug log once I turned that on) I saw zilch from the server. Tells didn't come thru. It was as though one side of the connection was gone, which points to a server glitch.

I have not yet tried to confirm the no kibitzing idea, but anecdotal evidence suggests that is a problem, because I put Crafty on FICS Friday afternoon to test everything including the new box I was using, and it ran fine all night. But I did not have kibitzing enabled except for selected opponents...
Sending kibitzes is not a problem. I logged in as a human to chat and FICS frozed on me. I cannot type that fast :-)
But, I was "receiving" kibitzes and tells.

This is another test I did. I was with two computers, one playing with Gaviota, and in another I was logged in with my human account "Gaucho". When FICS frozed on the engine, I sent tells to "Gaviota" from "Gaucho" and gaviota did not receive them.

Whatever happened at this point, the communication from FICS to the engine ceased to exist.

Miguel
User avatar
marcelk
Posts: 348
Joined: Sat Feb 27, 2010 12:21 am

Re: CCT Logon (xboard 4.4.x bug)

Post by marcelk »

michiguel wrote: This is another test I did. I was with two computers, one playing with Gaviota, and in another I was logged in with my human account "Gaucho". When FICS frozed on the engine, I sent tells to "Gaviota" from "Gaucho" and gaviota did not receive them.

Whatever happened at this point, the communication from FICS to the engine ceased to exist.
Another possibility is routing. Some of us could be routed through a 'rouge' router/firewall/ISP that is somehow interfering. There are many roads to Rome... Here are my paths to FICS (withholding the first 3 hops to keep some privacy). I experienced no problems with these:

From mscp/blik's location:

Code: Select all

traceroute to freechess.org (69.36.243.188), 30 hops max, 5000 byte packets
[...]
 4  xe-5-3-0.edge5.Amsterdam1.Level3.net (212.72.41.81)  8.470 ms  8.954 ms  9.184 ms
 5  ae-34-52.ebr2.Amsterdam1.Level3.net (4.69.139.161)  9.915 ms  10.393 ms  10.624 ms
 6  ae-48-48.ebr2.London1.Level3.net (4.69.143.82)  18.352 ms  12.429 ms  14.213 ms
 7  ae-42-42.ebr1.NewYork1.Level3.net (4.69.137.70)  81.650 ms  82.130 ms  82.360 ms
 8  ae-61-61.csw1.NewYork1.Level3.net (4.69.134.66)  92.834 ms  81.430 ms  81.908 ms
 9  ae-62-62.ebr2.NewYork1.Level3.net (4.69.148.33)  82.633 ms  82.864 ms  81.942 ms
10  ae-2-2.ebr4.SanJose1.Level3.net (4.69.135.185)  150.877 ms  164.099 ms  164.577 ms
11  ae-94-94.csw4.SanJose1.Level3.net (4.69.134.254)  159.559 ms  159.542 ms  160.021 ms
12  ae-42-99.car2.SanJose1.Level3.net (4.68.18.196)  154.255 ms  157.672 ms  157.641 ms
13  Layer42.car2.SanJose1.Level3.net (4.53.18.242)  156.621 ms  156.851 ms  156.833 ms
14  vl2.sw1.scl.layer42.net (69.36.225.134)  154.065 ms  154.294 ms  154.275 ms
15  fics.freechess.org (69.36.243.188)  150.251 ms  150.722 ms  150.955 ms
From Rookie's location:

Code: Select all

traceroute to freechess.org (69.36.243.188), 64 hops max, 52 byte packets
[...]
 4  nl-asd-dc2-ias-csg01-ge-5-2-0-kpn.net (139.156.113.103)  26.148 ms
    nl-asd-dc2-ias-csg01-ge-4-0-0-kpn.net (139.156.113.143)  4.549 ms  9.713 ms
 5  ams-ix.ae1.cr1.ams2.nl.nlayer.net (195.69.145.219)  16.364 ms  5.800 ms  11.929 ms
 6  ae3-60g.cr1.ams2.nl.nlayer.net (69.22.139.238)  5.645 ms  5.110 ms  5.260 ms
 7  xe-2-0-0.cr1.lhr1.uk.nlayer.net (69.22.142.94)  45.592 ms  11.462 ms  13.454 ms
 8  xe-2-2-0.cr1.nyc3.us.nlayer.net (69.22.142.9)  103.334 ms  83.936 ms
    xe-7-0-0.cr1.nyc3.us.nlayer.net (69.22.142.30)  93.205 ms
 9  xe-4-3-0.cr1.pao1.us.nlayer.net (69.22.142.6)  158.488 ms  158.306 ms  157.117 ms
10  as8121.ae0-3001.cr1.pao1.us.nlayer.net (69.22.153.114)  159.656 ms  162.243 ms  164.064 ms
11  vl2.sw1.scl.layer42.net (69.36.225.134)  158.348 ms  157.468 ms  157.412 ms
12  fics.freechess.org (69.36.243.188)  158.041 ms  157.150 ms  156.670 ms
Please compare these with your's? Do the engines with connection troubles somehow come in through another path?
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: CCT Logon (xboard 4.4.x bug)

Post by Michel »

Whatever happened at this point, the communication from FICS to the engine ceased to exist.
Come to think of it. I now realize I may have seen this behaviour also with icsdrone, but very rarely (like once every couple of weeks). I attributed it to a network problem (which it probably is as Marcel was pointing out).

These days icsdrone simply logs out/in in case of a network problem and resumes the stored game. So the shortlog (which is the only log I consult regularly) doesn't show much.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: CCT Logon (xboard 4.4.x bug)

Post by bob »

Michel wrote:
Whatever happened at this point, the communication from FICS to the engine ceased to exist.
Come to think of it. I now realize I may have seen this behaviour also with icsdrone, but very rarely (like once every couple of weeks). I attributed it to a network problem (which it probably is as Marcel was pointing out).

These days icsdrone simply logs out/in in case of a network problem and resumes the stored game. So the shortlog (which is the only log I consult regularly) doesn't show much.
When I run on ICC/FICS, I redirect stdout from xboard to a log file (actually I append with >>). I went back and looked at this log from Sat/Sun, and what I saw was lots of activity in channel 64, lots of kibitzes, and then suddenly nothing. Next thing I saw was a login (where I had killed xboard). It would then run normally for a while, and then dead silence. Until I again killed xboard. At the point of the "silence" there was a lot of activity, then zero beyond that instant, which looks like FICS simply chose to stop sending. I wonder if there is a way to get a ^S from xboard, ever? That would certainly stop all output to it until a ^Q is sent (flow control). However, I don't see anything like that internally in xboard.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: CCT Logon (xboard 4.4.x bug)

Post by bob »

marcelk wrote:
michiguel wrote: This is another test I did. I was with two computers, one playing with Gaviota, and in another I was logged in with my human account "Gaucho". When FICS frozed on the engine, I sent tells to "Gaviota" from "Gaucho" and gaviota did not receive them.

Whatever happened at this point, the communication from FICS to the engine ceased to exist.
Another possibility is routing. Some of us could be routed through a 'rouge' router/firewall/ISP that is somehow interfering. There are many roads to Rome... Here are my paths to FICS (withholding the first 3 hops to keep some privacy). I experienced no problems with these:

From mscp/blik's location:

Code: Select all

traceroute to freechess.org (69.36.243.188), 30 hops max, 5000 byte packets
[...]
 4  xe-5-3-0.edge5.Amsterdam1.Level3.net (212.72.41.81)  8.470 ms  8.954 ms  9.184 ms
 5  ae-34-52.ebr2.Amsterdam1.Level3.net (4.69.139.161)  9.915 ms  10.393 ms  10.624 ms
 6  ae-48-48.ebr2.London1.Level3.net (4.69.143.82)  18.352 ms  12.429 ms  14.213 ms
 7  ae-42-42.ebr1.NewYork1.Level3.net (4.69.137.70)  81.650 ms  82.130 ms  82.360 ms
 8  ae-61-61.csw1.NewYork1.Level3.net (4.69.134.66)  92.834 ms  81.430 ms  81.908 ms
 9  ae-62-62.ebr2.NewYork1.Level3.net (4.69.148.33)  82.633 ms  82.864 ms  81.942 ms
10  ae-2-2.ebr4.SanJose1.Level3.net (4.69.135.185)  150.877 ms  164.099 ms  164.577 ms
11  ae-94-94.csw4.SanJose1.Level3.net (4.69.134.254)  159.559 ms  159.542 ms  160.021 ms
12  ae-42-99.car2.SanJose1.Level3.net (4.68.18.196)  154.255 ms  157.672 ms  157.641 ms
13  Layer42.car2.SanJose1.Level3.net (4.53.18.242)  156.621 ms  156.851 ms  156.833 ms
14  vl2.sw1.scl.layer42.net (69.36.225.134)  154.065 ms  154.294 ms  154.275 ms
15  fics.freechess.org (69.36.243.188)  150.251 ms  150.722 ms  150.955 ms
From Rookie's location:

Code: Select all

traceroute to freechess.org (69.36.243.188), 64 hops max, 52 byte packets
[...]
 4  nl-asd-dc2-ias-csg01-ge-5-2-0-kpn.net (139.156.113.103)  26.148 ms
    nl-asd-dc2-ias-csg01-ge-4-0-0-kpn.net (139.156.113.143)  4.549 ms  9.713 ms
 5  ams-ix.ae1.cr1.ams2.nl.nlayer.net (195.69.145.219)  16.364 ms  5.800 ms  11.929 ms
 6  ae3-60g.cr1.ams2.nl.nlayer.net (69.22.139.238)  5.645 ms  5.110 ms  5.260 ms
 7  xe-2-0-0.cr1.lhr1.uk.nlayer.net (69.22.142.94)  45.592 ms  11.462 ms  13.454 ms
 8  xe-2-2-0.cr1.nyc3.us.nlayer.net (69.22.142.9)  103.334 ms  83.936 ms
    xe-7-0-0.cr1.nyc3.us.nlayer.net (69.22.142.30)  93.205 ms
 9  xe-4-3-0.cr1.pao1.us.nlayer.net (69.22.142.6)  158.488 ms  158.306 ms  157.117 ms
10  as8121.ae0-3001.cr1.pao1.us.nlayer.net (69.22.153.114)  159.656 ms  162.243 ms  164.064 ms
11  vl2.sw1.scl.layer42.net (69.36.225.134)  158.348 ms  157.468 ms  157.412 ms
12  fics.freechess.org (69.36.243.188)  158.041 ms  157.150 ms  156.670 ms
Please compare these with your's? Do the engines with connection troubles somehow come in through another path?
I don't think this can be a routing issue. TCP/IP has a reasonably short TTL for each packet and would re-send if it gets no ACK from the remote end. I watched one hang that sat for over an hour. Crafty hung in round 7, and the log showed nothing for well over an hour, where eventually (I suppose) FICS closed the connection and then it re-logged in normally... A routing issue can't cause that unless the only path between the two machines is cut, and even then TCP/IP will report an error way before that length of time goes by.