c-chess-cli

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: c-chess-cli

Post by lucasart »

Ras wrote: Tue Nov 03, 2020 10:20 am
lucasart wrote: Tue Nov 03, 2020 2:01 amI like the CSV solution. Is there a format I can use that other tools use ? eg. doesn't Ordo have such an input format ?
Ordo has only PGN as input and uses CSV as output. That would be the other solution if printing tournament stats (just the table without Elo) didn't make sense for c-chess-cli.

lucasart wrote: Tue Nov 03, 2020 6:05 amDo you have a real example when this zombie child scenario happens ?
Yes, that's how I spotted it. Let's take Raven 1.1, modified in chess.c line 62 to check the return code of fgets() and exit if it's 0, and match that against Zevra 2.1.2 with 8 threads (I have a 4C/8T CPU). I can't reproduce that with fewer workers, then Zevra won't be unresponsive.

Zevra is unresponsive after about 10-20 games, and while the Zevra processes are killed (most of the time, but not always), the Raven one's linger. The lingering engines are sleeping in waiting channel "pipe_wait" as per my system monitor, with FD 0 and 1 indicated as open files of pipe sort.

Using my engine instead of Raven has a similar effect, but I'm using read() directly on stdin (with error checking). The hanging processes of my engine are sleeping in waiting channel futex_wait_queue_me though.

It looks like the pipes aren't closed. I'm on kernel 5.4.0, but have also tried 5.8.0 - same results.

However, matching Raven and my engine works. Matching Demolito against Zevra works without Zevra becoming unresponsive. Pretty strange.

Killing c-chess-cli with CTRL-C before Zevra is unresponsive makes all processes exit as expected.

AndrewGrant wrote: Tue Nov 03, 2020 7:33 amOne of the things I test for each engine I add to OpenBench is whether or not they respect to closure of stdin.
What's your test case for that (Linux)?
So, if I understand correctly, this has nothing to do with Zevra. The problem is in Raven, and should be reproducible directly by hand (without c-chess-cli) if you hit Ctrl+D.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: c-chess-cli

Post by Ras »

lucasart wrote: Wed Nov 04, 2020 1:24 amSo, if I understand correctly, this has nothing to do with Zevra. The problem is in Raven, and should be reproducible directly by hand (without c-chess-cli) if you hit Ctrl+D.
Actually not. Zevra is only what triggers the abort chain for unknown reasons. I've modified Raven to do an exit if fgets() is 0, which is the most simple way of detecting a broken stdin. Still doesn't work. I just chose Raven because my own code is more complicated and wanted to rule that out.

The problem with the zombie processes in concurrent matches is that the spawned engine processes in general inherit a lot of open pipe handles, as visible in the system monitor. That shouldn't happen because engine_spawn() tries to close pipe ends that aren't needed, but if you take a look at matches with a lot of concurrent games, it doesn't seem to work as intended. The engine processes have a lot of open pipe handles that they shouldn't have, and that's what keeps zombie processes mutually alive because pipes are reference counted.
Rasmus Althoff
https://www.ct800.net
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: c-chess-cli

Post by lucasart »

Ras wrote: Wed Nov 04, 2020 1:38 am The problem with the zombie processes in concurrent matches is that the spawned engine processes in general inherit a lot of open pipe handles, as visible in the system monitor. That shouldn't happen because engine_spawn() tries to close pipe ends that aren't needed, but if you take a look at matches with a lot of concurrent games, it doesn't seem to work as intended. The engine processes have a lot of open pipe handles that they shouldn't have, and that's what keeps zombie processes mutually alive because pipes are reference counted.
Linux documentation actually talks about this problem, and seems to be saying that the correct solution is to use pipe2() with O_CLOEXEC, due to race conditions.

I will do some experiments. This is interesting. I don't want to just hide the problem under the carpet by putting an engine kill band aid.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: c-chess-cli

Post by lucasart »

lucasart wrote: Sat Oct 31, 2020 2:30 am
Ras wrote: Sat Oct 31, 2020 1:42 am 1) Adjustable log level. I'd like to choose between full log, or errors only. In the latter case, ideally only generating log files upon the first error in a thread. Like "-log errors" and "-log all" or so, possibly with some well-defined default if "-log" is given without log level.
Typically errors (defined vaguely as anything that goes wrong) are logged to stdout one way or another:
  • time losses: logged to stdout (and PGN).
  • disconnection: fatal error (c-chess-cli stops), logged to stdout.
  • engine crashes: logged to stdout. this is detected indirectly as an I/O error ("could not read from engine"), as its observable effect is that we get a broken pipe, which we can't read from (since the write end of the pipe was owned by the child process, which was terminated, and therefore the file handle was closed by the OS).
  • illegal move: logged to stdout (and PGN).
  • illegal moves in PV sent by engines: cutechess-cli logs them to stdout. c-chess-cli logs them in per thread log files as 'WARNING' messages, but remains silent on stdout. Perhaps, I should replicate the cutechess-cli behavior here ? Pro: user is forced to see them. Con: very spammy once you have an offending engine...
But I see your point. Unless you are looking at stdout the whole time, this is not practial. My bash-fu is somewhat limited, but perhaps there is a way to duplicate stdout (ie. display it in the terminal and write it to a file at the same time). Still, you'd have to grep that, and you're not sure what you should be grep'ing for, since the various problems are not tagged consistently.
Actually, there is no need for such a selective logging feature. Just pipe into 'tee' command from GNU coreutils, you can both watch stdout, and log it to a file at the same time, for later review. This is the UNIX way: programs should be minimalistic and orthogonal, and easy to combine.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
hgm
Posts: 27795
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: c-chess-cli

Post by hgm »

Ras wrote: Tue Nov 03, 2020 8:40 pm
AndrewGrant wrote: Tue Nov 03, 2020 8:36 pmI'm not sure why having multiple concurrency changes things?
Because reading stdin only gives an error if the pipe is closed. That however requires it to be closed on all other ends because it's reference counted. Since the pipes are duplicated also in the spawned engine processes, killing c-chess-cli doesn't close all other pipe ends so that reading stdin just gives a blocking call with no input. Basically, the zombies keep each other alive. With only one game, i.e. two engines, the second engine dupes the first one's stdin/out, but has no one duping its stdin/out. So the second engine exits, and then also the first one. A least, that's what I think that happens.
Good point. Even without concurrency the second engine would inherit the GUI ends of the pipes to the first engine. And could abuse it for sending garbage to its opponent. (A 'quit' command would do the job nicely! :wink: )

I see that WinBoard guards against this possbility: it makes non-inheritable duplicates of the GUI ends and then closes the originals, before launching the engine child. XBoard doesn't seem to take any special measures, though; it just creates the pipes with pipe(2), and then forks, and closes the unused ends of the just-created pipes (but not of the pre-existing ones for the other engine). I had some problem there anyway in the past: when XBoard was launched with stdin closed (e.g. by a tournament manager program), one of the pipes was created with fd=0, and the order of dup and close that attempted to migrate it to fd=0 was than such that the pipe disappeared altogether.

Good thing concurrency in XBoard is implemented by just launching multiple XBoard instances as separate processes.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: c-chess-cli

Post by lucasart »

hgm wrote: Sun Nov 08, 2020 2:26 pm Good thing concurrency in XBoard is implemented by just launching multiple XBoard instances as separate processes.
Indeed, it is surprisingly difficult to correctly parralelise the fork/exec sequence with pipes (with lots of threads in a fork-race). After a fair amount of research, and experimentation, I came to the conclusion that there are only 2 correct (simple) solutions:
  • Use pipe2() instead of pipe(), with flag=O_CLOEXEC, to atomically set the FD_CLOEXEC flags when creating the pipe. This only works on Linux (hence Android). MacOSX does not have pipe2(), nor any equivalent mechanism.
  • Close all the file descriptors from 3 (0,1,2 you don't want to close), all the way to sysconf(FOPEN_MAX)-1, in the child, before exec(). This is ugly, and slooow, but it is the only simple way on MacOSX (and general POSIX compatible). Most subprocess libraries out there do that (including Phobos for D language, tiny-process-library C++ used by Banksia, and many others I'm sure).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
hgm
Posts: 27795
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: c-chess-cli

Post by hgm »

Wouldn't it be possible to fork off processes rather than threads, with a shared memory area? This can mimic a multi-threaded process to a very high degree. But it does make the agents more independent, and in particular they would not automatically inherit each other's file descriptors.

I once used this method to quickly convert an engine that relied very heavily on global variables to SMP: just allocate the hash and EGT buffer as a shared memory area, instead of internally, and otherwise it is just business as usual for the individual processes. The 'main thread' just sees its hash table getting magically filled.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: c-chess-cli

Post by Ras »

lucasart wrote: Mon Nov 09, 2020 1:21 pm Use pipe2() instead of pipe(), with flag=O_CLOEXEC, to atomically set the FD_CLOEXEC flags when creating the pipe. This only works on Linux
Works so far, I'm running commit 285 under Linux. The engine processes also have the EPD and PGN files open, which they wouldn't with the pure POSIX solution, but I don't think that this is an issue. Also, the performance impact is probably smaller than it may look because you already avoid spawning engine processes if no engine switch is required.
Rasmus Althoff
https://www.ct800.net
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: c-chess-cli

Post by lucasart »

Ras wrote: Mon Nov 09, 2020 3:47 pm Works so far, I'm running commit 285 under Linux. The engine processes also have the EPD and PGN files open, which they wouldn't with the pure POSIX solution, but I don't think that this is an issue.
Yes, this is expected. The reason is that I open those files normally (with fopen), without setting FD_CLOEXEC. It's easy to fix this with fcntl(): no problem here, as it happens before threads are spawned, hence before the fork race.
Ras wrote: Mon Nov 09, 2020 3:47 pm Also, the performance impact is probably smaller than it may look because you already avoid spawning engine processes if no engine switch is required.
Yes, once the tournament is on, there is not much impact. But what is noticable is the boot time. It's when you start the 285*2 processes in parrallel at the beginning that you should see a big difference.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: c-chess-cli

Post by Ras »

lucasart wrote: Sat Nov 07, 2020 1:07 pmActually, there is no need for such a selective logging feature. Just pipe into 'tee' command from GNU coreutils, you can both watch stdout, and log it to a file at the same time, for later review.
Sounds good. However, the "illegal move in PV" warning is only printed to stdout if the full thread logging is active because also the stdout printf is guarded by "if (w->log)". So tee won't be able to redirect that unless full logging is active anyway.
Rasmus Althoff
https://www.ct800.net