Why not a linkable interface?

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

User avatar
hgm
Posts: 27701
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Why not a linkable interface?

Post by hgm »

kbhearn wrote:Wouldn't need to be. Could be a pointer to a 64 byte array and a flags integer. Something that can be converted to a position in your engine's native format in < 10 lines.
A FEN can be parsed in 10 lines too. the only difference between a FEN and an array is really that there is run-length encoding of the empty squares. So where in the parsing of the array you would have

Code: Select all

if&#40;*board== '.') board&#91;sqr++&#93; = EMPTY;
you would have in the FEN case

Code: Select all

while&#40;*fen > '0') (*fen&#41;--, board&#91;sqr++&#93; = EMPTY;
Depends on move format. UCI move format being very rigid as to exactly what it's going to send you and you just have to pick out the two oddball o-o and o-o-o cases first is reasonable.
UCI never uses OO castling. It uses King move in Chess, and KxR in Chess960. Only in CECP for Chess960 OO would be used.
CECP i think is in the same boat but i think i remembered some vagueness that led me to think a generic 'any-algebraic' notation parser might wind up being necessary as opposed to a mostly fixed with move with optional promotion character.
This is only a feature for masochistic engines that explicitly request they should be sent SAN. If you don't inflict that on yourself first, the GUI will never send SAN moves other than the OO / OOO in Chess960.
both protocols have it. it's driven by the desire to use space as the separator some of the time and then use space as an allowed mid-string character other times. in my dream protocol one character (say 'tab') would be reserved as a universal argument seperator such that the first step could merely be to feed the string into a tokeniser and get a list of strings which thereafter could be treated as discrete arguments. As it stands it's strip off the command, and then depending on the command the next thing you might need to use as a delimiter might be an '=', it might be a ' ', it might be ' value '(for a uci example) causing you to have to write per-command tokenising.
OK, I see. But I think this whole idea of tokenising might be overdoing it. Which tokens are allowed and what you have to do with them (i.e. store as text string, read as number, and where) depends on the command anyway. In CECP it is never a problem to subject the remainder of the input line directly to an sscanf that specifies how to interpret the rest of the command (through %d, %f, %c, %s or %[^\n] format elements), and to which engine variables the values have to go. As in the example above.
Move parsing! specifically the part of the CECP protocol where if you haven't set the (rejectable) feature usermove=1 you don't know if what you're left with is a move or a command you didn't remember to catch or what.
I agree that the absence of a keyword announcing a move in the v1 protocol was a mistake. But I usually do write my engines such that they also recognize bare moves. Not because I am afraid that their feature usermove=1 would be rejected, but because it is so annoying to have to type "usermove" when you run the engine from the command line for testing purposes.

But moves are quite easy to recognize; apart from the Chess960 castlings, which you would have to recognize explicitly (but fortunately none of my engines plays Chess960) you only need to look at the second character of the input line. If it is a digit or '@' then it considers the input line a move. This works irrespective of board size. Everything else you can reject as an unknown command. (In fact you could recognize the castlings from the second character being an 'O', as CECP never uses capitals in command keywords.)
Accordingly at the very least your move parser needs to be able to return false to you to say no this isn't a move you idiot.
Well, that is also debatable. The GUI should never send you moves with syntax errors. The "Illegal move" engine->GUI command is intended to allow delegating legality checking to the engine, by switching the GUI's legality checking off. (Usually because you are playing a variant and want to use the pieces in unorthodox ways that the GUI would object to.) But even in this case they should still be concatenations of two board squares, or a valid piece ID followed by '@' and a board square. Even with legality testing off the user has no way to generate moves to off-board squares, or plain gibberish. Not even if he types the move. The GUI has to perform the moves too, to stay in sync with the engine, and if he types something that the GUI would not be able to recognize as a move, it would never be transmitted to the engine even when legality checking is off.

If I decide that the input string is a move based on its 2nd character, I do

Code: Select all

from = input&#91;0&#93; - 'a' + 16*&#40;input&#91;1&#93; - '1');
to = input&#91;2&#93; - 'a' + 16*&#40;input&#91;3&#93; - '1');
without any hesitation or further checking. If the input was really gibberish the legality checking would probably reject it.
In general handling unexpected syntax gracefully is useful in order to write a log entry or send an error to the gui that 'hey, i don't know what this command was' to aid in debugging (probably i should've been able to handle it so it's good to see what it was).
To me that seems overdoing it. One should not try to design something complex just to act as a debugging aid for something that was very simple in the first place. If you are really suffering from paranoia you could test for every sscanf you use to parse a command if it returns the expected number of parameters (and put a dummy %c at the end of the format to see if there was unexpected stuff appended to it). That way you would catch errors like reading the second parameter of 'level' as %d, which would fail if it ever encountered a non-integer minute specification like '2:30'. But that is really the only case where I would expect anyone to go wrong. (Although I admit I have seen a case where someone wanted to have a move argument to the 'force' command.)

I think a more sensible approach to debugging would be to just have the engine print the value of all parameters that the command was supposed to set.
User avatar
hgm
Posts: 27701
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Why not a linkable interface?

Post by hgm »

Nine-line FEN reader:

Code: Select all

int sqr = 0, epFile, epRank, virgin = 0; char castling&#91;80&#93;;
do &#123;
  if&#40;*fen >= 'A') AddPiece&#40;pieceEncode&#91;*fen - 'A'&#93;, sqr++); else
  if&#40;*fen != '/') while&#40;*fen > '0') (*fen&#41;--, board&#91;sqr++&#93; = EMPTY;
&#125; while&#40;*++fen && *fen != ' ');
sscanf&#40;fen, "%s %c%d %d %d", castling, &epFile, &epRank, &cnt100, &moveNr&#41;;
if&#40;epFile == '-') ClearEP&#40;), moveNr = cnt100, cnt100 = epRank; // was out of phase
else SetEP&#40;MAKE_SQUARE&#40;epFile-'a', epRank-1&#41;);
for&#40;fen=castling; *fen; fen++) virgin |= virginityEncode&#91;*fen&#93;;
For a function SetPosition(char boardIn[64], int virginityIn, int epIn, int cnt100In, int moveNrIn) you would need something like

Code: Select all

int sqr = 0;
for&#40;sqr=0; sqr<64; sqr++) &#123;
  if&#40;*boardIn&#41; AddPiece&#40;pieceEncode&#91;*boardIn&#93;, sqr++);
  else board&#91;sqr++&#93; = EMPTY;
&#125;
if&#40;epIn == NONE&#41; ClearEP&#40;);
else SetEP&#40;epIn&#41;;
virgin = virginityEncode&#91;virginityIn&#93;;
cnt100 = cnt100In; moveNr = moveNrIn;
Not really a spectacular difference. Putting together the e.p. square from the coordinates is really the only extra line, and the line that processes the virginity field has to contain a loop in the FEN case.
kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 4:48 am

Re: Why not a linkable interface?

Post by kbhearn »

hgm wrote: To me that seems overdoing it. One should not try to design something complex just to act as a debugging aid for something that was very simple in the first place. If you are really suffering from paranoia you could test for every sscanf you use to parse a command if it returns the expected number of parameters (and put a dummy %c at the end of the format to see if there was unexpected stuff appended to it). That way you would catch errors like reading the second parameter of 'level' as %d, which would fail if it ever encountered a non-integer minute specification like '2:30'. But that is really the only case where I would expect anyone to go wrong. (Although I admit I have seen a case where someone wanted to have a move argument to the 'force' command.)

I think a more sensible approach to debugging would be to just have the engine print the value of all parameters that the command was supposed to set.
I think you hit the nail on the head with calling me paranoid :) I'm very much used to the pattern of never trusting input from outside your program and if it's mangled to be able to report that there's a problem rather than crash in fun and unexpected ways. Your 9 line fen parser is rather neat though, i didn't expect that to be possible even in the 'assume you're never given a malformed fen' case (although to be honest i'd never thought on it). I'll have to consider an initial parsing approach of trusting input more seriously for the sake of just getting past that point and coming back to it 'later' if i ever care to release to the wild.

As for the tokenizing approach, i know it's not possible in the protocols the way they're written - it was just a dream of a hypothetical nonexisting protocol with simple uniform steps to parsing a command.

i'll have to read more on sscanf - i'm used to distrusting those format string functions but it may indeed cut out a lot of work at least for the quick and dirty option.
User avatar
hgm
Posts: 27701
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Why not a linkable interface?

Post by hgm »

Note that testing for validity of input, even if not strictly necessary, at least is possible with FEN input. It would not really be possible when the position is passed as an array. How could you know if someone passes it an array of 36 squares, rather than 64?

I am very lazy, and I trust XBoard. So some of my engines (in particular Shokidoki and HaQiKi D) do not even check the input move for legality. They just play whatever you tell them, even if it is with the opponent's pieces. I have become a big fan of the system where the engine tells a rule-agnostic GUI the game rules, the GUI then uses those to do legality checking, move highlighting and SAN generation, and the engine can rely on everything the GUI sends it being OK.

Scanf-family functions are indeed quite versatile, as a poor-man's parser.