A FEN can be parsed in 10 lines too. the only difference between a FEN and an array is really that there is run-length encoding of the empty squares. So where in the parsing of the array you would havekbhearn wrote:Wouldn't need to be. Could be a pointer to a 64 byte array and a flags integer. Something that can be converted to a position in your engine's native format in < 10 lines.
Code: Select all
if(*board== '.') board[sqr++] = EMPTY;Code: Select all
while(*fen > '0') (*fen)--, board[sqr++] = EMPTY;UCI never uses OO castling. It uses King move in Chess, and KxR in Chess960. Only in CECP for Chess960 OO would be used.Depends on move format. UCI move format being very rigid as to exactly what it's going to send you and you just have to pick out the two oddball o-o and o-o-o cases first is reasonable.
This is only a feature for masochistic engines that explicitly request they should be sent SAN. If you don't inflict that on yourself first, the GUI will never send SAN moves other than the OO / OOO in Chess960.CECP i think is in the same boat but i think i remembered some vagueness that led me to think a generic 'any-algebraic' notation parser might wind up being necessary as opposed to a mostly fixed with move with optional promotion character.
OK, I see. But I think this whole idea of tokenising might be overdoing it. Which tokens are allowed and what you have to do with them (i.e. store as text string, read as number, and where) depends on the command anyway. In CECP it is never a problem to subject the remainder of the input line directly to an sscanf that specifies how to interpret the rest of the command (through %d, %f, %c, %s or %[^\n] format elements), and to which engine variables the values have to go. As in the example above.both protocols have it. it's driven by the desire to use space as the separator some of the time and then use space as an allowed mid-string character other times. in my dream protocol one character (say 'tab') would be reserved as a universal argument seperator such that the first step could merely be to feed the string into a tokeniser and get a list of strings which thereafter could be treated as discrete arguments. As it stands it's strip off the command, and then depending on the command the next thing you might need to use as a delimiter might be an '=', it might be a ' ', it might be ' value '(for a uci example) causing you to have to write per-command tokenising.
I agree that the absence of a keyword announcing a move in the v1 protocol was a mistake. But I usually do write my engines such that they also recognize bare moves. Not because I am afraid that their feature usermove=1 would be rejected, but because it is so annoying to have to type "usermove" when you run the engine from the command line for testing purposes.Move parsing! specifically the part of the CECP protocol where if you haven't set the (rejectable) feature usermove=1 you don't know if what you're left with is a move or a command you didn't remember to catch or what.
But moves are quite easy to recognize; apart from the Chess960 castlings, which you would have to recognize explicitly (but fortunately none of my engines plays Chess960) you only need to look at the second character of the input line. If it is a digit or '@' then it considers the input line a move. This works irrespective of board size. Everything else you can reject as an unknown command. (In fact you could recognize the castlings from the second character being an 'O', as CECP never uses capitals in command keywords.)
Well, that is also debatable. The GUI should never send you moves with syntax errors. The "Illegal move" engine->GUI command is intended to allow delegating legality checking to the engine, by switching the GUI's legality checking off. (Usually because you are playing a variant and want to use the pieces in unorthodox ways that the GUI would object to.) But even in this case they should still be concatenations of two board squares, or a valid piece ID followed by '@' and a board square. Even with legality testing off the user has no way to generate moves to off-board squares, or plain gibberish. Not even if he types the move. The GUI has to perform the moves too, to stay in sync with the engine, and if he types something that the GUI would not be able to recognize as a move, it would never be transmitted to the engine even when legality checking is off.Accordingly at the very least your move parser needs to be able to return false to you to say no this isn't a move you idiot.
If I decide that the input string is a move based on its 2nd character, I do
Code: Select all
from = input[0] - 'a' + 16*(input[1] - '1');
to = input[2] - 'a' + 16*(input[3] - '1');
To me that seems overdoing it. One should not try to design something complex just to act as a debugging aid for something that was very simple in the first place. If you are really suffering from paranoia you could test for every sscanf you use to parse a command if it returns the expected number of parameters (and put a dummy %c at the end of the format to see if there was unexpected stuff appended to it). That way you would catch errors like reading the second parameter of 'level' as %d, which would fail if it ever encountered a non-integer minute specification like '2:30'. But that is really the only case where I would expect anyone to go wrong. (Although I admit I have seen a case where someone wanted to have a move argument to the 'force' command.)In general handling unexpected syntax gracefully is useful in order to write a log entry or send an error to the gui that 'hey, i don't know what this command was' to aid in debugging (probably i should've been able to handle it so it's good to see what it was).
I think a more sensible approach to debugging would be to just have the engine print the value of all parameters that the command was supposed to set.
