Lex / Yacc for PGN

lucasart · Post by **lucasart** » Sat Jun 22, 2013 11:39 am

I was trying to write, by hand, a PGN parser, and quickly realized that it's hellishly more complicated than it looks. Already the lexer is heavily context sensitive...

Does anyone have a Lex and a Yacc file, so I don't start from scratch ?

Henk · Post by **Henk** » Sat Jun 22, 2013 11:48 am

lucasart wrote:I was trying to write, by hand, a PGN parser, and quickly realized that it's hellishly more complicated than it looks. Already the lexer is heavily context sensitive...

Does anyone have a Lex and a Yacc file, so I don't start from scratch ?

Of course I'm not answering your question.
But keep the scanner simple. Context should be handled in the parser not the scanner.

Jim Ablett · Post by **Jim Ablett** » Sat Jun 22, 2013 12:00 pm

lucasart wrote:I was trying to write, by hand, a PGN parser, and quickly realized that it's hellishly more complicated than it looks. Already the lexer is heavily context sensitive...

Does anyone have a Lex and a Yacc file, so I don't start from scratch ?

Gnuchess has the Lex stuff I think.

Jim.

Sven · Post by **Sven** » Sat Jun 22, 2013 1:44 pm

Jim Ablett wrote:
lucasart wrote:I was trying to write, by hand, a PGN parser, and quickly realized that it's hellishly more complicated than it looks. Already the lexer is heavily context sensitive...

Does anyone have a Lex and a Yacc file, so I don't start from scratch ?
Gnuchess has the Lex stuff I think.

Jim.

Yes, in Gnuchess 6.0.3 there is "src/frontend/lexpgn.l". It has some context references like Game[], GameCnt, MakeMove(), ValidateMove(), or ParseEPD(), and it includes a "common.h" file defining these functions/variables but also lots of other basic definitions like the whole Board data structure which are not related to PGN scanning/parsing, so that it can't be simply extracted and used in a different context without some modification (it was probably never meant to be). It should be possible, though, to create a modified version that has a minimal interface to the surrounding environment. E.g. accessing Game[] and GameCnt could be encapsulated through functions so that it would no longer be necessary to include a file like "common.h".

Sven

Michel · Post by **Michel** » Sat Jun 22, 2013 1:47 pm

If you want to recycle the pgn parser in GNU Chess it is probably best to use GNU Chess 5.50 as a base.

It doesn't have a common.h.

This is the interface as defined in pgn.h

Code: Select all

void PGNSaveToFile (const char *file, game_t *game, const char *resultstr);
void PGNReadFromFile (game_t *game, const char *file);
void PGNIterInit(pgn_iter_t *pgn_iter);
int PGNIterStart(pgn_iter_t *pgn_iter, const char *file);
void PGNIterClose(pgn_iter_t *pgn_iter);
int PGNIterNext (pgn_iter_t *pgn_iter, game_t *game);

Sven · Post by **Sven** » Sat Jun 22, 2013 1:55 pm

Michel wrote:If you want to recycle the pgn parser in GNU Chess it is probably best to use GNU Chess 5.50 as a base.

It doesn't have a common.h.

Oops, sorry, I was looking up the 6.x version.

Right, your 5.50 "lexpgn.l" has no "common.h" but instead it uses some other header files. At a first glance it seems to be a bit more decoupled from the engine, though.

Sven

Michel · Post by **Michel** » Sat Jun 22, 2013 2:08 pm

BTW. xboard also has a pgn parser and it does not depend on lex/yacc.

lucasart · Post by **lucasart** » Sat Jun 22, 2013 2:29 pm

Thanks for the help. I do realize that the GNU parsing code has some dependancy, which in turn bring more dependancies etc. I just wanted to have one as an example to follow, and as a kind of tutorial (so that when I write my own Lex file, and I ask myself, "how do you do this or that in Lex?" I have the answers in it). I'm sufficiently fluent in regular expressions, but still barely a Padawan in Lex & Yacc.

The Xboard PGN parser is surely a good place to start from too. I'll have a look, and if it's easy to extract from the rest, I'll take it. I'm just a bit worried that the Xboard source code, due to its generality (all the chess variants supported), may present a steep learning curve.

The third solution is to continue sweating on my crappy hand written lexer/parser, but in doing so I'm not really learning anything. At least Lex & Yacc are very powerful tools to learn, and can be reused in another context later.

hgm · Post by **hgm** » Sat Jun 22, 2013 3:25 pm

Old XBoard versions used to have a lex-genereated parser (parser.l). I kicked it out, because it was difficult to maintain. So newer XBoard's contain a hand-written parser.c.

The support of variants does not hugely affect the parcer, as XBoard sues SAN in all variants. It just means that file ID is not necessary limited to a-h, and board ranks can be double digits, and piece ID can be any letter, rather than only PNBRQK. It does check if the square coordiantes and the piece is valid for the current variant. You would have to supply functions CharToPiece and PieceToChar to do this checking, and take care of the piece encoding the software you want to interface it uses.

More tricky thing is that it is dependent of XBoard's move-generation code, through the routines TestLegality and Disambiguate. Both of these generate all legal moves for a given position. TestLegality checks if a fully-specified input move is amongst those. Disambiguate does the same, but allows 'wild cards' for those items that were not specified in the move. It counts legal moves that match the items that were specified, and returns those in a fully specified move (and warns if there was more than one match). The XBoard versions of these routines (in the file moves.c) depend very much on XBoard's internal representation of Chess positions, which is likely very different from what you would want.

Lex / Yacc for PGN

Lex / Yacc for PGN

Re: Lex / Yacc for PGN

Re: Lex / Yacc for PGN

Re: Lex / Yacc for PGN

Re: Lex / Yacc for PGN

Re: Lex / Yacc for PGN

Re: Lex / Yacc for PGN

Re: Lex / Yacc for PGN

Re: Lex / Yacc for PGN