Lex / Yacc for PGN

Discussion of chess software programming and technical issues.

Moderator: Ras

lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Lex / Yacc for PGN

Post by lucasart »

I was trying to write, by hand, a PGN parser, and quickly realized that it's hellishly more complicated than it looks. Already the lexer is heavily context sensitive...

Does anyone have a Lex and a Yacc file, so I don't start from scratch ?
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Henk
Posts: 7251
Joined: Mon May 27, 2013 10:31 am

Re: Lex / Yacc for PGN

Post by Henk »

lucasart wrote:I was trying to write, by hand, a PGN parser, and quickly realized that it's hellishly more complicated than it looks. Already the lexer is heavily context sensitive...

Does anyone have a Lex and a Yacc file, so I don't start from scratch ?
Of course I'm not answering your question.
But keep the scanner simple. Context should be handled in the parser not the scanner.
User avatar
Jim Ablett
Posts: 2391
Joined: Fri Jul 14, 2006 7:56 am
Location: London, England
Full name: Jim Ablett

Re: Lex / Yacc for PGN

Post by Jim Ablett »

lucasart wrote:I was trying to write, by hand, a PGN parser, and quickly realized that it's hellishly more complicated than it looks. Already the lexer is heavily context sensitive...

Does anyone have a Lex and a Yacc file, so I don't start from scratch ?
Gnuchess has the Lex stuff I think.

Jim.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Lex / Yacc for PGN

Post by Sven »

Jim Ablett wrote:
lucasart wrote:I was trying to write, by hand, a PGN parser, and quickly realized that it's hellishly more complicated than it looks. Already the lexer is heavily context sensitive...

Does anyone have a Lex and a Yacc file, so I don't start from scratch ?
Gnuchess has the Lex stuff I think.

Jim.
Yes, in Gnuchess 6.0.3 there is "src/frontend/lexpgn.l". It has some context references like Game[], GameCnt, MakeMove(), ValidateMove(), or ParseEPD(), and it includes a "common.h" file defining these functions/variables but also lots of other basic definitions like the whole Board data structure which are not related to PGN scanning/parsing, so that it can't be simply extracted and used in a different context without some modification (it was probably never meant to be). It should be possible, though, to create a modified version that has a minimal interface to the surrounding environment. E.g. accessing Game[] and GameCnt could be encapsulated through functions so that it would no longer be necessary to include a file like "common.h".

Sven
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: Lex / Yacc for PGN

Post by Michel »

If you want to recycle the pgn parser in GNU Chess it is probably best to use GNU Chess 5.50 as a base.

It doesn't have a common.h.

This is the interface as defined in pgn.h

Code: Select all

void PGNSaveToFile (const char *file, game_t *game, const char *resultstr);
void PGNReadFromFile (game_t *game, const char *file);
void PGNIterInit(pgn_iter_t *pgn_iter);
int PGNIterStart(pgn_iter_t *pgn_iter, const char *file);
void PGNIterClose(pgn_iter_t *pgn_iter);
int PGNIterNext (pgn_iter_t *pgn_iter, game_t *game);
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Lex / Yacc for PGN

Post by Sven »

Michel wrote:If you want to recycle the pgn parser in GNU Chess it is probably best to use GNU Chess 5.50 as a base.

It doesn't have a common.h.
Oops, sorry, I was looking up the 6.x version.

Right, your 5.50 "lexpgn.l" has no "common.h" but instead it uses some other header files. At a first glance it seems to be a bit more decoupled from the engine, though.

Sven
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: Lex / Yacc for PGN

Post by Michel »

BTW. xboard also has a pgn parser and it does not depend on lex/yacc.
lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Lex / Yacc for PGN

Post by lucasart »

Thanks for the help. I do realize that the GNU parsing code has some dependancy, which in turn bring more dependancies etc. I just wanted to have one as an example to follow, and as a kind of tutorial (so that when I write my own Lex file, and I ask myself, "how do you do this or that in Lex?" I have the answers in it). I'm sufficiently fluent in regular expressions, but still barely a Padawan in Lex & Yacc.

The Xboard PGN parser is surely a good place to start from too. I'll have a look, and if it's easy to extract from the rest, I'll take it. I'm just a bit worried that the Xboard source code, due to its generality (all the chess variants supported), may present a steep learning curve.

The third solution is to continue sweating on my crappy hand written lexer/parser, but in doing so I'm not really learning anything. At least Lex & Yacc are very powerful tools to learn, and can be reused in another context later.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
hgm
Posts: 28453
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Lex / Yacc for PGN

Post by hgm »

Old XBoard versions used to have a lex-genereated parser (parser.l). I kicked it out, because it was difficult to maintain. So newer XBoard's contain a hand-written parser.c.

The support of variants does not hugely affect the parcer, as XBoard sues SAN in all variants. It just means that file ID is not necessary limited to a-h, and board ranks can be double digits, and piece ID can be any letter, rather than only PNBRQK. It does check if the square coordiantes and the piece is valid for the current variant. You would have to supply functions CharToPiece and PieceToChar to do this checking, and take care of the piece encoding the software you want to interface it uses.

More tricky thing is that it is dependent of XBoard's move-generation code, through the routines TestLegality and Disambiguate. Both of these generate all legal moves for a given position. TestLegality checks if a fully-specified input move is amongst those. Disambiguate does the same, but allows 'wild cards' for those items that were not specified in the move. It counts legal moves that match the items that were specified, and returns those in a fully specified move (and warns if there was more than one match). The XBoard versions of these routines (in the file moves.c) depend very much on XBoard's internal representation of Chess positions, which is likely very different from what you would want.