trojanfoe wrote:..., however where does the PGN spec state that 'unexpected characters are skipped'?
What else would you do with them? In any parser it is crucial to have some form of error recovery. If the only form of error handling you implement is aborting with an error message at the first violation of the standard, it would be completely useless in practice, no matter how eloborate your implemntation of the standard...
trojanfoe wrote:..., however where does the PGN spec state that 'unexpected characters are skipped'?
What else would you do with them? In any parser it is crucial to have some form of error recovery. If the only form of error handling you implement is aborting with an error message at the first violation of the standard, it would be completely useless in practice, no matter how eloborate your implemntation of the standard...
Not necessarily; aborting with an error is reasonable (parsing can always continue on the next game, which should be easy to find). The PGN spec doesn't mention them so it's reasonable to not expect them.
Oh, I am sure it is invalid PGN. The question is more what to do when you encounter invalid PGN, to please the user most.
I tried to build an opening book once from a huge PGN file with Polyglot. It was an absolute disaster. Polyglot always exited on the first error it encounted, and wouldn't tell you about any other error. So you had to repair it, run again, get the next error, repair it... And there were hundreds of errors!
I'm looking for a complex PGN file for testing a PGN parser. Something with complex variations and a variety of test cases.
I thought something like that should be already available, but I could not find anything from google.
Anyone with a link to such a test PGN file?
Thanks in advance.
The "enormous.pgn" file on my ftp box is daunting. In one game, there are comments nested 17 levels deep. If that doesn't break your parsing, it will probable work for most everything. You will also find all the usual nonsense, wrong o-o characters (should be alpha-o, not zero), and moves like 1.e4 with no space, etc...
That is actually faulty PGN. According to the standard, braces do not nest, and the first } closes the comment, even if there where 20 unclosed { before it. When you write
1. e4 { e5 { 2. Nf3 Nf6 } e6 } 2. c4
then e6 is part of the game, } is an unexpected garbage character.
hgm wrote:Oh, I am sure it is invalid PGN. The question is more what to do when you encounter invalid PGN, to please the user most.
I tried to build an opening book once from a huge PGN file with Polyglot. It was an absolute disaster. Polyglot always exited on the first error it encounted, and wouldn't tell you about any other error. So you had to repair it, run again, get the next error, repair it... And there were hundreds of errors!
Well I agree with that. I think the best solution is to provide a 'relaxed mode' where the user can elect to allow (and ignore) errors if they want. The correct behaviour is then chosen by the user, and developers don't need to second-guess what the user wants.
Comments may not be nested according to the pgn specification. Only Variations can. Its the difference between {} and ().
5: Commentary
Comment text may appear in PGN data. There are two kinds of comments. The
first kind is the "rest of line" comment; this comment type starts with a
semicolon character and continues to the end of the line. The second kind
starts with a left brace character and continues to the next right brace
character. Comments cannot appear inside any token.
Brace comments do not nest; a left brace character appearing in a brace comment
loses its special meaning and is ignored. A semicolon appearing inside of a
brace comment loses its special meaning and is ignored. Braces appearing
inside of a semicolon comments lose their special meaning and are ignored.
An RAV (Recursive Annotation Variation) is a sequence of movetext containing
one or more moves enclosed in parentheses. An RAV is used to represent an
alternative variation. The alternate move sequence given by an RAV is one that
may be legally played by first unplaying the move that appears immediately
prior to the RAV. Because the RAV is a recursive construct, it may be nested.
There is theory and there is reality...
They are rarely the "same thing".
You can either rigidly follow the PGN standard, and fail to read a lot of PGN games that are perfectly useful, if not perfectly formed, or you can read what is actually out there.