looking for a complex PGN file for testing PGN parser

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

User avatar
trojanfoe
Posts: 65
Joined: Sun Jul 31, 2011 11:57 am
Location: Waterlooville, Hampshire, UK

Re: looking for a complex PGN file for testing PGN parser

Post by trojanfoe »

trojanfoe wrote: Yeah that causes my parser problems, however where does the PGN spec state that 'unexpected characters are skipped'?

-A
Anyway, I added a 'relaxed parsing' mode to my PGN scanner/parser, and get this as a result:

Code: Select all

[Event "This is an event with a ""]
[Site "This site has \\\\ xxx"]
[Date "1996.08.15"]
[Round "?"]
[White "White"]
[Black "Black"]
[SetUp "1"]
[FEN "1nbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/1NBQKBNR w Kk - 0 1"]
[Result "1/2-1/2"]

1. e4 (1. d4 {comment} 1... d5 $3 2. c4 $50 (2. Nf3 $2)) 1... e5 2. Nf3 {
Comment !} 2... Nc6 3. Nc3 Nf6 4. Bc4 Bc5 5. O-O O-O 1/2-1/2
User avatar
hgm
Posts: 27701
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: looking for a complex PGN file for testing PGN parser

Post by hgm »

trojanfoe wrote:..., however where does the PGN spec state that 'unexpected characters are skipped'?
What else would you do with them? In any parser it is crucial to have some form of error recovery. If the only form of error handling you implement is aborting with an error message at the first violation of the standard, it would be completely useless in practice, no matter how eloborate your implemntation of the standard...
User avatar
trojanfoe
Posts: 65
Joined: Sun Jul 31, 2011 11:57 am
Location: Waterlooville, Hampshire, UK

Re: looking for a complex PGN file for testing PGN parser

Post by trojanfoe »

hgm wrote:
trojanfoe wrote:..., however where does the PGN spec state that 'unexpected characters are skipped'?
What else would you do with them? In any parser it is crucial to have some form of error recovery. If the only form of error handling you implement is aborting with an error message at the first violation of the standard, it would be completely useless in practice, no matter how eloborate your implemntation of the standard...
Not necessarily; aborting with an error is reasonable (parsing can always continue on the next game, which should be easy to find). The PGN spec doesn't mention them so it's reasonable to not expect them.

-A
User avatar
hgm
Posts: 27701
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: looking for a complex PGN file for testing PGN parser

Post by hgm »

Well, the specs also don't say that you have to skip the rest of the game...
User avatar
trojanfoe
Posts: 65
Joined: Sun Jul 31, 2011 11:57 am
Location: Waterlooville, Hampshire, UK

Re: looking for a complex PGN file for testing PGN parser

Post by trojanfoe »

hgm wrote:Well, the specs also don't say that you have to skip the rest of the game...
That is true. This clearly causes inconsistent behaviour and uncertainty about what is valid PGN and what isn't. Doesn't make for much of a standard.

-A
User avatar
hgm
Posts: 27701
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: looking for a complex PGN file for testing PGN parser

Post by hgm »

Oh, I am sure it is invalid PGN. The question is more what to do when you encounter invalid PGN, to please the user most.

I tried to build an opening book once from a huge PGN file with Polyglot. It was an absolute disaster. Polyglot always exited on the first error it encounted, and wouldn't tell you about any other error. So you had to repair it, run again, get the next error, repair it... And there were hundreds of errors!
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: looking for a complex PGN file for testing PGN parser

Post by bob »

trojanfoe wrote:
bob wrote:
casaschi wrote:Hello.

I'm looking for a complex PGN file for testing a PGN parser. Something with complex variations and a variety of test cases.

I thought something like that should be already available, but I could not find anything from google.

Anyone with a link to such a test PGN file?

Thanks in advance.
The "enormous.pgn" file on my ftp box is daunting. In one game, there are comments nested 17 levels deep. If that doesn't break your parsing, it will probable work for most everything. You will also find all the usual nonsense, wrong o-o characters (should be alpha-o, not zero), and moves like 1.e4 with no space, etc...
What do you mean by 'comment nesting'?
Something like this:

1. e4 e5 {1... d4 2. exd5 Qxd5 3. Nc3 Qa5 {3... Qd8 d4}}

Or something similar. Ditto for parens. All the stuff between the first { or ( and the matching closing one is not a part of the actual game...
User avatar
hgm
Posts: 27701
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: looking for a complex PGN file for testing PGN parser

Post by hgm »

That is actually faulty PGN. According to the standard, braces do not nest, and the first } closes the comment, even if there where 20 unclosed { before it. When you write

1. e4 { e5 { 2. Nf3 Nf6 } e6 } 2. c4

then e6 is part of the game, } is an unexpected garbage character.
User avatar
trojanfoe
Posts: 65
Joined: Sun Jul 31, 2011 11:57 am
Location: Waterlooville, Hampshire, UK

Re: looking for a complex PGN file for testing PGN parser

Post by trojanfoe »

hgm wrote:Oh, I am sure it is invalid PGN. The question is more what to do when you encounter invalid PGN, to please the user most.

I tried to build an opening book once from a huge PGN file with Polyglot. It was an absolute disaster. Polyglot always exited on the first error it encounted, and wouldn't tell you about any other error. So you had to repair it, run again, get the next error, repair it... And there were hundreds of errors!
Well I agree with that. I think the best solution is to provide a 'relaxed mode' where the user can elect to allow (and ignore) errors if they want. The correct behaviour is then chosen by the user, and developers don't need to second-guess what the user wants.

-A
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: looking for a complex PGN file for testing PGN parser

Post by bob »

Edmund wrote:just to be more precise.

Comments may not be nested according to the pgn specification. Only Variations can. Its the difference between {} and ().
5: Commentary


Comment text may appear in PGN data. There are two kinds of comments. The
first kind is the "rest of line" comment; this comment type starts with a
semicolon character and continues to the end of the line. The second kind
starts with a left brace character and continues to the next right brace
character. Comments cannot appear inside any token.


Brace comments do not nest; a left brace character appearing in a brace comment
loses its special meaning and is ignored. A semicolon appearing inside of a
brace comment loses its special meaning and is ignored. Braces appearing
inside of a semicolon comments lose their special meaning and are ignored.
8.2.5: Movetext RAV (Recursive Annotation Variation)


An RAV (Recursive Annotation Variation) is a sequence of movetext containing
one or more moves enclosed in parentheses. An RAV is used to represent an
alternative variation. The alternate move sequence given by an RAV is one that
may be legally played by first unplaying the move that appears immediately
prior to the RAV. Because the RAV is a recursive construct, it may be nested.
There is theory and there is reality...

They are rarely the "same thing".

You can either rigidly follow the PGN standard, and fail to read a lot of PGN games that are perfectly useful, if not perfectly formed, or you can read what is actually out there. :)