I am planning a modifications to my .pgn handling in my GUI (Tarrasch Chess GUI, www.triplehappy.com). I am hoping that I might get some useful feedback here before making commitments that are hard to back out of.
The changes I am going to make concern;
1) Semicolon comments
2) Line endings
At the moment I support both types of comments (brace comments and semicolon comments) for both reading and writing. I propose changing this so that Tarrasch never writes out semicolon comments. The reason is that some tools choke on semicolon comments (example: Chessbase 9, I don't have more recent versions to test). I find this very surprising, but I am a pragmatist, and I don't want my users to create .pgn files that are technically correct, but then cause the user problems. The main motivation behind my use of semicolon comments is that I want to allow closing brace characters in comments. I am trying to compensate for the fact that comments don't nest in .pgn. But I can't provide a complete solution with semicolon comments anyway, so I will find some other incomplete solution, eg "|>" translates to "}", at least within Tarrasch itself. Or some other escaping solution.
As for line endings, Tarrasch is a Windows only tool (at the moment). When Tarrasch writes .pgn files it uses DOS/Windows line ending, so CR,LF. When Tarrasch reads .pgn files it is ambivalent and supports CR,LF or Unix style LF endings. I propose changing Tarrasch so that it writes Unix style LF endings. It will continue to support either type of line ending when reading. I have been doing some Tarrasch development in a Unix enviroment recently, and the new approach will make it easier to have a cross-platform version of Tarrasch. Yesterday a user reported to me that both Scid and Scid vs PC reject .pgn files with Windows line endings even on Windows. I find that very surprising, astonishing really. Unix style text files are second class citizens on Windows, you cannot edit them with the most basic text processing files on that platform (Notepad for example). I can understand writing Unix line endings, but I can't understand rejecting Windows line endings on Windows. Anyway, the whole incident reinforced my feeling that the pragmatic thing to do is to write Unix files on all platforms (and continue to accept either type of file when reading, on all platforms).
Constructive comments and feedback most welcome. Thanks in advance!
(I am particularly interested in knowing about important tools that do NOT choke on semicolon comments, just for my interest).
Advice on .pgn format issues for my chess GUI
Moderators: hgm, Dann Corbit, Harvey Williamson
-
Bill Forster
- Posts: 76
- Joined: Mon Sep 21, 2015 7:47 am
- Location: New Zealand
-
Ferdy
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Advice on .pgn format issues for my chess GUI
CB12 accepts pgn with ;Ruy Lopez comment. But when you copy it from CB to text file,
it will convert ;Ruy Lopez comment to {;Ruy Lopez}
The versatile pgn-extract tool does not recognize ; comment, same with Hiarcs chess explorer (HCE), and Aquarium chess gui.
it will convert ;Ruy Lopez comment to {;Ruy Lopez}
The versatile pgn-extract tool does not recognize ; comment, same with Hiarcs chess explorer (HCE), and Aquarium chess gui.
-
Evert
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: Advice on .pgn format issues for my chess GUI
Scid is originally a Linux program, so not supporting Windows-style text files is probably an oversight when porting it.Bill Forster wrote:Yesterday a user reported to me that both Scid and Scid vs PC reject .pgn files with Windows line endings even on Windows. I find that very surprising, astonishing really. Unix style text files are second class citizens on Windows, you cannot edit them with the most basic text processing files on that platform (Notepad for example). I can understand writing Unix line endings, but I can't understand rejecting Windows line endings on Windows. Anyway, the whole incident reinforced my feeling that the pragmatic thing to do is to write Unix files on all platforms (and continue to accept either type of file when reading, on all platforms).
Anyway, the correct way to handle text files is:
1. Support both CR and CR/LF line endings when reading a file (I suppose you should support LF endings too for completeness sake, but I don't think any current platform still uses those).
2. Write new files in whatever the native format is for the current platform (on Windows and in C, using "\n" to terminate a format string converts to "\n\r" when writing to a file opened in text mode; Linux, UNIX and OS X make no distinction between text and binary files).
3. If you're writing to an existing file, either convert the whole thing to the native format, or preserve the format it was in originally. Otherwise things are going to be messed up. Badly.
Having said all that, I'm actually all for everyone adopting CR line endings and getting rid of the whole messy distinction between "text" and "binary" files, so feel free to ignore point 2 above if you want.
-
hgm
- Posts: 27703
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Advice on .pgn format issues for my chess GUI
One correction: Linux uses LF (ascii 012 = 10), not CR (ascii 015) as line terminator.
-
jdart
- Posts: 4361
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Advice on .pgn format issues for my chess GUI
I think you should write files in the platform line ending mode, for the reason you stated: not doing so makes them unviewable in Notepad and similar Windows tools.
--Jon
--Jon
-
hgm
- Posts: 27703
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Advice on .pgn format issues for my chess GUI
Not being able to process PGN files in Windows text format is obviously a bad SCID bug. I think it is always unwise to cater to bugs of other software. Problems should be fixed where they ly. If a work-around is required it would be best to provide it in the form of a 'dos2linux' utility that would remove the CR from a given file. People can then apply that for files they want to feed to SCID, and have the advantage that it would work for any PGN file, no matter what source it is from. PGN files created by other Windows Chess software would also be in Windows text format, as would be PGN games copied from websites (like this forum).
That being said, I can report the following: I regularly transfer text files between Linux and Windows, and WordPad, unlike NotePad, is actually able to understand the Linux format. It always saves it in native Windows text format, so I use it as a linux2dos utility. Since Ubuntu 10.04 the Linux 'edit' application does respect the original format of the edited file, and is actually smart enough to add the CR in lines you added when the surrounding lines also have a CR.
That being said, I can report the following: I regularly transfer text files between Linux and Windows, and WordPad, unlike NotePad, is actually able to understand the Linux format. It always saves it in native Windows text format, so I use it as a linux2dos utility. Since Ubuntu 10.04 the Linux 'edit' application does respect the original format of the edited file, and is actually smart enough to add the CR in lines you added when the surrounding lines also have a CR.
-
mar
- Posts: 2552
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Advice on .pgn format issues for my chess GUI
Absolutely. When I write a text file as binary, I alway use LF (10 dec or 0xa hex) as EOL.Evert wrote:Having said all that, I'm actually all for everyone adopting CR line endings and getting rid of the whole messy distinction between "text" and "binary" files, so feel free to ignore point 2 above if you want.
Things are a bit more complicated though:
Unix uses LF,
ancient macs use CR
and MS-DOS/Windows use CR,LF, probably to allow text files to be sent directly to (ancient) printers.
Any lexer should be able to parse any EOL:
Code: Select all
0xa => EOL
0xd => peek next char:
0xa => consume + report EOL else unget + report EOL
-
Bill Forster
- Posts: 76
- Joined: Mon Sep 21, 2015 7:47 am
- Location: New Zealand
Re: Advice on .pgn format issues for my chess GUI
A summary of the feedback to date;
Semicolon comments; Many (most? nearly all?) tools don't support these. Nobody seems surprised. Changing Tarrasch so it doesn't write these comments seems to be a no-brainer.
EOL handling. This is much harder. Again nobody seems surprised that Scid, a popular and widely used tool (I imagine), has what seems to be a show-stopping flaw on Windows. It basically has no interoperability with other Windows chess programs. Maybe my google-fu is weak, but I haven't found any references to this elsewhere. It's surprising to me anyway. The consensus of opinion so far is that I should continue to use the most obvious candidate as best practice. That is; I should write .pgn according to the platform text convention and read .pgn files of either convention. I think the ancient (pre-Unix) Mac convention of CR line endings can be safely ignored at this point, so there are basically two relevant conventions Unix (LF) and Windows (CR,LF).
I still think there is a good case for changing to LF only on all platforms. The .pgn spec deprecates CR,LF. Windows seems to be losing this particular battle and most Windows tools seem quite happy to accept LF. Unix on the other hand seems to have the moral high ground (LF is simpler and better than CR,LF; CR,LF is nothing but a reflection of ancient technology) and Unix tends to make no concessions to Windows. Writing LF on all platforms and reading either LF or CR,LF on all platforms will avoid an important interoperability issue with Scid. I am trying to appeal to non-technical users, such users have no interest in command line utilities and even the idea of using Wordpad instead of Notepad (a useful trick - thank you) won't help Scid users who cannot open a Windows convention .pgn.
The feedback to date has been very useful, I was convinced I should change to writing LF on all platforms, but now I recognise that it's (unfortunately) not a no-brainer and needs more consideration.
Semicolon comments; Many (most? nearly all?) tools don't support these. Nobody seems surprised. Changing Tarrasch so it doesn't write these comments seems to be a no-brainer.
EOL handling. This is much harder. Again nobody seems surprised that Scid, a popular and widely used tool (I imagine), has what seems to be a show-stopping flaw on Windows. It basically has no interoperability with other Windows chess programs. Maybe my google-fu is weak, but I haven't found any references to this elsewhere. It's surprising to me anyway. The consensus of opinion so far is that I should continue to use the most obvious candidate as best practice. That is; I should write .pgn according to the platform text convention and read .pgn files of either convention. I think the ancient (pre-Unix) Mac convention of CR line endings can be safely ignored at this point, so there are basically two relevant conventions Unix (LF) and Windows (CR,LF).
I still think there is a good case for changing to LF only on all platforms. The .pgn spec deprecates CR,LF. Windows seems to be losing this particular battle and most Windows tools seem quite happy to accept LF. Unix on the other hand seems to have the moral high ground (LF is simpler and better than CR,LF; CR,LF is nothing but a reflection of ancient technology) and Unix tends to make no concessions to Windows. Writing LF on all platforms and reading either LF or CR,LF on all platforms will avoid an important interoperability issue with Scid. I am trying to appeal to non-technical users, such users have no interest in command line utilities and even the idea of using Wordpad instead of Notepad (a useful trick - thank you) won't help Scid users who cannot open a Windows convention .pgn.
The feedback to date has been very useful, I was convinced I should change to writing LF on all platforms, but now I recognise that it's (unfortunately) not a no-brainer and needs more consideration.
-
Dann Corbit
- Posts: 12482
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Advice on .pgn format issues for my chess GUI
To write to the platform takes no special effort.
Reading using fgets() seems to work across Windows and Unix transparently.
For instance, I use this simple filter program to convert Unix to Windows eol sequences:
Reading using fgets() seems to work across Windows and Unix transparently.
For instance, I use this simple filter program to convert Unix to Windows eol sequences:
Code: Select all
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char string[32767];
char *getsafe(char *buffer, int count)
{
char *result = buffer, *np;
if ((buffer == NULL) || (count < 1))
result = NULL;
else if (count == 1)
*result = '\0';
else if ((result = fgets(buffer, count, stdin)) != NULL)
if (np = strchr(buffer, '\n'))
*np = '\0';
return result;
}
int main(void)
{
while (getsafe(string, sizeof string))
{
puts(string);
}
return 0;
}
-
Bill Forster
- Posts: 76
- Joined: Mon Sep 21, 2015 7:47 am
- Location: New Zealand
Re: Advice on .pgn format issues for my chess GUI
The issue here is not the implementation details, but the specification. I don't have any problem with implementation.