New Interchange Protocol / Alternative to PGN
Moderator: Ras
-
mar
- Posts: 2673
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: New Interchange Protocol / Alternative to PGN
Yes, XML is horrible bloated piece of .... There are much simpler alternatives that are much more readable and straightforward to parse.
-
hgm
- Posts: 28458
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: New Interchange Protocol / Alternative to PGN
I must admit that I don't really see the need for the XML format. (But one could say the same for RTF format for storing MS Word documents.) It seems a bit like disassembling binary machine code. Nice for hackers that want to directly interfere with the format. But normal users would edit game files by loading the binary format in some application that can handle it, would edit it there, and then save it as binary again. I think there is only a very small niche for a readable format that is not optimized for reading by humans. Probably limited to programmers that want to check if their applications produce the proper format.
-
gcramer
- Posts: 40
- Joined: Mon Oct 28, 2013 11:21 pm
- Location: Bad Homburg, Germany
Re: New Interchange Protocol / Alternative to PGN
Oops, this is possible without complex chess logic!!? I will have a look into this stuff soon, it's indeed a very fine feature to convert all chess variants.hgm wrote:The idea of the VariantMen tag is that it would supply all the rule knowledge relating to piece movement. This should enable the SAN parser to do legality checking, and catch moves that were illegal according to the rules of the variant.
Thanks for your hint, I will think about how to express this in CAN, probably it's near to LAN.hgm wrote:In the protocol extension used in the WinBoard Alien Edition multi-leg moving is indicated by using the normal long-algebaric notation of WB protocol for each leg, and separate the legs by commas. This also captures multi-move variants like Marseillaise Chess, or non-standard castlings. For moves with a single piece that perform captures along the way, you could just concatenate all squares it visits.
-
gcramer
- Posts: 40
- Joined: Mon Oct 28, 2013 11:21 pm
- Location: Bad Homburg, Germany
Re: New Interchange Protocol / Alternative to PGN
I have to overwork the home page of C/CIF, I see that some important points are not yet clear expressed:hgm wrote:I must admit that I don't really see the need for the XML format...
XML is only the human readable version, and it's the basis for the definition. XML looks bloated, but the basis for C/CIF is the structure, and XML is not bloating the structure, it is the appropriate format for defining structured formats. For me it's important to have a tool which satisfies the following goals:
1. It can be used as a readable format, although this is only sugar.
2. It can be used to define the format.
3. It can be used to describe the format.
4. It can be used to talk about a structure.
5. It can be used to test a structure.
6. It is easy to parse.
7. It is easy to write.
8. There are many tools for this format.
9. It is well known and well accepted, this also means that everybody is understanding this format.
10. It's an appropriate format to write examples (nobody could read the binary format).
11. Mapping between the text format (XML) and the binary format is easy, straight forward, and performant.
XML is satisfying this, and when mapped to the binary format, which has the same structure, all the bloating will disappear.
Mapping from XML to the binary format will be done with the use of opcodes, and with the use of dictionaries for string compaction. I'm sure that the result will be much more compact than a PGN file, but I must confess that I don't have practical results about the binary format yet, I trust my experience with formal languages and compiler techniques. But before implementing all the stuff for a new format it is required to define the format. Because I'm using XML the implementation is a relatively easy thing, the writer for CIF is almost finished, except some details. The most work (about 40%) was the mapping of the country codes, a real crux.
By the way: the home page of C/CIF also provides SVG graphics to expose the structure of C/CIF. This graphic is not bloating the structure, although it takes a lot of room on the page, it is defining the raw structure (and can be used for navigation).
PS: In the past I did not like XML very much, it looks in fact bloated, and XSD (XML schema) is not easy to write, but as I defined the format of C/CIF - initially I did this without the use of XML - somehow I discovered that XML is a good tool for the definition, and incidentally also the text format CIF is available.
-
mar
- Posts: 2673
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: New Interchange Protocol / Alternative to PGN
I agree that binary representation is much more compact and parsing a binary format is lightning fast compred to a text format.
Yet I think there may be some value in having a text representation (sometimes you want to edit/check quickly), but XML is the worst choice of all.
Even tinyxml (which is supposed to be tiny) is 150kb of C++ code (at least the 2007 version I have here). That's anything but tiny for a simple data exchange format.
But when I think about it, the less complicated the better so yes probably it would be better to stick with a simple binary format to keep things simple.
(hmm, pgn has a binary format that never became popular, at least I'm not aware of anyone using it)
While I appreciate Gregor's effort, his draft seems complicated to me (haven't checked thoroughly).
Even if he goes open source (if I want others to use my format, it is sort of mandatory to provide a reference implementation),
he can either write his own xml parser or add one more external dependency.
Yet I think there may be some value in having a text representation (sometimes you want to edit/check quickly), but XML is the worst choice of all.
Even tinyxml (which is supposed to be tiny) is 150kb of C++ code (at least the 2007 version I have here). That's anything but tiny for a simple data exchange format.
But when I think about it, the less complicated the better so yes probably it would be better to stick with a simple binary format to keep things simple.
(hmm, pgn has a binary format that never became popular, at least I'm not aware of anyone using it)
While I appreciate Gregor's effort, his draft seems complicated to me (haven't checked thoroughly).
Even if he goes open source (if I want others to use my format, it is sort of mandatory to provide a reference implementation),
he can either write his own xml parser or add one more external dependency.
-
hgm
- Posts: 28458
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: New Interchange Protocol / Alternative to PGN
It depends on what you mean by 'complex Chess logic'. Of course it requires the parser to keep track of the game state (in particular board position), and know how the pieces move (when there are multiple pieces of the same type). But the VariantMen tag hands it all the knowledge needed to do that, by providing a Betza description of the participating non-standard pieces.gcramer wrote:Oops, this is possible without complex chess logic!!? I will have a look into this stuff soon, it's indeed a very fine feature to convert all chess variants.
The point was that in principle it can be done. But it won't be as trivial or efficient as parsing CAN, of course. For many applications that would not matter, however.
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: New Interchange Protocol / Alternative to PGN
A couple of points.gcramer wrote:Please do not compare the look and feel of PGN with C/CIF. PGN is designed to be human readable, but C/CIF is designed for a loss-free transfer of complex chess archives, PGN does not fit this goal at all. How can this format be far worse than PGN?lucasart wrote:This new XML format looks horrible. Far worse than PGN.
Furthermore please keep in mind that the human readable XML is not the primary format, the binary format CCIF is primary, and this format is not human readable at all, but quite compact, and supporting all the features of a modern chess application. Conversion between CIF (XML format) and CCIF (binary format) is quite simple. The XML format CIF is only sugar, to have a human readable format, it's a 1:1 mapping of the binary format.
Another point: it's absolutely impossible to transfer chess games from ChessBase to Scidb or vice versa with PGN, even if no additional data, like documents or videos, are involved. PGN does not know about the existence of the various languages in the world, and PGN does only know the existence of post commentaries and NAGs. But ChessBase and Scidb are quite more elaborated than PGN can support. In general a chess game is ruined with the transfer via PGN between ChessBase and Scidb.
Shortly summarized: C/CIF is not a replacement or successor of PGN, it's an additional format. For the transfer of plain games PGN is the primary format. But for the transfer of complex archives PGN isn't usable at all.
(1) yes, PGN is human-readable, which I consider an advantage. I like to be able to look at individual games, and wading through the XML stuff is a pain.
(2) I have PGN collections with tens of millions of games. I've not run into any problems at all other than raw file size. And that is a problem no matter what format you use when you start collecting all of the available online game archives.
(3) the basic problem remains. If someone wants to use your new format, they are going to have to write/add code to their engine, just as they do for PGN. Thankfully, PGN is really intended to encapsulate games and make them transferable, which it does quite well. All of the "chessbase" stuff you mention is intended for human consumption, not chess engines which could care less about what humans think as the game progresses.
(4) XML is certainly going to bloat things badly, size-wise.
-
gcramer
- Posts: 40
- Joined: Mon Oct 28, 2013 11:21 pm
- Location: Bad Homburg, Germany
Re: New Interchange Protocol / Alternative to PGN
Yes, for such tasks PGN is more appropriate. But is seems that there is a good chance to convert the CAN notation to SAN without too much effort. This means it might be possible to build a simple viewer for this format, displaying the very appreciated SAN notation. C/CIF will not only provide a data interchange format, it is also planned that C/CIF is providing some tools for this format.bob wrote:(1) yes, PGN is human-readable, which I consider an advantage. I like to be able to look at individual games, and wading through the XML stuff is a pain.
I know the problem with the required space on disk. Probably the binary format CCIF will be significantly more compact than a PGN archive (CCIF will use some simple compaction techniques), but this is not yet tested.bob wrote:(2) I have PGN collections with tens of millions of games. I've not run into any problems at all other than raw file size. And that is a problem no matter what format you use when you start collecting all of the available online game archives.
I agree that for a chess engine C/CIF is currently not of interest, but probably this might depend on the supported chess variants of the engine. In fact for chess applications like chess databases PGN is not usable for a loss-free data interchange. Even some basic features, for example multilingual comments, are not supported by PGN.bob wrote:(3) the basic problem remains. If someone wants to use your new format, they are going to have to write/add code to their engine, just as they do for PGN. Thankfully, PGN is really intended to encapsulate games and make them transferable, which it does quite well. All of the "chessbase" stuff you mention is intended for human consumption, not chess engines which could care less about what humans think as the game progresses.
CIF (the XML format) is not the primary format - CIF is defining the format - and I'm optimistic that the primary format CCIF (a compact binary format) will produce smaller archives than PGN.bob wrote:(4) XML is certainly going to bloat things badly, size-wise.
-
gcramer
- Posts: 40
- Joined: Mon Oct 28, 2013 11:21 pm
- Location: Bad Homburg, Germany
Re: New Interchange Protocol / Alternative to PGN
I guess that the degree of compaction of this binary format is too small compared to .zip or .gz.mar wrote:(hmm, pgn has a binary format that never became popular, at least I'm not aware of anyone using it)
C/CIF is defined for data interchange, and it is planned that this format is useful for any application. This is not an easy thing, because I don't know about the features of any application. So the format is defined in a way that any application can add "extensions" without a violation of the defined standard.mar wrote:While I appreciate Gregor's effort, his draft seems complicated to me (haven't checked thoroughly).
And some details are in fact super-complicated, for example the application independent mapping of country codes. But the C/CIF library will provide many useful functions for the mapping (already written).
Of course, C/CIF will be open source, and it will provide a reference implementation.mar wrote:Even if he goes open source...
It's always the same problem with the dependencies. Not a problem under Linux/Unix, for example in any Linux distro the XML library expat is available, but Windows is a crux, not providing any useful open stuff. Fortunately the usage of XML in C/CIF is low level, this means that writing an own XML parser for the Windows version might be a quite easy task. (All the HTML pages of the C/CIF project are generated with Tcl scripts, I've written a super-simple XML parser even in Tcl.)mar wrote:...he can either write his own xml parser or add one more external dependency.
-
kinderchocolate
- Posts: 454
- Joined: Mon Nov 01, 2010 6:55 am
- Full name: Ted Wong
Re: New Interchange Protocol / Alternative to PGN
Gregor, I'm not so sure your proposed format will get any wide acceptance. Everything that you said was from what a programmer would think. But the world is dictated by users, people who don't know anything about PGN parsing. They don't care you have 8000 or 80 million lines of code in your parser. All they care is whether they can read a chess game. If they don't understand the new format, they won't use it, no GUI will use it, and you'll be forced to give up on it. You don't have a choice but to support the clumsy but easily readable PGN format.