Page 2 of 6

Re: PGN standard, its improvement and standardization

Posted: Mon Oct 07, 2019 11:36 am
by hgm
niklasf wrote: Mon Oct 07, 2019 11:16 am
hgm wrote: Mon Oct 07, 2019 9:39 am Hiding such a PV in a normal comment (within braces) hides these from the normal display mechanisms, and thusshould be considered a very bad practice.
Unfortunately the PV often coincides with the following moves of the game, and many GUIs don't support having distinct variations for the same moves. (And for a good reason, since most of the time that's not what users want.)
I am not completely sure what you mean. But the PGN standard doesn't forbid any sequence of legal moves to be included as variation. So if GUIs do not support that, they are simply not PGN compliant, and should be fixed.

I don't understand the remark that "users do not want that". If they don't want to step through an included variation they can simply refrain from doing it. Annotated PVs being present as recursive variations doesn't clutter up the game text any more than those being present inside comments. Code that allows people to step through variations would not in itself care whether these variations were (partial) duplicats or not; it would require dedicated code to break this functionality and make the software PGN non-compliant.

Re: PGN standard, its improvement and standardization

Posted: Mon Oct 07, 2019 5:09 pm
by niklasf
niklasf wrote: Mon Oct 07, 2019 11:16 am it would require dedicated code to break this functionality and make the software PGN non-compliant.
All it takes is choosing Node := OrderedMap<Move, Node> as the datastructure for the game tree, which would otherwise be a natural choice.

Re: PGN standard, its improvement and standardization

Posted: Mon Oct 07, 2019 6:09 pm
by Fulvio
lucasart wrote: Mon Oct 07, 2019 6:45 am The PGN format is an abortion…

But your proposals fail to adress the real problem of PGN which is its (un)parsability
Some time ago I would have agreed, but then I rewrote the code for SCID and changed my mind.
The trick is to avoid a state-machine that reads the characters one at a time: https://sourceforge.net/p/chessx/code/H ... e.cpp#l746
Instead using a lexer that identifies the next token is surprisingly easy:
https://sourceforge.net/p/scid/code/ci/ ... xer.h#l161

Re: PGN standard, its improvement and standardization

Posted: Mon Oct 07, 2019 8:56 pm
by jhaglund2

Code: Select all

[Event "FIDE World Cup 2019"]
[Site "Khanty-Mansiysk RUS"]
[Date "2019.09.24"]
[Round "5.2"]
[White "Ding Liren"]
[Black "Grischuk,A"]
[Result "1-0"]
[WhiteElo "2811"]
[BlackElo "2759"]
[ECO "A20"]
[Opening "English opening"]
[TimeControl "40/240"] //e.g.
[Annotator "1. +0.53   1... +0.05"] //e.g.

1. c4 e5 2. g3 Nf6 3. Bg2 Bc5 4. d3 d5 5. cxd5 Nxd5 6. Nc3 Nb6 7. Nf3 Nc6 8. O-O
O-O 9. a3 a5 10. Na4 Nxa4 11. Qxa4 Nd4 12. Nxd4 Bxd4 13. Bd2 c6 14. e3 Bb6 15.
Bc3 Re8 16. Rfd1 Bd7 17. Rac1 h6 18. h3 Rb8 19. Rd2 Bc7 20. d4 c5 21. Qc2 exd4
22. exd4 c4 23. a4 Bd6 24. Rdd1 b6 25. Re1 Rxe1+ 26. Rxe1 Qc7 27. h4 Re8 28. Bd5
Rxe1+ 29. Bxe1 Be6 30. Bxe6 fxe6 31. Qe4 Kf7 32. Bc3 Bf8 33. d5 Qd6 34. dxe6+
Qxe6 35. Qb7+ Kg8 36. Bd4 Qf5 37. Kh2 Qc2 38. Qd5+ Kh7 39. Qf7 Qd3 40. Bc3 Qd6
41. Qxc4 Qg6 42. Bd4 1-0
This is what I think a PGN headers should include at the most. I don't think anything else is really good to know.

Adding nested evaluations should require you to make it another format,(e.g., .PGN2 .CB-PGN) coming from the source, which added the evaluations.

Re: PGN standard, its improvement and standardization

Posted: Mon Oct 07, 2019 9:32 pm
by Dann Corbit
The only change I see as important is to make the e.p. field put the vulnerability location in if and only if the capture can actually be made for the EPD section of the standard.
Having two records for every double pawn push is utterly absurd.
I even think that if the possible capturing pawn is pinned the field should be left as '-' because the e.p. capture cannot happen.
Extra tags in the PGN headers can simply be ignored if you don't like them.

For EPD, an unimportant change would be to make the pm be the list of expected moves instead of the first move of the pv. Making the pm the first move of the pv is silly and redundant.

I also add my own EPD tags, but those can be ignored also, so I do not think extra tags should matter much.
The grammar for EPD tags is simple:
<tag> <value> <semicolon>
And so if <tag> is something you do not recognize, just ignore it.

Like to have:
I would like to see a standardization of the timing and evaluation data within PGN games. There are perhaps a dozen different methods currently in play, and that is a giant hassle to parse.

Re: PGN standard, its improvement and standardization

Posted: Tue Oct 08, 2019 2:34 am
by sovaz1997
lucasart wrote: Mon Oct 07, 2019 6:45 am The PGN format is an abortion…

But your proposals fail to adress the real problem of PGN which is its (un)parsability, and its use of SAN (another abortion), which requires to code full chess rules and board state to just parse moves.

In any case, you will not change anything by posting in an internet forum. Ever heard the phrase "he who codes decides" ? That's how open source progresses. Code talks, talk walks.

So if you want to move the needle *you* need to do the work of writing the code to parse you new format, and writing converter tools that are easy to use to convert to and from PGN. Even then, people won't use it because they won't get any practical benefit from it. There should be useful tools (eg. GUI) that *directly* input and out from/to this format. And *you* will have to do the work here, again.
I agree. I think we need a binary format, we must get away from the PGN format. At the same time, in order for the format to become a standard, it is necessary to develop good translation tools from PGN and vice versa, as well as a good GUI. I also see the point of separating the analysis score from the game score. Well, the analysis can be performed using several engines, I think it should be possible to save the results of such an analysis, and with all the details. This will be a more compressed format compared to PGN (especially if standard tags are used). This week I will try to come up with data organization.
Even if it does not become popular (I mean that the format will not become widespread), for myself I still see many advantages of the new format.

The format will essentially be an open source application with the following features:
1) translation into PGN and back files of various shells;
2) Commands for working with the format (adding moves, adding attributes of moves, etc.). Naturally, recursive branching must be supported;
3) Test GUI to demonstrate the opportunities of the new format.

Re: PGN standard, its improvement and standardization

Posted: Tue Oct 08, 2019 2:39 am
by sovaz1997
I did not work much with data serialization, but I think there will be something like this: the header is stored (it indicates the size of the body and the type of object) and the body. This topic welcomes the ideas of those who worked with serialization. Also, give advice: is it worth using a serialization library, or is it better to write from scratch?

Thank you all for your ideas, I already have a clearer vision of the new format.

Just thought: it is important to store all the game data at the beginning, and the moves at the end, so that there is no rewriting. The game will be a tree, in the trivial case without branching.

Re: PGN standard, its improvement and standardization

Posted: Tue Oct 08, 2019 3:55 am
by Dann Corbit
There was an AMIGA program which had an interesting method to encode PGN games that was extremely compact.

For any given position, a move generator (obviously, all encodings would have to use the same generator) creates a list of legal PGN moves.
Then, the byte written to disk is the 8 bit move number.

Another simple compaction would be to encode EPD positions with a binary form (it takes about 160 bits, IIRC).
And the PGN headers could be integer pairs where the first integer is the id for the tag name in a related database table and the second integer is the id for the value.

It makes sense to have a character "player type" for human/alpha-beta/NN/Hybrid/Centaur/etc.
In the case of chess engines, I think the bare minimum to encode is something like this:

Code: Select all

CREATE TABLE [dbo].[ChessEngine](
	[Engine] [varchar](255) NULL,
	[Elo] [int] NULL,
	[plus] [int] NULL,
	[minus] [int] NULL,
	[score_pct] [float] NULL,
	[draw_pct] [float] NULL,
	[games] [int] NULL
) 

Re: PGN standard, its improvement and standardization

Posted: Tue Oct 08, 2019 5:01 am
by Ferdy
sovaz1997 wrote: Sun Oct 06, 2019 2:47 pm Due to the fact that we have a huge number of different chess shells, I believe that it is necessary to create a new format for PGN files, which will standardize computer position evals, its analysis and additional data. With a single format, it will be much more convenient to use one PGN file in various chess shells.

For example we have TCEC comments format:

Code: Select all

{d=16, sd=51, mt=161000, tl=7049000, s=219832, n=35024512, pv=Nb3 Be7 h4 Rb8 h5 b5 Bd3 f6 exf6 Nxf6 Nd4 Na5 Qe2 Bb4 Kb1 Rb7 Nb3 Nc4 Bd4 Bd6 h6 g6 g3 Rff7 Bxc4, tb=0, h=86.3, ph=0.0, wv=1.08, R50=49, Rd=-11, Rr=-11, mb=+0+0+0+0+0,}
There was a pgn enhancement proposal in 2001 from the people in chessbase and chess assistant and others and is already practiced today.
https://www.enpassant.dk/chess/palview/enhancedpgn.htm

Move comments like { [%clk 1:20:34] } or even { [%eval 25] } are from this enhancements.
The commands clk and eval and others are not intended to be fixed as specified by this enhancements but with more usage this can eventually be considered as standard.

That d=16 can be converted to:
[%acd 16]
acd=analysis count depth from epd standard opcode.

That mt=161000 can be:
[%emt 0:02:41]
that is h:mm:ss
and emt=elapsed movetime

That s=219832 or speed or nps can be:
[%nps 219832]

Most people are now using [%eval 25] but
[%ce 25] is also good because ce is from epd standard meaning centipawn evaluation.

There is also:
[%eval 25,18]
where 18 is the depth, this is still valid according to the enhancement, but to make it more clearer:
[%ce 25] [%acd 18]
1. e4 { [%ce 25] [%acd 18] [%emt 0:0:10] }

But with computer chess engines on fast TC, emt format can be extended to include ms or milliseconds.
[%emt 0:00:00:560]
h:mm:ss:mls


References:

pgn standard:
https://opensource.apple.com/source/Che ... andard.txt

pgn enhancements:
https://www.enpassant.dk/chess/palview/enhancedpgn.htm

Chess on xml:
http://www.saremba.de/chessgml/distribution.htm

Chess programming:
https://www.chessprogramming.org/Portable_Game_Notation

Others:
http://www.saremba.de/chessgml/standard ... mplete.htm
https://python-chess.readthedocs.io/en/latest/pgn.html

Re: PGN standard, its improvement and standardization

Posted: Tue Oct 08, 2019 6:26 am
by sovaz1997
Dann Corbit wrote: Tue Oct 08, 2019 3:55 am There was an AMIGA program which had an interesting method to encode PGN games that was extremely compact.

For any given position, a move generator (obviously, all encodings would have to use the same generator) creates a list of legal PGN moves.
Then, the byte written to disk is the 8 bit move number.

Another simple compaction would be to encode EPD positions with a binary form (it takes about 160 bits, IIRC).
And the PGN headers could be integer pairs where the first integer is the id for the tag name in a related database table and the second integer is the id for the value.

It makes sense to have a character "player type" for human/alpha-beta/NN/Hybrid/Centaur/etc.
In the case of chess engines, I think the bare minimum to encode is something like this:

Code: Select all

CREATE TABLE [dbo].[ChessEngine](
	[Engine] [varchar](255) NULL,
	[Elo] [int] NULL,
	[plus] [int] NULL,
	[minus] [int] NULL,
	[score_pct] [float] NULL,
	[draw_pct] [float] NULL,
	[games] [int] NULL
) 
This is a great solution, store the move in 1 byte, I will use it. Also, I have the following idea: do not use one file for many games. Instead, use a single merge file that contains links to files with games. I think it will be more effective.