That's why I went on to create a format specification that allows both more compact storage and faster parsing speeds than PGN while preserving its tag system. It's also arguably simpler to implement - it doesn't require a move generator.
I abolished the idea of including comments and annotations within the file. While this is a killer feature for many it's also completely unnecessary for the other many in a format with the goals of compactness and speed it would be just an inconvenience to support. Tradeoffs!
A rough sketch of the format:
It is created specifically for chess. It uses assumptions based on chess rules. It may not work if a position that's not a legal Chess960 position arises.
Every file starts with a 32 byte file header with format identifier and flags.
Currently there are 2 compression levels.
- 0 - 2 bytes per move. packed from square, to square, type, promoted piece. Cannot get simpler than that.
- 1 - Almost always 1 byte per move, 2 bytes in very unusual positions. It uses an idea similar to the basis for https://lichess.org/blog/Wqa7GiAAAOIpBL ... ompression but doesn't require legal move generation and is much much faster. Cannot get smaller than that with (well, *almost*) fixed width encodings
The header contains:
- necessary information such as entry_size, header_size, ply_count, result, flags
- a custom start position if necessary
- common tags like date, white_elo, black_elo, white_player, black_player, round, site, event, eco
- may contain additional tags as key value pairs
In the future I may specify optional lz4 compression on top of everything else.
There is also a file scope flag "headerless" that allows reducing the header size to minimum, preserving only result and ply_count information, so the file is basilly just movetext.
Details can be found in the specification here: https://github.com/Sopel97/chess_pos_db ... /docs/bcgn along with pseudo-code.
A reference implementation of a file writer and reader can be found here https://github.com/Sopel97/chess_pos_db ... ess/Bcgn.h https://github.com/Sopel97/chess_pos_db ... s/Bcgn.cpp . With other relevant bits in https://github.com/Sopel97/chess_pos_db ... ss/Chess.h (CompressedMove), https://github.com/Sopel97/chess_pos_db ... Position.h (CompressedPosition) and https://github.com/Sopel97/chess_pos_db ... eIndex.cpp (compression level 1 move encoding/decoding)
I also performed some benchmarks on my oldish pc: https://github.com/Sopel97/chess_pos_db ... vs_bcgn.md
tldr; between 1.5x to 3x faster parsing than a corner-cutting pgn parser implementation
I may do some standalone tools if it gains traction.
Would you want this format to become more widely used?
Would you support it in your software?
What would you change (better do this early on!)?