Open Chess Game Database Standard

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
phhnguyen
Posts: 1447
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Open Chess Game Database Standard

Post by phhnguyen »

Open Chess Game Database Standard (OCGDB)

What: an open standard for chess game databases
Purposes/targets: for exchanging/sharing chess game databases between programs, good enough to be used directly for some web apps as well as other tools
License: the standards, codes, data are free for any use/purpose - MIT or something less restricted license
Status: starting/work in progress
Current direction: trying SQL/SQLite

Project link: https://github.com/nguyenpham/ocgdb
(There’s a small database of over 50000 games in SQLite format thus you can try ideas and measure performances)

You all are welcome to join and contribute!
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
msarchet
Posts: 13
Joined: Tue Sep 21, 2021 4:07 am
Full name: Michael Sarchet

Re: Open Chess Game Database Standard

Post by msarchet »

This is really cool! Thanks for the work on this
Panzer (still very in progress) - C++ bitboard engine https://github.com/msarchet/panzer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

Yeah open standard is good. Really really really dont want a single move to contain a full FEN string.

Position = Hash + FEN or Quadboard or Triboard
Move = ID + ParentId (its a tree in sql where NULL = root position)
Maybe?

I have a Tool for translating pgn to sql. None of the Existing pgn Parsers could do 1gb/s for a fast import of lichess dbs. Can share it here.

Questions:
only full games from start?
only legal positions?
only legal Piece configs?
4 Player chess?
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
phhnguyen
Posts: 1447
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

dangi12012 wrote: Thu Oct 21, 2021 12:44 am Yeah open standard is good. Really really really dont want a single move to contain a full FEN string.
No, the current sample is not like that. A game always has a FEN for starting (the FEN is empty if it starts from the origin position) and a list of moves. At the moment the move list is just a text (string) of moves in SAN format (you can check that in the sample database, download from the GitHub link).

Incoming time, we may try other formats to encode games such as binaries, mix... Try-and-error with benchmarks may give us a good solution.
dangi12012 wrote: Thu Oct 21, 2021 12:44 am
Position = Hash + FEN or Quadboard or Triboard
Move = ID + ParentId (its a tree in sql where NULL = root position)
Maybe?

I have a Tool for translating pgn to sql. None of the Existing pgn Parsers could do 1gb/s for a fast import of lichess dbs. Can share it here.

Questions:
only full games from start?
A game can start from any position since it contains the starting FEN.
dangi12012 wrote: Thu Oct 21, 2021 12:44 am only legal positions?
only legal Piece configs?
No limit. Users can do what they want.
dangi12012 wrote: Thu Oct 21, 2021 12:44 am 4 Player chess?
No limit. Users can do what they want. The standard should be good enough for other chess variants.

From my experience, one database should contain games of one chess variant only (should not mix multi variants in one database). There is a table "info" in which users can put info such as the variant:

Code: Select all

INSERT INTO info(name, value) VALUES ('variant', 'standard')
BTW, nothing is fixed. Just a work in progress. We all can discuss and work together. That is a standard for us.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

phhnguyen wrote: Thu Oct 21, 2021 2:37 am
dangi12012 wrote: Thu Oct 21, 2021 12:44 am
No limit. Users can do what they want. The standard should be good enough for other chess variants.

From my experience, one database should contain games of one chess variant only (should not mix multi variants in one database). There is a table "info" in which users can put info such as the variant:

Code: Select all

INSERT INTO info(name, value) VALUES ('variant', 'standard')
BTW, nothing is fixed. Just a work in progress. We all can discuss and work together. That is a standard for us.

So maybe one DB with rule compliant games only like Lichess database? So only games that start from a valid FEN and follow the normal rules of chess.
And another witch allows positions with 32 pieces or more. 10 Queens, Pawns on Same Rank as rooks? Positions with tripple check. That would be for puzzles etc.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
odomobo
Posts: 96
Joined: Fri Jul 06, 2018 1:09 am
Location: Chicago, IL
Full name: Josh Odom

Re: Open Chess Game Database Standard

Post by odomobo »

In my opinion, you need to work on the requirements. As I read it, PGN already fulfills the requirements as stated.

Since one goal is for some applications to be able to use it directly, then part of the requirements should include what kinds of queries can be made into the database, and the rough performance requirements for those queries. Without good requirements, how can you verify that a given design meets the needs?

My 2c
Sopel
Posts: 389
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Open Chess Game Database Standard

Post by Sopel »

I've written most of this on discord, but I'll repeat the most important points here.

0. This SHOULD NOT standardize a database layout.

1. The format should be DBMS-agnostic. That means the storage format needs to be specified at the lowest level possible. Otherwise you'll limit adoption to a specific DBMS.

1.a. It should really be a binary format. Consider fixed endianness to make it less complicated.

2. The biggest drawbacks of PGN are: a) text bloat, b) move notation requiring disambiguation through legal moves, c) a standard that's impossible to be compliant with

2.ab. This can be solved by using a binary format with move compression and general compression on top. I would suggest some move enumeration scheme for move compression https://www.chessprogramming.org/Encodi ... numeration (some can benefit greatly from pext, which is ubiquitous now) and zstd/snappy/lz4 for general compression on top. You can get to <6 bits per move while keeping the performance of a corner-cutting PGN parser, I know this from experience.

2.ab note. If you go with variable-length move compression consider some form of arithmetic-coding like packing, to save the fractional bits on each move.

2.c. be strict, and consider implementation feasibility

3. What data about a game is needed depends on use case. The format needs to be flexible, otherwise it will limit adoption. PGN solves this by having optional parts, like comments, variations, NAGs. I would go further and make it required to be specified in the metadata which optional parts are present, such that fetching a resource could specify what's desired and a conversion would happen on the fly. For example there are use cases where just movelist is needed and nothing else.

3.a. Stripping information should be as easy as possible and should be possible to do in a streaming manner

4. Nomalization will improve compressability but will not allow for extracting more information in an easier way. For example players Bob A. Smith and Bob Smith might the same but will not be considered the same during normalization. In other words, normalization should be an implementation detail and should not provide any guarantees.

5. Make the files concatenable through `cat`. This can be achieved by making files out of (fixed) sized blocks, with the block size at the start of each. See for example .binpack format for stockfish training data

6. Consider putting the variable sized metadata at the end of the block, to allow appending positions without rewriting the whole file.

7. Specify the range of supported positions. Normal chess? Chess 960? any position on a 8x8 board with any amount of chess pieces? This will influence move compression and move "legality" validation.

8. For storing exact positions consider either 24 byte fixed width https://codegolf.stackexchange.com/ques ... ompression or variable-width through huffman/arithmetic coding (huffman should be fine due to power of 2 counts for all pieces)

9. Consider ability for parallel parsing. The easiest way to enable it is to have the file be chunked and the chunks having a relatively small upper bound for the size (for stockfish training data we use chunks ~1MB and the upper bound, by specification, is 100MB)

10. Consider character encoding standard. This is very important to have specified. PGN doesn't and you end up issues when you try to parse a windows-1250 pgn as utf-8. Consider UTF-8 as the ONLY standard way of encoding text.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

Sopel wrote: Thu Oct 21, 2021 5:50 pm I've written most of this on discord, but I'll repeat the most important points here.

0. This SHOULD NOT standardize a database layout.

1. The format should be DBMS-agnostic. That means the storage format needs to be specified at the lowest level possible. Otherwise you'll limit adoption to a specific DBMS.

1.a. It should really be a binary format. Consider fixed endianness to make it less complicated.

2. The biggest drawbacks of PGN are: a) text bloat, b) move notation requiring disambiguation through legal moves, c) a standard that's impossible to be compliant with
Exactly wrong. If you want a binary DB this thread is not for you.
This would be a SQL representation that is more compact than pgn - contains all information in any game AND is able to query any position fast enought for user interaction.

It would be a Database that can be filled by this:
https://database.lichess.org/
or any other PGN. - could be an opening table or whatever list of games you want.

The cool stuff here is that you cannot query a pgn. You cannot query a binary format efficiently without solving B trees and other hard problems while maintaining something like data consistency.

If you need something that your engine can use during gameplay - you can still query the slower sql database and transform it into your own propretiary binary format.

A simple SQL Format is really the right choice.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
wickedpotus
Posts: 147
Joined: Sun May 16, 2021 5:33 pm
Full name: Aron Rodgriges

Re: Open Chess Game Database Standard

Post by wickedpotus »

dangi12012 wrote: Thu Oct 21, 2021 6:09 pm
Exactly wrong. If you want a binary DB this thread is not for you.
This would be a SQL representation that is more compact than pgn - contains all information in any game AND is able to query any position fast enought for user interaction.

It would be a Database that can be filled by this:
https://database.lichess.org/
or any other PGN. - could be an opening table or whatever list of games you want.

The cool stuff here is that you cannot query a pgn. You cannot query a binary format efficiently without solving B trees and other hard problems while maintaining something like data consistency.

If you need something that your engine can use during gameplay - you can still query the slower sql database and transform it into your own propretiary binary format.

A simple SQL Format is really the right choice.
SQL ?? and bloated pgn-like storage formats?

If the objective is to have an open standard format for storing chess games suitable for a wide variety of applications those both directions seem to be counterproductive to the end goal.
Sopel
Posts: 389
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Open Chess Game Database Standard

Post by Sopel »

dangi12012 wrote: Thu Oct 21, 2021 6:09 pm Exactly wrong. If you want a binary DB this thread is not for you.
This would be a SQL representation that is more compact than pgn - contains all information in any game AND is able to query any position fast enought for user interaction.

It would be a Database that can be filled by this:
https://database.lichess.org/
or any other PGN. - could be an opening table or whatever list of games you want.

The cool stuff here is that you cannot query a pgn. You cannot query a binary format efficiently without solving B trees and other hard problems while maintaining something like data consistency.

If you need something that your engine can use during gameplay - you can still query the slower sql database and transform it into your own propretiary binary format.

A simple SQL Format is really the right choice.
How do you envision the exchange format to be then? If it's gonna be tied to one DBMS then it's gonna be useless and might as well be another database software (not a standard). And please change your tone, you're being very annoying.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.