Open Chess Game Database Standard

dangi12012 · Post by **dangi12012** » Sat Dec 18, 2021 2:13 pm

Sopel wrote: ↑Sat Dec 18, 2021 1:04 pm
phhnguyen wrote: ↑Sat Dec 18, 2021 12:29 am
Sopel wrote: ↑Fri Dec 17, 2021 12:18 pm
phhnguyen wrote: ↑Fri Dec 17, 2021 6:14 am If you don’t need to get all results of a given hash key (hm, I doubt about that, sometimes the number of results is just small), you may get all game IDs by one read then query yourself other information one by one. So easy and so quick.
You really don't see a problem? This blob can get to hundreds of gigabytes for large databases.

This is pagination 101, come on... No one is talking about "one by one".
How come you have that number (hundreds of gigabytes)? I knew you have just made an estimate but it should have something to support it!
A single position adds 8 bytes to the blob. Lichess has ~2B games. That's 8GB for startpos blob, ~1-3GB for next position. Though this is less than my initial wrong estimate it is BAD. This is not how you do databases. It hurts me physically. I'll repeat again that one important word.

PAGINATION

and you're making simple queries actually HARD, by requiring querying more than needed, and requiring additional processing from the application and further queries

btw. this is also an important read https://en.wikipedia.org/wiki/Database_normalization

Why dont you just open your code editor and make a branch where you proof your wild claims?
From the beginning you just hated on this project without ever backing up anything. Just saying its bad is not a proof for anything. Its clear that you are not an SQL expert - just enough half knowledge that you think you are at the top in the dunning-kruger curve...

Sopel · Post by **Sopel** » Sat Dec 18, 2021 2:30 pm

dangi12012 wrote: ↑Sat Dec 18, 2021 2:13 pm
Sopel wrote: ↑Sat Dec 18, 2021 1:04 pm
phhnguyen wrote: ↑Sat Dec 18, 2021 12:29 am
Sopel wrote: ↑Fri Dec 17, 2021 12:18 pm
phhnguyen wrote: ↑Fri Dec 17, 2021 6:14 am If you don’t need to get all results of a given hash key (hm, I doubt about that, sometimes the number of results is just small), you may get all game IDs by one read then query yourself other information one by one. So easy and so quick.
You really don't see a problem? This blob can get to hundreds of gigabytes for large databases.

This is pagination 101, come on... No one is talking about "one by one".
How come you have that number (hundreds of gigabytes)? I knew you have just made an estimate but it should have something to support it!
A single position adds 8 bytes to the blob. Lichess has ~2B games. That's 8GB for startpos blob, ~1-3GB for next position. Though this is less than my initial wrong estimate it is BAD. This is not how you do databases. It hurts me physically. I'll repeat again that one important word.

PAGINATION

and you're making simple queries actually HARD, by requiring querying more than needed, and requiring additional processing from the application and further queries

btw. this is also an important read https://en.wikipedia.org/wiki/Database_normalization
Why dont you just open your code editor and make a branch where you proof your wild claims?
From the beginning you just hated on this project without ever backing up anything. Just saying its bad is not a proof for anything. Its clear that you are not an SQL expert - just enough half knowledge that you think you are at the top in the dunning-kruger curve...

There already is code proving me right, and it's the code before the blobs were introduced. Now, you might wonder, why didn't it look better in actual tests then? I'll tell you why. Because the tests didn't do PAGINATION like any normal person would.

dangi12012 · Post by **dangi12012** » Sat Dec 18, 2021 3:06 pm

Sopel wrote: ↑Sat Dec 18, 2021 2:30 pm There already is code proving me right, and it's the code before the blobs were introduced. Now, you might wonder, why didn't it look better in actual tests then? I'll tell you why. Because the tests didn't do PAGINATION like any normal person would.

Come on Sopel. Everyone knows you are a troll already. You just make up stuff at this point. OP asked you about how you come to 100GB? The truth is: Sopel loves to derail any topic he touches with unbacked claims and half truths.

phhnguyen: Notice how he never provides an alternative. I cant think of a better solution for querying in sql efficiently since the blob is the replacement for another table only containing two int columns as a mappig. (A blob- list replacement).

And you have PROVEN that its fast enough for actual gui use.
Now we just have to implement it into a gui

As far as i can see its ready to use?
What do you think phhnguyen?

Sopel · Post by **Sopel** » Sat Dec 18, 2021 3:21 pm

Well, if this "blob" implementation is preserved then SCID will be an obviously superior choice for positional queries, while also being much faster.

dangi12012 · Post by **dangi12012** » Sat Dec 18, 2021 3:24 pm

Sopel wrote: ↑Sat Dec 18, 2021 3:21 pm Well, if this "blob" implementation is preserved then SCID will be an obviously superior choice for positional queries, while also being much faster.

SCID cannot even store that many positions and games... Which you know also.
But you keep hating on SQL on amount of data that SCID cannot even store.

Troll be gone.

ndbd · Post by **ndbd** » Sun Dec 19, 2021 10:37 pm

I am lurking here and reading this thread with interest. First I want to say that I think BanksiaGUI is a great program and obviously Pham is a talented programmer. But:

There are two existing widely used database formats, CBH/CBF/CBV (Chessbase) and SCID. All the issues that are talked about in this thread have been mostly solved already with these formats. Moreover it is not difficult to enhance these formats. They are, after all, essentially binary database tables with custom index informations, ideally suited for chess.

The original intention here was to figure out if SQL or SQLlite creates an _easy_ and straight-forward way of maintaining, storing and accessing game information, and all that without required space not going rampant. With several tries required by a skilled programmer, it is clear that it's not straight-forward and so easy, and that thought must be put into everything.

The fact that Bill Forster (Tarrasch GUI, has a blog post about that) didn't get it working quickly and opted for an in-memory approach supports the above. Indeed, it is very likely that Shane Hudson also considered SQL when designing SCID.

There is a vast material out there in CBH/CBF/CBV and SCID, and it is most easy for users if they can keep their data in the format. It also makes it easy for them to transition from an existing program to another.

If the time to figure out how to get a clean fast and nice SQL implementation would have been spent to document the SCID file-format and implement it, a fast solution might already be in production. There are some issues with SCID4, like the hard-coded index value which limits databases to maximum of 2^32 games, but these things can be easily extended. It is also not difficult to imagine more tables (more indices) if one wants to index certain searches more efficiently. Even though position search is so fast in SCID, that it might not be required.

As for CBH/CBF/CBV, there is quite a degree of information out there, for example here github.com/antoyo/uncbv or even here on talkchess.

Again, I'd love to see database support coming to BanksiaGUI, but I think you guys make everything more difficult than it has to be. Also there are some things that I think are currently not addressed, like searching for comments within games...

emadsen · Post by **emadsen** » Mon Dec 20, 2021 3:26 am

I've been reading this thread too and find the SQL discussion interesting. Kudos to Nguyen Hong Pham for documenting his attempts to construct a SQL-based chess game database. I appreciate his effort and the details he's shared.

ndbd wrote: ↑Sun Dec 19, 2021 10:37 pm The original intention here was to figure out if SQL or SQLlite creates an _easy_ and straight-forward way of maintaining, storing and accessing game information, and all that without required space not going rampant. With several tries required by a skilled programmer, it is clear that it's not straight-forward and so easy, and that thought must be put into everything.

The fact that Bill Forster (Tarrasch GUI, has a blog post about that) didn't get it working quickly and opted for an in-memory approach supports the above. Indeed, it is very likely that Shane Hudson also considered SQL when designing SCID.

We'll find how well it performs once the product launches at the end of the month, but I'd like to point out the new version of Hiarcs Chess Explorer uses SQLite. See "HCE NextGen will internally use Sqlite database" stated by a developer on the Hiarcs team.

dangi12012 · Post by **dangi12012** » Mon Dec 20, 2021 1:49 pm

I can confirm that the sql works as a good and fast lookup method in C#

Of course the "Book explorer" like lichess has to be implemented. And icons are missing (work in progress).

I will release this gui as normal winforms controls in a nuget package.
Could not be easier to use.

With SCID i am not sure I could use any programming language I want to without writing a wrapper?
With sqlite the adapter to the db already exists for pretty much any language out there.

So having an "Open Chess Game Database Standard" that has preprepared databases is very nice to have!

mvanthoor · Post by **mvanthoor** » Mon Dec 20, 2021 3:22 pm

emadsen wrote: ↑Mon Dec 20, 2021 3:26 am We'll find how well it performs once the product launches at the end of the month, but I'd like to point out the new version of Hiarcs Chess Explorer uses SQLite. See "HCE NextGen will internally use Sqlite database" stated by a developer on the Hiarcs team.

It would be great if SQL can somehow be parallelized when searching and storing. The main problem in ChessBase (at least my old version 12 from 2012) is that it is completely single-threaded. I don't know if this is still the case with ChessBase 16, but if so, buying a computer with more cores is completely useless.

One of the greatest boons of using SQL to store data such as this is that it can be "easily" converted from one database to another. (It's just lots of work to do.)

dangi12012 · Post by **dangi12012** » Mon Dec 20, 2021 4:44 pm

mvanthoor wrote: ↑Mon Dec 20, 2021 3:22 pm It would be great if SQL can somehow be parallelized when searching and storing. The main problem in ChessBase (at least my old version 12 from 2012) is that it is completely single-threaded. I don't know if this is still the case with ChessBase 16, but if so, buying a computer with more cores is completely useless.

Actually it can be done in parallel because the switch to multithreading happend in the sqlite sourcecode 10 years ago.
https://stackoverflow.com/questions/406 ... ent-access

That goes for reading access (where you have real concurrency)
For inserts it will be singlethreaded via a mutex.

This scales really well accross multiple cores

Open Chess Game Database Standard

Re: Open Chess Game Database Standard

Re: Open Chess Game Database Standard

Re: Open Chess Game Database Standard

Re: Open Chess Game Database Standard

Re: Open Chess Game Database Standard

Re: Open Chess Game Database Standard

Re: Open Chess Game Database Standard

Re: Open Chess Game Database Standard

Re: Open Chess Game Database Standard

Re: Open Chess Game Database Standard