Open Chess Game Database Standard

Discussion of chess software programming and technical issues.

Moderator: Ras

Fulvio
Posts: 396
Joined: Fri Aug 12, 2016 8:43 pm

Re: Open Chess Game Database Standard

Post by Fulvio »

phhnguyen wrote: Tue Dec 28, 2021 5:25 am I believe the bitboards+indexes can be comparable with any binary one on position searching. We surely win in flexibility, say, users can query almost no limited kinds of position-searching.
A database of 3 million games is pretty small and those results still seem very slow to me.
However, if there is one thing for sure, it is that you will never win in flexibility: SCID includes a full Tcl interpreter (for example:
forum3/viewtopic.php?f=2&t=73826&p=8423 ... pt#p842350)
So, if you really want to go crazy, you can even create an sqlite database :wink: (https://www.sqlite.org/tclsqlite.html)
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Fulvio wrote: Tue Dec 28, 2021 8:13 am
phhnguyen wrote: Tue Dec 28, 2021 5:25 am I believe the bitboards+indexes can be comparable with any binary one on position searching. We surely win in flexibility, say, users can query almost no limited kinds of position-searching.
A database of 3 million games is pretty small and those results still seem very slow to me.
However, if there is one thing for sure, it is that you will never win in flexibility: SCID includes a full Tcl interpreter (for example:
forum3/viewtopic.php?f=2&t=73826&p=8423 ... pt#p842350)
We are talking about position-searching. I don't know yet if SCID (yes for SCIDvsPC) can do that task. Can you confirm?

BTW, I believe SQL itself is almost the strongest query language, it can cover almost all kinds of questions about data. Now we give it all bitboard data and functions (I will create some more functions time by time such as functions to create BB for a given rank, column...) thus it can cover almost all kinds of position-searching for chess. It should be equal to or wider coverage than any other query language.
Fulvio wrote: Tue Dec 28, 2021 8:13 am So, if you really want to go crazy, you can even create an sqlite database :wink: (https://www.sqlite.org/tclsqlite.html)
Hey, welcome to SQLite world!!! Hope you will support us by exporting your databases into a SQL format :D
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
brianr
Posts: 540
Joined: Thu Mar 09, 2006 3:01 pm
Full name: Brian Richardson

Re: Open Chess Game Database Standard

Post by brianr »

phhnguyen wrote: Tue Dec 28, 2021 8:36 am We are talking about position-searching. I don't know yet if SCID (yes for SCIDvsPC) can do that task. Can you confirm?
Yes.
See Search under Current Position (exact and many options), and extensive pattern matching as well (see Material Search, Common Patterns)
Fulvio
Posts: 396
Joined: Fri Aug 12, 2016 8:43 pm

Re: Open Chess Game Database Standard

Post by Fulvio »

phhnguyen wrote: Tue Dec 28, 2021 8:36 am We are talking about position-searching. I don't know yet if SCID (yes for SCIDvsPC) can do that task. Can you confirm?
Yes, of course
phhnguyen wrote: Tue Dec 28, 2021 8:36 am BTW, I believe SQL itself is almost the strongest query language, it can cover almost all kinds of questions about data. Now we give it all bitboard data and functions (I will create some more functions time by time such as functions to create BB for a given rank, column...) thus it can cover almost all kinds of position-searching for chess. It should be equal to or wider coverage than any other query language.
SQL is simple, many people already knows it, and it is very nice to use if you want to search the game's tags.
But with CQL they went pretty crazy and you'll never achive all that with just SQL:
http://www.gadycosteff.com/cql/examples.html
"at least 3 non-pinned non-pawn pieces by one side that have no legal moves in a position."
"stalemate in which each square in the king's field is guarded once only"
or
"Find games with the longest sequence of consecutive forced moves".
I don't know how many users will ever need to search for things like that.

And even that doesn't cover everything, and if I'm interested in a very peculiar search honestly it's easier for me to write a script in Tcl or even in C++.
For example here I searched a 55+ million games database:
forum3/viewtopic.php?f=7&t=73663&p=840196#p840196
- for position where the castling flags are irrelevant
- for position where en-passant is not legal because the pawn is pinned
- for position where en-passant is not legal because the king is in check.

Can you do that with SQL?
How long does it take to search in such a large database that doesn't fit in RAM?
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Fulvio wrote: Tue Dec 28, 2021 8:13 am A database of 3 million games is pretty small and those results still seem very slow to me.
However, if there is one thing for sure, it is that you will never win in flexibility: SCID includes a full Tcl interpreter (for example:
forum3/viewtopic.php?f=2&t=73826&p=8423 ... pt#p842350)
brianr wrote: Tue Dec 28, 2021 9:52 am
phhnguyen wrote: Tue Dec 28, 2021 8:36 am We are talking about position-searching. I don't know yet if SCID (yes for SCIDvsPC) can do that task. Can you confirm?
Yes.
See Search under Current Position (exact and many options), and extensive pattern matching as well (see Material Search, Common Patterns)
I have read carefully the link. In that, you have mentioned exact-position-matching, not kind of approximate-position-searching as we were talking. In my view, exact-position-matching is just a small part of approximate-position-searching.

All my wondering/curious questions are still valid: can SCID (and SCIDvsPC) do directly approximate-position-searching to answer such below questions and how long does it take?
- find all games with 3 White Queens
- find all games with White Pawns in d4, e5, f4, g4, and Black King in b7
- find all games with two Black Rooks in middle squares (d4, d5, e4, e5)
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Fulvio wrote: Tue Dec 28, 2021 10:08 am
phhnguyen wrote: Tue Dec 28, 2021 8:36 am We are talking about position-searching. I don't know yet if SCID (yes for SCIDvsPC) can do that task. Can you confirm?
Yes, of course
phhnguyen wrote: Tue Dec 28, 2021 8:36 am BTW, I believe SQL itself is almost the strongest query language, it can cover almost all kinds of questions about data. Now we give it all bitboard data and functions (I will create some more functions time by time such as functions to create BB for a given rank, column...) thus it can cover almost all kinds of position-searching for chess. It should be equal to or wider coverage than any other query language.
SQL is simple, many people already knows it, and it is very nice to use if you want to search the game's tags.
But with CQL they went pretty crazy and you'll never achive all that with just SQL:
http://www.gadycosteff.com/cql/examples.html
"at least 3 non-pinned non-pawn pieces by one side that have no legal moves in a position."
"stalemate in which each square in the king's field is guarded once only"
or
"Find games with the longest sequence of consecutive forced moves".
I don't know how many users will ever need to search for things like that.

And even that doesn't cover everything, and if I'm interested in a very peculiar search honestly it's easier for me to write a script in Tcl or even in C++.
For example here I searched a 55+ million games database:
forum3/viewtopic.php?f=7&t=73663&p=840196#p840196
- for position where the castling flags are irrelevant
- for position where en-passant is not legal because the pawn is pinned
- for position where en-passant is not legal because the king is in check.

Can you do that with SQL?
How long does it take to search in such a large database that doesn't fit in RAM?
All the questions you mentioned required information from previous moves (history) - they are not pure-position-searching, SQL itself can't answer directly such those questions. However, if we add some code to check move history, it is not hard to solve them, especially we have a very strong search engine, and we can search/use with some precomputing information (bitboards).
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
brianr
Posts: 540
Joined: Thu Mar 09, 2006 3:01 pm
Full name: Brian Richardson

Re: Open Chess Game Database Standard

Post by brianr »

phhnguyen wrote: Tue Dec 28, 2021 10:25 am
Fulvio wrote: Tue Dec 28, 2021 8:13 am A database of 3 million games is pretty small and those results still seem very slow to me.
However, if there is one thing for sure, it is that you will never win in flexibility: SCID includes a full Tcl interpreter (for example:
forum3/viewtopic.php?f=2&t=73826&p=8423 ... pt#p842350)
brianr wrote: Tue Dec 28, 2021 9:52 am
phhnguyen wrote: Tue Dec 28, 2021 8:36 am We are talking about position-searching. I don't know yet if SCID (yes for SCIDvsPC) can do that task. Can you confirm?
Yes.
See Search under Current Position (exact and many options), and extensive pattern matching as well (see Material Search, Common Patterns)
I have read carefully the link. In that, you have mentioned exact-position-matching, not kind of approximate-position-searching as we were talking. In my view, exact-position-matching is just a small part of approximate-position-searching.

All my wondering/curious questions are still valid: can SCID (and SCIDvsPC) do directly approximate-position-searching to answer such below questions and how long does it take?
- find all games with 3 White Queens
- find all games with White Pawns in d4, e5, f4, g4, and Black King in b7
- find all games with two Black Rooks in middle squares (d4, d5, e4, e5)
Yes.
Look at the SCID Material Search and under the Common Patterns button there are some columns.
The 2nd row is the piece.
The 3rd is the file.
The 4th is the rank.
The plus can add more.

Even more complex searches can be built up with the and/or options.
I can't really answer how long it takes for massive numbers of games.
In all my searches with up to about a million games it is nearly instantaneous.
Fulvio
Posts: 396
Joined: Fri Aug 12, 2016 8:43 pm

Re: Open Chess Game Database Standard

Post by Fulvio »

phhnguyen wrote: Tue Dec 28, 2021 10:25 am All my wondering/curious questions are still valid: can SCID (and SCIDvsPC) do directly approximate-position-searching to answer such below questions and how long does it take?
- find all games with 3 White Queens
- find all games with White Pawns in d4, e5, f4, g4, and Black King in b7
- find all games with two Black Rooks in middle squares (d4, d5, e4, e5)
Yes, it would be a lot easier if you took a few minutes to try directly.

Finding the 57 games with 3 white queens takes 2.18 seconds.
(so you can understand that these enthusiastic statements of yours leave me quite puzzled if it takes over 24 minutes to get a wrong result, probably because you have counted the same games multiple times).

But you should search things that makes sense from a chess point of view.
For example, you may want to study how to play IQP positions.
In SCID you would opens the "Material Search" window and choose in "common patterns" "White IQP" (positions that have a white pawn in the d file and do not have white pawns in the c and e files).
Maybe you also want to restrict it to positions with all the standard pieces (both colors with 1 queen, 2 rooks, 2 bishops, 2 knights and an indefinite number of pawns).
The search takes 5.72 seconds and there are 212,122 games, of which white has won 83,641 and lost 66,509.

How long does that search takes with SQL and how many users would be able to write that SQL statement?
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

Post by dangi12012 »

I see the questions here are contiously more and more calibrated towards a very specific subset of problems that are better in proprietary binary databases like SCID.

If someone needs that specific subset for his/her own gui - its really simple to add a blob column anywhere you like and just cast that memory (after binding the blob) to a scid database.

So you can have scid inside sql - since sql is the more malluable solution its really easy to extend sql with binary formats.

Expect SQL to perform "well enough" for gui use. Especially because it doesnt have any hard limits like number of games, number of gamges, number of unique players etc.
Now it even supports positional queries.

So "which specific inaccuracy is usually played by which grandmasters" is normally hard to answer - but with this DB its a 2 minute sql query writing effort - and 2s runtime effort.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Chess Game Database Standard

Post by phhnguyen »

Fulvio wrote: Tue Dec 28, 2021 12:26 pm Yes, it would be a lot easier if you took a few minutes to try directly.
Sorry, I am not a SCID user, felt hard to figure out how your search works.
Fulvio wrote: Tue Dec 28, 2021 12:26 pm Finding the 57 games with 3 white queens takes 2.18 seconds.
(so you can understand that these enthusiastic statements of yours leave me quite puzzled if it takes over 24 minutes to get a wrong result, probably because you have counted the same games multiple times).
You are right, my bad. I have counted all results returned by the search without checking and trimming the repeated ones.
Fulvio wrote: Tue Dec 28, 2021 12:26 pm But you should search things that makes sense from a chess point of view.
For example, you may want to study how to play IQP positions.
In SCID you would opens the "Material Search" window and choose in "common patterns" "White IQP" (positions that have a white pawn in the d file and do not have white pawns in the c and e files).
Maybe you also want to restrict it to positions with all the standard pieces (both colors with 1 queen, 2 rooks, 2 bishops, 2 knights and an indefinite number of pawns).
The search takes 5.72 seconds and there are 212,122 games, of which white has won 83,641 and lost 66,509.
Could you post images of how to set up that dialog box to answer the below questions? I tried several times but was not successful, thanks:
- Find all games that have 3 White Queens
- Find all games that have two Black Rooks in d4, d5, e4, e5
Fulvio wrote: Tue Dec 28, 2021 12:26 pm How long does that search takes with SQL
Frankly speaking, at the moment our test app took much more time than yours. However, that is still in development!
Fulvio wrote: Tue Dec 28, 2021 12:26 pm how many users would be able to write that SQL statement?
That is an easy issue. Average users will use some dialog boxes like yours. Our code is just a basement/low part and other GUIs/tools should add upper parts to it. But I guess advanced users may prefer to use directly SQL statements.

Having more choices is better, isn't it? ;)
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager