I noticed that I have forgotten two limits:
8) The variations/comments/NAGs counts are approximations, stored in only 4-bits and mapped to only 16 values: 0,1,2,3,4,5,6,7,8,9,10,15,20,30,40,50.
( https://sourceforge.net/p/scid/code/ci/ ... try.h#l293 )
The real value is rounded to the nearest available.
For example:
16 is rounded to 15
12 is rounded to 10
26 to 30
9) The half moves count is stored in 10-bits, so with a max value of 1023.
SCID4 database
Moderator: Ras
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: SCID4 database
You are wrong also. Test it please if you dont believe it...R. Tomasi wrote: ↑Wed Oct 13, 2021 4:41 pmhttps://medium.com/@JasonWyatt/squeezin ... e175f3c346dangi12012 wrote: ↑Wed Oct 13, 2021 4:30 pm100% Wrong.Rein Halbersma wrote: ↑Wed Oct 13, 2021 2:04 pmLookup on indices is O(log N), not O(1).dangi12012 wrote: ↑Mon Oct 11, 2021 10:38 pm As I told you - with SQL you get O(1) lookup performance on primary keys. You dont have to take it from me - literally anyone working in IT industry with some SQL experience can verify.
https://www.geeksforgeeks.org/sql-query-complexity/
O(log(N)) is for a binary search. Which you would have to do for a primary STRING key.
But not nor a primary INTEGER key. Which is a hashtable lookup like a C++ map. Its O(1).
Please dont invent a binary database format. It will be slower - debugging will take forever and at the end of the day you dont have a standard like sql everyone can rely on - and have experience with. Im not even starting with data integrity even during unexpected crashes... Thats what databases do for you
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 307
- Joined: Wed Sep 01, 2021 4:08 pm
- Location: Germany
- Full name: Roland Tomasi
Re: SCID4 database
You suggest encoding positions as integers?dangi12012 wrote: ↑Wed Oct 13, 2021 7:05 pmYou are wrong also. Test it please if you dont believe it...R. Tomasi wrote: ↑Wed Oct 13, 2021 4:41 pmhttps://medium.com/@JasonWyatt/squeezin ... e175f3c346dangi12012 wrote: ↑Wed Oct 13, 2021 4:30 pm100% Wrong.Rein Halbersma wrote: ↑Wed Oct 13, 2021 2:04 pmLookup on indices is O(log N), not O(1).dangi12012 wrote: ↑Mon Oct 11, 2021 10:38 pm As I told you - with SQL you get O(1) lookup performance on primary keys. You dont have to take it from me - literally anyone working in IT industry with some SQL experience can verify.
https://www.geeksforgeeks.org/sql-query-complexity/
O(log(N)) is for a binary search. Which you would have to do for a primary STRING key.
But not nor a primary INTEGER key. Which is a hashtable lookup like a C++ map. Its O(1).

Thank's for the link, though. Didn't know integral keys were O(1)!
-
- Posts: 396
- Joined: Fri Aug 12, 2016 8:43 pm
Re: SCID4 database
Directly from the sqlite website:dangi12012 wrote: ↑Wed Oct 13, 2021 7:05 pm You are wrong also. Test it please if you dont believe it...
https://www.geeksforgeeks.org/sql-query-complexity/
O(log(N)) is for a binary search. Which you would have to do for a primary STRING key.
But not nor a primary INTEGER key. Which is a hashtable lookup like a C++ map. Its O(1).
Please dont invent a binary database format. It will be slower - debugging will take forever and at the end of the day you dont have a standard like sql everyone can rely on - and have experience with. Im not even starting with data integrity even during unexpected crashes... Thats what databases do for you
"One technique for avoiding a full table scan is to do lookups by rowid (or by the equivalent INTEGER PRIMARY KEY)."
"Since the information is stored in the table in rowid order, SQLite can find the correct row using a binary search. If the table contains N elements, the time required to look up the desired row is proportional to logN rather than being proportional to N as in a full table scan."
https://www.sqlite.org/queryplanner.html
Now, your posts are useless, offtopic and full of unbacked claims.
So, can you please stop trolling? (and for the others please do not feed the troll)
Didn't you already wrote: "I wont post here anymore".
So, please, stick to that.
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: SCID4 database
Excatly OP wants good performance. You store the zobrist hash as Integer primary key - and a zobrist with a different seed to eliminate any and all collisions. This gives you O(1)R. Tomasi wrote: ↑Wed Oct 13, 2021 7:21 pmYou suggest encoding positions as integers?dangi12012 wrote: ↑Wed Oct 13, 2021 7:05 pmYou are wrong also. Test it please if you dont believe it...R. Tomasi wrote: ↑Wed Oct 13, 2021 4:41 pmhttps://medium.com/@JasonWyatt/squeezin ... e175f3c346dangi12012 wrote: ↑Wed Oct 13, 2021 4:30 pm100% Wrong.Rein Halbersma wrote: ↑Wed Oct 13, 2021 2:04 pmLookup on indices is O(log N), not O(1).dangi12012 wrote: ↑Mon Oct 11, 2021 10:38 pm As I told you - with SQL you get O(1) lookup performance on primary keys. You dont have to take it from me - literally anyone working in IT industry with some SQL experience can verify.
https://www.geeksforgeeks.org/sql-query-complexity/
O(log(N)) is for a binary search. Which you would have to do for a primary STRING key.
But not nor a primary INTEGER key. Which is a hashtable lookup like a C++ map. Its O(1).![]()
Thank's for the link, though. Didn't know integral keys were O(1)!
I know it works because I used it in my last years project to have a lookup for any and all positions from 2013 - 2020 that have ever been played on lichess. Including SF rating - outcome and all usernames.
Everything at once doesnt fit but up to 120GB or so sqlite is blazingly fast to lookup any position + how often it was reached..
I made my point clear that reinventing a custom DB will be a painful slow process and at the end of the day data integrity and features like deduplication and compression are much much harder to implement not to mention the advanced B tree implentation you need when looking stuff up.
What you will end up will be slower - error prone and no developer on the world can maintain it for you.
Maybe I will release my parser - since it transforms any of these .pgn.bz2 into batched highly compatible sql insert statements.
https://database.lichess.org/
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: SCID4 database
No; he suggested to just save the Zobrist of each position in each game. An 80 move game will have 160 positions. An 8 million game database will have 160 * 8.000.000 = 1.28 billion position rows. Storing the game_id and the zobrist_key (both 64-bit, or 16 bytes total) will come down to 20.48 billion bytes for this table, or about 19 GB if I'm not miscounting.
Then I haven't mentioned relations, or other information needed. That's the reason why chess databases mostly use binary formats. I think storing 8 million games in an SQL database will work fine, and it'll probably be fast enough for most tasks, but I'm not sure if it will be usable for a database containing millions of games. Chessbase databases (also a binary format) are definitely much smaller.
So it could be that performance is fine, but the space requirement will be comparatively huge.
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: SCID4 database
Your math is totally off. Its not about how many total positions but DISTINCT positions thats why we have a zobrist hash in the first place. 8 Million games dont need much storage at all. A few 100 MB. You dont store every position of every game. You store every distinct position once.mvanthoor wrote: ↑Thu Oct 14, 2021 1:50 pmNo; he suggested to just save the Zobrist of each position in each game. An 80 move game will have 160 positions. An 8 million game database will have 160 * 8.000.000 = 1.28 billion position rows. Storing the game_id and the zobrist_key (both 64-bit, or 16 bytes total) will come down to 20.48 billion bytes for this table, or about 19 GB if I'm not miscounting.
Then I haven't mentioned relations, or other information needed. That's the reason why chess databases mostly use binary formats. I think storing 8 million games in an SQL database will work fine, and it'll probably be fast enough for most tasks, but I'm not sure if it will be usable for a database containing millions of games. Chessbase databases (also a binary format) are definitely much smaller.
So it could be that performance is fine, but the space requirement will be comparatively huge.
For the OP: Lookup the lichess sourcecode its open source.
For a good binary implementation read the github for Syzygy
"The number of unique legal 7-piece positions is 423,836,835,667,331. Syzygy tablebases store all their information in 18.4 TB, so at around
0.35 bits per position. This is much more compact than the proprietary 100 TB Lomonosov tablebases."
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: SCID4 database
But you do have to store or reference a collection of those positions to know which of them occured in a game. That will still take a fair amount of storage. Or, you'll have to play through the game, and then look into that position table for every move, which will not be very fast if you have 8 million games to go.dangi12012 wrote: ↑Thu Oct 14, 2021 3:32 pm Your math is totally off. Its not about how many total positions but DISTINCT positions thats why we have a zobrist hash in the first place. 8 Million games dont need much storage at all. A few 100 MB. You dont store every position of every game. You store every distinct position once.
Don't get me wrong; in the future I want to look into writing my own user interface (to use with my own engine), and I'd rather use an SQL-database than my own binary format, or even SCID's format, because I already hit some of SCID's limits when testing it. But I'm still not convinced that SQL is the best choice.
-
- Posts: 96
- Joined: Fri Jul 06, 2018 1:09 am
- Location: Chicago, IL
- Full name: Josh Odom
Re: SCID4 database
As far as I'm concerned, a custom solution can always be made to outperform a generic one, and this is a case where performance matters a lot.
Instead of trying to make an open standard that works for everyone, I think it's more practical to make a very permissive standalone library which can then be used in other projects. That should lower its barrier of use (without sacrificing performance), which I think is the most important concern.
Instead of trying to make an open standard that works for everyone, I think it's more practical to make a very permissive standalone library which can then be used in other projects. That should lower its barrier of use (without sacrificing performance), which I think is the most important concern.
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: SCID4 database
Endgame Tablebases in SQL? Hell no. You are correct there.odomobo wrote: ↑Thu Oct 14, 2021 6:18 pm As far as I'm concerned, a custom solution can always be made to outperform a generic one, and this is a case where performance matters a lot.
Instead of trying to make an open standard that works for everyone, I think it's more practical to make a very permissive standalone library which can then be used in other projects. That should lower its barrier of use (without sacrificing performance), which I think is the most important concern.
OP asked about "similar performance to how lichess can lookup a position" for 8 Million games. This is a perfect job for a sql database. Not only is it fast enough for user interaction - but if you think about a new statistic you want evaluated you can write a query in 1 minute. ie. what is the most common opening etc.
Writing a new query that is efficient in a custom binary format - requires at least expert knowledge about the layout etc - and extra code just for a simple question.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer