Fastest way to find games that match a position in a large database

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Peperoni
Posts: 72
Joined: Sun Nov 01, 2020 5:27 pm
Full name: Richard Porti

Fastest way to find games that match a position in a large database

Post by Peperoni »

Hello,

I was wondering what would be the best way to find a specific position in a large database (from a FEN for example).
I noticed that for example in Chessbase, it is not instant at all when you look for the games that match a specific position.
However, they have a statistics that show the percentages reached in the current position and that is instant.
I am also wondering why. (maybe they don't count positions reached from other moves combinations?)

Thanks,
User avatar
towforce
Posts: 11572
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK

Re: Fastest way to find games that match a position in a large database

Post by towforce »

Here's some "get you started" information: https://www.google.com/search?q=algorit ... ase+tables

In order to make database searches quick, databases are generally indexed. The next question then becomes, "What are the best indexes to generate for a database of chess positions?"

Some ideas (just off the top of my head):

* number of pieces
* presence of particular pieces
* players

etc.

If you're building your own database, you can index on your favourite position features - rooks on open files, or whatever it happens to be.

If you're a chess database company like Chessbase, you're probably going to spend a lot of time considering how chess players are going to want to search your database, and choose the indexes accordingly.

If you're only building a small database, you might consider just looking at every position and comparing against the query.

The possibilities are endless...
Writing is the antidote to confusion.
It's not "how smart you are", it's "how are you smart".
Your brain doesn't work the way you want, so train it!
User avatar
towforce
Posts: 11572
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK

Re: Fastest way to find games that match a position in a large database

Post by towforce »

Peperoni wrote: Mon May 31, 2021 4:22 pmI noticed that for example in Chessbase, it is not instant at all when you look for the games that match a specific position. However, they have a statistics that show the percentages reached in the current position and that is instant.

This is just a guess, but maybe the second question can be answered from the indexes alone, whereas to find the actual position, you'd first look in the indexes, and these might return thousands of possible matches, each of which would then have to be retrieved and looked at to see whether it's an exact match.
Writing is the antidote to confusion.
It's not "how smart you are", it's "how are you smart".
Your brain doesn't work the way you want, so train it!
Fulvio
Posts: 395
Joined: Fri Aug 12, 2016 8:43 pm

Re: Fastest way to find games that match a position in a large database

Post by Fulvio »

Peperoni wrote: Mon May 31, 2021 4:22 pm I was wondering what would be the best way to find a specific position in a large database (from a FEN for example).
I noticed that for example in Chessbase, it is not instant at all when you look for the games that match a specific position.
However, they have a statistics that show the percentages reached in the current position and that is instant.
I am also wondering why. (maybe they don't count positions reached from other moves combinations?)
The statistics are probably pre-calculated and stored into an hash table.
Let's take an extreme example. A database with 1 million games of just 1 move: e4 or d4.
The hash table will contain only 2 entries and would be very fast to search into it.
On the contrary to obtain the list of matching games all the 1 million games need to be searched.
User avatar
gbtami
Posts: 389
Joined: Wed Sep 26, 2012 1:29 pm
Location: Hungary

Re: Fastest way to find games that match a position in a large database

Post by gbtami »

Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: Fastest way to find games that match a position in a large database

Post by Pio »

towforce wrote: Mon May 31, 2021 4:59 pm Here's some "get you started" information: https://www.google.com/search?q=algorit ... ase+tables

In order to make database searches quick, databases are generally indexed. The next question then becomes, "What are the best indexes to generate for a database of chess positions?"

Some ideas (just off the top of my head):

* number of pieces
* presence of particular pieces
* players

etc.

If you're building your own database, you can index on your favourite position features - rooks on open files, or whatever it happens to be.

If you're a chess database company like Chessbase, you're probably going to spend a lot of time considering how chess players are going to want to search your database, and choose the indexes accordingly.

If you're only building a small database, you might consider just looking at every position and comparing against the query.

The possibilities are endless...
I gave an idea of how to make a fast database in the following thread http://talkchess.com/forum3/viewtopic.php?f=7&t=75698

My solution is compact and very fast (O(log n)) on both exact matches and partial matches.

If you also want to store games, you will have to add an additional table “games”, pointing to the “positions” table. I guess it would be smart making the “positions” table consist of a combined (game_id(bigint/identity/not null), position(see previous post for how to encode this)) as primary key and position also as foreign key to the games table. You will probably want to make it clustered on (position, game_id) since you want to get the different games fast for each position. The next problem will be to follow through a particular game without using too much space. One way of solving it is for a given position generate all the moves and generate new positions and for each position see if the game_id is there. This will also take O(log n) in time for each next continuation move in the game.

You will probably want to add additional information about the games in a third table, fourth … table where you store the opponents’ names, year of the match, where it took place and additional info.
User avatar
towforce
Posts: 11572
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK

Re: Fastest way to find games that match a position in a large database

Post by towforce »

Pio wrote: Tue Jun 01, 2021 1:07 amI gave an idea of how to make a fast database in the following thread http://talkchess.com/forum3/viewtopic.php?f=7&t=75698rth … table where you store the opponents’ names, year of the match, where it took place and additional info.

Interesting! Peperoni wants to write his own database from scratch. Nothing wrong with that, but I would personally use existing database software (link).
Writing is the antidote to confusion.
It's not "how smart you are", it's "how are you smart".
Your brain doesn't work the way you want, so train it!
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Fastest way to find games that match a position in a large database

Post by jdart »

ChessBase does have indexes, they called these "Search Boosters." See https://en.chessbase.com/post/chessbase ... -in-action.

Btw. it helps enormously to have your ChessBase databases on a SSD, preferably an M.2 SSD.
User avatar
towforce
Posts: 11572
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK

Re: Fastest way to find games that match a position in a large database

Post by towforce »

jdart wrote: Thu Jun 03, 2021 4:46 pm ChessBase does have indexes, they called these "Search Boosters." See https://en.chessbase.com/post/chessbase ... -in-action.

Btw. it helps enormously to have your ChessBase databases on a SSD, preferably an M.2 SSD.

I must remember to stop using the word "index" and instead use "search booster"! :lol:

Nice video, though, showing some of Chessbase's capabilities. There's no article text to speak of - just go straight to the article video below (6:46):

Writing is the antidote to confusion.
It's not "how smart you are", it's "how are you smart".
Your brain doesn't work the way you want, so train it!
RogerC
Posts: 41
Joined: Tue Oct 29, 2019 8:33 pm
Location: French Polynesia
Full name: Roger C.

Re: Fastest way to find games that match a position in a large database

Post by RogerC »

Peperoni wrote: Mon May 31, 2021 4:22 pm Hello,

I was wondering what would be the best way to find a specific position in a large database (from a FEN for example).
I noticed that for example in Chessbase, it is not instant at all when you look for the games that match a specific position.
However, they have a statistics that show the percentages reached in the current position and that is instant.
I am also wondering why. (maybe they don't count positions reached from other moves combinations?)

Thanks,
Hi,

Try this online database : clic on "input FEN", enter the FEN and "OK" : you will have the answer. More than 13 billions analysed positions sor far.
You won't have the games but you'll have the variants to play and their scores.

https://www.chessdb.cn/queryc_en/

Best regards.