Best way to store games

hgm · Post by **hgm** » Sat Nov 07, 2020 12:59 pm

Well, for games that do not start from the standard position the initial position (which in PGN would be in the FEN tag) would have to be stored. And yes, your method of encoding such a position would be a good choice for that. But is practice only a very small number of stored games will not start from the standard position. (And even in games where material odds are common practice, such as in Shogi, they just start from a dozen or so standard positions, which could be indicated by a small integer.) Even for Chess960 you could just use the number of the shuffle to indicate the start position. And there probably don't even exist enough puzzles to make a database of hundreds of millions of those. So it seems that in any practice the issue of how you encode arbitrary FEN tags will have negligible impact on size and speed.

Indexing is another matter. This makes searches very fast, but will require a huge amount of pre-computation to compile the index.

The OP indicated in a later posting that he doen't want to use existing software. I suppose that also applies to using standard general-purpose database programs.

Pio · Post by **Pio** » Sat Nov 07, 2020 3:22 pm

hgm wrote: ↑Sat Nov 07, 2020 12:59 pm Well, for games that do not start from the standard position the initial position (which in PGN would be in the FEN tag) would have to be stored. And yes, your method of encoding such a position would be a good choice for that. But is practice only a very small number of stored games will not start from the standard position. (And even in games where material odds are common practice, such as in Shogi, they just start from a dozen or so standard positions, which could be indicated by a small integer.) Even for Chess960 you could just use the number of the shuffle to indicate the start position. And there probably don't even exist enough puzzles to make a database of hundreds of millions of those. So it seems that in any practice the issue of how you encode arbitrary FEN tags will have negligible impact on size and speed.

Indexing is another matter. This makes searches very fast, but will require a huge amount of pre-computation to compile the index. For exact positions matches you can simply sort lists of game numberss by a hash key indicating the position. For 'given position + arbitrary extra pieces' I doubt that indexing would be feasible; there are just too many possibilities for the query.

The OP indicated in a later posting that he doen't want to use existing software. I suppose that also applies to using standard general-purpose database programs.

If we speak of indexing it does not require a huge amount of pre-computation at all. It takes however lots of memory especially if the primary key column is big. I suggested using the occupancy bitboard as a clustered index of a table and that does not take any additional space whatsoever. It just puts the rows in order physically in the table. That means you can only have one clustered index per table. Using the clustered index on the occupancy bitboard (or as a combination of the occupancy bitboard and other info) is very good since it will put similar positions very close in memory and you will be able to fetch them all instantaneously.

hgm · Post by **hgm** » Sat Nov 07, 2020 4:23 pm

This assumes that the occupancy bitboard for the sought position is known. But this might not be the case in every search mode. E.g. in XBoard's "match if position is subset" any unoccupied square in the sought position could have been occupied as well.

Dann Corbit · Post by **Dann Corbit** » Sat Nov 07, 2020 6:38 pm

Pio wrote: ↑Sat Nov 07, 2020 8:40 am
Dann Corbit wrote: ↑Sat Nov 07, 2020 4:26 am
Peperoni wrote: ↑Fri Nov 06, 2020 11:01 pm Wow that's brilliant
That technique was invented by an Amiga chess program in the 1980's
Had no idea about it. Can you please send a link. Thanks!

Go here:
http://mirrors.xmission.com/aminet/game/board/
source code is in this file:
pigb4src.lha
The database was called pigbase.

Fulvio · Post by **Fulvio** » Mon Nov 09, 2020 1:00 pm

hgm wrote: ↑Fri Nov 06, 2020 11:23 pm

Code: Select all

pieceNr = pieces[moveCode];
step = steps[moveCode];
fromSqr = location[pieceNr];
toSqr = fromSqr + step & 0x77;
board[toSqr] = board[fromSqr];
board[fromSqr] = 0;
location[pieceNr] = toSqr;

This is similar to how the moves are stored in SCID:
https://github.com/benini/scid/blob/git ... buf.h#L298

In a previous message you suggested memorizing the type of the piece as well (for example this is the 2nd knight).
If it is possible to store in just a byte the moves for the king, the 8 pawns and 2 of each other type (including queens) that would be very interesting.
The from square could then be found with something like this:

Code: Select all

fromSqr = 0;
for (; fromSqr < 64; ++fromSqr) {
	if (board[fromSqr] == pieceType && --pieceNr <= 0) break;
}

Is it just a very interesting idea or is there some program that has tried to implement it?

Peperoni · Post by **Peperoni** » Fri Dec 04, 2020 1:14 pm

Hello,

I implemented the method given by H.G Muller and it works like a charm.
Now I am thinking about how to save the headers in a compact way.

Do you have any ideas?

Thanks

Bill Forster · Post by **Bill Forster** » Thu Dec 10, 2020 4:47 am

I come to this a little late unfortunately. I agonised over very similar issues for months, eventually refining very effective solutions. I wrote about the development of my thinking in two long blog posts. In the first one https://triplehappy.wordpress.com/2015/ ... mpression/ I described a one byte per move encoding scheme and in the second one https://triplehappy.wordpress.com/2016/ ... mentation/ I described fast position search through millions of in memory games stored using the one byte per move encoding.

It's not difficult to encode chess moves in a single byte (or even less), but the scheme I describe does have some nice properties that make it ideal for this application. Firstly it is very performant, using this scheme your program can speed through games almost as quickly as it would with a well implemented conventional source + destination encoding requiring 16 bits or more. Basically making each move involves just a handful of instructions, without many branches and essentially no looping. It is a strictly one byte per move format, every legal move in a legal chess position corresponds to a single unique one byte code. Like all one byte schemes, there is a slow mode (based on indexing into a legal move list), but there is no need for escape codes. The encoder/decoder smoothly selects slow mode whenever the side to move has more than two knights, or more than two rooks, or more than one bishop of the same colour, or more than two queens, or one queen and seven pawns. I experimentally determined that such positions constitute around 0.0004% of positions in practical games, and because slow mode is not *that* slow, it doesn't impact practical performance at all.

To search for positions I originally implemented a conventional database approach, with indexed SQL(lite) database tables of position hashes, much as you described it. Once I had my fast one byte per move game representation I abandoned the SQL approach though, because I found just brute force stepping through the games, even millions of games, was much more practical. It was almost as fast, and used much (much) less storage, and didn't involve a (very) slow indexing phase when building the database.

To speed the search process it's important to abandon games early once it is impossible for them to reach the search position, and I describe some tricks I use to achieve this. I have some more tricks in reserve for future even faster implementations, eg store 8 or 16 bit "index" flags for each game, each flag is associated with a commonly occurring piece/square combination, eg White pawn on e4. The WPe4 flag is set for a game if a white pawn ever occupies e4. This means the game doesn't need to be searched if the search position happens to have a White pawn on e4. In a decent proportion of searches this should dramatically reduce the number of games searched dramatically speeding the process. Simply using multiple cores to search in parallel is also available as speedup technique.

In any case, my existing solution is already very responsive and snappy, it's particularly fast at searching for positions early in games, which is a good match for the conventional database features that are implemented in Tarrasch Chess GUI my very practical, easy to use, free and open source chess GUI/database. The 91 Mb Windows setup.exe download has a modest but very useful 1.2 Million modern master game database already setup and ready to go less than a minute after you've downloaded the package. And all the source code is open on Github. Excuse the self promotion!

Best way to store games

Re: Best way to store games

Re: Best way to store games

Re: Best way to store games

Re: Best way to store games

Re: Best way to store games

Re: Best way to store games

Re: Best way to store games