Database snapshot

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
noobpwnftw
Posts: 360
Joined: Sun Nov 08, 2015 10:10 pm

Database snapshot

Post by noobpwnftw » Sat Jul 27, 2019 9:54 pm

For those who want to probe my database locally or for other unspecified reasons, here is a full database snapshot of my book project as of today:

ftp://ftp.chessdb.cn/pub/chessdb/data-s ... 190728.tar

The database contains about 3 billion unique chess positions, mostly connected to startpos, analyzed by Stockfish with no less than 22 plies at terminal node and has a very wide multi-pv exploration, the scores been back-propagated using a weighted averaging function, also for most of the positions there is a special field(encoded as 'a0a0') marking known shortest distance of the position from startpos.

Using this database snapshot is as simple as putting the data files under your database folder and launch the server, yet still, I'd recommend you to use the online API and make feature requests if you need any, since it is getting updated constantly and I have no plans to make such kind of snapshots very frequently(while waiting for a contributor to make incremental snapshots possible).

This database snapshot is released into the public domain.

Dann Corbit
Posts: 10203
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Database snapshot

Post by Dann Corbit » Sun Jul 28, 2019 1:10 am

noobpwnftw wrote:
Sat Jul 27, 2019 9:54 pm
For those who want to probe my database locally or for other unspecified reasons, here is a full database snapshot of my book project as of today:

ftp://ftp.chessdb.cn/pub/chessdb/data-s ... 190728.tar

The database contains about 3 billion unique chess positions, mostly connected to startpos, analyzed by Stockfish with no less than 22 plies at terminal node and has a very wide multi-pv exploration, the scores been back-propagated using a weighted averaging function, also for most of the positions there is a special field(encoded as 'a0a0') marking known shortest distance of the position from startpos.

Using this database snapshot is as simple as putting the data files under your database folder and launch the server, yet still, I'd recommend you to use the online API and make feature requests if you need any, since it is getting updated constantly and I have no plans to make such kind of snapshots very frequently(while waiting for a contributor to make incremental snapshots possible).

This database snapshot is released into the public domain.
Please leave it online for a while, i an on vacation and cannot download it right now
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

Ferdy
Posts: 4113
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: Database snapshot

Post by Ferdy » Sun Jul 28, 2019 2:31 am

Thanks for sharing.

I tried to probe from startpos with the following result.

Code: Select all

    Move   Score  Rank       Note  winrate%
0   e2e4  15 (8)     2  ! (20-04)     50.61
1   d2d4  15 (4)     2  ! (20-03)     50.30
2   g1f3  15 (2)     2  ! (20-04)     50.15
3   g2g3  10 (2)     2  ! (20-07)     50.15
4   c2c4  10 (2)     2  ! (20-04)     50.15
5   d2d3       0     1  * (20-12)     50.00
6   c2c3       0     1  * (20-08)     50.00
7   e2e3       0     1  * (20-10)     50.00
8   b2b3       0     1  * (20-10)     50.00
9   b1c3       0     1  * (20-04)     50.00
10  a2a3       0     1  * (20-09)     50.00
11  h2h3      -1     1  * (20-09)     49.92
12  f2f4      -4     0  ? (20-14)     49.70
13  a2a4      -5     0  ? (20-11)     49.62
14  b2b4      -6     0  ? (20-11)     49.55
15  g1h3     -41     0  ? (20-05)     46.90
16  b1a3     -51     0  ? (20-01)     46.14
17  h2h4     -57     0  ? (20-01)     45.69
18  f2f3     -82     0  ? (20-01)     43.82
19  g2g4    -103     0  ? (20-01)     42.26
In

Code: Select all

0   e2e4  15 (8)     2  ! (20-04)     50.61
What is (8)?
Why rank 2 and not rank 1?
What is (20-04)?

For other position there is no (value) under Score column.

Code: Select all

    Move  Score  Rank       Note  winrate%
0   e7g5    -31     2  ! (07-01)     47.65
1   h7h6    -52     0  ? (06-01)     46.07
2   e8g8    -68     0  ? (13-01)     44.87
3   c7c5    -77     0  ? (05-01)     44.19
4   b8c6    -78     0  ? (03-01)     44.12
5   a7a6    -89     0  ? (08-01)     43.30
6   c7c6   -107     0  ? (05-01)     41.96
7   f7f6   -121     0  ? (05-01)     40.93
8   b7b6   -123     0  ? (14-01)     40.79
9   d7b6   -126     0  ? (17-01)     40.57
10  d7f8   -161     0  ? (19-02)     38.04
11  g7g6   -170     0  ? (18-01)     37.40
12  f7f5   -208     0  ? (09-01)     34.74

noobpwnftw
Posts: 360
Joined: Sun Nov 08, 2015 10:10 pm

Re: Database snapshot

Post by noobpwnftw » Sun Jul 28, 2019 2:57 am

Code: Select all

e2e4  15 (8)     2  ! (20-04)     50.61
This reads:
<Notation of the move> <adjusted score>(<real score>) <rank> <rank mark> (<# of known reply moves>-<# of good reply moves>) <winrate>

For rank, 2 > 1 > 0 where rank=2 means it is a preferred move, rank=1 means it is a good alternative, rank=0 means it's a bad move(also when the position itself is bad).

Adjusted score only applies to startpos, mainly to normalize the above calculations.

Score has a range of +-10000, more than that it means a known mate score, with mated score at +-30000.

All these calculations are done at API front-end, the raw database just maps position keys to a set of moves which then maps to their eval score.

Position keys are binary-encoded FEN format with white-black symmetry(using the smaller one in their hex string form).

Ferdy
Posts: 4113
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: Database snapshot

Post by Ferdy » Sun Jul 28, 2019 3:53 am

noobpwnftw wrote:
Sun Jul 28, 2019 2:57 am

Code: Select all

e2e4  15 (8)     2  ! (20-04)     50.61
This reads:
<Notation of the move> <adjusted score>(<real score>) <rank> <rank mark> (<# of known reply moves>-<# of good reply moves>) <winrate>

For rank, 2 > 1 > 0 where rank=2 means it is a preferred move, rank=1 means it is a good alternative, rank=0 means it's a bad move(also when the position itself is bad).

Adjusted score only applies to startpos, mainly to normalize the above calculations.

Score has a range of +-10000, more than that it means a known mate score, with mated score at +-30000.

All these calculations are done at API front-end, the raw database just maps position keys to a set of moves which then maps to their eval score.

Position keys are binary-encoded FEN format with white-black symmetry(using the smaller one in their hex string form).
Thanks got it.

noobpwnftw
Posts: 360
Joined: Sun Nov 08, 2015 10:10 pm

Re: Database snapshot

Post by noobpwnftw » Sun Jul 28, 2019 3:58 am

Binary FEN encoding has the following format:

Code: Select all

<board unit>...<board unit><turn><special unit>...<special unit>
Where each board unit has a 8-bit value of:
0 = 1 empty space
1 = 2 empty spaces
2 = 3 empty spaces
3 = p
4 = n
5 = b
6 = r
7 = q
8 = unused to avoid ambiguity
9 = k
a = P
b = N
c = B
d = R
e = Q
f = K

Turn is a 1 bit flag of 0 = white, 1 = black.

Special unit representing castling and ep information has a 8-bit value of:
0 = none
1 = a
2 = b
3 = c
4 = d
5 = e
6 = f
7 = g
8 = h
9 = delimiter
a = K
b = Q
c = k
d = q
and the file of ep square is as-is of it's numeric value.

Then output is then tailing-zero trimmed to produce the final position key.

Internally, moves are encoded as 16-bit values:

Code: Select all

<4-bit src_rank><1-bit promotion flag><3-bit src_file><4-bit dst_rank><4-bit dst_file>
Where if promotion flag is set, dst_rank is redefined as:
0 = q
1 = r
2 = b
3 = n

noobpwnftw
Posts: 360
Joined: Sun Nov 08, 2015 10:10 pm

Re: Database snapshot

Post by noobpwnftw » Sun Jul 28, 2019 7:10 am

In board unit above, if there are more than 3 empty spaces, the first unit is set to 8 and the next unit is the number of empty spaces minus 4.
And correction: turn is a 8-bit flag, instead of 1.

User avatar
Rebel
Posts: 4788
Joined: Thu Aug 18, 2011 10:04 am

Re: Database snapshot

Post by Rebel » Sun Jul 28, 2019 8:04 am

noobpwnftw wrote:
Sat Jul 27, 2019 9:54 pm
For those who want to probe my database locally or for other unspecified reasons, here is a full database snapshot of my book project as of today:

ftp://ftp.chessdb.cn/pub/chessdb/data-s ... 190728.tar

The database contains about 3 billion unique chess positions, mostly connected to startpos, analyzed by Stockfish with no less than 22 plies at terminal node and has a very wide multi-pv exploration, the scores been back-propagated using a weighted averaging function, also for most of the positions there is a special field(encoded as 'a0a0') marking known shortest distance of the position from startpos.

Using this database snapshot is as simple as putting the data files under your database folder and launch the server, yet still, I'd recommend you to use the online API and make feature requests if you need any, since it is getting updated constantly and I have no plans to make such kind of snapshots very frequently(while waiting for a contributor to make incremental snapshots possible).

This database snapshot is released into the public domain.
By accident, can you offer those 3 billion in EPD with SF score and depth, or a util that converts your database to EPD?
90% of coding is debugging, the other 10% is writing bugs.

User avatar
Ovyron
Posts: 2828
Joined: Tue Jul 03, 2007 2:30 am

Re: Database snapshot

Post by Ovyron » Sun Jul 28, 2019 10:22 am

noobpwnftw wrote:
Sat Jul 27, 2019 9:54 pm
analyzed by Stockfish with no less than 22 plies at terminal node
Interesting, my private database uses depth 22 as well, looks like we found it to be optimal (depth 21 having considerably less quality, depth 23 being consirerably more slow) independently?
Ferdy wrote:
Sun Jul 28, 2019 2:31 am
I tried to probe from startpos with the following result.

Code: Select all

    Move   Score  Rank       Note  winrate%
0   e2e4  15 (8)     2  ! (20-04)     50.61
1   d2d4  15 (4)     2  ! (20-03)     50.30
2   g1f3  15 (2)     2  ! (20-04)     50.15
3   g2g3  10 (2)     2  ! (20-07)     50.15
4   c2c4  10 (2)     2  ! (20-04)     50.15
5   d2d3       0     1  * (20-12)     50.00
6   c2c3       0     1  * (20-08)     50.00
7   e2e3       0     1  * (20-10)     50.00
8   b2b3       0     1  * (20-10)     50.00
9   b1c3       0     1  * (20-04)     50.00
10  a2a3       0     1  * (20-09)     50.00
11  h2h3      -1     1  * (20-09)     49.92
12  f2f4      -4     0  ? (20-14)     49.70
13  a2a4      -5     0  ? (20-11)     49.62
14  b2b4      -6     0  ? (20-11)     49.55
15  g1h3     -41     0  ? (20-05)     46.90
16  b1a3     -51     0  ? (20-01)     46.14
17  h2h4     -57     0  ? (20-01)     45.69
18  f2f3     -82     0  ? (20-01)     43.82
19  g2g4    -103     0  ? (20-01)     42.26
Surprising to see scores that high. Mine has everything at 0.00 except for 1.d4 which is 0.03 (all white tries have been refuted to a 0.00 score otherwise).

...

Oh, three billion means your database is 1000 times larger than mine :shock:

I'd wish for a way to check it online (see https://www.365chess.com/opening.php for an example)

noobpwnftw
Posts: 360
Joined: Sun Nov 08, 2015 10:10 pm

Re: Database snapshot

Post by noobpwnftw » Sun Jul 28, 2019 2:09 pm

Ovyron wrote:
Sun Jul 28, 2019 10:22 am
noobpwnftw wrote:
Sat Jul 27, 2019 9:54 pm
analyzed by Stockfish with no less than 22 plies at terminal node
Interesting, my private database uses depth 22 as well, looks like we found it to be optimal (depth 21 having considerably less quality, depth 23 being consirerably more slow) independently?
Ferdy wrote:
Sun Jul 28, 2019 2:31 am
I tried to probe from startpos with the following result.

Code: Select all

    Move   Score  Rank       Note  winrate%
0   e2e4  15 (8)     2  ! (20-04)     50.61
1   d2d4  15 (4)     2  ! (20-03)     50.30
2   g1f3  15 (2)     2  ! (20-04)     50.15
3   g2g3  10 (2)     2  ! (20-07)     50.15
4   c2c4  10 (2)     2  ! (20-04)     50.15
5   d2d3       0     1  * (20-12)     50.00
6   c2c3       0     1  * (20-08)     50.00
7   e2e3       0     1  * (20-10)     50.00
8   b2b3       0     1  * (20-10)     50.00
9   b1c3       0     1  * (20-04)     50.00
10  a2a3       0     1  * (20-09)     50.00
11  h2h3      -1     1  * (20-09)     49.92
12  f2f4      -4     0  ? (20-14)     49.70
13  a2a4      -5     0  ? (20-11)     49.62
14  b2b4      -6     0  ? (20-11)     49.55
15  g1h3     -41     0  ? (20-05)     46.90
16  b1a3     -51     0  ? (20-01)     46.14
17  h2h4     -57     0  ? (20-01)     45.69
18  f2f3     -82     0  ? (20-01)     43.82
19  g2g4    -103     0  ? (20-01)     42.26
Surprising to see scores that high. Mine has everything at 0.00 except for 1.d4 which is 0.03 (all white tries have been refuted to a 0.00 score otherwise).

...

Oh, three billion means your database is 1000 times larger than mine :shock:

I'd wish for a way to check it online (see https://www.365chess.com/opening.php for an example)
Depth 22 seems to be a good balance between quality and speed.

I have applied penalties to a 0.00 score in back-propagation, maybe that caused it.

For a nice GUI like those I someone would look up the data from my API so that no reinventing wheels is needed.

Post Reply