"Perft" of an opening book

Discussion of chess software programming and technical issues.

Moderator: Ras

Robert Pope
Posts: 563
Joined: Sat Mar 25, 2006 8:27 pm
Location: USA
Full name: Robert Pope

"Perft" of an opening book

Post by Robert Pope »

I'm starting to work on a self-building opening book, and I want to roll my own instead of making a generic PolyGlot book. So I'm trying to get an idea ahead of time how many unique positions might reasonably be generated in a 8-move vs 10-move vs 15-move book, etc. , assuming only actual games get saved.

Is that a question that somebody with a large database of games can easily answer? Obviously there's no hard upper limit, but I'm guessing that for comp/comp games, there's a value that gets reached quickly by a million games, but grows much more slowly after that point.
User avatar
Ajedrecista
Posts: 2098
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: "Perft" of an opening book.

Post by Ajedrecista »

Hello Robert:
Robert Pope wrote: Mon Jul 10, 2023 2:07 am I'm starting to work on a self-building opening book, and I want to roll my own instead of making a generic PolyGlot book. So I'm trying to get an idea ahead of time how many unique positions might reasonably be generated in a 8-move vs 10-move vs 15-move book, etc. , assuming only actual games get saved.

Is that a question that somebody with a large database of games can easily answer? Obviously there's no hard upper limit, but I'm guessing that for comp/comp games, there's a value that gets reached quickly by a million games, but grows much more slowly after that point.
I will not answer your question exactly, but OEIS A083276 sequence is Number of distinct chess positions after n plies including differences due to availability and possibility of castling and en passant captures, that is, a kind of perft with unique positions, and could be helpful in your question. The highest known value nowadays is for 11 plies = 5.5 moves and that shall be understood as an upper bound to your question IMHO. The branching factor BF(n) = [a(n)]/[a(n-1)] is:

Code: Select all

 n          a(n)             BF(n)
------------------------------------
 1                  20     20.000000
 2                 400     20.000000
 3               5,362     13.405000
 4              72,078     13.442372
 5             822,518     11.411499
 6           9,417,681     11.449818
 7          96,400,068     10.236073
 8         988,187,354     10.250899
 9       9,183,421,888      9.293199
10      85,375,278,064      9.296674
11     726,155,461,002      8.505454
You can see the pattern: a(12) should be around 8.51 times a(11) at most (almost 6.18e+12), then slowly decreasing, so I expect that the branching factor of your problem will behave in a similar way.

Regards from Spain.

Ajedrecista.
Robert Pope
Posts: 563
Joined: Sat Mar 25, 2006 8:27 pm
Location: USA
Full name: Robert Pope

Re: "Perft" of an opening book

Post by Robert Pope »

Just to wrap up this thread, I built a 20 ply book using the "CCRL Training" dataset of 2 million games from lczero.org. For the first 800K games, I was adding on average 2.7 new positions per game. The next 800K added 1.95 positions/game, and the final 400K added 1.7 positions/game.

The growth in my opening book really didn't slow enough to be able to put a figure on a reasonable upper bound, but I can at least assume it will grow no faster than 1.7 additional positions/game on average (for a 20 ply book).