Open Opening Book Standard (OOBS)

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
phhnguyen
Posts: 1525
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Open Opening Book Standard (OOBS)

Post by phhnguyen »

Open Opening Book Standard (OOBS)

We have released the first draft version of the Open Opening Book Standard (OOBS). In other words, it is a new format for chess opening book files.
The main ideas:
- use SQLite to store data and search
- all data is clear, meaningful, and full

In detail, we store FEN strings, and numbers of win-draw-loss of games with those FENs.

Those books can be viewed and edited directly by any SQLite browser such as SQLite Studio, SQLite Browser. Below is a screenshot:


Image
View, edit by SQLiteStudio


The new format is good for playing as well as studying, modifying, and editing. Users may use any above SQLite browsers to work with those books without the need for any specific chess tool.

The project itself is an opening book tool. At the moment it could:
- Create opening books in OBS standard (.obs.db3), Polyglot (.bin). The input are PGN (.pgn) and/or OCGDB (.ocgdb.db3) files
- Query for data

More functions will be added later.

The link to the project is below. In the sample folder is an opening book book-mb3.54.obs.db3 of 30 million positions, created from MillionBase database (by Ed Schröder) of 3.45 million games.

https://github.com/nguyenpham/oobs

The project was inspired by project OCGDB. Some code is taken from that project too thus it is verified, and the tool has very high performance.

That is a working project. Many things are not fixed yet and may be changed later. Some research and studies about chess opening in general and this format will be published.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
User avatar
phhnguyen
Posts: 1525
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Opening Book Standard (OOBS)

Post by phhnguyen »

Open Opening Book Standard (OOBS)

Brief of main ideas/techniques
- Use SQL/SQLite as the backbone/framework for storing data and querying information
- All information is clear, meaningful, in full, understandable for both humans and machine


Why OOBS? Features/Highlights
- Open databases: users could easily understand data structures, modify, convert to, or from other database formats
- It is based on SQL - the strongest query language for querying information. Users can work without using chess specific programs
- It could be used for all opening purposes, from tournaments to normal games 
- All information is clear, full meaning, and understandable. It doesn't have "magic" numbers such as hash key numbers (which are no meaning for humans or even for computers when they stand-alone). It uses FEN strings instead
- MIT license: you may use it for any applications/purposes unlimitedly without worrying about license conditions



Overview
Almost all chess opening books are in text or binary formats. 

1. Text

The most popular are PGN and EDP. Their strong points are human-readable, simple, editable with any normal text editor without the need of using specific software, and workable with almost all chess GUIs and tools. The common problems of those formats are slow to process, they require more computing/power to retrieve information, are large sizes, contain errors, have heavy redundancy, and missing vital information for book tasks. 

Basically, they aren’t designed and don’t have any specific structures/information for opening book purposes at all such as weights for moves. As the consequence, they are used mostly as small books with some simple uses such as for tournaments (when each match just needs a given position to start from). They almost can’t be used to play move by move because of lacking weights and being slow matching.


2. Binary
Those formats have some advantages such as being fast, compact, and having full opening book features. However, they also have some drawbacks:
 
- Hard to understand: each format requires a long list of descriptions and explanations why this but not that
- Hard to change structures: adding, removing, changing data types... usually hard and headache tasks, require change seriously both data and code
- Hard to support by other apps: other programs may get many problems to support since they may use different programming languages and data structures… In the case of using hash keys, it is not easy for an app to create hash keys for other ones
- Hard to support by web apps: processing large binary files is not a strong point of scripting languages/web apps
- License compatibility: code of some binary formats are from GPL to more restricted, thus some programs may get hard to adapt or even can't support 

Take a look at the Polyglot format closely as an example. It is quite popular, a kind of de-facto standard one for chess. However, it contains some drawbacks as below:

- Hash key: For each position, the book stored a hash key as a representation. Hash keys are actually "magic" numbers. They look random and have no meaning when standing alone. We almost can’t convert it back into a chess position. It is also a big challenge for other programs to create hash keys which are matched exactly the ones of the original Polyglot program. Furthermore, even though we have accepted Polyglot hash keys as a de factor standard ones for ordinary chess, we may get the chaos to apply to other chess variants since they don’t have any standard for their hash keys.
- Weight/score: when creating books, Polyglot calculates a weight for each move from numbers involving win/draw/loss games (typically the formula is weight = 2 wins + draw). All weights were then scaled globally down to fit numbers of 16 bits. Those numbers are good for comparison (to find out which moves are better than other ones). However, there are some problems: 1) the calculation is fixed. The default formula ignores completely losing numbers. Some users may love to select moves by win/loss rates instead 2) one way: we can’t get back numbers of win-draw-loss which are very useful for many purposes 3) inconsistent for merging, add games to an existing book: all weights of a book may be scaled automatically. However, the scale number is not stored anywhere. Thus new scores from merging, adding games can’t be scaled similar to the original ones, thus the resulting book may work inconsistently or even wrongly

The advantages of being binary formats were very important in the past when computers were weak and Internet speed was slow. However, they become less important nowadays since computers become much stronger and Internet is much faster. That has been making other formats become more attractive.

Thus we think we need a new format. Drawing requirements:
- Easier to understand data structures. It is best if everything  could describe themselves 
- Easy to support by different languages, apps, web apps
- Fast enough to use directly
- Easy to support other chess variants
- Almost free for all (MIT/or less restriction license)


SQL/SQLite

We have picked up SQL/SQLite as the main tools/direction to develop the new opening book format. We have known it has some reason drawbacks such as slow and large on size.

However, it has some advantages:
- Data could be displayed in text forms, readable for both humans and machines
- Easy to alternate structures. Adding, removing, changing fields are so simple
- Easy to understand structures. They all are almost described themselves 
- Easy to write tools, converters
- Users can query directly
- Supported by a lot of tools
- SQL: is a very strong and flexible way to make queries
- Come with many strong, mature SQL engines and libraries


1. FEN strings
Instead of using "magic" keys such as hash keys, we use FEN strings to represent/store chess positions.

2. Win-Draw-Loss numbers
We store directly those numbers (win-draw-loss) with the position. Whenever a program wants a score (say, to compare) it can compute on the fly from those numbers. Storing those numbers can prevent losing information. That is very useful for studying as well as updating and merging later.

3. Comments
Comments by users

4. Other fields
Polyglot has a field called “learn”. It is very rare to be used even in every position must-have. Thus we don’t mention it or any other field. They are not compulsory. However, users are totally free to add or remove them. Because of using SQL database, adding, removing, and having extra fields are simple and won’t affect other fields. We don’t have to worry about compatibility between them.


One or two tables?
Basically, we need to store FEN strings and their belonging information such as moves and the win rate (Win-Draw-Loss) of those moves. That is the one-to-many relationship between FEN and move-win rate. To save some space as well as have a "perfect" design, those FEN strings should be stored in a separate table, say FEN table, while the move-win rate should be stored in another one, say, the MoveRate table and they will have a relationship (via their ID as FOREIGN KEY).

However, after studying and trying, we design to store all information in one table (name Book) only. The database is a bit larger but not too much (5-10%) compared with having two tables. The gain is that the design is simpler. More important, users can study, and edit data much easier.


Book Table
It is the main table. It should have some fields: ID, FEN, Move, Active, Win, Draw, Loss

1. ID
Just a normal ID number

2. FEN field
A FEN string of each chess position could be stored (FEN strings) in that field. However, a given position could have different FEN strings because they may differ in halfmove clock and fullmove number. Thus for comparing and searching, all FEN strings should be set with halfmove clock as 0 and fullmove number as 1. It means all FEN strings must be ended with "0 1".

3. Move
Move which could be made from the board of the FEN string. The move is in the coordinate format.

4. Active
This branch/move may be used only if it is active. Basically, we will use only 1 bit of that number. Other bits may be reserved for later use

5. Win, Draw, Loss
Store numbers of won (1-0), draw (0.5-0.5), loss (0-1) games which were used to create the book

Info Table
Store some extra information:

1. Variant
Store chess variant information.

2. ItemCount
Store number of records in Book table. It is useful for quickly retracting that number since querying by the statement "SELECT COUNT" could take a while for a huge database


Abbreviation
We use both "OOBS" or "OBS".


File name extension .obs.db3

It should keep the normal extension of a database. For example, the one for SQLite should have the extension .db3. It should have upper/extra extension .obs to distinguish it from other databases.
For examples of file names:

Code: Select all

  book-mb345.obs.db3
  book-lichess-elite.obs.db3
Sample databases
There is a sample opening book in the samples folder:
- book-mb3.45.obs.db3. It is created by games from the MillionBase database (by Ed Schröder) of 3.45 million games. Games are limited with at least 40-ply length, Elo >= 2000, repeat at least twice, take to ply 30, hit from 3 games.

In my computer (3.6 GHz Quad-Core i7, 16 GB RAM from 2017), the program took about 14 minutes to convert from a database in SQLite format (ocgdb) to the new book. The book contains about 30 million positions.

You may open it with any SQLite browsers/tools and make some queries to understand its structures, speed, advantages, and disadvantages.


License
MIT: Almost totally free (requires just fair use). All documents, codes, and data samples in this project are ready and free to integrate into other apps without worrying about license compatibility.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
mar
Posts: 2667
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Open Opening Book Standard (OOBS)

Post by mar »

while this is commendable, how does it compare with say polyglot?
it's 16 bytes/entry so a SQL database will be likely much bigger with FENs
also what about the query performance? polyglot books have to use a binary search, but it's still very compact

in theory, using FENs avoids collisions, but in reality, it's a lot of data.
also - how to you canonicalize the FEN coming from an engine?
speaking of variants, even Chess960 uses either X-FEN or S-FEN, you also have to filter out invalid FENs, like invalid EP or castling rights

also implementing polyglot really is trivial (a pity we can't generate the hash keys though) - you propose to depend on SQLITE, which is awesome
but it's also order(s) of magnitude more code than a typical chess engine - seems like a total overkill to me

also the choice of C++ severely limits the use of OOBS in other languages, so C would be a much better choice I think (can interface with prettymuch anything)

on the other hand, SQLITE should allow for easier book manipulation than polyglot, but I really don't see how this is relevant to chess engines - an why any engine author would want to switch from polyglot (or some proprietary binary format) to what you propose, sorry...
User avatar
phhnguyen
Posts: 1525
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Opening Book Standard (OOBS)

Post by phhnguyen »

mar wrote: Tue May 03, 2022 8:40 pm while this is commendable, how does it compare with say polyglot?
it's 16 bytes/entry so a SQL database will be likely much bigger with FENs
also what about the query performance? polyglot books have to use a binary search, but it's still very compact

in theory, using FENs avoids collisions, but in reality, it's a lot of data.
also - how to you canonicalize the FEN coming from an engine?
speaking of variants, even Chess960 uses either X-FEN or S-FEN, you also have to filter out invalid FENs, like invalid EP or castling rights

also implementing polyglot really is trivial (a pity we can't generate the hash keys though) - you propose to depend on SQLITE, which is awesome
but it's also order(s) of magnitude more code than a typical chess engine - seems like a total overkill to me

also the choice of C++ severely limits the use of OOBS in other languages, so C would be a much better choice I think (can interface with prettymuch anything)

on the other hand, SQLITE should allow for easier book manipulation than polyglot, but I really don't see how this is relevant to chess engines - an why any engine author would want to switch from polyglot (or some proprietary binary format) to what you propose, sorry...

Below is a quick comparison between OOBS and Polyglot books. I think the comparison is still correct with other formats too.
  • Size: For the same amount of data, an OOBS book is about 3 times as large as a Polyglot book. However, we believe that extra size doesn’t matter nowadays
  • Speed: I don’t have any speed benchmark of OOBS at the moment but they should be significantly slower than Polyglot ones. However, they can still reply instantly for any data querying, good enough for any task of using the opening book, even on very old hardware. For more understanding, the fact that we can use online books (such as ones from chessdb.cn or Lichess) to play too. Online books are always so slow, high latency but we could run them for playing. The speed of a local book is always much much quicker, enough for playing
  • Programming languages: SQLite is a C library, thus developers won’t have any problem integrating it (as well as OOBS) into their C programs. Furthermore, SQLite is a mature and very popular library, supporting almost all programming languages from Java, Python, Swift..., and could run in almost all systems thus integrating should not be an issue. The idea and the data structure of OOBS are so simple too. Just create a few FEN strings and that is enough to query books. Developers may not use our code at all and they may develop themselves the code to extract, and modify data from books. Thus, developers should not have any problem integrating OOBS
From my own experience, supporting directly Polyglot books is a nightmare for developers: they have to create similar hash keys. Someone has to change their code, and some others have to write some internal converters. Then they need to read data directly from the file and preprocess it first. Polyglot uses data encoded as Little Endian which differs from our data in typical OSs/systems (we use Big Endian), we need an extra step to convert first. Then we need to understand and convert some “magic” numbers into chess moves. Sometimes we need to sort data as we want. All will take a lot of effort and that is a buggy process.

In contrast, for OOBS, a developer just needs to find a suitable SQLite library and integrate it into his code, then write some lines of code to query and extract data. All are done. There are so few chances to create bugs. He even doesn’t need to calculate scores or sort the data since all could be SQL statements, SQL engines can do those jobs before returning results.

BTW, I am rare to see an engine that can work directly with opening books. Typically, that is a work of chess GUIs/tools. For almost all of us, opening books are a “black box”, users don’t really care about their formats, or how to extract data. They concern only if their chess GUIs/tools could support the books they have. If a new book could run, all are done! Frankly speaking, all popular binary book formats such as Polyglot, CTG, ABK, and OOBS could hold all basic opening information. Thus they all could fulfill tasks, and be total enough for average users.

However, for advanced users, there are some reasons to use OOBS:
  • Flexible ways to calculate weights/scores: Polyglot used a fixed/hard code formula to calculate scores from game results (from win/draw/loss, the formula is score = 2win + draw) and we can’t change the formula later. OOBS can calculate scores on-the-fly thus users can change to what/when they want. For example, someone may prefer other formulas such as score = 3win + draw. IMHO, totally ignoring numbers of losing is a flaw. We should count them, say, by formulas such as score = 3win + draw - 2loss, or score = (2win + draw)/loss
  • More information: Polyglot keeps only weights. Those numbers could be used for comparison but themselves are meaningless. OOBS keeps all numbers of win/draw/loss. Users can extra many information, conclusions such as how “busy” a move is, its win rate, chance to draw…
For bookmakers:
  • Easy to edit: they can find and change any value they want, make comments on the changes, search to find, and turn on/off some moves (via the field “Active”). It is easy to add traps and verify variants. Inserting, and deleting records is simple and easy too. All editing could be done just by any SQLite tool. All numbers have full meaning and consistent values. In contrast, to modify a Polyglot book users need a specific, complicated chess software (which so far is very few). We don’t have many things to change, say, only weights. As we have mentioned, weights are meaningless, especially when they are alone and be scaled. Thus how to change an existing one is a hard question. Any chance could affect badly to the quality of the book
  • Can merge, add: they can create a new book by merging from other books, add/or add some games. In contrast, merging, and expanding Polyglot is almost impossible since we miss the scale factors of the original ones thus expanding any book can easily break their correctness and consistency
  • Easy to verify and fix books: All records are quite independent of each other. The system could bear some wrong with some records. Wrong data could be verified and found out too. For Polyglot, it is very hard to verify and find out wrong data since numbers all are “magic” and some may not be reachable from the root to verify
  • Probably it is the best solution for chess variants since we don’t have to standardize the set of hash keys/generators
Nevertheless, if users still prefer Polyglot books, they could easily convert from OOBS books. The converting process should be very fast. OOBS books keep more information thus we can convert at any time. However, it is one way: we can’t convert Polyglot books back to OOBS since lacking information.

For the time being, there will be more chess GUIs/tools to support OOBS. At a point, people won’t care about book formats but their quality. We believe OOBS can help people to develop and manage their books much better in much easier, more efficient ways.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
Ferdy
Posts: 4851
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Open Opening Book Standard (OOBS)

Post by Ferdy »

phhnguyen wrote: Tue May 03, 2022 3:34 pm Open Opening Book Standard (OOBS)

We have released the first draft version of the Open Opening Book Standard (OOBS). In other words, it is a new format for chess opening book files.
The main ideas:
- use SQLite to store data and search
- all data is clear, meaningful, and full

In detail, we store FEN strings, and numbers of win-draw-loss of games with those FENs.

Those books can be viewed and edited directly by any SQLite browser such as SQLite Studio, SQLite Browser. Below is a screenshot:


Image
View, edit by SQLiteStudio


The new format is good for playing as well as studying, modifying, and editing. Users may use any above SQLite browsers to work with those books without the need for any specific chess tool.

The project itself is an opening book tool. At the moment it could:
- Create opening books in OBS standard (.obs.db3), Polyglot (.bin). The input are PGN (.pgn) and/or OCGDB (.ocgdb.db3) files
- Query for data

More functions will be added later.

The link to the project is below. In the sample folder is an opening book book-mb3.54.obs.db3 of 30 million positions, created from MillionBase database (by Ed Schröder) of 3.45 million games.

https://github.com/nguyenpham/oobs

The project was inspired by project OCGDB. Some code is taken from that project too thus it is verified, and the tool has very high performance.

That is a working project. Many things are not fixed yet and may be changed later. Some research and studies about chess opening in general and this format will be published.
It looks like all FEN has "0 1" at the end. Why not just make it an EPD.
From:
rnbqk2r/ppp2pp1/3p1B1p/2b1p3/2B1P3/2NP4/PPP2PPP/R2QK1NR b KQkq - 0 1
To:
rnbqk2r/ppp2pp1/3p1B1p/2b1p3/2B1P3/2NP4/PPP2PPP/R2QK1NR b KQkq -

Regarding e.p square on:


[fen]r1bq1rk1/2p2ppp/pbnp1n2/1p2p3/3PP3/1BP2N1P/PP3PP1/RNBQR1K1 b - d3 0 1[/fen]


Perhaps it is better to remove the d3 as there is no black pawn that can capture in that location.

Updated rule in the github repo that I tried to maintain.

Code: Select all

16.1.3.4: En passant target square

The fourth field is the en passant target square. If there is no en passant
target square then the single character symbol "-" appears. If there is an en
passant target square then is represented by a lowercase file character
immediately followed by a rank digit. Obviously, the rank digit will be "3"
following a white pawn double advance (Black is the active color) or else be
the digit "6" after a black pawn double advance (White being the active color).

An en passant target square is given if and only if the last move was a pawn
advance of two squares and the active color has a legal en passant capture
move.
I am developing a game analyzer where positions encountered and analyzed (epd, move, ce, depth, pv, engname) are saved in sqlite3 db. When positions from other games are found in db I will just retrieve it - no more engine analysis to be done. I can actually use OOBS in this regard - writing game win/loss/draw stats as comments in the move made in the game and the move recommended by the analyzing engine.

Code: Select all

ce = centipawn evaluation
pv = principal variation 
engname = the engine id name that analyzes the position
Ferdy
Posts: 4851
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Open Opening Book Standard (OOBS)

Post by Ferdy »

I have downloaded the book-mb3.54.obs.db3 and try to find the stats on this position.


[fen]rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq - 0 1[/fen]


No record is returned.
Ferdy
Posts: 4851
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Open Opening Book Standard (OOBS)

Post by Ferdy »

Sample Application of OOBS with the game analyzed by Stockfish 15.

[pgn][Event "American Cup Champ"]
[Site "Saint Louis USA"]
[Date "2022.04.23"]
[Round "2.3"]
[White "Sevian, Samuel"]
[Black "Caruana, Fabiano"]
[Result "0-1"]
[BlackElo "2781"]
[BlackFideId "2020009"]
[BlackTitle "GM"]
[ECO "D02"]
[EventDate "2022.04.20"]
[Opening "Queen's bishop game"]
[WhiteElo "2693"]
[WhiteFideId "2040506"]
[WhiteTitle "GM"]

1.d4 { book: 3.54_MillionBase, white_score_rate: 54.95%, } 1...Nf6 { book: 3.54_MillionBase, white_score_rate: 60.00%, } 2.Nf3 { book: 3.54_MillionBase, white_score_rate: 52.12%, } 2...d5 { book: 3.54_MillionBase, white_score_rate: 47.19%, } 3.Bf4 { book: 3.54_MillionBase, white_score_rate: 51.85%, } 3...c5 { book: 3.54_MillionBase, white_score_rate: 55.59%, } 4.e3 { book: 3.54_MillionBase, white_score_rate: 50.00%, 0.0/23, } 4...Nc6 { book: 3.54_MillionBase, white_score_rate: 53.29%, -0.11/23, } 5.Nbd2 { book: 3.54_MillionBase, white_score_rate: 58.71%, 0.12/24, } 5...Nh5 { book: 3.54_MillionBase, white_score_rate: 50.00%, -0.25/24, } ( 5...cxd4 { -0.12/23 } ) 6.Bg5 { book: 3.54_MillionBase, white_score_rate: 62.50%, -0.18/22, } ( 6.dxc5 Nxf4 7.exf4 Qa5 8.c4 { 0.25/24 } ) 6...h6 { book: 3.54_MillionBase, white_score_rate: 35.00%, 0.18/22, } 7.Bh4 { book: 3.54_MillionBase, white_score_rate: 65.00%, 0.03/23, } 7...g5 { book: 3.54_MillionBase, white_score_rate: 35.00%, 0.17/25, } 8.Ne5 { -0.22/25, } 8...Nxe5 { book: 3.54_MillionBase, white_score_rate: 41.67%, 0.18/25, } 9.dxe5 { book: 3.54_MillionBase, white_score_rate: 37.50%, -0.14/26, } 9...Ng7 { book: 3.54_MillionBase, white_score_rate: 62.50%, 0.01/25, } 10.Bg3 { book: 3.54_MillionBase, white_score_rate: 37.50%, 0.02/25, } 10...Nf5 { book: 3.54_MillionBase, white_score_rate: 62.50%, -0.09/24, } 11.e4 { -0.32/22, } ( 11.Qf3 { 0.09/25 } ) 11...Nxg3 $6 { -0.53/20, } ( 11...dxe4 12.Nxe4 Qa5+ 13.Qd2 Qxd2+ { 0.32/22 } ) 12.hxg3 { 0.53/20, White is slighly better, } 12...e6 $6 { -0.6/22, } ( 12...d4 { -0.45/20 } ) 13.f4 { -0.17/20, Position is equal, } ( 13.Qh5 Qa5 14.c3 c4 15.Be2 { 0.6/22 } ) 13...Bd7 $6 { -0.76/21, } ( 13...c4 { 0.17/20 } ) 14.exd5 { 0.76/21, White is slighly better, } 14...exd5 { -0.73/23, } 15.Qh5 { -0.17/21, Position is equal, } ( { Better is } 15.c4 { 0.89/23 } ) 15...Qb6 { -0.11/22, } ( 15...c4 16.O-O-O Qa5 17.Kb1 Qb6 { 0.17/21 } ) 16.O-O-O { 0.11/22, } 16...c4 { -0.26/22, } 17.c3 $2 { -1.1/21, } ( { Better is } 17.Qf3 Bc6 { 0.31/21 } ) 17...Bc5 { 0.6/21, Black is slighly better, } ( 17...O-O-O 18.Be2 { 1.1/21 } ) 18.Be2 { -0.6/21, } 18...Be3 { -0.09/25, Position is equal, } ( 18...Ba4 19.Bg4 Bxd1 20.Rxd1 Qg6 { 0.62/23 } ) 19.Bg4 { 0.09/25, } 19...O-O-O { -0.06/27, } 20.Kb1 $5 { 0.0/24, } ( 20.Bxd7+ Rxd7 21.Kb1 Qe6 22.fxg5 { 0.0/29 } ) 20...Bxg4 { 0.0/24, } 21.Qxg4+ { 0.0/27, } 21...Kb8 { -0.08/25, } ( 21...Qe6 22.Qf3 Bc5 23.Qh5 Rhg8 { 0.0/30 } ) 22.Qf5 { -0.47/24, } ( 22.Rhe1 Bxd2 23.Rxd2 d4 24.Rxd4 { 0.08/25 } ) 22...Qe6 { 0.47/24, } 23.Qc2 $6 { -0.5/23, } ( 23.Qxe6 { -0.27/24 } ) 23...gxf4 { 0.5/23, Black is slighly better, } 24.gxf4 { -0.38/26, } 24...Bxf4 { 0.45/25, } 25.Nf3 { -0.37/24, } 25...Bxe5 { 0.37/25, } 26.Rhe1 $6 { -0.6/22, } ( 26.Nxe5 Qxe5 27.Qf2 Qe4+ 28.Ka1 { -0.33/25 } ) 26...f6 { 0.6/22, } 27.Ka1 { -0.74/20, } ( 27.Re2 h5 28.Rde1 Rde8 29.a3 { -0.53/24 } ) 27...Rhg8 { 0.74/20, } 28.Nd4 $2 { -1.68/21, } ( 28.Re2 a6 { -0.86/22 } ) 28...Qg4 { 1.7/22, } ( 28...Qb6 29.Qe2 { 1.68/21 } ) 29.Nf5 $4 { -4.05/22, } ( { Best is } 29.Re2 Qg6 { -1.7/22 } ) 29...Qxg2 { 4.05/22, Black is winning, } 30.Qa4 { -4.65/21, } ( 30.Re2 Qg6 31.Re3 h5 32.Rh3 { -3.99/23 } ) 30...Rge8 { 3.72/22, } ( 30...h5 { 4.65/21 } ) 31.Nd6 { -8.28/22, } ( 31.a3 Rh8 { -3.72/22 } ) 31...Re7 { 2.34/22, Black is clearly better, } ( { Best is } 31...Bxc3 32.bxc3 Rxe1 33.Rxe1 Qd2 { 8.28/22 } ) 32.Rg1 $4 { -3.55/22, } ( 32.Nxc4 Ree8 { -2.34/22 } ) 32...Qf2 { 2.87/22, } ( 32...Qe2 33.Nf5 Re6 34.Rge1 Qf2 { 3.55/22 } ) 33.Nxc4 { -2.87/22, } 33...Qf4 { 2.34/21, } ( 33...Red7 34.Nxe5 fxe5 35.a3 d4 { 2.86/23 } ) 34.Qb3 { -2.53/22, } ( 34.Rgf1 Qg4 35.Rg1 Qc8 36.Nxe5 { -2.34/21 } ) 34...Rc7 { 0.63/23, Black is slighly better, } ( { Better is } 34...d4 35.Rgf1 Qg3 36.a3 d3 { 2.53/22 } ) 35.Na3 $2 { -2.33/21, } ( { Better is } 35.Nxe5 Qxe5 36.Qb4 f5 37.Qh4 { -0.63/23 } ) 35...Qe3 { 1.78/20, } ( 35...a6 36.Rxd5 Rcc8 37.Qd1 Bc7 { 2.33/21 } ) 36.Rge1 { -2.58/24, } ( 36.Nc2 Qb6 { -1.78/20 } ) 36...Qc5 { 1.49/23, } ( { Better is } 36...Qb6 37.Qxb6 axb6 38.Rh1 Rh7 { 2.58/24 } ) 37.Nc2 { -2.0/23, } ( 37.Nb5 Rh7 38.Nd4 Qb6 39.Qc2 { -1.49/23 } ) 37...h5 { 1.45/24, } ( 37...Qc4 38.Rh1 Qxb3 39.axb3 Rh7 { 2.0/23 } ) 38.Nb4 $4 { -3.85/23, } ( { Best is } 38.Nd4 { -1.45/24 } ) 38...Qc4 { 3.85/23, Black is winning, } 39.Nxd5 { -4.12/25, } ( 39.Rh1 { -3.68/22 } ) 39...Qxb3 { 4.12/25, } 40.axb3 { -4.31/25, } 40...Rh7 { 4.57/24, } 41.Re3 { -4.62/22, } ( 41.Rh1 h4 42.Rh3 Bg3 43.Rd3 { -4.34/25 } ) 41...h4 { 4.62/22, } 42.Rh3 { -4.38/22, } 42...Bg3 { 4.89/23, } 43.c4 { -4.89/19, } 43...f5 { 4.84/21, } 44.Rf1 { -5.07/22, } ( 44.Rd3 Rf7 { -5.05/21 } ) 44...Rf8 { 4.34/22, } ( 44...f4 45.Rf3 Rf7 46.b4 Re8 { 5.07/22 } ) 45.Rf3 { -4.91/22, } ( 45.Nf4 Re8 46.Kb1 Re4 47.Nd5 { -4.34/22 } ) 45...f4 { 4.91/22, } 46.Ka2 { -5.27/21, } ( 46.b4 Kc8 47.Ka2 Kd7 48.Rh1 { -4.99/20 } ) 46...Rh6 { 4.48/18, } ( 46...Kc8 47.b4 Kd7 48.b5 Ke6 { 5.27/21 } ) 47.b4 { -4.48/18, } 47...Rh5 { 4.21/19, } ( 47...Kc8 48.Kb3 Kd7 49.b5 Ke6 { 4.77/19 } ) 48.b5 { -4.57/21, } ( 48.Rh1 Kc8 49.b5 Kd7 50.Kb3 { -4.21/19 } ) 48...Re5 { 4.02/22, } ( 48...Kc8 { 4.57/21 } ) 49.Rh1 { -4.02/22, } 49...Re2 { 4.98/21, } ( 49...Re4 50.c5 Re5 51.Rd1 Rd8 { 4.67/21 } ) 50.Kb3 { -5.49/22, } ( 50.Rhf1 Re1 { -4.98/21 } ) 50...Rfe8 { 4.23/20, } ( { Better is } 50...Rf2 51.Rxf2 Bxf2 52.Kc2 Bg3 { 5.49/22 } ) 51.c5 { -6.17/22, } ( 51.Nxf4 R2e3+ { -4.23/20 } ) 51...R8e5 { 2.7/26, Black is clearly better, } ( { Best is } 51...R2e3+ 52.Nxe3 fxe3 53.Rf7 e2 { 6.17/22 } ) 52.Rd3 $4 { -5.84/20, } ( 52.Nxf4 R2e3+ 53.Rxe3 Rxe3+ 54.Ka2 { -2.7/26 } ) 52...f3 { 0.07/22, Position is equal, } ( { Best is } 52...Re1 53.Rxe1 { 5.84/20 } ) 53.c6 $4 { -5.41/20, } ( { Best is } 53.Rhd1 Re8 54.Rxf3 Rf2 55.Rfd3 { -0.07/22 } ) 53...bxc6 { 5.89/20, Black is winning, } ( 53...Rf5 54.Nc3 Re7 55.Nd5 Ref7 { 5.41/20 } ) 54.bxc6 { -5.89/20, } 54...f2 { 7.56/20, } 55.c7+ { -7.22/19, } 55...Kb7 { 6.36/20, } ( 55...Kc8 56.Rc1 Re8 57.Ra1 Kb7 { 7.86/20 } ) 56.Rc1 { -7.97/20, } ( 56.Ka2 Re8 57.Rhd1 Rc8 58.Rb3+ { -6.36/20 } ) 56...Re8 { 7.97/20, } 57.Rd4 { -8.08/21, } ( 57.Ka2 Rc8 58.Rb3+ Ka8 59.Rf3 { -8.14/20 } ) 57...a5 { 8.08/21, } 58.Nf6 { -10.58/18, } ( 58.Rf1 Re1 59.Rxf2 Bxf2 60.Rd3 { -8.82/19 } ) 58...Rc8 { 8.61/20, } ( 58...Re1 59.Rdd1 f1=Q 60.Rxe1 Qxe1 { 10.58/18 } ) 59.Rd5 { -12.23/19, } ( 59.Rf1 Re1 60.Rxf2 Bxf2 61.Ra4 { -8.61/20 } ) 59...Rxc7 { 8.81/19, } ( 59...Re1 60.Rcc5 f1=Q 61.Rb5+ Kxc7 { 12.23/19 } ) 60.Rb5+ { -16.33/18, } ( 60.Rf1 Re1 61.Rxf2 Bxf2 62.Ka2 { -8.81/19 } ) 60...Ka6 { 16.33/18, } 61.Rxc7 { -23.35/18, } ( 61.Rd1 Re1 62.Rf5 Rxd1 63.Rf3 { -18.38/16 } ) 61...Kxb5 { 23.35/18, } 0-1[/pgn]
User avatar
hgm
Posts: 28396
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Open Opening Book Standard (OOBS)

Post by hgm »

Note that Polyglot format is designed for encoding a 'cooked book'. That is, all it is supposed to do is answer the question: "given this position, which move should be played with what probability?". All tasks that led to determining the answer (such as raw statistics of the games containing the position) have already been performed "off line", i.e. during creation of the book. The format is designed to distribute finished books in a compact way.

So in particular the format is not designed for holding intermediate results during bookbuilding, which should still be elaborated on. For expanding the set of games from which a book is built, one should simply keep the PGN on which the original book was based, and concatenate it with the additional PGN games. (In building books with WinBoard I used a trick for that, though: it uses the 'learning fileds' of the Polyglot book to store raw WDL statistics that, together with the weight, allows reconstruction of the complete statistics. So that WinBoard can add more games to a book created by itself later.)

One should thus view a Polyglot book as an executable, where the PGN file from which it was made is the source code. As we know, modifying a program by hex-editing the executable is not the recommended way. One alters the source code, and recompiles.

Renormalization of the weights on overflow is a bit of a non-issue. The weights are 16 bit, and can thus run up to 64K. Only a few positions very close to the initial one would ever occur more than that in the game set.

That the way Polyglot (or WinBoard) derives the weights from the WDL statistics is stupid is an entirely different issue. Which has nothing to do with the format. In principle the desirability for playing a move does not just depend on how often it was played and with what result in actual games, but also on the statistics of later positions. A move can have good statistics because people did not often find the refutation (perhaps because it has not been known so long yet that the move can be refuted), but if a refutation exists, it should still be considered a poor move, and not be played. The games on which the book is based should be considered as the rollouts of an MCTS tree.

That the base keys for computing the position hash are given as a lengthy table, rather than an algorithmic prescription definitely is an annoyance. OTOH, the table is posted on line, and anyone can copy it. WinBoard uses an algorithmic prescription for expanding the standard table for handling larger boards and unorthodox piece types. To cure this, using an entirely new and unrelated format is overkill, though. One could just keep the format, but with an altered prescription for deriving the position key. Existing Polyglot books would not be easily converted to the new key standard, but that would hold for any change in format. These books were not designed for further processing, and that includes conversion to other formats.

Some book builders might consider it an advantage that the format in which they release their work is not easily 'decompiled'.
User avatar
Steve Maughan
Posts: 1298
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

Re: Open Opening Book Standard (OOBS)

Post by Steve Maughan »

phhnguyen wrote: Wed May 04, 2022 5:45 am...Speed: I don’t have any speed benchmark of OOBS at the moment but they should be significantly slower than Polyglot ones...
An engine needs to query the book before starting each search. Speed becomes extremely important if you're testing ideas by playing games-in-ten-seconds. What would be the overhead of 60 queries?

Steve
http://www.chessprogramming.net - Juggernaut & Maverick Chess Engine
User avatar
phhnguyen
Posts: 1525
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Open Opening Book Standard (OOBS)

Post by phhnguyen »

Ferdy wrote: Wed May 04, 2022 8:13 am It looks like all FEN has "0 1" at the end. Why not just make it an EPD.
From:
rnbqk2r/ppp2pp1/3p1B1p/2b1p3/2B1P3/2NP4/PPP2PPP/R2QK1NR b KQkq - 0 1
To:
rnbqk2r/ppp2pp1/3p1B1p/2b1p3/2B1P3/2NP4/PPP2PPP/R2QK1NR b KQkq -

Regarding e.p square on:


[fen]r1bq1rk1/2p2ppp/pbnp1n2/1p2p3/3PP3/1BP2N1P/PP3PP1/RNBQR1K1 b - d3 0 1[/fen]


Perhaps it is better to remove the d3 as there is no black pawn that can capture in that location.

Updated rule in the github repo that I tried to maintain.

Code: Select all

16.1.3.4: En passant target square

The fourth field is the en passant target square. If there is no en passant
target square then the single character symbol "-" appears. If there is an en
passant target square then is represented by a lowercase file character
immediately followed by a rank digit. Obviously, the rank digit will be "3"
following a white pawn double advance (Black is the active color) or else be
the digit "6" after a black pawn double advance (White being the active color).

An en passant target square is given if and only if the last move was a pawn
advance of two squares and the active color has a legal en passant capture
move.
I am developing a game analyzer where positions encountered and analyzed (epd, move, ce, depth, pv, engname) are saved in sqlite3 db. When positions from other games are found in db I will just retrieve it - no more engine analysis to be done. I can actually use OOBS in this regard - writing game win/loss/draw stats as comments in the move made in the game and the move recommended by the analyzing engine.

Code: Select all

ce = centipawn evaluation
pv = principal variation 
engname = the engine id name that analyzes the position

Good idea! Thanks.
Ferdy wrote: Wed May 04, 2022 10:31 am I have downloaded the book-mb3.54.obs.db3 and try to find the stats on this position.


[fen]rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq - 0 1[/fen]


No record is returned.
Your FEN string should be rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1
(missing enpassant)

Image
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager