jja: convert CTG books to PolyGlot format (and more!)

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

chesskobra
Posts: 194
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: jja: convert CTG books to PolyGlot format (and more!)

Post by chesskobra »

Thanks a lot. Identical sha sum looks impressive. I will test on the weekend (especially the pgn output, which is what I am very interested in). Have been a little bit busy, so have not tested.
User avatar
alpltl
Posts: 57
Joined: Tue Mar 14, 2023 3:04 pm
Location: Berlin
Full name: Ali Polatel

Re: jja: convert CTG books to PolyGlot format (and more!)

Post by alpltl »

chesskobra wrote: Wed Jun 21, 2023 12:55 pm Thanks a lot. Identical sha sum looks impressive. I will test on the weekend (especially the pgn output, which is what I am very interested in). Have been a little bit busy, so have not tested.
I'm so glad to hear from you and truly appreciate your enthusiasm to test our recent updates over the weekend. Your continuous support and dedication in identifying and testing for bugs in JJA has been immensely valuable to our project.

We understand everyone has their commitments and we really appreciate the time and effort you're taking out of your busy schedule to test our PGN output feature. This is an area we've been working hard on, and your input will definitely be instrumental in ensuring its quality and effectiveness.

I'm eagerly looking forward to hearing about your testing experiences and any feedback you may have. Thank you once again for being an integral part of our project.
Caissa-AI, Caissa-Test, and Caissa-X on LiChess
ChessWoB: Chess without Boundaries
jja: Jin, Jîyan, Azadî!
Follow @alip on Mastodon!
chesskobra
Posts: 194
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: jja: convert CTG books to PolyGlot format (and more!)

Post by chesskobra »

Could someone clarify if my interpretation of what a line in a book is is correct?

I think a line is a maximal sequence of half moves starting from a given position (in the case of an opening book, starting from the start position) seen in a database that conforms to the min-game parameter. For example, if min-game is 3, and in the database there are at least 3 games with move sequence a,b,c,d, but at most 2 games with a,b,c,d,x for any x, then a,b,c,d is a line, but a,b,c,d,x is not for any x. Also, a,b is not a line even it appears at least 3 times, since it is not a maximal sequence appearing at least 3 times. So maximal means a sequence that cannot be extended, while still satisfying the min-game parameter. (There is still the question what to do if the position after a,b,c,d,x,y,z appears at least 3 times in the database, but with different move orders, but many of the intermediate positions do not appear at least 3 times. But for now let us ignore this.)

In this sense, I would like to see lines in the pgn output from a book, I would like it to print a,b,c,d as a line. Also, if min-game 1, then all distinct complete games should be in the output.
User avatar
alpltl
Posts: 57
Joined: Tue Mar 14, 2023 3:04 pm
Location: Berlin
Full name: Ali Polatel

Re: jja: convert CTG books to PolyGlot format (and more!)

Post by alpltl »

chesskobra wrote: Thu Jun 22, 2023 2:51 pm Could someone clarify if my interpretation of what a line in a book is is correct?

I think a line is a maximal sequence of half moves starting from a given position (in the case of an opening book, starting from the start position) seen in a database that conforms to the min-game parameter. For example, if min-game is 3, and in the database there are at least 3 games with move sequence a,b,c,d, but at most 2 games with a,b,c,d,x for any x, then a,b,c,d is a line, but a,b,c,d,x is not for any x. Also, a,b is not a line even it appears at least 3 times, since it is not a maximal sequence appearing at least 3 times. So maximal means a sequence that cannot be extended, while still satisfying the min-game parameter. (There is still the question what to do if the position after a,b,c,d,x,y,z appears at least 3 times in the database, but with different move orders, but many of the intermediate positions do not appear at least 3 times. But for now let us ignore this.)

In this sense, I would like to see lines in the pgn output from a book, I would like it to print a,b,c,d as a line. Also, if min-game 1, then all distinct complete games should be in the output.
In the context of JJA, a 'line' refers to a sequence of half-moves (i.e., plys) originating from a specific position. Each half-move in the sequence is associated with a unique position on the board. The lines in an opening book represent different potential paths that a game can take, starting from the specified position.

The min-game parameter, used during the creation of the opening book from a database, establishes the minimum number of games in the database that must follow a specific sequence of half-moves for it to be included in the book as a line. If a sequence doesn't meet this criterion, it is not considered a valid line and will not be included in the book.

To use your example, if min-game is 3, then a line 'a, b, c, d' must occur in at least three games in the database to be included in the book. If there are only two games that continue with 'a, b, c, d, x' for any half-move x, then 'a, b, c, d, x' is not a valid line because it does not meet the min-game requirement. Therefore, 'a, b, c, d' is considered the maximal line in this context. 'a, b' is not considered a line even though it appears three times because it's not maximal, as it can be extended to 'a, b, c, d'.

If the min-game parameter is set to 1, then all unique complete games from the database will indeed be included in the opening book, as each unique game satisfies the condition of appearing at least once. Note, as we've seen before, for a single game there may be more than one variations in case there're move repetitions in the game.

The max-ply option, which is used when querying the opening book, limits the depth of the variations considered, in terms of plys. For instance, if you have an opening line '1.e4 e5 2.Nf3 Nc6 3.Bb5', and you set max-ply to 3 when querying the book, the variations returned will only consider the first three half-moves, that is, up to '1.e4 e5 2.Nf3'.

Regarding your request to see specific lines in the PGN output, JJA can indeed produce PGN files from an opening book, and these will reflect the lines stored within the book. If the line 'a, b, c, d' is included in the book and max-ply is set high enough to include it, it will appear in the output.

In summary, your understanding of a line in an opening book is largely correct. A line is a maximal sequence of half-moves from a given position that appears at least min-game times in the game database used to create the book. The max-ply parameter then restricts the depth of the lines when querying the book. The output will then depend on the structure of the book (dictated by min-game) and the query parameters (such as max-ply).
Caissa-AI, Caissa-Test, and Caissa-X on LiChess
ChessWoB: Chess without Boundaries
jja: Jin, Jîyan, Azadî!
Follow @alip on Mastodon!
chesskobra
Posts: 194
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: jja: convert CTG books to PolyGlot format (and more!)

Post by chesskobra »

I have uploaded a test to gitlab https://gitlab.com/beejaganita/jja-tests
I am seeing a few extra lines in the pgn output with -min-game 1 option. Please take a look. I am still a bit uncomfortable getting around a gitlab repo.
chesskobra
Posts: 194
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: jja: convert CTG books to PolyGlot format (and more!)

Post by chesskobra »

Please ignore the previous message. I have now created a very small test case with 2 games in the original pgn and 4 in the jja output. https://gitlab.com/beejaganita/jja-tests/

I think if a position arises by two paths, polyglot may create more lines that may not exist in the original pgn. For example, if there are two games a,b,c,x and c,b,a,y, then the output pgn from the polyglot book will have 4 lines: abcx, abcy, cbay, cbax, since the position after abc also arises after cba, which makes sense,
User avatar
alpltl
Posts: 57
Joined: Tue Mar 14, 2023 3:04 pm
Location: Berlin
Full name: Ali Polatel

Re: jja: convert CTG books to PolyGlot format (and more!)

Post by alpltl »

chesskobra wrote: Tue Jun 27, 2023 12:28 pm Please ignore the previous message. I have now created a very small test case with 2 games in the original pgn and 4 in the jja output. https://gitlab.com/beejaganita/jja-tests/

I think if a position arises by two paths, polyglot may create more lines that may not exist in the original pgn. For example, if there are two games a,b,c,x and c,b,a,y, then the output pgn from the polyglot book will have 4 lines: abcx, abcy, cbay, cbax, since the position after abc also arises after cba, which makes sense,
Thanks a lot for testing. That is correct. Lines with transpositions will appear as many lines and this is due to the nature of the PolyGlot opening book format: what's stored is a key of the position, which is a Zobrist hash ie. a random-looking 64-bit number, a compact 16-bit move representation, a weight on setting the moves' priority and a learn field which is commonly unused. Hence there is no information in a polyglot book entry about the previous position or the move which caused this position to arise. Without this information it is not really possible to eliminate such transpositions in PGN output. Having said that I think it makes sense to have the transpositions in the PGN for a clearer view of the book.
Caissa-AI, Caissa-Test, and Caissa-X on LiChess
ChessWoB: Chess without Boundaries
jja: Jin, Jîyan, Azadî!
Follow @alip on Mastodon!
User avatar
alpltl
Posts: 57
Joined: Tue Mar 14, 2023 3:04 pm
Location: Berlin
Full name: Ali Polatel

Re: jja: convert CTG books to PolyGlot format (and more!)

Post by alpltl »

chesskobra wrote: Tue Jun 27, 2023 12:28 pm Please ignore the previous message. I have now created a very small test case with 2 games in the original pgn and 4 in the jja output. https://gitlab.com/beejaganita/jja-tests/

I think if a position arises by two paths, polyglot may create more lines that may not exist in the original pgn. For example, if there are two games a,b,c,x and c,b,a,y, then the output pgn from the polyglot book will have 4 lines: abcx, abcy, cbay, cbax, since the position after abc also arises after cba, which makes sense,
Also note, the latest git has the new subcommands dump and restore which you can use to gain better insight into PolyGlot opening books. The latest git build which includes this feature is here for linux-glibc, linux-musl, and windows respectively. Below is a short demonstration on what you can do with them:

Code: Select all

⇒  jja dump Perfect2023.bin | shuf | head -n3 # dump each entry as a JSON array to standard output.
[1180666540095889894,788,1,0]
[5141607501082753450,3742,1,0]
[5260937186738805552,3364,35280,0]
⇒  jja dump Perfect2023.bin | grep $(jja hash -pe4 | awk '{print $6}') # Look for a position in the dump.
[9384546495678726550,3364,65520,0]
[9384546495678726550,3234,43680,0]
[9384546495678726550,3242,1,0]
[9384546495678726550,3372,1,0]
⇒  jja find -pe4 Perfect2023.bin # Compare with the output above.
+---+------+--------+-------+
| * | UCI  | Weight | Learn |
+---+------+--------+-------+
| 1 | e7e5 | 65520  | 0     |
+---+------+--------+-------+
| 2 | c7c5 | 43680  | 0     |
+---+------+--------+-------+
| 3 | c7c6 | 1      | 0     |
+---+------+--------+-------+
| 4 | e7e6 | 1      | 0     |
+---+------+--------+-------+
⇒  jja dump Perfect2023.bin | awk -F, '$3 < 1000 {next} 1' | jja restore tmp1.bin # Create a new book only preserving entries with weight greater than or equal to 1000
Creating output PolyGlot opening book...
Success creating output PolyGlot opening book.
Reading PolyGlot dump from standard input...
Success reading 525 PolyGlot book entries from standard input into memory.
Writing output PolyGlot opening book...
⇒  { jja info Perfect2023.bin ; jja info tmp1.bin } | grep -i num # Compare number of entries in both books.
Number of entries: 3127
Number of entries: 525
⇒  jja dump Perfect2023.bin |\
        perl -MJSON -lane '$a = decode_json($_); $a->[2] = int(rand(65536)); print to_json($a, {ascii => 1, pretty => 0})' |\
        jja restore tmp2.bin # Randomize weights of all entries in the book.
Creating output PolyGlot opening book...
Success creating output PolyGlot opening book.
Reading PolyGlot dump from standard input...
Success reading 3127 PolyGlot book entries from standard input into memory.
Writing output PolyGlot opening book...
⇒  { jja find Perfect2023.bin; jja find tmp2.bin } # See how weights are randomized in the second book, and see how they are reverse-sorted by weight.
+---+------+--------+-------+
| * | UCI  | Weight | Learn |
+---+------+--------+-------+
| 1 | e2e4 | 65520  | 0     |
+---+------+--------+-------+
| 2 | g1f3 | 17035  | 0     |
+---+------+--------+-------+
| 3 | d2d4 | 39312  | 0     |
+---+------+--------+-------+
| 4 | c2c4 | 9172   | 0     |
+---+------+--------+-------+
+---+------+--------+-------+
| * | UCI  | Weight | Learn |
+---+------+--------+-------+
| 1 | d2d4 | 55843  | 0     |
+---+------+--------+-------+
| 2 | g1f3 | 35509  | 0     |
+---+------+--------+-------+
| 3 | c2c4 | 10182  | 0     |
+---+------+--------+-------+
| 4 | e2e4 | 2297   | 0     |
+---+------+--------+-------+
As you see the dump, and restore subcommands provide an efficient way to mangle PolyGlot opening books. Here I have showed you two examples, one where we filter out entries lower than a certain bound, and the other where we randomize weights of all entries in the book. This is very versatile and can be used to achieve many different actions on PolyGlot opening books in an easy, efficient manner.

This version also has support for BrainLearn experience file format in addition to all the book formats we support and the dump/restore functionality may be used for BrainLearn experience files as well. I am going to release this as version 0.6.0 after testing, and translation updates. Stay tuned :-)
Caissa-AI, Caissa-Test, and Caissa-X on LiChess
ChessWoB: Chess without Boundaries
jja: Jin, Jîyan, Azadî!
Follow @alip on Mastodon!
User avatar
alpltl
Posts: 57
Joined: Tue Mar 14, 2023 3:04 pm
Location: Berlin
Full name: Ali Polatel

Re: jja: convert CTG books to PolyGlot format (and more!)

Post by alpltl »

We're excited to announce that JJA has reached another milestone with the release of version 0.6.0! Following our previous version 0.5.0, this latest update brings substantial enhancements, critical fixes, and notable new features that are sure to make your chess analysis even more powerful. Read more about it in the blog post here.

Download

- Windows: jja-0.6.0.exe, sha512sum, signature
- Linux-Glibc: jja-0.6.0-glibc.bin, sha512sum, signature
- Linux-Musl: jja-0.6.0-musl.bin, sha512sum, signature

To build from source, use

Code: Select all

cargo install jja
ChangeLog
## 0.6.0

- `jja::pgnfilt::Operator` and `jja::pgnfilt::LogicalOperator` implements `Eq` as
well as `PartialEq` now.
- new `restore` command to accompany the `dump` command which restores JSON
serialized PolyGlot or BrainLearn file entries into the given output file.
- fix `SIGPIPE` handling on UNIX systems so that `jja` does not panic when the
output is piped to another program such as a pager.
- find learned `-z <HASH>, --hash=<HASH>` to query PolyGlot opening books and
BrainLearn experience files by Zobrist hash.
- readonly support for BrainLearn experience file format. The subcommands `info`,
`dump`, and `find` are able to handle files in BrainLearn experience file format
with the extension `.exp`.
- improve polyglot key lookup by reading only the key rather than the whole entry
from file. The new public function `polyglotbook::PolyGlotBook::read_book_key` is
used for that.
- new `dump` command to dump the full contents of a PolyGlot opening book or a PGN
file. The dump format of the PolyGlot opening book is JSON, whereas for PGN files
this is CSV.
- **breaking change**: `polyglotbook::PolyglotBook::lookup_moves` has been changed
to take a zobrist hash of a chess position as an argument rather than the
`shakmaty::Chess` position itself.
- hash learned `--signed` to print Zobrist hashes as signed decimal numbers.
- new module `jja::file` which exports utilities for binary file i/o.
- **breaking change**: `jja::polyglot::entry_{from,to}_file` have been renamed to
`jja::polyglot::bin_entry_{from,to}_file`.
- use the XorShift random number
generator to randomly pick moves during book matches using the match command. This
algorithm is cryptographically insecure but is very fast.
- fix match command from panicing on certain cases when there is a book lookup miss.
- Print more detailed build information on long version, `--version` output.
- Memory map CTG files to speed up random access in return for increased memory
usage. This brings a new dependency upon the crate `memmap`.
- use buffered read/write in interacting with PolyGlot books which reduces the
read/write system calls by a huge margin and thereby improves performance. The
`book` member of `PolyGlotBook` is now a `BufReader<File>` rather than a `File`
which is a **breaking change**.
- important fix for calculating PolyGlot compatible Zobrist hashes wrt.
en-passant legality. In PolyGlot format, en-passant moves are only pseudo-legal
whereas previously the `jja::hash::zobrist_hash` function mistakenly checked
for full legality.
- breaking change: new type `ctgbook::CtgTree` which holds the new return value
of the functions `CtgBook::extract_all`, and `CtgBook::extract_all2`. The `tree`
element of `CtgEntry` which were used by these functions has also been dropped and
the functions have been implemented in a much more performant way using
considerably less memory. As a result, most ctg to abk/polyglot conversions are
almost double as fast.
- breaking change: `ctgbook::CtgEntry` member uci's type has been changed from
`String` to `shakmaty::uci::Uci`, and the `nags` member has been renamed to `nag`
and its type has been changed from `Option<String>` to `Option<Nag>`.
- ctg move comment entries were parsed and silently discarded, they're no longer
parsed. Moreover, `ctgbook::CtgEntry` no longer has a `comment` member which is a
breaking change.
- **breaking change[/b]: `ctg::colored_uci` function now accepts a
`shakmaty::uci::Uci` rather than a UCI string. The order of the function arguments
is also changed.
- ctg: new type `Ctg::Nag` to abstract CTG NAG (Numeric Annotation Glyph) entries
- obk entries with zero weight, are now assigned weight `1` during PolyGlot
conversion to prevent skipping these entries. We plan to make this
user-configurable in the future.
- use the standard `IsTerminal` trait, and drop the dependency on `is_terminal`
crate
- use the standard `to_le_bytes()`, `to_be_bytes()`, and drop the dependency on
`byteorder` crate
- hash learned `-e=<MODE>`, `--enpassant-mode=<MODE>` to select en-passant mode when
to include the en-passant square in Zobrist hash calculation.
- important fix for encoding castling moves during book conversions to the
PolyGlot format. See details in the respective issue
- **breaking change**: `polyglot.from_uci` now expects a `bool` argument
to correctly encode castling positions by determining whether the king to move is
on their starting square.
- bump minimum supported Rust version (MSRV) from `1.64` to `1.70` due to `shakmaty` bump
- drop dependency on the unmaintained and insecure `chrono` crate
- `PolyGlotBook` has two new public functions: `find_book_key`, and
`read_book_entry`
- hash learned `-x`, `--hex` to print hash as hexadecimal rather than decimal
- upgrade `pgn-reader` crate from `0.24` to `0.25`
- upgrade `shakmaty` crate from `0.25` to `0.26`
- optimize various ctg functions
- info learned to print the total number of positions for `ctg` opening books
- **breaking change**: `CtgBook::num_entries` has been renamed to `total_positions`.
- **breaking change**: `CtgBook::total_pages` function accepts a reference to
`self`, rather than consuming `self`. The return type is `usize` now.
- **breaking change**: Drop `close` functions of `CtgBook` and `PolyGlotBook`,
improve `CtgBook` to close the cto file immediately after open, and keep a `File`,
rather than an `Option<File>` in `CtgBook`. The now unused `path` element of
`CtgBook` is also dropped.
- find: simplify & optimize polyglot entry lookup
- display license, version and author information in "--help" output
- set minimum supported Rust version (MSRV) to "1.64" as determined by "cargo-msrv"
- fix a bug which caused key and epd to be displayed incorrectly in editor screen
- edit: implement "--rescale" for PolyGlot books. When specified, edit will rescale
weights of all entries in the book, rather than a single entry. This is useful to
quickly correct/optimize PolyGlot books which were generated without weight scaling.
- fix build with "i18n" feature disabled
- upgrade "tempfile" crate from `3.5` to `3.6`
- upgrade "rust-embed" crate from 6.6 to 6.7
- upgrade "once_cell" crate from 1.17 to 1.18
- upgrade "ctrlc" crate from 3.3 to 3.4
- important fix CTG to Polyglot weight conversion which caused all entries in
CTG books with missing performance information to be skipped from the output
PolyGlot book. Read more about it in the respective
issue
.
- make learned "-H`, `--hashcode" argument to skip duplicate games based on the
HashCode PGN tag. PGN files may be tagged using "pgn-extract --addhashcode"
Caissa-AI, Caissa-Test, and Caissa-X on LiChess
ChessWoB: Chess without Boundaries
jja: Jin, Jîyan, Azadî!
Follow @alip on Mastodon!
User avatar
alpltl
Posts: 57
Joined: Tue Mar 14, 2023 3:04 pm
Location: Berlin
Full name: Ali Polatel

Re: jja: convert CTG books to PolyGlot format (and more!)

Post by alpltl »

Small note, for those who are interested I am microblogging my progress on jja, opening book generation, and other chess programming experiments on my Mastodon here. Freel free to follow if you find it interesting.
Caissa-AI, Caissa-Test, and Caissa-X on LiChess
ChessWoB: Chess without Boundaries
jja: Jin, Jîyan, Azadî!
Follow @alip on Mastodon!