FEN compression

lucasart · Post by **lucasart** » Wed Mar 17, 2021 1:35 am

I was looking at the NNUE training code for compressing FEN, which uses Huffman encoding for pieces. See PackedSfen in https://github.com/glinscott/nnue-pytor ... _formats.h

But I think we can do better than this, using pdep (for compression) and pext (for decompression).

Compression algorithm

Code: Select all

itemname                   | bits           | definition
--------------------------------------------|-----------
occupied                   | 64             |                  
pdep(pawns, occupied)      | cnt(occupied)  | 
pdep(knights, remaining)   | cnt(remaining) | remaining = occupied & ~pawns
pdep(bishops, remaining)   | cnt(remaining) | remaining &= ~bishops
pdep(rooks, remaining)     | cnt(remaining) | remaining &= ~rooks        
pdep(queens, remaining)    | cnt(remaining) | remaining &= ~queens
kings = remaining          | 0              |
turn                       | 1              |
ep                         | 1              |
pdep(epSquare, candidates) | cnt(candidates)| candidates = ((white & pawns & rank4) << 8) | ((black & pawns & rank6) >> 8)
pdep(castleRooks, rooks)   | cnt(rooks)     | castleRooks = rooks with associated castling rights (any color)
rule50                     | 7              |
fullMove                   | 10             | useless: conveys no position information

Note the use of castleRooks which handles Chess960 nicely (and Chess as a particular case of Chess960).

It's difficult to put upper bounds on these cnt(). And those will tend to be way higher than actual averages on real samples. Only implementing it and compressing large random FEN collections will tell the practical compression ratio.

Decompression algorithm
Left as an exercise to the reader

Is this useful? Absolutely not. But it's fun!

mar · Post by **mar** » Wed Mar 17, 2021 2:31 am

I don't know about PDEP/PEXT but from the docs it seems you actually want pext for compression and pdep for decompression?
Nice idea - the point is the trainer compresses FENs this way in memory I suppose.
Where do you handle (piece) color? I guess you could easily encode say black pieces in a similar way of what you already do.

Your PEXT/PDEP should be faster, because decompressing Huffman bit by bit is not the best idea ever; even though most data will be empty squares.
One could probably do better by using a simple LUT, say grab byte, lookup, store symbols, return excess bits to accumulator, repeat.
Maybe even multiple symbols at once might work, but would it be worth the effort and the extra complexity? probably not, assuming it's not really about performance.

AndrewGrant · Post by **AndrewGrant** » Wed Mar 17, 2021 2:39 am

Not nearly as good as yours, although trivial to decompress and matches my internal use during training. Load times are pretty fast, to the point where speeding it up is a non-factor for me, but I imagine your scheme would be faster, trading off disk reads for decompression.

Code: Select all

Occupancy     : 8-bytes
Evaluation    : 2-bytes
Turn          : 1-bytes
White King Sq : 1-bytes
Black King Sq : 1-bytes
Piece Count   : 1-bytes
Packed Pieces : 1-bytes per two pieces (16 max, 12 average)

= Average of 26-bytes, max of 30-bytes.

Code: Select all

    while (fgets(line, 256, fin) != NULL) {

        boardFromFEN(board, line, 0);

        char *tail   = strstr(line, "] ");
        int16_t eval = atoi(tail + strlen("] "));
        int8_t turn  = board->turn;

        uint64_t white  = board->colours[WHITE];
        uint64_t black  = board->colours[BLACK];
        uint64_t pieces = white | black;

        uint8_t count = popcount(pieces);
        uint8_t wksq  = getlsb(white & board->pieces[KING]);
        uint8_t bksq  = getlsb(black & board->pieces[KING]);

        uint8_t types[32] = {0};
        uint8_t packed[16] = {0};

        fwrite(&pieces, sizeof(uint64_t), 1, fout);
        fwrite(&eval,   sizeof(int16_t ), 1, fout);
        fwrite(&turn,   sizeof(uint8_t ), 1, fout);
        fwrite(&wksq,   sizeof(uint8_t ), 1, fout);
        fwrite(&bksq,   sizeof(uint8_t ), 1, fout);
        fwrite(&count,  sizeof(uint8_t ), 1, fout);

        for (int i = 0; pieces; i++) {
            int sq = poplsb(&pieces);
            types[i] = encode_piece(board->squares[sq]);
        }

        for (int i = 0; i < 16; i++)
            packed[i] = pack_pieces(types[i*2], types[i*2+1]);

        fwrite(packed, sizeof(uint8_t), (count + 1) / 2, fout);
    }

Guess I'll add that I have Ryzens, so PEXT/PDEP are kinda no-gos

lucasart · Post by **lucasart** » Wed Mar 17, 2021 4:07 am

mar wrote: ↑Wed Mar 17, 2021 2:31 am I don't know about PDEP/PEXT but from the docs it seems you actually want pext for compression and pdep for decompression?
Nice idea - the point is the trainer compresses FENs this way in memory I suppose.
Where do you handle (piece) color? I guess you could easily encode say black pieces in a similar way of what you already do.

Good points. Forgot color and inverted pdep/pext. For color, we just need cnt(occupied) bits to store pext(white, occupied) (and deduce black as occupied & ~white).

Corrected version

Code: Select all

itemname                   | bits           | definition
--------------------------------------------|-----------
occupied                   | 64             |                  
pext(white,occupied)       | cnt(occupied)  | 
black = occupied & ~white  | 0              |
pext(pawns, occupied)      | cnt(occupied)  | 
pext(knights, remaining)   | cnt(remaining) | remaining = occupied & ~pawns
pext(bishops, remaining)   | cnt(remaining) | remaining &= ~bishops
pext(rooks, remaining)     | cnt(remaining) | remaining &= ~rooks        
pext(queens, remaining)    | cnt(remaining) | remaining &= ~queens
kings = remaining          | 0              |
turn                       | 1              |
ep                         | 1              |
pext(epSquare, candidates) | cnt(candidates)| candidates = ((white & pawns & rank4) << 8) | ((black & pawns & rank6) >> 8)
pext(castleRooks, rooks)   | cnt(rooks)     | castleRooks = rooks with associated castling rights (any color)
rule50                     | 7              |
fullMove                   | 10             | useless: conveys no position information

lucasart · Post by **lucasart** » Wed Mar 17, 2021 5:04 am

AndrewGrant wrote: ↑Wed Mar 17, 2021 2:39 am Not nearly as good as yours, although trivial to decompress and matches my internal use during training. Load times are pretty fast, to the point where speeding it up is a non-factor for me, but I imagine your scheme would be faster, trading off disk reads for decompression.

Code: Select all

Occupancy     : 8-bytes
Evaluation    : 2-bytes
Turn          : 1-bytes
White King Sq : 1-bytes
Black King Sq : 1-bytes
Piece Count   : 1-bytes
Packed Pieces : 1-bytes per two pieces (16 max, 12 average)

= Average of 26-bytes, max of 30-bytes.

Code: Select all

    while (fgets(line, 256, fin) != NULL) {

        boardFromFEN(board, line, 0);

        char *tail   = strstr(line, "] ");
        int16_t eval = atoi(tail + strlen("] "));
        int8_t turn  = board->turn;

        uint64_t white  = board->colours[WHITE];
        uint64_t black  = board->colours[BLACK];
        uint64_t pieces = white | black;

        uint8_t count = popcount(pieces);
        uint8_t wksq  = getlsb(white & board->pieces[KING]);
        uint8_t bksq  = getlsb(black & board->pieces[KING]);

        uint8_t types[32] = {0};
        uint8_t packed[16] = {0};

        fwrite(&pieces, sizeof(uint64_t), 1, fout);
        fwrite(&eval,   sizeof(int16_t ), 1, fout);
        fwrite(&turn,   sizeof(uint8_t ), 1, fout);
        fwrite(&wksq,   sizeof(uint8_t ), 1, fout);
        fwrite(&bksq,   sizeof(uint8_t ), 1, fout);
        fwrite(&count,  sizeof(uint8_t ), 1, fout);

        for (int i = 0; pieces; i++) {
            int sq = poplsb(&pieces);
            types[i] = encode_piece(board->squares[sq]);
        }

        for (int i = 0; i < 16; i++)
            packed[i] = pack_pieces(types[i*2], types[i*2+1]);

        fwrite(packed, sizeof(uint8_t), (count + 1) / 2, fout);
    }

Guess I'll add that I have Ryzens, so PEXT/PDEP are kinda no-gos

This is already better, and simpler, than the one from NNUE folks. And yes, pext/pdep are a massive pain to compute in software. Given that Intel is being obsoleted by AMD, and x86 is also being obsoleted by ARM, building the future on pext/pdep is probably not such a good idea…

Dann Corbit · Post by **Dann Corbit** » Wed Mar 17, 2021 5:58 am

I believe Uri Blass wrote a FEN compressor that squeezes a fen below 165 bits

lucasart · Post by **lucasart** » Wed Mar 17, 2021 6:37 am

Dann Corbit wrote: ↑Wed Mar 17, 2021 5:58 am I believe Uri Blass wrote a FEN compressor that squeezes a fen below 165 bits

I ran the experience with my pext algorithm, playing 100 games (demolito vs. itself at depth=8), and computing the average compressed size for all encountered positions. I ran it twice: with and without adjudication (no adjudication is favorable, of course, as games end in long sequence of positions that contain little information with few pieces):
* no adjudication: avg = 138 bits
* resign (3 moves 700bp) + draw (8 moves 10cp): avg = 142 bits

Beat that

PS: In fact, it's even 10 bits less than stated above, once you remove the fullMove, which is totally useless (it's just an annotation, contains zero position information relevant to tuning an eval for example).

AndrewGrant · Post by **AndrewGrant** » Wed Mar 17, 2021 10:15 am

lucasart wrote: ↑Wed Mar 17, 2021 5:04 am
AndrewGrant wrote: ↑Wed Mar 17, 2021 2:39 am
Code: Select all
Occupancy     : 8-bytes
Evaluation    : 2-bytes
Turn          : 1-bytes
White King Sq : 1-bytes
Black King Sq : 1-bytes
Piece Count   : 1-bytes
Packed Pieces : 1-bytes per two pieces (16 max, 12 average)

= Average of 26-bytes, max of 30-bytes.
This is already better, and simpler, than the one from NNUE folks. And yes, pext/pdep are a massive pain to compute in software. Given that Intel is being obsoleted by AMD, and x86 is also being obsoleted by ARM, building the future on pext/pdep is probably not such a good idea…

Really? I looked at that file but its a few thousand lines and it was not immediately clear where to look for the compression scheme.
Do you know off hand if the Stockfish trainer continually reloads the data back into memory, or if it is a one-time process?

lucasart · Post by **lucasart** » Wed Mar 17, 2021 2:11 pm

Actually, it can be improved even further, because the candidate squares for en-passant are only of the opposite(turn).

Code: Select all

itemname                   | bits           | definition
--------------------------------------------|-----------
occupied                   | 64             |                  
pext(white,occupied)       | cnt(occupied)  | 
black = occupied & ~white  | 0              |
pext(pawns, occupied)      | cnt(occupied)  | 
pext(knights, remaining)   | cnt(remaining) | remaining = occupied & ~pawns
pext(bishops, remaining)   | cnt(remaining) | remaining &= ~bishops
pext(rooks, remaining)     | cnt(remaining) | remaining &= ~rooks        
pext(queens, remaining)    | cnt(remaining) | remaining &= ~queens
kings = remaining          | 0              |
turn                       | 1              |
ep                         | 1              |
pext(epSquare, candidates) | cnt(candidates)| candidates = turn == black ? ((white & pawns & rank4) >> 8) : ((black & pawns & rank6) << 8)
pext(castleRooks, rooks)   | cnt(rooks)     | castleRooks = rooks with associated castling rights (any color)
rule50                     | 7              |
fullMove                   | 10             | useless: conveys no position information

This will probably average 1 bit lower, or 124 bits with adjudication and 120 without, once you exclude the non-position part (rule50, fullMove).

gladius · Post by **gladius** » Wed Mar 17, 2021 4:14 pm

AndrewGrant wrote: ↑Wed Mar 17, 2021 10:15 am
lucasart wrote: ↑Wed Mar 17, 2021 5:04 am
AndrewGrant wrote: ↑Wed Mar 17, 2021 2:39 am
Code: Select all
Occupancy     : 8-bytes
Evaluation    : 2-bytes
Turn          : 1-bytes
White King Sq : 1-bytes
Black King Sq : 1-bytes
Piece Count   : 1-bytes
Packed Pieces : 1-bytes per two pieces (16 max, 12 average)

= Average of 26-bytes, max of 30-bytes.
This is already better, and simpler, than the one from NNUE folks. And yes, pext/pdep are a massive pain to compute in software. Given that Intel is being obsoleted by AMD, and x86 is also being obsoleted by ARM, building the future on pext/pdep is probably not such a good idea…
Really? I looked at that file but its a few thousand lines and it was not immediately clear where to look for the compression scheme.
Do you know off hand if the Stockfish trainer continually reloads the data back into memory, or if it is a one-time process?

It continually loads the data from disk. Each batch is thrown away after being used.

FEN compression

FEN compression

Re: FEN compression

Re: FEN compression

Re: FEN compression

Re: FEN compression

Re: FEN compression

Re: FEN compression

Re: FEN compression

Re: FEN compression

Re: FEN compression