Binary Sort

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
Rebel
Posts: 7299
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Binary Sort

Post by Rebel »

I browsed the internet for finding an util that can sort a binary file with a key > 64 bits and found nothing useful and so I wrote one myself. It's slow but it works. It's slow because every time the binary keys are converted to a string first before a comparison with strcmp can be made.

The example code checks sfen *.binpack files on double positions, in the binpack format the position is stored in 256 bytes and thus the sort key has the extreme size of 32 bytes.

The sfen_quicksort() routine is based on an article I once found in a magazine decades before the internet which I saved and so now and then comes in handy.

https://github.com/egh-s/Bsort

Download at - https://github.com/egh-s/Bsort/releases/tag/Bsort
90% of coding is debugging, the other 10% is writing bugs.
Sesse
Posts: 300
Joined: Mon Apr 30, 2018 11:51 pm

Re: Binary Sort

Post by Sesse »

Use memcmp() instead of strcmp().
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Binary Sort

Post by dangi12012 »

Rebel wrote: Thu Nov 18, 2021 10:32 am I browsed the internet for finding an util that can sort a binary file with a key > 64 bits and found nothing useful and so I wrote one myself. It's slow but it works. It's slow because every time the binary keys are converted to a string first before a comparison with strcmp can be made.

The example code checks sfen *.binpack files on double positions, in the binpack format the position is stored in 256 bytes and thus the sort key has the extreme size of 32 bytes.

The sfen_quicksort() routine is based on an article I once found in a magazine decades before the internet which I saved and so now and then comes in handy.

https://github.com/egh-s/Bsort

Download at - https://github.com/egh-s/Bsort/releases/tag/Bsort
Thats because you dont need to have a util or library for that...
Have a class and overload the < operator and you can std::sort instantly. The C++ STL does the job for you.

Code: Select all

struct uint256{
     uint64_t a,b,c,d;
     bool operator < (const uint64_t &x) const
     {
       if (a < x.a) return true;
       if (b < x.b) return true;
       if (c < x.c) return true;
       if (d < x.d) return true;
       return false;
     }
}
You can even sort the file inplace if you memmap the file...
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Joost Buijs
Posts: 1632
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Binary Sort

Post by Joost Buijs »

For my NN experiments I use a 64 bit hash key which seems to be more than enough to distinguish between positions. I checked it several times with more than 2 billion positions by comparing the hash keys with the positions and I never found a single collision. Even when there are collisions their number is so small that it won't have any impact on training a network.

I remove doubles by storing the positions and scores in a binary tree with the hashes as key, it takes a little bit more memory but with 128GB RAM it is still manageable.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Binary Sort

Post by dangi12012 »

Joost Buijs wrote: Thu Nov 18, 2021 12:07 pm For my NN experiments I use a 64 bit hash key which seems to be more than enough to distinguish between positions. I checked it several times with more than 2 billion positions by comparing the hash keys with the positions and I never found a single collision. Even when there are collisions their number is so small that it won't have any impact on training a network.

I remove doubles by storing the positions and scores in a binary tree with the hashes as key, it takes a little bit more memory but with 128GB RAM it is still manageable.
If I read https://github.com/official-stockfish/S ... 7/src/tt.h
I can see they are doing clustering and reduced the key to 16 bits.
But maybe I read it wrong?
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Joost Buijs
Posts: 1632
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Binary Sort

Post by Joost Buijs »

Yeah, Stockfish seems to use a 16 bit key for the TT. I never looked at it, so I assume together with the index they will have more bits. Somehow collisions don't seem to hurt, but I don't like it at all.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Binary Sort

Post by dangi12012 »

Joost Buijs wrote: Thu Nov 18, 2021 12:24 pm Yeah, Stockfish seems to use a 16 bit key for the TT. I never looked at it, so I assume together with the index they will have more bits. Somehow collisions don't seem to hurt, but I don't like it at all.
I think its a good idea to do a clustered 32 TT (which they had last time I looked) because games that are 4gb apart normally dont have intersections anymore. (So its even better than a single big map for a 64 bit TT).
I am sure they use the other bits too since they only have 64 bit hashes anyway. What they can do is use multiplyupper and multiplylower which is faster than a full 64 bit multiply - because they split into 32 bits and less.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
hgm
Posts: 28353
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Binary Sort

Post by hgm »

Why would they ever need to multiply? I never looked at their code, but I suppose they simply use as many bits as needed (by ANDing with a mask) to index the TT (which won't be more than 48), and store 16 other bits as signature to distinguish entries that map to the same bucket index-wise.
mar
Posts: 2655
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Binary Sort

Post by mar »

dangi12012 wrote: Thu Nov 18, 2021 12:02 pm

Code: Select all

struct uint256{
     uint64_t a,b,c,d;
     bool operator < (const uint64_t &x) const
     {
       if (a < x.a) return true;
       if (b < x.b) return true;
       if (c < x.c) return true;
       if (d < x.d) return true;
       return false;
     }
}
this code is plain wrong as you don't handle inequality on particular places
{2,1,2,3} compares less than {1,1,2,4}, for example
User avatar
hgm
Posts: 28353
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Binary Sort

Post by hgm »

Code: Select all

struct uint256{
     uint64_t a,b,c,d;
     bool operator < (const uint64_t &x) const
     {
       if (a < x.a) return true;
       if (a > x.a) return false;
       if (b < x.b) return true;
       if (b > x.b) return false;
       if (c < x.c) return true;
       if (c > x.c) return false;
       if (d < x.d) return true;
       return false;
     }
}