Algorithm for Performing a Position Search in a Database

mcostalba · Post by **mcostalba** » Wed Jan 18, 2012 8:41 am

Dave_N wrote:I think having a setting for number of moves distance from the position solves most of the problems, and then simply scanning the database for matching hash entries

I suggest you to go in the other direction: instead of scanning the DB for "similar" positions it is better you first compute a set of hash keys of what you call "similar positions" whatever you may want to define it, for instance you can do a depth 2 search from you current position and retrieve all the keys of the nodes found. Than you look up the DB for that keys.

A better approach IMHO is to do a normal deep search (say for 30 seconds) from your root position with, for instance, multi pv set to 5, then at the end collect just the PV lines up to depth 3-4 or whatever you may want. In this way you avoid looking up the DB for silly / unrealistic positions and just grab what is sensible.

You need to modify an existing engine to do the above. But IMHO is the best approach.

Dave_N · Post by **Dave_N** » Wed Jan 18, 2012 9:08 am

mcostalba wrote:
Dave_N wrote:I think having a setting for number of moves distance from the position solves most of the problems, and then simply scanning the database for matching hash entries
I suggest you to go in the other direction: instead of scanning the DB for "similar" positions it is better you first compute a set of hash keys of what you call "similar positions" whatever you may want to define it, for instance you can do a depth 2 search from you current position and retrieve all the keys of the nodes found. Than you look up the DB for that keys.

A better approach IMHO is to do a normal deep search (say for 30 seconds) from your root position with, for instance, multi pv set to 5, then at the end collect just the PV lines up to depth 3-4 or whatever you may want. In this way you avoid looking up the DB for silly / unrealistic positions and just grab what is sensible.

You need to modify an existing engine to do the above. But IMHO is the best approach.

sounds interesting, I would probably modify an engine to keep more than just the PV's and to keep some of the options (after all they are Mpv options I guess). Here is an example ...

Code: Select all


root -
           line 1 -
                 line 1a -
                 line 1b -
                 line 1c -
           line 2 -
                 line 2a -
                 line 2b -
                 ... etc
           line 3 -
                 ... etc

hgm · Post by **hgm** » Wed Jan 18, 2012 9:31 am

For WinBoard's position search I dropped the idea of using hash keys to speed up the process pretty quickly. The reason is that it really only works satisfactorily for exact matches, and that most of the search modes need fuzziness that is very hard to implement with hash keys.

Also, when looking for a match, the hash keys hardly offered an advantage: when you use a much larger representation, like a nibble board, or even a byte board, it will exceed the word length of the computer, and thus require multiple compare instruction to establish they are equal. But equality will be a rare case, and virtually all positions will be rejected at the first compare. So it does not really matter how large the representation is, and how many compares it would take to do the full comparison, if the first compare acts as a filter that rejects 99% of the cases immediately.

You could, for instance, use a bitboard representation, where the first board indicates Pawns. Then any position with a different Pawn structure would be rejected after the first 64-bit compare. And looking for an exact Pawn structure, but fuzzy pieces, is a much more common user request than looking for fuzzy Pawn structure, so the chances the user will request something that renders this 'first filter' ineffective is quite small. In a bitboard representation you can also easily mask out what you are not interested in; this makes it easy to implement another very common search mode, to find a position with all the given pieces on their current squares, and possibly extra material. You can then simply mask out all squares that were empty in the requested position, before doing the comparisons.

As the purpose of WinBoard was not to make the fastest-ever position search (the requirement that it should work for all board sizes and with 22 piece types would not be able to beat one tailored for 8x8 boards with only 6 piece types anyway), I did not even bother with all that. I just compare by looping over a mailbox board, having a mask board that tells which squares to ignore. This has the advantage that board update when scanning through the games becomes entirely trivial. I profiled the resulting code, and the total time was not heavily dominated by the compare routine. So no spectacular gain would be possible by going to a more complex representation, and any gain would at least have been partly offset by more complex board updating.

So what I do is simply keep a mailbox board and piece-type counts, and compare those to the sought position, with the possibility to mask what you don't want to see. I use different comparison routines for the various search modes, as some of them require numeric comparison of the piece counts, rather than just ignoring them or requiring an exact match. This made it quite easy to implement all WinBoard's search modes:

1 - Exact match on the entire position
2 - Match any position that can be made from the given one by placing extra pieces
3 - Match the Pawn structure, with the same piece material anywhere on the board
4 - Find positions with exactly the same material
5 - Find positions with the specified material, and possible extra material up to given maxima for each piece type, whether this extra material must be balanced (i.e. same for black and white)
6 - Same as 5, but the extra material can also be imbalanced.

For 4-6 I only have to pay attention to the piece counts.

Don · Post by **Don** » Sun Jan 22, 2012 4:12 pm

mcostalba wrote:
Dave_N wrote:I think having a setting for number of moves distance from the position solves most of the problems, and then simply scanning the database for matching hash entries
I suggest you to go in the other direction: instead of scanning the DB for "similar" positions it is better you first compute a set of hash keys of what you call "similar positions" whatever you may want to define it, for instance you can do a depth 2 search from you current position and retrieve all the keys of the nodes found. Than you look up the DB for that keys.

A better approach IMHO is to do a normal deep search (say for 30 seconds) from your root position with, for instance, multi pv set to 5, then at the end collect just the PV lines up to depth 3-4 or whatever you may want. In this way you avoid looking up the DB for silly / unrealistic positions and just grab what is sensible.

You need to modify an existing engine to do the above. But IMHO is the best approach.

Awesome idea!

Algorithm for Performing a Position Search in a Database

Re: Algorithm for Performing a Position Search in a Database

Re: Algorithm for Performing a Position Search in a Database

Re: Algorithm for Performing a Position Search in a Database

Re: Algorithm for Performing a Position Search in a Database