speed up or avoiding move sorting

BeyondCritics · Post by **BeyondCritics** » Tue Mar 21, 2017 12:10 pm

Stockfish first tries hash moves, good captures, killers and "counter moves" and then then silent moves, which it does sort according to heuristic, static value.
At the last three plies stockfish considers only moves with heuristic eval > 0 (see https://github.com/official-stockfish/S ... k.cpp#L241). It uses std::partition to shuffle them to the front.
Curiously it uses insertion sort in any case.
The problem with insertion sort is, that it is Omega(n^2), although it is faster for very few items than plain quicksort. Typical tests with sorting of random numbers do show a break even point around 10 or 20 items.
This is fewer than the average move count of 30 moves and far fewer than the maximum move count of 216 moves.
Therefore i did a quick test at home to replace the custom insertion sort with a std::sort and compared the performance (using profile builds for each) and lo and behold i got an tiny throughput improvement of 1% on my machine. Likely not significant, but since std::sort is far more predictable i think it is preferable.
The more i ponder about it, the more i think that insertion sort is really a bad idea for the typical incremental development style that is used nowadays, since it punishes every increase of the branching factor sharply, destroying possible tiny improvements.
In the stockfish sources there is a questionable comment, praising the stability of insertion sort. This feature is currently not needed for stockfish.
The move and eval data structure of stockfish is not optimal for 64 bit compiles, there is a wastage of 50% memory throughput.
You could try to use selection sort in predicted CUT nodes, this is just an idea (since you asked).
You could try to use counting sort or radix sort, since these are linear. But this could lead to problems, if the distribution of your evals is very ragged.

hgm · Post by **hgm** » Tue Mar 21, 2017 1:18 pm

Well, it is questionable if there is any benefit in sorting very precisely on what at best can be a very rough estimate of the move merit. The benefit must come from searching moves that are obviously crappy to the back, and search the 10% that does have some merit before those. The order within those 10% is basically a pure gamble. So just scattering the moves over a number of 'merit bins', and treating the bins in order of descending merit, is a solution that is both trivial and fast.

With the possibility to make the number of merit bins much larger than the number of moves, this would almost give perfect sorting anyway, as almost no bin would contain more than a single move. This makes it possible to exploit the fact that you have to be prepared for the worst case w.r.t. the number of moves. If you typically have 40 moves, but 256 slots to store them, you could already scatter the moves over the slots when you first store them, based on their heuristic merit. If the merit measure would be an 8-bit value you could use it as the index of the move-list slot where you store it (or the next empty one, if it is already taken). Things would only get pathological in positions where you indeed have close to 216 moves.

brtzsnr · Post by **brtzsnr** » Tue Mar 21, 2017 5:52 pm

Nice! Using the mantisa to group floats into bins has been used for very fast sorting. I never though of using it here. I wrote once an implementation for sorting this way [1] (not my idea, just the implementation).

[1] http://codegolf.stackexchange.com/a/2787/32

brtzsnr · Post by **brtzsnr** » Tue Mar 21, 2017 6:10 pm

mar wrote:Anyway, move sorting is never the real bottleneck so unless you're really desperate there are better ways to gain elo.

In my engine evaluation takes 18% and move sorting takes 8%, roughly half. I now get between 2.1 Mnps in the mid game with queens on board and 3.3 Mnps towards the end with only pawns on board. With a cached evaluation, once you speed up the search internals you can get crazy speed with reasonable good eval. Move sorting seems like a more basic problem than eval feature engineering so I believe it should be easier to solve.

Otherwise the problems are orthogonal. I can squeeze Elo from both

hgm · Post by **hgm** » Tue Mar 21, 2017 6:20 pm

Well, reading a float as int is a cheap way to get the approximate base-2 logarithm. (I still remembered that from writing my own floating-point math library in 6809 machine code!

) And it seemed to me that it is the ratio of the history scores that decides how important it is to search one move before the other, rather than their absolute difference. So taking equally sized bins for the logarithm seems the logical thing to do.

Binning is a very fast first step to complete sorting, when the number of bins approaches the number of elements to sort, because the order between the bins is a given, and there are only few elements in each bin. The fraction of bins with multiple elements can be made very low by having more bins than elements. E.g. for homogeneously distributed data having 10 times more bins than elements has only 4.5% of the non-empty bins with 2 elements, and only 0.15% with 3.

At some point the time starts to be dominated by finding the non-empty bins amongst the empty ones, however. But by using bit extraction technigues you can directly extract the filled bins out of a group of 64, basically reducing the overhead due to the empty bins by a factor 64, racing through those 64 at the time. So the optimum shifts to having just a few populated bins per group of 64, making the probability for having more than two elements per bin vanishingly small. And since having many bins means having narrow bins, for this application you probably don't care about the order of the moves in the small fraction of the bins that do have multiple moves in them.

speed up or avoiding move sorting

Re: speed up or avoiding move sorting

Re: speed up or avoiding move sorting

Re: speed up or avoiding move sorting

Re: speed up or avoiding move sorting

Re: speed up or avoiding move sorting