Testing for Move Ordering Improvements

Cheney · Post by **Cheney** » Sat Mar 25, 2017 4:16 pm

Hello

I am wondering what is the best way to test fine tuning of move ordering?

I follow a traditional approach in this order:
* Hash
* Captures
* Killers
* Silent moves:
** Promotions
** Special Pawns
** Checking moves
** Castling
** History moves

I do keep a percentage stat on my killers and how often they cause a BETA. I used to keep a stat on how often the PV move is in the top 10% of moves and specifically the hash move.

When I reorder some of the silent moves, the stats hardly change. I believe that as I progress deeper into the area of ordering, many nodes, for example, do not have promotions and even when they do they are not necessarily always the best PV move. This should apply to each type of move I am ordering.

I have added ordering for passed pawns and even pawn pushes to 7th and 6th ranks when in the "silent move" category (the special pawns I refer to above). The question is how to truly identify if it is working there.

I have tested a set depth perspective on several test positions. The time to reach that depth with the ordered special pawn moves is +/- 5ms but the node count/moves play has increased. So, although moves per second increased, I do not really think the engine is any faster. Or am I wrong here?

I am thinking about tracking the frequency at which these moves are encountered and if they are the PV or CUT move. I already have the structured code, just need to do it

Is my line of thinking correct? Or, is there something else I can track or look at to prove out I am ordering moves optimally for both CUT and PV with any of the silent moves? Maybe I should not bother with ordering the special pawn moves (or others), but I am doing this to also help flag the moves when I am ready to implement extensions and futility pruning.

Thank you

cetormenter · Post by **cetormenter** » Sat Mar 25, 2017 6:00 pm

For move ordering in Nirvana I do,

Hash,
Capture promotes,
Promotes,
Winning (see > 0) captures,
Killers,
Castle moves,
History ordered moves,
Losing captures

Move ordering is very important because there are a lot of ways to "double dip" with it. I have tried collecting statistics on cut off but it's pretty hard to get unbiased results as they are all based off your current move ordering. The best way is to just test it out in games. It's hard to get a repsentative set of positions as a test suite.

Ferdy · Post by **Ferdy** » Sun Mar 26, 2017 8:19 am

Cheney wrote:Hello

I am wondering what is the best way to test fine tuning of move ordering?

I follow a traditional approach in this order:
* Hash
* Captures
* Killers
* Silent moves:
** Promotions
** Special Pawns
** Checking moves
** Castling
** History moves

I do keep a percentage stat on my killers and how often they cause a BETA. I used to keep a stat on how often the PV move is in the top 10% of moves and specifically the hash move.

When I reorder some of the silent moves, the stats hardly change. I believe that as I progress deeper into the area of ordering, many nodes, for example, do not have promotions and even when they do they are not necessarily always the best PV move. This should apply to each type of move I am ordering.

I have added ordering for passed pawns and even pawn pushes to 7th and 6th ranks when in the "silent move" category (the special pawns I refer to above). The question is how to truly identify if it is working there.

I have tested a set depth perspective on several test positions. The time to reach that depth with the ordered special pawn moves is +/- 5ms but the node count/moves play has increased. So, although moves per second increased, I do not really think the engine is any faster. Or am I wrong here?

I am thinking about tracking the frequency at which these moves are encountered and if they are the PV or CUT move. I already have the structured code, just need to do it

Is my line of thinking correct? Or, is there something else I can track or look at to prove out I am ordering moves optimally for both CUT and PV with any of the silent moves? Maybe I should not bother with ordering the special pawn moves (or others), but I am doing this to also help flag the moves when I am ready to implement extensions and futility pruning.

Thank you

Testing by games is of course the best approach to measure effectiveness of move ordering. But I have other suggestion. Create a test suite of positions and measure the correct moves it finds and the time it takes to find the best move. Something like a depth limited test. So you have number of correct moves it find and the total time it takes to find those moves based on depth limit. To improve this system the test position is of type STS where there are more than one moves with points so the engine will now be compared with the points it gets. Even if bm is not found there are other moves where the engine can get points.

Code: Select all

2q2rk1/1bpnbppp/1p1pp3/r7/p1PPPB1P/2PQ1NP1/P4PB1/3RR1K1 w - - acd 27; bm Nh2; eco "E16"; Ae "asmFishW_2017-03-15_popcnt"; c6 "20 16 14"; c7 "Nh2 Bh3 Be3"; c8 "100 63 45"; c9 "f3h2 g2h3 f4e3";

The c8 opcode contains the points that the engine may get depending on the bestmove it considered after searching to a fixed depth.
When comparing engine perf, the first criteria to consider is the total points it gets since what is the point of improvement if it could not find best moves in a position, then the time it takes to finished the test as an engine with good move ordering is generally faster.
Perhaps start at 1000 test positions or even the sts positions can be used here already. This system is intended for comparing same engine with the only difference being the move ordering scheme.
Writing a script to do this thing is not that difficult, but selecting test positions would take more time. For example if you have move ordering for passers as you mentioned, you need to find a test positions with passer as one of the top best moves. You can then examine the result for that test position to see if it find the top best moves and how fast. Then do the final actual (more) game tests to the scheme which looked promising from this kind of test suite.

I will start creating the script to know how this would worked out.

hgm · Post by **hgm** » Sun Mar 26, 2017 11:09 am

Be sure to gather statictics on this for each remaining depth separately; nodes close to the root have entirely different stats than those near the leaves, but when you lump everything together, you basically only see what happens in the leaves.

In deep searches hash move or null move should have a very high cut rate. When you often need to search very many moves here before you find the cut-move, it suggests a failure of your IID scheme. (Note that with IID you should group the stats with the depth of the iteration, not of the ultimate depth of the node.)

Very shallow searches often do not have a hash move, and then the rest of your move ordering starts to become important. Note that the ordering only has an effect in cut-nodes, and that the dominant task of those in an alpha-beta search is to quickly and decisively refute all the idiocies that are tried in the all-node parent. The opponent is offering his material to us there all the time, and when doing nothing (i.e. null move) is not an option, grabbing the fattest piece on offer usually is good enough to subdue him for another while. Only very rarely he poses a serious threat without first accumulating so many hanging pieces that you have to worry about the actual threat. A counter-move heuristic for non-captures might be helpful here.

What I once did was have the engine print all positions with a very late cut move to a file. To see if the cut-move would seem obvious to a human, and imagine what would be needed to have the engine understand that too.

petero2 · Post by **petero2** » Sun Mar 26, 2017 11:49 am

hgm wrote:Note that the ordering only has an effect in cut-nodes

This is true for a full-width search algorithm but when considering LMR the situation is more complicated.

The move ordering can in that case change the node type. If move ordering causes an important move to be searched later in the move list it may be reduced so much that the thing that makes the move important is not seen by the search. This can change the calculated score of the move in either direction, depending on if "the thing" is good or bad for the current side.

LMR will typically redo the search with full depth if the reduced depth search did not fail low, but the case where the reduced depth search fails low but a full depth search would not have failed low is typically not detected.

This phenomenon can cause an engine to not "see" the best move for a very long time during analysis, but if you "show" the move to the engine by playing it on the board, the engine will quickly see that the move is actually best.

hgm · Post by **hgm** » Sun Mar 26, 2017 12:50 pm

Indeed, if you use the ordering also to determine reduction, it becomes another matter. But in principle these are not the same. "Has a high probability to fail high" is something completely independent from "needs a large depth to see the merit".

So perhaps it would be better to maintain two history scores: one updated by low-depth searches, one updated by high-depth searches. Moves with a much better 'deep history' than 'shallow history' (relative to the maximum history score in the respective tables) should not be reduced. The best history of the two should decide the order.

So it would be perfectly possible that an earlier late move is searched with a reduced depth, because its shallow history is pretty good, and a move with a somewhat lower deep-history score (and therefore searched later) is not reduced, because it had a totally appalling shallow-history score. Of course it would be beneficial in general to postpone deeper searches to the end of the list, if they do not have a significantly larger probability to fail high than a shallower search, in the hope that a cutoff by one of the shallower moves makes the expensive deep search unnecessary.

Ferdy · Post by **Ferdy** » Sun Mar 26, 2017 6:08 pm

Tried this test suite approach similar to STS scoring but with added time measurement and fixed depth search.

v353 has promotion ahead of killers
v354 has the killer ahead of promotion

A. Test suite result using 1500 pos from STS

Code: Select all

Test File    &#58; STS1-STS15_LAN_v3.epd
Total Pos    &#58; 1500
Search Depth &#58; 12

Engine                      Pts   MaxPts  Pts&#40;%)  Time&#40;ms&#41;
Deuterium v2017.1.35.353  10626    15000   70.84    241799
Deuterium v2017.1.35.354  10658    15000   71.05    230962

The new v354 is better in both pts and time.

B. Actual game test at TC 15s + 0.1s inc/move after 2000 games.

Code: Select all

Score of D354 vs D353&#58; 519 - 522 - 959 &#91;0.499&#93; 2000
Elo difference&#58; -0.52 +/- 10.97
SPRT&#58; llr -0.321, lbound -2.94, ubound 2.94

This need more games.
It would also be interesting if the test sets are increased, since this is only a fixed depth 12 search which is fast to complete. And would also be interesting to try different depth searches.

I will upload the script to github later.

flok · Post by **flok** » Sun Mar 26, 2017 8:27 pm

I'm surprised of all the specific testing that is done. I thought that for movesorting it would be enough to check the TTD, time-to-depth?

cdani · Post by **cdani** » Sun Mar 26, 2017 10:15 pm

flok wrote:I'm surprised of all the specific testing that is done. I thought that for movesorting it would be enough to check the TTD, time-to-depth?

Not at all! Every bit of change must be tested to death with a lot of games

hgm · Post by **hgm** » Sun Mar 26, 2017 10:27 pm

Not if they only affect speed.

Testing for Move Ordering Improvements

Testing for Move Ordering Improvements

Re: Testing for Move Ordering Improvements

Re: Testing for Move Ordering Improvements

Re: Testing for Move Ordering Improvements

Re: Testing for Move Ordering Improvements

Re: Testing for Move Ordering Improvements

Re: Testing for Move Ordering Improvements

Re: Testing for Move Ordering Improvements

Re: Testing for Move Ordering Improvements

Re: Testing for Move Ordering Improvements