Alternatives to History Heuristics

Edsel Apostol · Post by **Edsel Apostol** » Tue Sep 08, 2009 11:27 am

Edsel Apostol wrote:
mcostalba wrote:
mcostalba wrote:
Edsel Apostol wrote: I have this idea to use a new history table every fifth ply (subject to tuning) and to initialize it at every 5th first grandchild node of each root move and the succeeding 5th first grandchild from that 5th first grandchild node. This way a history table will only be used for 5 plies and it will only be used by that certain branch of the tree.
Yes, I had a _similar_ idea but still no time to test it.

Actually my idea is slightly different but at the same time _very_different in possible results.

If tests will be good I will post results.
Ok, you posted yours so it is correct I post mine:

It is similar to yours but instead of nodes "of each root move" I only consider PV nodes and of this node I only consider the first move, i.e. the PV move.

IOW I setup an history table for first move of root move list.

All the following moves will use this table.

When fifth ply is reached and we are in PV, then I setup another table for the _first_ move of the PV node: all the moves _below_ that move will use this new table.

In my scheme on a search of say 20 plies I have 5 tables only, on your scheme you have much more and I have doubts it can work

Note that in Stockfish (I don't know in your Twisted logic) history is weighted by depth, also if first main table will end up storing data of side branches at very high ply, this should not compromise history quality because their relative weight is very small.

My scheme also, handles nicely the asymmetry between the size of the PV move sub-tree and of the remaining ones (that are much smaller).

I think this scheme is what more is near to the concept of one history table per subtree and I have expectations it will work.
Your scheme is what I thought first before I came up with my idea. I've just realized now that my idea will only work if it is in a fixed single depth search. I forgot to take into account that it will be used in an iterative deepening framework.

The reason why I scrapped that first idea in favor of mine is that the fifth or tenth ply of the PV move may not be close positionally to the fifth or tenth ply of non PV nodes, therefore the contents of that history table from the PV node might not be a representative of the subtree on non PV nodes.

I think the most probable idea would be to use the number of pawns to index what history table to use. I came up with this idea because I think that a position with 5 pawns for white and 5 pawns for black for example will be close to other positions with similar number of pawns, therefore the history data from this positions will be very similar and their values can be trusted. This scheme will be independent of ply and depth so it will not be affected by deeper search.

I haven't tried my idea above yet but I think that the data that this scheme will produce can be used further in the game tree so it only needs to be cleared every new game. It is even possible that the values can be saved when the program exits and retrieved when it runs again. It will serve like a set of precomputed piece square tables that learns every new game.

This idea can be used in the extreme by using the position's pawn structure as index to its own history table. If there is just a way to hash the positions pawn structure into less than a hundred distinct index, then we will have a persistent history table that learns every time the engine searches.

I can't explain my idea clearly. This idea revolves around my theory that the history values are relevant for any position as long as the pawn structure is similar or related.

Maybe someone clever here can invent a method who can hash into a distinct index a similar set of pawn structure.

Cardoso · Post by **Cardoso** » Tue Sep 08, 2009 10:27 pm

Edsel Apostol wrote:
bob wrote:
rjgibert wrote:
bob wrote:
rjgibert wrote:
hgm wrote:
bob wrote:I've tried that, all the way to history[pc/from/to][ply] to history[pc/from/to][1]. The first is _very_ cache unfriendly. The latter is what everyone is currently using.

...
The discontinuity problem can be alleviated to a large extent by overlapping the bins and making them more fine-grained. As reading of the history tables occurs far more frequently than writing, the overhead involved in this can be put in the writing without too much impact on overall performance. I.e. use one bin per two ply, but store a cutoff at ply n not only in the [n,n+1] bin but also in the [n-2,n-1] and [n+2,n+3] bin. Then you do span 6 ply.

The cache friendlyness can perhaps be improved by having history[ply][pc/from/to].
I don't understand why you want to "overlap the bins." You are going to have some duplication of information between bins anyways, so I don't see the need to enforce this by overlapping.

Just let each bin populate itself naturally. I think that the "problem of borderline cases" is an illusion. If move x produces fail highs for both bin A and bin B, this will be reflected very quickly in both bins regardless. If the idea of multiple bins fails, I don't think it will be because of this issue. I just don't see it.
The overlap is needed because at ply=6, you care just as much about the history of ply=4 as you do for ply=8. If you just do plies 1-6, 7-12, then around that discontinuity point, you have less information than you do int he middle of the range.
That's a very decent explanation.

But you still don't need to overlap the counters. You can read from the corresponding bin plus the nearest adjacent bin and reconcile the values proportionately.

You would still update only one counter. Only the procedure for reading a value would involve two--one from each of the two bins.
Here's the issue. History ordering is "iffy" from the get-go. It is not a big Elo producer. Which means the cost has to be low. I tested this, and a ton of other ideas. Two examples.

You suggested using adjacent "bins". The problem with this is that you generally use a "selection sort" approach, which means you generate N moves, and then you make a pass over each of those N moves and compute a history score. If that doesn't fail high, you make a pass over the remaining N-1 moves, doing the same. This is non-trivial computation and it hurts overall.

I tried the opposite, that of doing the multiple bin computation when a fail-high occurs, since that is far less often than the previous example. But it is still expensive, not cache-friendly, and I could not get it to better than a break-even at best... Which means it was of no use.

I spent about 2.5 months testing every history idea I could come up with, because I wanted to see if I could find something that would be complementary with LMR and the "counter" approach which makes the L in LMR actually mean something. I tried obvious ideas. Ideas used by others. Ideas that were off-the-wall. Nothing worked...
If you don't have anything to test with the cluster can you please try my idea? I know it's not intuitive but at least if it would fail you could verify your theory that any form of history doesn't work. If it would work then all of us are going to benefit from it.

Here's the idea:

// history table indexed by [numwhitepawns][numblackpawns][side][piece][to]
// you could edit how the index is calculated to make it cache friendly
int HistTable[9][9][2][8][64];

When a beta cutoff occurs just increase the history counter by 1. No need to use depth and no need to scale.

just to say I had your idea a few years ago, but applyed to checkers.
It didn't help at all. I also tried many variants of having multiple history tables and nothing help, although I ended keeping one history table for every 4 ply and for reading I average 3 history tables neighbouring the current ply/4, so I add the three values and divide it by three.
I Agree with mike that an history table (or two or three) used only near at the root would yeld the best results.
Also I belive it works better for checkers than for chess.

regards,
Alvaro

Edsel Apostol · Post by **Edsel Apostol** » Wed Sep 09, 2009 10:48 am

Cardoso wrote:
Edsel Apostol wrote:
bob wrote:
rjgibert wrote:
bob wrote:
rjgibert wrote:
hgm wrote:
bob wrote:I've tried that, all the way to history[pc/from/to][ply] to history[pc/from/to][1]. The first is _very_ cache unfriendly. The latter is what everyone is currently using.

...
The discontinuity problem can be alleviated to a large extent by overlapping the bins and making them more fine-grained. As reading of the history tables occurs far more frequently than writing, the overhead involved in this can be put in the writing without too much impact on overall performance. I.e. use one bin per two ply, but store a cutoff at ply n not only in the [n,n+1] bin but also in the [n-2,n-1] and [n+2,n+3] bin. Then you do span 6 ply.

The cache friendlyness can perhaps be improved by having history[ply][pc/from/to].
I don't understand why you want to "overlap the bins." You are going to have some duplication of information between bins anyways, so I don't see the need to enforce this by overlapping.

Just let each bin populate itself naturally. I think that the "problem of borderline cases" is an illusion. If move x produces fail highs for both bin A and bin B, this will be reflected very quickly in both bins regardless. If the idea of multiple bins fails, I don't think it will be because of this issue. I just don't see it.
The overlap is needed because at ply=6, you care just as much about the history of ply=4 as you do for ply=8. If you just do plies 1-6, 7-12, then around that discontinuity point, you have less information than you do int he middle of the range.
That's a very decent explanation.

But you still don't need to overlap the counters. You can read from the corresponding bin plus the nearest adjacent bin and reconcile the values proportionately.

You would still update only one counter. Only the procedure for reading a value would involve two--one from each of the two bins.
Here's the issue. History ordering is "iffy" from the get-go. It is not a big Elo producer. Which means the cost has to be low. I tested this, and a ton of other ideas. Two examples.

You suggested using adjacent "bins". The problem with this is that you generally use a "selection sort" approach, which means you generate N moves, and then you make a pass over each of those N moves and compute a history score. If that doesn't fail high, you make a pass over the remaining N-1 moves, doing the same. This is non-trivial computation and it hurts overall.

I tried the opposite, that of doing the multiple bin computation when a fail-high occurs, since that is far less often than the previous example. But it is still expensive, not cache-friendly, and I could not get it to better than a break-even at best... Which means it was of no use.

I spent about 2.5 months testing every history idea I could come up with, because I wanted to see if I could find something that would be complementary with LMR and the "counter" approach which makes the L in LMR actually mean something. I tried obvious ideas. Ideas used by others. Ideas that were off-the-wall. Nothing worked...
If you don't have anything to test with the cluster can you please try my idea? I know it's not intuitive but at least if it would fail you could verify your theory that any form of history doesn't work. If it would work then all of us are going to benefit from it.

Here's the idea:

// history table indexed by [numwhitepawns][numblackpawns][side][piece][to]
// you could edit how the index is calculated to make it cache friendly
int HistTable[9][9][2][8][64];

When a beta cutoff occurs just increase the history counter by 1. No need to use depth and no need to scale.
just to say I had your idea a few years ago, but applyed to checkers.
It didn't help at all. I also tried many variants of having multiple history tables and nothing help, although I ended keeping one history table for every 4 ply and for reading I average 3 history tables neighbouring the current ply/4, so I add the three values and divide it by three.
I Agree with mike that an history table (or two or three) used only near at the root would yeld the best results.
Also I belive it works better for checkers than for chess.

regards,
Alvaro

Thanks Alvaro for sharing your ideas and results. It puts my idea further down my todo list.

Anyway, I have another idea. I don't know if this has been tried before. Here it goes:

Instead of using depth to fill the values in the history table, the number of nodes searched on that specific branch of tree under that move is stored.

This idea is somewhat related to what is in Stockfish/Glaurung though it was only being used there in sorting the root move. Since I presume that using nodes to order root moves is good as it work in a strong engine like Stockfish, I would assume that it will work with other type of moves as well, may it be in the leaf or any where else in the tree.

The idea can be pursued further by storing the nodes searched for each non tactical move in the tree. The advantage of this idea is that it isn't depth dependent. The moves at shallow plies, meaning near the root, will have higher value than the moves near the leaf. The disadvantage is that moves at deeper plies can still influence the score at shallow plies.

This idea can be used also for history pruning.

By the way, is there a good correlation between number of nodes searched for a branch under a certain move with the quality of that move? I mean if a move has spent a lot of nodes when it is searched, does that mean that that move is good? I think that it is but I might be wrong.

Too many ideas, too little time and resources to test.

bob · Post by **bob** » Wed Sep 09, 2009 7:38 pm

Edsel Apostol wrote:
Cardoso wrote:
Edsel Apostol wrote:
bob wrote:
rjgibert wrote:
bob wrote:
rjgibert wrote:
hgm wrote:
bob wrote:I've tried that, all the way to history[pc/from/to][ply] to history[pc/from/to][1]. The first is _very_ cache unfriendly. The latter is what everyone is currently using.

...
The discontinuity problem can be alleviated to a large extent by overlapping the bins and making them more fine-grained. As reading of the history tables occurs far more frequently than writing, the overhead involved in this can be put in the writing without too much impact on overall performance. I.e. use one bin per two ply, but store a cutoff at ply n not only in the [n,n+1] bin but also in the [n-2,n-1] and [n+2,n+3] bin. Then you do span 6 ply.

The cache friendlyness can perhaps be improved by having history[ply][pc/from/to].
I don't understand why you want to "overlap the bins." You are going to have some duplication of information between bins anyways, so I don't see the need to enforce this by overlapping.

Just let each bin populate itself naturally. I think that the "problem of borderline cases" is an illusion. If move x produces fail highs for both bin A and bin B, this will be reflected very quickly in both bins regardless. If the idea of multiple bins fails, I don't think it will be because of this issue. I just don't see it.
The overlap is needed because at ply=6, you care just as much about the history of ply=4 as you do for ply=8. If you just do plies 1-6, 7-12, then around that discontinuity point, you have less information than you do int he middle of the range.
That's a very decent explanation.

But you still don't need to overlap the counters. You can read from the corresponding bin plus the nearest adjacent bin and reconcile the values proportionately.

You would still update only one counter. Only the procedure for reading a value would involve two--one from each of the two bins.
Here's the issue. History ordering is "iffy" from the get-go. It is not a big Elo producer. Which means the cost has to be low. I tested this, and a ton of other ideas. Two examples.

You suggested using adjacent "bins". The problem with this is that you generally use a "selection sort" approach, which means you generate N moves, and then you make a pass over each of those N moves and compute a history score. If that doesn't fail high, you make a pass over the remaining N-1 moves, doing the same. This is non-trivial computation and it hurts overall.

I tried the opposite, that of doing the multiple bin computation when a fail-high occurs, since that is far less often than the previous example. But it is still expensive, not cache-friendly, and I could not get it to better than a break-even at best... Which means it was of no use.

I spent about 2.5 months testing every history idea I could come up with, because I wanted to see if I could find something that would be complementary with LMR and the "counter" approach which makes the L in LMR actually mean something. I tried obvious ideas. Ideas used by others. Ideas that were off-the-wall. Nothing worked...
If you don't have anything to test with the cluster can you please try my idea? I know it's not intuitive but at least if it would fail you could verify your theory that any form of history doesn't work. If it would work then all of us are going to benefit from it.

Here's the idea:

// history table indexed by [numwhitepawns][numblackpawns][side][piece][to]
// you could edit how the index is calculated to make it cache friendly
int HistTable[9][9][2][8][64];

When a beta cutoff occurs just increase the history counter by 1. No need to use depth and no need to scale.
just to say I had your idea a few years ago, but applyed to checkers.
It didn't help at all. I also tried many variants of having multiple history tables and nothing help, although I ended keeping one history table for every 4 ply and for reading I average 3 history tables neighbouring the current ply/4, so I add the three values and divide it by three.
I Agree with mike that an history table (or two or three) used only near at the root would yeld the best results.
Also I belive it works better for checkers than for chess.

regards,
Alvaro
Thanks Alvaro for sharing your ideas and results. It puts my idea further down my todo list.

Anyway, I have another idea. I don't know if this has been tried before. Here it goes:

Instead of using depth to fill the values in the history table, the number of nodes searched on that specific branch of tree under that move is stored.

This idea is somewhat related to what is in Stockfish/Glaurung though it was only being used there in sorting the root move. Since I presume that using nodes to order root moves is good as it work in a strong engine like Stockfish, I would assume that it will work with other type of moves as well, may it be in the leaf or any where else in the tree.

The idea can be pursued further by storing the nodes searched for each non tactical move in the tree. The advantage of this idea is that it isn't depth dependent. The moves at shallow plies, meaning near the root, will have higher value than the moves near the leaf. The disadvantage is that moves at deeper plies can still influence the score at shallow plies.

This idea can be used also for history pruning.

By the way, is there a good correlation between number of nodes searched for a branch under a certain move with the quality of that move? I mean if a move has spent a lot of nodes when it is searched, does that mean that that move is good? I think that it is but I might be wrong.

Too many ideas, too little time and resources to test.

I doubt that there are many (at least old-timers) that have not tried that. If you look up Schaeffers original paper on the history heuristic, he used 1<<depth. That's from days when he could do 5 ply searches on a very slow Sun. 1<<depth will not work today as it will obviously overflow a 32 bit history counter, and will actually overflow a 64 bit counter when we can reach depths of 40-50 plies on some variations. I believe I was the first to use depth*depth, which I tried as a solution to the overflow problem. Node counts are an issue for 32 bit machines, although they might well work for 64 bits. But the problem is that the size of the tree doesn't say much about the quality of the best move. Some paths are simply very deep because a losing sacrifice exposes the opponent's king to a zillion checks that all lead nowhere. Is the best move in that position really that useful?

I am convinced that the history idea simply doesn't work with today's searches near as well as it worked back in the days of pretty much fixed depth, no reductions, no forward pruning, etc...

mcostalba · Post by **mcostalba** » Wed Sep 09, 2009 9:19 pm

bob wrote: But the problem is that the size of the tree doesn't say much about the quality of the best move. Some paths are simply very deep because a losing sacrifice exposes the opponent's king to a zillion checks that all lead nowhere. Is the best move in that position really that useful?

IMHO what counts is the _average_ behaviour of size tree metric.

It has no importance at all if _sometime_ it can be wrong (as is with most of the chess engine tricks that happens to be only statistically good, not that are the optimum in any position) what really counts is that on average it chooses good moves more then bad ones, or at least does this more then the alternative way of ordering the moves.

Gerd Isenberg · Post by **Gerd Isenberg** » Wed Sep 09, 2009 9:20 pm

bob wrote: I am convinced that the history idea simply doesn't work with today's searches near as well as it worked back in the days of pretty much fixed depth, no reductions, no forward pruning, etc...

One may argue, if HH worked in the past with smaller search depths (even for you), why shouldn't it work today in subtrees with about the same size (or average remaining depth) as former full trees (implies clearing the history table at the root of such a subtree)?

Because in PVS most of those subtrees already use null-windows?

Your intrinsic movegen order with hash-move, winning captures, killers and then knight, bishop, rook, queen, king and finally pawn moves, most advanced piece and target square first, is probably good enough, for subtrees with static eval far below alpha or far above beta. Btw. do you try all knight-moves before any pawn moves (except hash and killer), or do you prefer more advanced pawns before less advanced knights?

bob · Post by **bob** » Wed Sep 09, 2009 10:34 pm

mcostalba wrote:
bob wrote: But the problem is that the size of the tree doesn't say much about the quality of the best move. Some paths are simply very deep because a losing sacrifice exposes the opponent's king to a zillion checks that all lead nowhere. Is the best move in that position really that useful?
IMHO what counts is the _average_ behaviour of size tree metric.

It has no importance at all if _sometime_ it can be wrong (as is with most of the chess engine tricks that happens to be only statistically good, not that are the optimum in any position) what really counts is that on average it chooses good moves more then bad ones, or at least does this more then the alternative way of ordering the moves.

I don't totally agree. Size of the tree is one aspect. Time to search that tree is another. This makes actual games the best measuring tool of all. Choose the ordering strategy that wins the most games, regardless of speed or tree size. Because the fastest/smallest is irrelevant unless it wins more games. There are subtle trade-offs that are easier to manage in real test games...

bob · Post by **bob** » Wed Sep 09, 2009 10:39 pm

Gerd Isenberg wrote:
bob wrote: I am convinced that the history idea simply doesn't work with today's searches near as well as it worked back in the days of pretty much fixed depth, no reductions, no forward pruning, etc...
One may argue, if HH worked in the past with smaller search depths (even for you), why shouldn't it work today in subtrees with about the same size (or average remaining depth) as former full trees (implies clearing the history table at the root of such a subtree)?

Because in PVS most of those subtrees already use null-windows?

Your intrinsic movegen order with hash-move, winning captures, killers and then knight, bishop, rook, queen, king and finally pawn moves, most advanced piece and target square first, is probably good enough, for subtrees with static eval far below alpha or far above beta. Btw. do you try all knight-moves before any pawn moves (except hash and killer), or do you prefer more advanced pawns before less advanced knights?

I generate them in piece-order. So all knight moves get tried first. And with respect to pawns, I do this because as a chess player, I realize that _every_ pawn move is a bad move. It weakens squares irrevocably, and makes it easier to attack that particular pawn, irrevocably. Yes, sometimes the loss is offset by an even bigger gain (wrecking the opponent's king safety, or seizing the center (or whatever). But holding off on the pawn moves until the end of the search has always worked well for me, dating back to pre-cray-blitz days.

And as I said, yes, it is likely that the way I do things is good enough that I can't get any benefit from the history heuristic, which did work in older versions of Crafty to a very limited degree. HH was never a big winner for me, but I'll take a percent or two if I can.

mcostalba · Post by **mcostalba** » Thu Sep 10, 2009 9:04 am

bob wrote: I don't totally agree.

I think we have said the same thing...more or less

Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics