Alternatives to History Heuristics

Edsel Apostol · Post by **Edsel Apostol** » Fri Sep 04, 2009 6:40 pm

mcostalba wrote:
mcostalba wrote:
Edsel Apostol wrote: I have this idea to use a new history table every fifth ply (subject to tuning) and to initialize it at every 5th first grandchild node of each root move and the succeeding 5th first grandchild from that 5th first grandchild node. This way a history table will only be used for 5 plies and it will only be used by that certain branch of the tree.
Yes, I had a _similar_ idea but still no time to test it.

Actually my idea is slightly different but at the same time _very_different in possible results.

If tests will be good I will post results.
Ok, you posted yours so it is correct I post mine:

It is similar to yours but instead of nodes "of each root move" I only consider PV nodes and of this node I only consider the first move, i.e. the PV move.

IOW I setup an history table for first move of root move list.

All the following moves will use this table.

When fifth ply is reached and we are in PV, then I setup another table for the _first_ move of the PV node: all the moves _below_ that move will use this new table.

In my scheme on a search of say 20 plies I have 5 tables only, on your scheme you have much more and I have doubts it can work

Note that in Stockfish (I don't know in your Twisted logic) history is weighted by depth, also if first main table will end up storing data of side branches at very high ply, this should not compromise history quality because their relative weight is very small.

My scheme also, handles nicely the asymmetry between the size of the PV move sub-tree and of the remaining ones (that are much smaller).

I think this scheme is what more is near to the concept of one history table per subtree and I have expectations it will work.

Your scheme is what I thought first before I came up with my idea. I've just realized now that my idea will only work if it is in a fixed single depth search. I forgot to take into account that it will be used in an iterative deepening framework.

The reason why I scrapped that first idea in favor of mine is that the fifth or tenth ply of the PV move may not be close positionally to the fifth or tenth ply of non PV nodes, therefore the contents of that history table from the PV node might not be a representative of the subtree on non PV nodes.

I think the most probable idea would be to use the number of pawns to index what history table to use. I came up with this idea because I think that a position with 5 pawns for white and 5 pawns for black for example will be close to other positions with similar number of pawns, therefore the history data from this positions will be very similar and their values can be trusted. This scheme will be independent of ply and depth so it will not be affected by deeper search.

mcostalba · Post by **mcostalba** » Fri Sep 04, 2009 8:01 pm

Edsel Apostol wrote: The reason why I scrapped that first idea in favor of mine is that the fifth or tenth ply of the PV move may not be close positionally to the fifth or tenth ply of non PV nodes, therefore the contents of that history table from the PV node might not be a representative of the subtree on non PV nodes.

Of course they are not, but I don't think this is a problem. You use same history table only for moves below your PV. Moves siblings of your PV will use the previous table.

Let me see if I am able to represent what I have expressed, you have three tables a,b and c

Code: Select all

a-a-b-b-c-c
  | | | | |
  | | | | c
  | | | b-b
  | | b-b-b
  | a-a-a-a
  |
  a-a-a-a-a

Plies go from left to right. On the vertical line there are moves at the same node.

At the beginning you use table a, then at ply 3 first move of PV switches to B and so all the moves _below_ b, but not the ones siblings of b.

At ply 4 first PV move goes to c, but the others remain to b and there are others still on a.

The point is that main sub-tree changes and also if side branches stay using the same table this is not a problem because they usually are much smaller then main branch.

bob · Post by **bob** » Fri Sep 04, 2009 10:34 pm

rjgibert wrote:
hgm wrote:
bob wrote:I've tried that, all the way to history[pc/from/to][ply] to history[pc/from/to][1]. The first is _very_ cache unfriendly. The latter is what everyone is currently using.

...
The discontinuity problem can be alleviated to a large extent by overlapping the bins and making them more fine-grained. As reading of the history tables occurs far more frequently than writing, the overhead involved in this can be put in the writing without too much impact on overall performance. I.e. use one bin per two ply, but store a cutoff at ply n not only in the [n,n+1] bin but also in the [n-2,n-1] and [n+2,n+3] bin. Then you do span 6 ply.

The cache friendlyness can perhaps be improved by having history[ply][pc/from/to].
I don't understand why you want to "overlap the bins." You are going to have some duplication of information between bins anyways, so I don't see the need to enforce this by overlapping.

Just let each bin populate itself naturally. I think that the "problem of borderline cases" is an illusion. If move x produces fail highs for both bin A and bin B, this will be reflected very quickly in both bins regardless. If the idea of multiple bins fails, I don't think it will be because of this issue. I just don't see it.

The overlap is needed because at ply=6, you care just as much about the history of ply=4 as you do for ply=8. If you just do plies 1-6, 7-12, then around that discontinuity point, you have less information than you do int he middle of the range.

rjgibert · Post by **rjgibert** » Sat Sep 05, 2009 2:09 am

bob wrote:
rjgibert wrote:
hgm wrote:
bob wrote:I've tried that, all the way to history[pc/from/to][ply] to history[pc/from/to][1]. The first is _very_ cache unfriendly. The latter is what everyone is currently using.

...
The discontinuity problem can be alleviated to a large extent by overlapping the bins and making them more fine-grained. As reading of the history tables occurs far more frequently than writing, the overhead involved in this can be put in the writing without too much impact on overall performance. I.e. use one bin per two ply, but store a cutoff at ply n not only in the [n,n+1] bin but also in the [n-2,n-1] and [n+2,n+3] bin. Then you do span 6 ply.

The cache friendlyness can perhaps be improved by having history[ply][pc/from/to].
I don't understand why you want to "overlap the bins." You are going to have some duplication of information between bins anyways, so I don't see the need to enforce this by overlapping.

Just let each bin populate itself naturally. I think that the "problem of borderline cases" is an illusion. If move x produces fail highs for both bin A and bin B, this will be reflected very quickly in both bins regardless. If the idea of multiple bins fails, I don't think it will be because of this issue. I just don't see it.
The overlap is needed because at ply=6, you care just as much about the history of ply=4 as you do for ply=8. If you just do plies 1-6, 7-12, then around that discontinuity point, you have less information than you do int he middle of the range.

That's a very decent explanation.

But you still don't need to overlap the counters. You can read from the corresponding bin plus the nearest adjacent bin and reconcile the values proportionately.

You would still update only one counter. Only the procedure for reading a value would involve two--one from each of the two bins.

Edsel Apostol · Post by **Edsel Apostol** » Sat Sep 05, 2009 2:47 am

rjgibert wrote:
bob wrote:
rjgibert wrote:
hgm wrote:
bob wrote:I've tried that, all the way to history[pc/from/to][ply] to history[pc/from/to][1]. The first is _very_ cache unfriendly. The latter is what everyone is currently using.

...
The discontinuity problem can be alleviated to a large extent by overlapping the bins and making them more fine-grained. As reading of the history tables occurs far more frequently than writing, the overhead involved in this can be put in the writing without too much impact on overall performance. I.e. use one bin per two ply, but store a cutoff at ply n not only in the [n,n+1] bin but also in the [n-2,n-1] and [n+2,n+3] bin. Then you do span 6 ply.

The cache friendlyness can perhaps be improved by having history[ply][pc/from/to].
I don't understand why you want to "overlap the bins." You are going to have some duplication of information between bins anyways, so I don't see the need to enforce this by overlapping.

Just let each bin populate itself naturally. I think that the "problem of borderline cases" is an illusion. If move x produces fail highs for both bin A and bin B, this will be reflected very quickly in both bins regardless. If the idea of multiple bins fails, I don't think it will be because of this issue. I just don't see it.
The overlap is needed because at ply=6, you care just as much about the history of ply=4 as you do for ply=8. If you just do plies 1-6, 7-12, then around that discontinuity point, you have less information than you do int he middle of the range.
That's a very decent explanation.

But you still don't need to overlap the counters. You can read from the corresponding bin plus the nearest adjacent bin and reconcile the values proportionately.

You would still update only one counter. Only the procedure for reading a value would involve two--one from each of the two bins.

HGM thinks that since there are more read than write that occurs in the table, it would probably be faster if he just put more calculations in the write part than in the read part. So instead of reading from two bins and reconcile the values proportionately as what you've said, he already done that in the write part so that when you read from a certain bin, it was already overlapped, so there is only one read needed and it would save a lot of calculations.

bob · Post by **bob** » Sat Sep 05, 2009 4:34 pm

rjgibert wrote:
bob wrote:
rjgibert wrote:
hgm wrote:
bob wrote:I've tried that, all the way to history[pc/from/to][ply] to history[pc/from/to][1]. The first is _very_ cache unfriendly. The latter is what everyone is currently using.

...
The discontinuity problem can be alleviated to a large extent by overlapping the bins and making them more fine-grained. As reading of the history tables occurs far more frequently than writing, the overhead involved in this can be put in the writing without too much impact on overall performance. I.e. use one bin per two ply, but store a cutoff at ply n not only in the [n,n+1] bin but also in the [n-2,n-1] and [n+2,n+3] bin. Then you do span 6 ply.

The cache friendlyness can perhaps be improved by having history[ply][pc/from/to].
I don't understand why you want to "overlap the bins." You are going to have some duplication of information between bins anyways, so I don't see the need to enforce this by overlapping.

Just let each bin populate itself naturally. I think that the "problem of borderline cases" is an illusion. If move x produces fail highs for both bin A and bin B, this will be reflected very quickly in both bins regardless. If the idea of multiple bins fails, I don't think it will be because of this issue. I just don't see it.
The overlap is needed because at ply=6, you care just as much about the history of ply=4 as you do for ply=8. If you just do plies 1-6, 7-12, then around that discontinuity point, you have less information than you do int he middle of the range.
That's a very decent explanation.

But you still don't need to overlap the counters. You can read from the corresponding bin plus the nearest adjacent bin and reconcile the values proportionately.

You would still update only one counter. Only the procedure for reading a value would involve two--one from each of the two bins.

Here's the issue. History ordering is "iffy" from the get-go. It is not a big Elo producer. Which means the cost has to be low. I tested this, and a ton of other ideas. Two examples.

You suggested using adjacent "bins". The problem with this is that you generally use a "selection sort" approach, which means you generate N moves, and then you make a pass over each of those N moves and compute a history score. If that doesn't fail high, you make a pass over the remaining N-1 moves, doing the same. This is non-trivial computation and it hurts overall.

I tried the opposite, that of doing the multiple bin computation when a fail-high occurs, since that is far less often than the previous example. But it is still expensive, not cache-friendly, and I could not get it to better than a break-even at best... Which means it was of no use.

I spent about 2.5 months testing every history idea I could come up with, because I wanted to see if I could find something that would be complementary with LMR and the "counter" approach which makes the L in LMR actually mean something. I tried obvious ideas. Ideas used by others. Ideas that were off-the-wall. Nothing worked...

rjgibert · Post by **rjgibert** » Sat Sep 05, 2009 5:34 pm

bob wrote:
[snip]

Here's the issue. History ordering is "iffy" from the get-go. It is not a big Elo producer. Which means the cost has to be low. I tested this, and a ton of other ideas. Two examples.

You suggested using adjacent "bins". The problem with this is that you generally use a "selection sort" approach, which means you generate N moves, and then you make a pass over each of those N moves and compute a history score. If that doesn't fail high, you make a pass over the remaining N-1 moves, doing the same. This is non-trivial computation and it hurts overall.

I tried the opposite, that of doing the multiple bin computation when a fail-high occurs, since that is far less often than the previous example. But it is still expensive, not cache-friendly, and I could not get it to better than a break-even at best... Which means it was of no use.

I spent about 2.5 months testing every history idea I could come up with, because I wanted to see if I could find something that would be complementary with LMR and the "counter" approach which makes the L in LMR actually mean something. I tried obvious ideas. Ideas used by others. Ideas that were off-the-wall. Nothing worked...

I got it. You did a lot of stuff. But that doesn't make it a dead issue.

One last idea, then I'll shut up.

This idea addresses the discontinuity problem in the simplest way possible. The discontinuity problem of adjacent bins causes some of the ply levels to be under represented in the bins, because they are borderline.

The simplest solution is to simply weight those borderline plys a little more heavily when we update a counter by multiplying by a constant. Now what could be simpler than that?

We don't need to overlap the bins. We don't need to update more than one bin with a single fail high move. We just weight the borderline plys more heavily.

This would address the discontinuity problem in a rough way, but in computationally super simple way. Did you try this idea?

Edsel Apostol · Post by **Edsel Apostol** » Sat Sep 05, 2009 6:38 pm

bob wrote:
rjgibert wrote:
bob wrote:
rjgibert wrote:
hgm wrote:
bob wrote:I've tried that, all the way to history[pc/from/to][ply] to history[pc/from/to][1]. The first is _very_ cache unfriendly. The latter is what everyone is currently using.

...
The discontinuity problem can be alleviated to a large extent by overlapping the bins and making them more fine-grained. As reading of the history tables occurs far more frequently than writing, the overhead involved in this can be put in the writing without too much impact on overall performance. I.e. use one bin per two ply, but store a cutoff at ply n not only in the [n,n+1] bin but also in the [n-2,n-1] and [n+2,n+3] bin. Then you do span 6 ply.

The cache friendlyness can perhaps be improved by having history[ply][pc/from/to].
I don't understand why you want to "overlap the bins." You are going to have some duplication of information between bins anyways, so I don't see the need to enforce this by overlapping.

Just let each bin populate itself naturally. I think that the "problem of borderline cases" is an illusion. If move x produces fail highs for both bin A and bin B, this will be reflected very quickly in both bins regardless. If the idea of multiple bins fails, I don't think it will be because of this issue. I just don't see it.
The overlap is needed because at ply=6, you care just as much about the history of ply=4 as you do for ply=8. If you just do plies 1-6, 7-12, then around that discontinuity point, you have less information than you do int he middle of the range.
That's a very decent explanation.

But you still don't need to overlap the counters. You can read from the corresponding bin plus the nearest adjacent bin and reconcile the values proportionately.

You would still update only one counter. Only the procedure for reading a value would involve two--one from each of the two bins.
Here's the issue. History ordering is "iffy" from the get-go. It is not a big Elo producer. Which means the cost has to be low. I tested this, and a ton of other ideas. Two examples.

You suggested using adjacent "bins". The problem with this is that you generally use a "selection sort" approach, which means you generate N moves, and then you make a pass over each of those N moves and compute a history score. If that doesn't fail high, you make a pass over the remaining N-1 moves, doing the same. This is non-trivial computation and it hurts overall.

I tried the opposite, that of doing the multiple bin computation when a fail-high occurs, since that is far less often than the previous example. But it is still expensive, not cache-friendly, and I could not get it to better than a break-even at best... Which means it was of no use.

I spent about 2.5 months testing every history idea I could come up with, because I wanted to see if I could find something that would be complementary with LMR and the "counter" approach which makes the L in LMR actually mean something. I tried obvious ideas. Ideas used by others. Ideas that were off-the-wall. Nothing worked...

If you don't have anything to test with the cluster can you please try my idea? I know it's not intuitive but at least if it would fail you could verify your theory that any form of history doesn't work. If it would work then all of us are going to benefit from it.

Here's the idea:

// history table indexed by [numwhitepawns][numblackpawns][side][piece][to]
// you could edit how the index is calculated to make it cache friendly
int HistTable[9][9][2][8][64];

When a beta cutoff occurs just increase the history counter by 1. No need to use depth and no need to scale.

bob · Post by **bob** » Sat Sep 05, 2009 11:37 pm

rjgibert wrote:
bob wrote:
[snip]

Here's the issue. History ordering is "iffy" from the get-go. It is not a big Elo producer. Which means the cost has to be low. I tested this, and a ton of other ideas. Two examples.

You suggested using adjacent "bins". The problem with this is that you generally use a "selection sort" approach, which means you generate N moves, and then you make a pass over each of those N moves and compute a history score. If that doesn't fail high, you make a pass over the remaining N-1 moves, doing the same. This is non-trivial computation and it hurts overall.

I tried the opposite, that of doing the multiple bin computation when a fail-high occurs, since that is far less often than the previous example. But it is still expensive, not cache-friendly, and I could not get it to better than a break-even at best... Which means it was of no use.

I spent about 2.5 months testing every history idea I could come up with, because I wanted to see if I could find something that would be complementary with LMR and the "counter" approach which makes the L in LMR actually mean something. I tried obvious ideas. Ideas used by others. Ideas that were off-the-wall. Nothing worked...
I got it. You did a lot of stuff. But that doesn't make it a dead issue.

One last idea, then I'll shut up.

This idea addresses the discontinuity problem in the simplest way possible. The discontinuity problem of adjacent bins causes some of the ply levels to be under represented in the bins, because they are borderline.

The simplest solution is to simply weight those borderline plys a little more heavily when we update a counter by multiplying by a constant. Now what could be simpler than that?

We don't need to overlap the bins. We don't need to update more than one bin with a single fail high move. We just weight the borderline plys more heavily.

This would address the discontinuity problem in a rough way, but in computationally super simple way. Did you try this idea?

Here's the problem. We are at ply 5 and want to pick a good history move. We just returned back up the tree from a big search out at ply 35. I don't want the stuff from ply 35 to influence my move at ply 5 at all. And for the search out at ply 35, I don't want the information from ply 5 to influence me at all out there.

Michael Sherwin · Post by **Michael Sherwin** » Sun Sep 06, 2009 12:55 am

bob wrote:
rjgibert wrote:
bob wrote:
[snip]

Here's the issue. History ordering is "iffy" from the get-go. It is not a big Elo producer. Which means the cost has to be low. I tested this, and a ton of other ideas. Two examples.

You suggested using adjacent "bins". The problem with this is that you generally use a "selection sort" approach, which means you generate N moves, and then you make a pass over each of those N moves and compute a history score. If that doesn't fail high, you make a pass over the remaining N-1 moves, doing the same. This is non-trivial computation and it hurts overall.

I tried the opposite, that of doing the multiple bin computation when a fail-high occurs, since that is far less often than the previous example. But it is still expensive, not cache-friendly, and I could not get it to better than a break-even at best... Which means it was of no use.

I spent about 2.5 months testing every history idea I could come up with, because I wanted to see if I could find something that would be complementary with LMR and the "counter" approach which makes the L in LMR actually mean something. I tried obvious ideas. Ideas used by others. Ideas that were off-the-wall. Nothing worked...
I got it. You did a lot of stuff. But that doesn't make it a dead issue.

One last idea, then I'll shut up.

This idea addresses the discontinuity problem in the simplest way possible. The discontinuity problem of adjacent bins causes some of the ply levels to be under represented in the bins, because they are borderline.

The simplest solution is to simply weight those borderline plys a little more heavily when we update a counter by multiplying by a constant. Now what could be simpler than that?

We don't need to overlap the bins. We don't need to update more than one bin with a single fail high move. We just weight the borderline plys more heavily.

This would address the discontinuity problem in a rough way, but in computationally super simple way. Did you try this idea?
Here's the problem. We are at ply 5 and want to pick a good history move. We just returned back up the tree from a big search out at ply 35. I don't want the stuff from ply 35 to influence my move at ply 5 at all. And for the search out at ply 35, I don't want the information from ply 5 to influence me at all out there.

Move ordering for the first N ply is more important than for the later ply. Therefore, I believe that much more computationally intensive and smarter algorithms are of greater value at shallow ply. If history tables can be used to improve the move ordering then they would be best used only at deeper ply within M of the leaf nodes.

Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics

Re: Alternatives to History Heuristics