move count based pruning

bob · Post by **bob** » Wed Sep 08, 2010 6:29 pm

rbarreira wrote:
Don wrote: But what may be more important is the time control the programs were designed for. In a sense most programs are designed for human-like time controls on the platform of the day. I have a feeling that if we ran todays programs on yesterdays hardware at human-like time controls we would get a different answer than if we ran the same two programs on todays hardware.
I think you are definitely on to something here... Whether we run a new program on old hardware or an old program on new hardware, we're taking a fish out of the water. Today's programs probably do things that would be detrimental on old hardware (trusting the availability of large amounts of RAM, for one thing...), and old programs aren't tuned to take advantage of very fast hardware...

It's probably possible to get some reasonably good data out of these experiments anyway, but it might always have caveats.

There is really no accurate way to do this. So all we can hope for is just a simple "first approximation". Clearly had I had today's hardware in 1996, I would have tuned Crafty significantly different with regard to null-move R, and whatever else might have been appropriate (I am not an expert on 1996 crafty at the moment, still trying to get the damned thing to run on 64 bit hardware, which is not easy.)

silentshark · Post by **silentshark** » Wed Sep 08, 2010 7:42 pm

bob wrote:
silentshark wrote:
bob wrote:Ugh. This is not going so well. I used exactly your numbers, doing "LMP" in the last 4 plies after searching the number of moves you gave. I removed my forward pruning code (which is based on alpha, material score, and an error margin that gets larger as depth increases, but still limited to last 4 plies only.

Early results so far:
Code: Select all
    Crafty-23.4-2        2676    4    4 30000   66%  2552   22% 
    Crafty-23.4-1        2673    4    4 30000   66%  2552   22% 
    Crafty-23.4R03       2600    7    7  5891   56%  2551   25% 
You might want to give my forward pruning a try. I simply compare material-score + error-margin and compare to alpha. If that value is <= alpha, I prune. Only in last 4 plies. The 4 margins are 125, 125, 300, 300. I have margins for deeper depths but cut it off at remaining depth = 4. I've not had any better results pruning at 5 or 6, yet.
Hmm. That doesn't look too promising. Can you give it a whirl as an addition to all your normal foward pruning stuff?

If this implementation doesn't work, I wouldn't write LMP off. A few things:

- this seems to work for Stockfish, Toga (and others?)
- successful implementation is probably very program dependent
- move ordering of (non hash, non killer, non capture) moves may be important if you are doing LMP. iirc, for these moves Crafty applies no move ordering.

I'm still interested what the result of taking LMP out of Komodo or
Stockfish looks like.
here is the final data:
Code: Select all
   Crafty-23.4-2        2676    3    3 30000   66%  2553   22%
   Crafty-23.4-1        2673    3    3 30000   66%  2553   22%
         2596    3    3 30000   56%  2553   24%
   Crafty-23.4R03-2     2596    3    3 30000   56%  2553   24%
Now for some possible reasons. First, if I search the same position to the same depth using the two above versions, R03 (LMP) searches a tree at least 2x bigger, so my futility pruning code is significantly more aggressive. It is also more selective in that it doesn't prune moves that get us close to alpha, while dumping those that do not.

I prune in the last 4 plies, At any ply, before I prune, I must satisfy all of the following:

(1) this move is not a hash move, nor a capture with SEE >0, nor a killer move.

(2) I am not in check (I have plans to relax this a bit by asking "am I in check, and did I extend the move that checked me?" since I don't extend losing checks (according to SEE)).

(3) I did not extend the current move that I am considering pruning (this means this move is a safe check)

(4) I have searched at least one move of some sort (first move is never pruned).

(5) material score + error margin < alpha.

If I get thru those tests, then I start to test for pruning, which only happens in the last 4 plies using the 4 different error margins (indexed by depth) that I gave last night...

I believe this is yet another case of the reason why I dislike the concept of "move number pruning/reduction decisions."

Another issue here is that I already know that for typical middle-game positions, if Crafty fails high at any node, about 92% of the time it fails high on the first node. By the time I get thru the hash move, good captures, and killers, this number is way on up (I do not measure this now, but am going to run some tests shortly and will report the results) there, possibly as high as 99%. If that is so, then reducing everything beyond those moves makes a lot of sense, unless something marks them as "remarkable" (such as winning material, safe check, etc.).

Notice that in the above test, I removed futility pruning since they overlap with each other. But I did keep the reduction code so that moves that were not pruned away were still subject to my normal reductions.

You mentioned it is working for StockFish. I have not looked at their code (yet). Do they outright prune based only on the move's position in the list, or do they have the material+margin < alpha requirement as well, which really changes the way this works???

Thanks, bob, what's the difference between Crafty-23.4R03-1 and Crafty-23.4R03-2? Is Crafty-23.4R03-2 using LMP as well as all your usual pruning?

I don't see anything like a material + margin < alpha condition in the SF code for LMP. Other conditions are like yours, don't prune captures, don't prune if in check etc.

silentshark · Post by **silentshark** » Wed Sep 08, 2010 8:45 pm

bob wrote: Notice that in the above test, I removed futility pruning since they overlap with each other. But I did keep the reduction code so that moves that were not pruned away were still subject to my normal reductions.

just spotted this statement. I guess there is some overlap, but there will be many instances where they prune different subtrees. So why not try with both together (if you can spare the cluster time)

bob · Post by **bob** » Wed Sep 08, 2010 8:58 pm

Don wrote:
rbarreira wrote:
Don wrote: But what may be more important is the time control the programs were designed for. In a sense most programs are designed for human-like time controls on the platform of the day. I have a feeling that if we ran todays programs on yesterdays hardware at human-like time controls we would get a different answer than if we ran the same two programs on todays hardware.
I think you are definitely on to something here... Whether we run a new program on old hardware or an old program on new hardware, we're taking a fish out of the water. Today's programs probably do things that would be detrimental on old hardware (trusting the availability of large amounts of RAM, for one thing...), and old programs aren't tuned to take advantage of very fast hardware...

It's probably possible to get some reasonably good data out of these experiments anyway, but it might always have caveats.
Yes, it has to be taken with a grain of salt. Software and hardware tend to evolve together.

For example the MP issue. MP is not just hardware but represents a software advance too. So when we do these experiments with MP turned on in order to measure the "benefit" of hardware advances we will be pretending that software has nothing to do with it, even though it does.

One can rationalize that the parallel search is not chess-related. That it is more of a hardware optimization technique as opposed to a pure AI improvement. Whereas things like reductions, pruning, extensions, evaluations, etc are all related to the game being played...

There are bunch of the caveats that make this not an apples to apples comparison that I have come up with. The design of the program is based on the current hardware, we would write a "modern" program differently if it were run on older hardware. We depend on bigger cache sizes, we use compilers designed for new processors (although some have switches for older architectures), hash table sizes were often limited in the older programs which couldn't take advantage of todays memory sizes anyway. (But we should consider memory size on the hardware side of the issue.) Even though it may not be considered relevant, 64 bit programs are suddenly very popular in a time where 64 bit OS's are also very popular. Is this a hardware or software advance? Crafty has always been 64 bit most of the commodity OS's were 32 bit and probably could not take full advantage. I believe that on 64 bit hardware 64 bit programs have an advantage, but on 32 bit hardware I think it's probably a slight disadvantage.

I think a big issue may also be scalability. The results may depend very much on the time control - where running really fast would be more favorable for the older programs and running really slow for the newer programs. So if you run a modern program on ancient hardware where you only doing 3 or 4 ply searches, it's like putting a fast runner who is a slow starter in a really short race - he does not have a chance to really stretch his legs out.

Don · Post by **Don** » Wed Sep 08, 2010 9:16 pm

bob wrote:
Don wrote:
rbarreira wrote:
Don wrote: But what may be more important is the time control the programs were designed for. In a sense most programs are designed for human-like time controls on the platform of the day. I have a feeling that if we ran todays programs on yesterdays hardware at human-like time controls we would get a different answer than if we ran the same two programs on todays hardware.
I think you are definitely on to something here... Whether we run a new program on old hardware or an old program on new hardware, we're taking a fish out of the water. Today's programs probably do things that would be detrimental on old hardware (trusting the availability of large amounts of RAM, for one thing...), and old programs aren't tuned to take advantage of very fast hardware...

It's probably possible to get some reasonably good data out of these experiments anyway, but it might always have caveats.
Yes, it has to be taken with a grain of salt. Software and hardware tend to evolve together.

For example the MP issue. MP is not just hardware but represents a software advance too. So when we do these experiments with MP turned on in order to measure the "benefit" of hardware advances we will be pretending that software has nothing to do with it, even though it does.

One can rationalize that the parallel search is not chess-related. That it is more of a hardware optimization technique as opposed to a pure AI improvement. Whereas things like reductions, pruning, extensions, evaluations, etc are all related to the game being played...

One could make that rationalization but it doesn't fly with me. I have read several papers on parallel chess and including some you were involved with and there are special considerations for chess. I remember reading about simple root tree splitting algorithms in your early stuff (Cray Blitz) I think and then how you set out to improve it significantly.

So this is like saying that your hash table implementation in chess programs have nothing to do with chess because it's only a way to take advantage of memory which is a hardware thing. But how you do it is a huge thing, it's not irrelevant by any means.

I have a way to resolve this issue. I believe that's it's only a partial software issue and agree this is more about hardware - but software should be given some credit too. Clearly, if you have a good MP implementation such as in Crafty, you can run on as many cores as available. In other words, if you can run on 2 cores, you can run on any number of cores. So the software "credit" could be the amount of expected ELO gain for running on 2 cores vs 1 core. This is probably something like 40 ELO or so. I think Crafty gets about 100 ELO going to 4 cores. So we could credit 40 ELO for software advancements enabling MP and if we run on 4 cores credit 100 ELO for hardware advancement. And if you run on even more cores you don't keep giving credit for the software but consider it a constant.

There are bunch of the caveats that make this not an apples to apples comparison that I have come up with. The design of the program is based on the current hardware, we would write a "modern" program differently if it were run on older hardware. We depend on bigger cache sizes, we use compilers designed for new processors (although some have switches for older architectures), hash table sizes were often limited in the older programs which couldn't take advantage of todays memory sizes anyway. (But we should consider memory size on the hardware side of the issue.) Even though it may not be considered relevant, 64 bit programs are suddenly very popular in a time where 64 bit OS's are also very popular. Is this a hardware or software advance? Crafty has always been 64 bit most of the commodity OS's were 32 bit and probably could not take full advantage. I believe that on 64 bit hardware 64 bit programs have an advantage, but on 32 bit hardware I think it's probably a slight disadvantage.

I think a big issue may also be scalability. The results may depend very much on the time control - where running really fast would be more favorable for the older programs and running really slow for the newer programs. So if you run a modern program on ancient hardware where you only doing 3 or 4 ply searches, it's like putting a fast runner who is a slow starter in a really short race - he does not have a chance to really stretch his legs out.

bob · Post by **bob** » Wed Sep 08, 2010 10:28 pm

silentshark wrote:
bob wrote:
silentshark wrote:
bob wrote:Ugh. This is not going so well. I used exactly your numbers, doing "LMP" in the last 4 plies after searching the number of moves you gave. I removed my forward pruning code (which is based on alpha, material score, and an error margin that gets larger as depth increases, but still limited to last 4 plies only.

Early results so far:
Code: Select all
    Crafty-23.4-2        2676    4    4 30000   66%  2552   22% 
    Crafty-23.4-1        2673    4    4 30000   66%  2552   22% 
    Crafty-23.4R03       2600    7    7  5891   56%  2551   25% 
You might want to give my forward pruning a try. I simply compare material-score + error-margin and compare to alpha. If that value is <= alpha, I prune. Only in last 4 plies. The 4 margins are 125, 125, 300, 300. I have margins for deeper depths but cut it off at remaining depth = 4. I've not had any better results pruning at 5 or 6, yet.
Hmm. That doesn't look too promising. Can you give it a whirl as an addition to all your normal foward pruning stuff?

If this implementation doesn't work, I wouldn't write LMP off. A few things:

- this seems to work for Stockfish, Toga (and others?)
- successful implementation is probably very program dependent
- move ordering of (non hash, non killer, non capture) moves may be important if you are doing LMP. iirc, for these moves Crafty applies no move ordering.

I'm still interested what the result of taking LMP out of Komodo or
Stockfish looks like.
here is the final data:
Code: Select all
   Crafty-23.4-2        2676    3    3 30000   66%  2553   22%
   Crafty-23.4-1        2673    3    3 30000   66%  2553   22%
         2596    3    3 30000   56%  2553   24%
   Crafty-23.4R03-2     2596    3    3 30000   56%  2553   24%
Now for some possible reasons. First, if I search the same position to the same depth using the two above versions, R03 (LMP) searches a tree at least 2x bigger, so my futility pruning code is significantly more aggressive. It is also more selective in that it doesn't prune moves that get us close to alpha, while dumping those that do not.

I prune in the last 4 plies, At any ply, before I prune, I must satisfy all of the following:

(1) this move is not a hash move, nor a capture with SEE >0, nor a killer move.

(2) I am not in check (I have plans to relax this a bit by asking "am I in check, and did I extend the move that checked me?" since I don't extend losing checks (according to SEE)).

(3) I did not extend the current move that I am considering pruning (this means this move is a safe check)

(4) I have searched at least one move of some sort (first move is never pruned).

(5) material score + error margin < alpha.

If I get thru those tests, then I start to test for pruning, which only happens in the last 4 plies using the 4 different error margins (indexed by depth) that I gave last night...

I believe this is yet another case of the reason why I dislike the concept of "move number pruning/reduction decisions."

Another issue here is that I already know that for typical middle-game positions, if Crafty fails high at any node, about 92% of the time it fails high on the first node. By the time I get thru the hash move, good captures, and killers, this number is way on up (I do not measure this now, but am going to run some tests shortly and will report the results) there, possibly as high as 99%. If that is so, then reducing everything beyond those moves makes a lot of sense, unless something marks them as "remarkable" (such as winning material, safe check, etc.).

Notice that in the above test, I removed futility pruning since they overlap with each other. But I did keep the reduction code so that moves that were not pruned away were still subject to my normal reductions.

You mentioned it is working for StockFish. I have not looked at their code (yet). Do they outright prune based only on the move's position in the list, or do they have the material+margin < alpha requirement as well, which really changes the way this works???
Thanks, bob, what's the difference between Crafty-23.4R03-1 and Crafty-23.4R03-2? Is Crafty-23.4R03-2 using LMP as well as all your usual pruning?

I don't see anything like a material + margin < alpha condition in the SF code for LMP. Other conditions are like yours, don't prune captures, don't prune if in check etc.

They are identical. I usually do two runs if the results are important, to reduce the chances of getting one wild value out at 4 sigma or something... if both are close, they can be trusted pretty well...

bob · Post by **bob** » Wed Sep 08, 2010 10:29 pm

silentshark wrote:
bob wrote: Notice that in the above test, I removed futility pruning since they overlap with each other. But I did keep the reduction code so that moves that were not pruned away were still subject to my normal reductions.
just spotted this statement. I guess there is some overlap, but there will be many instances where they prune different subtrees. So why not try with both together (if you can spare the cluster time)

I can do both by just "stacking them." Letting my normal pruning stuff prune, and if it chooses to not prune, then letting your approach prune. Running now...

bob · Post by **bob** » Thu Sep 09, 2010 3:17 am

Don wrote:
bob wrote:
Don wrote:
rbarreira wrote:
Don wrote: But what may be more important is the time control the programs were designed for. In a sense most programs are designed for human-like time controls on the platform of the day. I have a feeling that if we ran todays programs on yesterdays hardware at human-like time controls we would get a different answer than if we ran the same two programs on todays hardware.
I think you are definitely on to something here... Whether we run a new program on old hardware or an old program on new hardware, we're taking a fish out of the water. Today's programs probably do things that would be detrimental on old hardware (trusting the availability of large amounts of RAM, for one thing...), and old programs aren't tuned to take advantage of very fast hardware...

It's probably possible to get some reasonably good data out of these experiments anyway, but it might always have caveats.
Yes, it has to be taken with a grain of salt. Software and hardware tend to evolve together.

For example the MP issue. MP is not just hardware but represents a software advance too. So when we do these experiments with MP turned on in order to measure the "benefit" of hardware advances we will be pretending that software has nothing to do with it, even though it does.

One can rationalize that the parallel search is not chess-related. That it is more of a hardware optimization technique as opposed to a pure AI improvement. Whereas things like reductions, pruning, extensions, evaluations, etc are all related to the game being played...
One could make that rationalization but it doesn't fly with me. I have read several papers on parallel chess and including some you were involved with and there are special considerations for chess. I remember reading about simple root tree splitting algorithms in your early stuff (Cray Blitz) I think and then how you set out to improve it significantly.

Reread the things I have written. Even the DTS algorithm was applied to completely uniform game trees, worst-first game trees, and chess. But nothing in DTS is related to chess, any more than there is any chess specific stuff in my current parallel search. The parallel stuff is intertwined with minimax-alpha/beta significantly, but AB search works for chess, checkers, othello, go, etc... equally well.

So this is like saying that your hash table implementation in chess programs have nothing to do with chess because it's only a way to take advantage of memory which is a hardware thing. But how you do it is a huge thing, it's not irrelevant by any means.

It is clearly a search enhancement, not a chess one. But search does play the game, so it is "related" pretty well. And yes, parallel search is also somewhat related, but it really is independent of the game itself. And it would do absolutely nothing without the hardware. If the argument is always going to revolve around "it takes software to use the new hardware" then this test is going to go nowhere, fast. Parallel search is much more akin to pure optimization activities than to game-playing activities. The knowledge of the program is no bigger, the search is no better, all we gain is the speed provided by the new hardware. Hashing is not a hardware related thing. It was done in the 60's in Greenblatt's program, and has been done ever since. Whether you use one table, or two, or buckets, all of that is optimization related, once we have hashing in the search. So hashing itself is a software development, although it is not recent and would not apply to our 1995 to present comparison since it existed 30 years earlier. How you hash is an optimization issue, not a chess issue, unless you add new information to the table besides what was explained by Greenblatt...

I have a way to resolve this issue. I believe that's it's only a partial software issue and agree this is more about hardware - but software should be given some credit too. Clearly, if you have a good MP implementation such as in Crafty, you can run on as many cores as available. In other words, if you can run on 2 cores, you can run on any number of cores. So the software "credit" could be the amount of expected ELO gain for running on 2 cores vs 1 core. This is probably something like 40 ELO or so. I think Crafty gets about 100 ELO going to 4 cores. So we could credit 40 ELO for software advancements enabling MP and if we run on 4 cores credit 100 ELO for hardware advancement. And if you run on even more cores you don't keep giving credit for the software but consider it a constant.

A middle-of-the-road compromise. My original thinking was that software is anything that makes the program stronger, without regard to hardware. Hardware is anything that makes the program stronger by offering more speed, even if the program does have to be modified to take advantage. For example, Gerd's impossible-to-grasp flood-fill stuff that relies on the FPU changes to make the FP registers turn into 64 bit int registers. Or now into 128 bit registers.

There are bunch of the caveats that make this not an apples to apples comparison that I have come up with. The design of the program is based on the current hardware, we would write a "modern" program differently if it were run on older hardware. We depend on bigger cache sizes, we use compilers designed for new processors (although some have switches for older architectures), hash table sizes were often limited in the older programs which couldn't take advantage of todays memory sizes anyway. (But we should consider memory size on the hardware side of the issue.) Even though it may not be considered relevant, 64 bit programs are suddenly very popular in a time where 64 bit OS's are also very popular. Is this a hardware or software advance? Crafty has always been 64 bit most of the commodity OS's were 32 bit and probably could not take full advantage. I believe that on 64 bit hardware 64 bit programs have an advantage, but on 32 bit hardware I think it's probably a slight disadvantage.

I think a big issue may also be scalability. The results may depend very much on the time control - where running really fast would be more favorable for the older programs and running really slow for the newer programs. So if you run a modern program on ancient hardware where you only doing 3 or 4 ply searches, it's like putting a fast runner who is a slow starter in a really short race - he does not have a chance to really stretch his legs out.

bob · Post by **bob** » Thu Sep 09, 2010 3:21 am

Without the bother of cut/paste, the combination picks up 30 Elo over the pure LMP approach. But it is still 40 below stock 23.4... I think the idea of not pruning moves that might fail high is what is helping 23.4 here. Just lopping them off seems to wreck things. Reducing is less drastic. But the funny part is that the way I prune is about 2x as aggressive as your counter-based method, since the trees are about 1/2 the size of those your LMP idea searches.

silentshark · Post by **silentshark** » Thu Sep 09, 2010 9:16 pm

bob wrote:Without the bother of cut/paste, the combination picks up 30 Elo over the pure LMP approach. But it is still 40 below stock 23.4... I think the idea of not pruning moves that might fail high is what is helping 23.4 here. Just lopping them off seems to wreck things. Reducing is less drastic. But the funny part is that the way I prune is about 2x as aggressive as your counter-based method, since the trees are about 1/2 the size of those your LMP idea searches.

Useful to hear your results. But before you write off this idea completely, Bob, remember that this (or similar) ideas are clearly working for some strong programs. Not Crafty though.

Thanks again for taking the time to run these tests.

move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning - new results.

Re: move count based pruning - new results.