LMR and null move selectivity

bob · Post by **bob** » Tue Sep 01, 2009 2:01 am

rvida wrote:
mcostalba wrote: It is anyhow interesting to see that is not 100% accurate because a very small percent of null moves that satisfy the conditions fail high the same also if they shouldn't, about 340 on more then 120K so less then 0.3%

I didn't had the time to further analyse what kind of positions fail the rule.
It can fail because you dont know which node type the hash entry comes from. An entry from a (former) PV node was reduced less, with more accurate result. Result from more reduced non-PV search of that node may differ.

Doesn't matter. I know what _kind_ if value the hash entry represents. And if it says "best you can get is X, where X is < beta, then a null can't fail high unless you are in zugzwang, which is exactly where you do _not_ want to do a null-move search anyway.

bob · Post by **bob** » Tue Sep 01, 2009 2:06 am

Here are the results. I had previously run a test for two different value parameters, so I simply re-ran them again, exactly the same, except that the null-move test was tossed out. 23.1R11 is the code without the avoid_null code, 23.1R10 has it in...

Code: Select all

Name                  Elo    +    - games score oppo. draws
Crafty-23.1R10-50    2601    4    4 40000   48%  2618   22%
Crafty-23.1R10-75    2600    5    4 40000   48%  2618   22% 
Crafty-23.1R11-50    2598    4    4 40000   47%  2618   22% 
Crafty-23.1R11-75    2595    4    4 40000   47%  2618   22%

So it is worth a couple of Elo, maybe. Did seem to lose one more game out of every 100 played based on the scoring percentage.

All played using the same starting positions, same opponents. PGNs were combined into one file to produce the above.

mcostalba · Post by **mcostalba** » Tue Sep 01, 2009 7:13 am

bob wrote: I don't see how that could happen. Null-move searches only fail high when our current position is so good that even if we don't make a move we are still winning. I don't see how the hash table could say "score is <= X yet a null-move search could fail high." _UNLESS_ you are catching zugzwang positions, since that can certainly happen in those, and represent an error to boot.

I am not sure about it.

When you search two times the same position you can get different results even at same search depth.

This is due because you don't search _all_ the moves but just a fraction and that fraction you search at different depth, some moves you go deeper some less due to LMR.

The fraction of moves that you search is history dependent and history is a dynamic thing and can change as time passes by.

IOW suppose the stored hash value was from a previous search where due to history pruning you didn't try a given winning move, after a while, that winning move is perhaps discovered and added to history.

When you research the position with a null move search now that winning move has a good history score and is no more pruned away, so null search tries it, the good move makes its job and the search returns a fail high.

I see two ways this happens through history and LMR:

1) Inaccurate history makes a good move be pruned.

2) Inaccurate history makes a good move to be ordered at the end of the list where LMR is in action. The move is reduced and the reduced depth is not enough to see the move is a winning move.

History, when combined with LMR, is a dynamic thing and and can vary as the search goes on so what you have written in hash table is not exactly like written in stone.

bob · Post by **bob** » Tue Sep 01, 2009 6:39 pm

mcostalba wrote:
bob wrote: I don't see how that could happen. Null-move searches only fail high when our current position is so good that even if we don't make a move we are still winning. I don't see how the hash table could say "score is <= X yet a null-move search could fail high." _UNLESS_ you are catching zugzwang positions, since that can certainly happen in those, and represent an error to boot.
I am not sure about it.

When you search two times the same position you can get different results even at same search depth.

Only due to an occasional transposition table "boost". This is pretty rare. And given two consecutive searches with the same table, this is rarer still.

This is due because you don't search _all_ the moves but just a fraction and that fraction you search at different depth, some moves you go deeper some less due to LMR.

But the same moves will get reduced if your code has been carefully checked.

The fraction of moves that you search is history dependent and history is a dynamic thing and can change as time passes by.

IOW suppose the stored hash value was from a previous search where due to history pruning you didn't try a given winning move, after a while, that winning move is perhaps discovered and added to history.

When you research the position with a null move search now that winning move has a good history score and is no more pruned away, so null search tries it, the good move makes its job and the search returns a fail high.

This assumes one uses "history". I do not. Testing has shown it is completely worthless. Might not hurt, but never helps either.

I see two ways this happens through history and LMR:

1) Inaccurate history makes a good move be pruned.

2) Inaccurate history makes a good move to be ordered at the end of the list where LMR is in action. The move is reduced and the reduced depth is not enough to see the move is a winning move.

History, when combined with LMR, is a dynamic thing and and can vary as the search goes on so what you have written in hash table is not exactly like written in stone.

There is a moral to that, of course. Do _not_ use history for pruning decisions. It doesn't work.

bob · Post by **bob** » Tue Sep 01, 2009 6:47 pm

Tord Romstad wrote:
bob wrote:It is one of those things that is obviously correct,
It is only "obviously correct" if you don't use the null move search for anything besides pruning. Without the threat move returned by the null move search, it is harder to forward prune safely.

I don't agree. I have done a _ton_ of testing using this so-called "threat move" to control pruning. It really doesn't work. It might do fine on test positions, but when you test with real games, and a lot of them, I could not find anything that worked. I even tried your code directly, porting it to crafty. I tried eliminating your "threat move code" completely and testing on our cluster. The results didn't change one bit, with or without.

Another factor worth noting is that in the cases where this trick can be applied, the null move search will normally be even cheaper than usual, because most branches of the subtree will be pruned directly by transposition table lookups. I think a few extra nodes searched is a fair price to pay for getting a good threat move.

But of course, intuition isn't a very reliable tool in this case, and the only way to know is to test.

If I had found any positive benefit from this "threat move" then you might be right. But the best move returned by the null-move is, IMHO, pretty worthless. Of what benefit is it to do anything with a move that is played after your opponent "passes"? It is an idea that sounds reasonable, but does not appear to do anything useful (or harmful either) in the testing I have done... After one "passes" the opponent almost always has a "threat". That is why null-move works. If there is no threat, and null fails low, we search normally. When the null move fails high, to call the move that produces this a "threat" is a misnomer. In fact, if you look at the trees, for most fail highs, I am already so far ahead in material that even if I "pass" you can't win enough back to make this relevant. In most "fail lows" I simply lose material because I don't don't react to your previous move, not because you have some sort of threat that needs attention. I simply could not find any variation of this idea that worked, with one very old exception, that of the "threat detection" Chrilly described about 15 years ago or so. And I found that too expensive to be practical, although I saved the code and retest it from time to time for curiosity.

Don · Post by **Don** » Tue Sep 01, 2009 7:33 pm

bob wrote:I really never measured this, although I suppose I could run a test on the cluster with and without to see if there is significant difference. I'll start that running right now for giggles to see. It is one of those things that is obviously correct, and it does happen in the search in real games, and the cost is essentially zero. But as to the actual improvement, I will post the answer to this tonight.

Did you run the test? What was the result?

bob · Post by **bob** » Tue Sep 01, 2009 8:01 pm

Don wrote:
bob wrote:I really never measured this, although I suppose I could run a test on the cluster with and without to see if there is significant difference. I'll start that running right now for giggles to see. It is one of those things that is obviously correct, and it does happen in the search in real games, and the cost is essentially zero. But as to the actual improvement, I will post the answer to this tonight.
Did you run the test? What was the result?

Posted in this thread, subject ends with "data"...

This seems to be worth winning an extra game every 100 games, or maybe 3-4 Elo max...

frankp · Post by **frankp** » Tue Sep 01, 2009 8:07 pm

bob wrote:Here are the results. I had previously run a test for two different value parameters, so I simply re-ran them again, exactly the same, except that the null-move test was tossed out. 23.1R11 is the code without the avoid_null code, 23.1R10 has it in...
Code: Select all
Name                  Elo    +    - games score oppo. draws
Crafty-23.1R10-50    2601    4    4 40000   48%  2618   22%
Crafty-23.1R10-75    2600    5    4 40000   48%  2618   22% 
Crafty-23.1R11-50    2598    4    4 40000   47%  2618   22% 
Crafty-23.1R11-75    2595    4    4 40000   47%  2618   22% 
So it is worth a couple of Elo, maybe. Did seem to lose one more game out of every 100 played based on the scoring percentage.

All played using the same starting positions, same opponents. PGNs were combined into one file to produce the above.

Bob

Did you manage to automate 'bayselo' to produce the above table, or are just patient at typing and remember the commands?

bob · Post by **bob** » Tue Sep 01, 2009 9:14 pm

frankp wrote:
bob wrote:Here are the results. I had previously run a test for two different value parameters, so I simply re-ran them again, exactly the same, except that the null-move test was tossed out. 23.1R11 is the code without the avoid_null code, 23.1R10 has it in...
Code: Select all
Name                  Elo    +    - games score oppo. draws
Crafty-23.1R10-50    2601    4    4 40000   48%  2618   22%
Crafty-23.1R10-75    2600    5    4 40000   48%  2618   22% 
Crafty-23.1R11-50    2598    4    4 40000   47%  2618   22% 
Crafty-23.1R11-75    2595    4    4 40000   47%  2618   22% 
So it is worth a couple of Elo, maybe. Did seem to lose one more game out of every 100 played based on the scoring percentage.

All played using the same starting positions, same opponents. PGNs were combined into one file to produce the above.
Bob

Did you manage to automate 'bayselo' to produce the above table, or are just patient at typing and remember the commands?

When I run a test, once it finishes, I combine the PGN from that test with the PGN from previous tests (where they contain versions I am comparing against) and then run the entire mess through bayeselo and append this to a results file automatically. All I need to do to run a test is:

(1) make a different version for each different idea; or

(2) if this is a single idea where I want to tune, I have code in Crafty (scale n command) that will take any integer value and stuff it into a global variable "scale", which is used in the code I am testing.

(3) after doing one (or both) of the above, I have a shell script that I edit where I can change the time control if I wish (usually I do not) and then I simply tell it which versions to test, and if I am using the "scale" parameter, I tell it for each version, what values of "scale" to try. The script prepares the cluster scripts that actually play the games, and then fires off instances of my referee for each node used which takes those scripts and actually plays the indicated opponents using the indicated starting positions. When the match ends, the name tag in each PGN for that match is changed to "Crafty-xx.xRxx-scale" where xx.x is the normal version number, Rxx where xx is a unique number indicating my test version (I also have Txx for versions from Tracy, etc). When I finish I have an enormous PGN collection containing all versions, which gets pushed thru Bayeselo and the result added to a master results file. I also get directories of the form Crafty-xx.xRxx-scale-match which has just the PGN from that specific version/sub-version/scale in case I want to look at them individually later.

As a general rule, all I change in the shell script is the version number, and occasionally add some "scale numbers" to tune a single parameter. Then I submit that script and watch as the thing plays something like 256 games at a time on the small cluster, or 540 games at a time on the big one.

edit:

almost forgot. I have a "stock" set of commands I pass to bayeselo. Before I execute bayeselo I create a file "master.pgn" which has all the pgn results I want to compare. Bayeselo reads that big file and spits out the results as you saw them above. I don't do anything to get this, it is automatically produced (and updated) after each version completes a test run.

michiguel · Post by **michiguel** » Tue Sep 01, 2009 9:16 pm

bob wrote:
Don wrote:
bob wrote:
xcomponent wrote:
Doesn't seem to hurt. But doesn't help either.
That's true when it comes down to LMR. For null move i would say that there is a noticeable difference.I mean "if(!pvnode)" for null move definitely hurts. I'm talking about my own tests of course. May be someone would disagree.
Same for me. _any_ restriction on where Null-move is tried, with the exception of the transposition table trick, hurts performance in my cluster testing.
What is the transposition table trick?

I am now NOT reducing any move after a null move. This is mostly due to superstition. In testing it comes out 1 or 2 ELO stronger but with a margin of error much greater (about 5) so it's not clear whether it's any kind of benefit or not.
This was first mentioned by someone I do not remember. The idea is this...

When you do a hash probe, often the draft is not sufficient and you can use the best move, but not the score/bound. But if you take the current depth, and subtract the usual null-move depth from it, you check the draft against this limit, which is what you will do in a null-move search anyway. If the draft is now sufficient, but the probe result says the score or bound will _not_ fail high here, then we just proved that a null-move search would not fail high here either. I return a flag "avoid_null = 1" to Search() which then refuses to try the null-move since we know it will be wasted.

If that isn't clear, let me know and I will do a more formal explanation.

If I understand correctly, I was doing exactly that since I added nullmove! (~8 years ago) to Gaviota. I always thought everybody was doing something similar since it is very simple. Of course, I never tested if it was better or worse, I assumed that in the worse case scenario it was neutral.

Miguel

LMR and null move selectivity

Re: LMR and null move selectivity

Re: LMR and null move selectivity -- data

Re: LMR and null move selectivity

Re: LMR and null move selectivity

Re: LMR and null move selectivity

Re: LMR and null move selectivity

Re: LMR and null move selectivity

Re: LMR and null move selectivity -- data

Re: LMR and null move selectivity -- data

Re: LMR and null move selectivity