Comparative nodes per second

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Uri Blass
Posts: 10267
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Comparative nodes per second

Post by Uri Blass »

lkaufman wrote:
Uri Blass wrote:
lkaufman wrote:
bob wrote:
lkaufman wrote:
bob wrote:
lkaufman wrote:
jdart wrote:
Don't you have to do eval on the last four mainsearch plies in order to do futility pruning?
IIRC Crafty does futility based on material score only.

--Jon
What about for deciding whether to try null move or not? I don't see how Crafty could avoid calling the eval in the main search.
I always do null-move. I've tried all the alternatives and the only exception is that I use the hash table trick where entries with a draft that is too shallow can't be used to terminate the search, but they might have draft sufficient to tell me that a null-move search is hopeless.
Why would you want to waste time on null move if you are (let's say) a piece behind beta?
You can still have some unstoppable threat that the opponent can't fix even with two moves in a row... It is not really what I "want to do" but what has actually tested to be stronger in my program.

I do a null-move search unless any one of the following conditions is false:

1. previous ply did not try a null move;

2. hash table hit did not prove that a null-move would likely fail low and be worthless.

3. This is a null-window search (ie not the first move along a PV path, although once the PV-node window collapses to zero, null is done).

4. side on move is not in check

5. moving side has at least one piece left on the board.
Have you actually tested "don't try null move if behind in material"? I would expect that to be better than what you are doing now.
I am not sure if I understand you correctly.

Do you suggest simply not doing null move for the side that has less material when beta is not important?

I think that if some condition like this that is not dependent on beta works for Crafty then replacing only material by lazy evaluation even works better.
No, I still meant to compare to beta. I agree that lazy eval should work better if it is nearly free, but my main point is that whether you use material only or lazy eval, you should do null move unless the score is below beta by some serious margin, maybe half a pawn. In other words, you want to do null move unless you are maybe 90% confident that the true current score is below beta.
It seems to me that your reply suggests something different

From your earlier post:

"Using the crude numbers means that in general you would do null move unless a side is down a full pawn, which usually means his true eval is negative."

Avoiding null move only when a side is down a full pawn means that you do not compare to beta
otherwise beta may be 0.1 pawns when the material is equal and you are not going to do null move pruning.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Comparative nodes per second

Post by lkaufman »

Uri Blass wrote: It seems to me that your reply suggests something different

From your earlier post:

"Using the crude numbers means that in general you would do null move unless a side is down a full pawn, which usually means his true eval is negative."

Avoiding null move only when a side is down a full pawn means that you do not compare to beta
otherwise beta may be 0.1 pawns when the material is equal and you are not going to do null move pruning.
Yes, you are right, my earlier post only makes sense if beta is also rounded to the nearest pawn. It's much better to use a margin.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Comparative nodes per second

Post by bob »

Uri Blass wrote:
bob wrote:
dchoman wrote:
bob wrote:
Don wrote:
jdart wrote:
What about for deciding whether to try null move or not? I don't see how Crafty could avoid calling the eval in the main search.
No eval test is done in Crafty as a condition for null move. Arasan does not do this either. The efficacy of this is questionable.

--Jon
For us there is no question. We have tried many permutations of the general idea from always doing the null move test regardless of the evaluation to restricting it to certain node types, etc.

What has always worked best is to do null move pruning test only when the current evaluation is already >= beta. A lot of details in each program that could change this formula of course - but for us it's not a close enough call to even be ambiguous.

What does Arasan do? Do you always do the null move test?
I decided to give this a cluster test since that is one option I do not recall having tried in real games. Initially, with a few positional tests, it actually looks like a reasonable idea (I just added && eval >= beta to the end of the rest of the null-move qualifiers such as not being in check, the hash table didn't say null is futile, etc..., so that the eval call is not done on every node, just the ones that pass the earlier filters). Reduced the size of the tree, and slowed the nps. But with a net gain on just a few test positions. Am now running a cluster test to see if it is a real gain in terms of Elo...

More in a couple of hours...
I tried this in EXchess a few weeks ago and it was a small net win. One thing I realized is that it should not slow down the program very much because I would need to do an eval anyway if the null move drops straight into qsearch. So if the eval >= beta, and I do the null move, I keep the score and just use it (with the sign flipped) when qsearch is called rather than repeat the eval.

- Dan
First run completed and showed "0 elo improvement" but one takes that with the +/-4 error bar to be not so conclusive. Running another 30K games to refine the accuracy somewhat...

If it sticks at 0 gain, I am done. If it shows some promise, I might try to optimize a bit. But I have tried so many things with null-move in the past, this was almost certainly one of them, and there's generally a reason something is not in my code like that, in that it simply didn't work for me. I am going to look at this a bit further to see if there are any other potential things to try, again..
I think that 0 elo shows some promise because it is possible that with some changes in the condition it is more than 0 elo.

The problem is that calling evaluation is expensive for you and there are many cases when eval>=beta so you do not save nodes in the search but you lose speed so you may try to
replace eval>=beta by
(lazy_eval>=beta||eval>=beta) because checking this condition is faster
for all the cases that lazy_eval>=beta.
I tested that also. And just a simple material test as well, the cheapest eval of all. Zero improvement. I am waiting on a machine to get re-booted and I will try to post the actual data. Just can't get to it right now...

If I recall correctly, just checking the material score actually hurt the elo by something like -20. The lazy eval test hurt by something like -7, and the full eval was a pure wash, no change...

I did leave my other constraints in place of course...
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Comparative nodes per second

Post by Don »

bob wrote: I tested that also. And just a simple material test as well, the cheapest eval of all. Zero improvement. I am waiting on a machine to get re-booted and I will try to post the actual data. Just can't get to it right now...

If I recall correctly, just checking the material score actually hurt the elo by something like -20. The lazy eval test hurt by something like -7, and the full eval was a pure wash, no change...

I did leave my other constraints in place of course...
What is pretty interesting to me is how any given idea can work well in one program and not in another. We have experimented many times with this idea and in every case null pruning works best when we call the evaluation function and only do it when the score is >= beta. We have even tried using margins so that we do null move pruning when it is (for example) within half a pawn of beta. We think it should definitely help, but it doesn't. The reason we think it should help is because you could have very powerful threats that will survive the null move and the opponent cannot answer.

We usually revisit almost everything sooner or later, so I have no doubt we will eventually try again to see what happens!
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Comparative nodes per second

Post by lkaufman »

bob wrote: If I recall correctly, just checking the material score actually hurt the elo by something like -20. The lazy eval test hurt by something like -7, and the full eval was a pure wash, no change...

I did leave my other constraints in place of course...
Of course using just material score or even lazy score will hurt if you don't use a margin, because there is a high probability that the static score is above beta if the lazy score is just slightly below beta. Or do you mean that you used lazy eval with the lazy margin subtracted from beta for the comparison? That would be the right way to do it. I'm pretty sure that the reason full eval didn't work for you is just the overhead of the eval, which doesn't apply to us as we need the score anyway for other uses. If you got a zero result despite the cost, then obviously the idea would help if there were no cost.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Comparative nodes per second

Post by bob »

Don wrote:
bob wrote: I tested that also. And just a simple material test as well, the cheapest eval of all. Zero improvement. I am waiting on a machine to get re-booted and I will try to post the actual data. Just can't get to it right now...

If I recall correctly, just checking the material score actually hurt the elo by something like -20. The lazy eval test hurt by something like -7, and the full eval was a pure wash, no change...

I did leave my other constraints in place of course...
What is pretty interesting to me is how any given idea can work well in one program and not in another. We have experimented many times with this idea and in every case null pruning works best when we call the evaluation function and only do it when the score is >= beta. We have even tried using margins so that we do null move pruning when it is (for example) within half a pawn of beta. We think it should definitely help, but it doesn't. The reason we think it should help is because you could have very powerful threats that will survive the null move and the opponent cannot answer.

We usually revisit almost everything sooner or later, so I have no doubt we will eventually try again to see what happens!
Here are some numbers:

Normal crafty 23.5: 2655
avoiding null unless eval >= beta + standard rules given: 2655
avoiding null unless fast_eval >= beta + standard rules: 2649
avoiding null if material < beta + standard rules: 2640

It is interesting that the full eval is equal. The tree averages a smaller size, but the NPS drops as well, apparently exactly offsetting the size savings with a slower search.

I will try a slower test to see if slower games make any difference, but this will take a couple of days... Just queued up a 60s+1s 30K game match for normal crafty, full eval > beta and material > beta... This might take a couple of days as the cluster is currently half-powered-down leaving just 64 nodes up...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Comparative nodes per second

Post by bob »

lkaufman wrote:
bob wrote: If I recall correctly, just checking the material score actually hurt the elo by something like -20. The lazy eval test hurt by something like -7, and the full eval was a pure wash, no change...

I did leave my other constraints in place of course...
Of course using just material score or even lazy score will hurt if you don't use a margin, because there is a high probability that the static score is above beta if the lazy score is just slightly below beta. Or do you mean that you used lazy eval with the lazy margin subtracted from beta for the comparison? That would be the right way to do it. I'm pretty sure that the reason full eval didn't work for you is just the overhead of the eval, which doesn't apply to us as we need the score anyway for other uses. If you got a zero result despite the cost, then obviously the idea would help if there were no cost.
Yes to the margin idea. But these were pretty fast games. I tried 4 different margins and reported the results for the best one.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Comparative nodes per second

Post by bob »

lkaufman wrote:
bob wrote:[ Have you actually tested "don't try null move if behind in material"? I would expect that to be better than what you are doing now.
Did you read my testing post? I actually tested "if eval < beta don't do a null move". I also tried a "fast eval". And a material only. Made absolutely no difference. I'll dig up the results and post when a server at the office gets rebooted so I can get in...
I didn't realize you had tested material only. That should avoid eval calls and so the benefit should be free. Do you use precise material values for this test or just 1-3-3-5-9? In this case the crude numbers should be best, because if you use precise values the side slightly behind in material (say bishop for knight for example) may actually be ahead in true eval. Using the crude numbers means that in general you would do null move unless a side is down a full pawn, which usually means his true eval is negative. If you didn't try this, I recommend doing so. I can hardly see how this could fail to help at least slightly.[/quote]

Not quite sure what you mean by "precise values." But what I used was simply the material score that is incrementally maintained, where things like pawn=100, knight=325, rook=500 and queen=1050, our cluster-optimized material values. Nothing else is added. For the "fast eval test" I added in PST values and a few other things, but did not do the major piece loops where most of the time is spent in eval...
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Comparative nodes per second

Post by lkaufman »

bob wrote:
lkaufman wrote:
bob wrote: If I recall correctly, just checking the material score actually hurt the elo by something like -20. The lazy eval test hurt by something like -7, and the full eval was a pure wash, no change...

I did leave my other constraints in place of course...
Of course using just material score or even lazy score will hurt if you don't use a margin, because there is a high probability that the static score is above beta if the lazy score is just slightly below beta. Or do you mean that you used lazy eval with the lazy margin subtracted from beta for the comparison? That would be the right way to do it. I'm pretty sure that the reason full eval didn't work for you is just the overhead of the eval, which doesn't apply to us as we need the score anyway for other uses. If you got a zero result despite the cost, then obviously the idea would help if there were no cost.
Yes to the margin idea. But these were pretty fast games. I tried 4 different margins and reported the results for the best one.
Just out of curiosity, what were the best margins for the two cases (lazy and material)?

Since you found that the cost of doing the eval exactly balanced out the benefit, this suggests that you should try using this restriction only when depth is not too low, perhaps above your 4 plies of futility. The cost becomes insignificant if you are far enough from the leaves, but I don't think the same holds for the benefit.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Comparative nodes per second

Post by lkaufman »

bob wrote:
lkaufman wrote:
bob wrote:[ Have you actually tested "don't try null move if behind in material"? I would expect that to be better than what you are doing now.
Did you read my testing post? I actually tested "if eval < beta don't do a null move". I also tried a "fast eval". And a material only. Made absolutely no difference. I'll dig up the results and post when a server at the office gets rebooted so I can get in...
I didn't realize you had tested material only. That should avoid eval calls and so the benefit should be free. Do you use precise material values for this test or just 1-3-3-5-9? In this case the crude numbers should be best, because if you use precise values the side slightly behind in material (say bishop for knight for example) may actually be ahead in true eval. Using the crude numbers means that in general you would do null move unless a side is down a full pawn, which usually means his true eval is negative. If you didn't try this, I recommend doing so. I can hardly see how this could fail to help at least slightly.
Not quite sure what you mean by "precise values." But what I used was simply the material score that is incrementally maintained, where things like pawn=100, knight=325, rook=500 and queen=1050, our cluster-optimized material values. Nothing else is added. For the "fast eval test" I added in PST values and a few other things, but did not do the major piece loops where most of the time is spent in eval...[/quote]

I was mainly wondering whether you used identical values for bishop and knight or not. But this won't matter if you used a decent-sized margin. By the way, do you treat the bishop pair as part of the material value?