Comparative nodes per second

bob · Post by **bob** » Thu Apr 12, 2012 9:13 pm

lkaufman wrote:
bob wrote:
lkaufman wrote:
jdart wrote:
lkaufman wrote: 1. How does Ivanhoe achieve such a huge lead in NPS over Stockfish?
2. What is Ivanhoe giving up to achieve this speed? It can't just be simpler eval, as the speed gap is too large. But Ivanhoe would be totally crushed by Stockfish on an equal-nodes basis. Why?
a few comments:

1. As you probably know, NPS across programs is not highly correlated to playing strength. A very simple program can have very high NPS and not be a a good chessplayer. To take another example, Crafty has quite high NPS relative to other programs, but is not nearly equal to IvanHoe in strength.

2. Chess programs are complex beasts, and all the pieces typically interact. So for example programs with different board representations will behave quite differently. Also pruning and evaluation decisions affect NPS quite a bit. Many programs do eval at every node because it is used in pruning decisions. If you don't do this, or do lazy eval, you will be faster, but not necessarily stronger.

3. On the x86 platform, cache and memory usage has a very large effect on speed, especially for multithreaded programs. A program can be optimized at a local level by the compiler but optimizing memory usage usually has to be done by design.

4. I haven't done this but running the programs you are comparing under a good profiler (such as VTune) will answer some of your questions about causes of speed differences.

--Jon
Perhaps we'll try the profiler, but it won't answer the basic questions:
1. Is the reported NPS gap real or artificial?
2. If real, what accounts for SF being so much stronger than Ivanhoe on an equal nodes (rather than time) basis?

If the profiler shows much higher % of time spent in eval in SF, that could help explain #2. But if it shows no major differences in percentages, then what?
NPS is simply total nodes searched divided by time taken to search all of them. Or, in a simpler term, the time taken to search a single typical node, multiplied by the number of such nodes searched. There are lots of things one can do to change NPS. For example, use eval at all interior nodes. I typically see the q-search taking 80-90% of the total nodes searched, so adding eval to the interior nodes will slow it down by about 10%. What if that gives you better move ordering so that your tree is now 25% deeper on average? You are 10% slower, but getting to the same depth 25% faster. A net win.

Comparing NPS is not a particularly interesting number, any more than comparing the RPM reading on two different vehicles tells you anything at all about how fast they are going relative to each other, how much fuel they are consuming, or how much noise they are making.
Don't you have to do eval on the last four mainsearch plies in order to do futility pruning? If so then not doing eval at interior nodes should only save about 1%. Anyway the two programs being compared both do eval at interior nodes as this is needed for pruning decisions.
I agree that comparing NPS in general is rather pointless, but these are open source programs, so in theory the differences should be fully explainable. I am pointing to it because the program that does more NPS is much stronger at hyperspeed chess but not at regular chess. The question is whether the first observation explains the second.
Crafty is too unlike any of the top programs for comparisons to be relevant. One big difference is movecount pruning, which you correctly credited all the way back to Greenblatt in MacHack but which for some reason you don't use in Crafty. It took forty years to recognize that he was right!

I don't think what greenblatt described was anywhere near what we are talking about here. Greenblatt used static criteria to toss out (forward prune) moves even at the root. Not using dynamic search information such as history counters to order the moves on the fly and throw away the last few...

bob · Post by **bob** » Thu Apr 12, 2012 9:36 pm

Code: Select all

   2 Crafty-23.5R22-1     2662    4    4 30000   64%  2548   22% 
   3 Crafty-23.5R22-2     2661    4    4 30000   64%  2548   22% 
   7 Crafty-23.5R20-2     2658    4    4 30000   63%  2548   22% 
   8 Crafty-23.5-2        2657    4    4 30000   63%  2548   22% 
  10 Crafty-23.4-2        2641    4    4 30000   62%  2548   21% 
  11 Crafty-23.4-1        2639    4    4 30000   62%  2548   21% 
  12 Crafty-23.5R23       2636    7    7  7406   61%  2548   21%

23.5R22 is the best 23.5 to date. 23.5R23 is the same code but with the lazy eval code disabled. Early results show a -26 elo change, I will report final results a bit later...

lkaufman · Post by **lkaufman** » Thu Apr 12, 2012 10:12 pm

bob wrote:
lkaufman wrote:
jdart wrote:
Don't you have to do eval on the last four mainsearch plies in order to do futility pruning?
IIRC Crafty does futility based on material score only.

--Jon
What about for deciding whether to try null move or not? I don't see how Crafty could avoid calling the eval in the main search.
I always do null-move. I've tried all the alternatives and the only exception is that I use the hash table trick where entries with a draft that is too shallow can't be used to terminate the search, but they might have draft sufficient to tell me that a null-move search is hopeless.

Why would you want to waste time on null move if you are (let's say) a piece behind beta?

bob · Post by **bob** » Thu Apr 12, 2012 10:13 pm

Don wrote:
jdart wrote:
What about for deciding whether to try null move or not? I don't see how Crafty could avoid calling the eval in the main search.
No eval test is done in Crafty as a condition for null move. Arasan does not do this either. The efficacy of this is questionable.

--Jon
For us there is no question. We have tried many permutations of the general idea from always doing the null move test regardless of the evaluation to restricting it to certain node types, etc.

What has always worked best is to do null move pruning test only when the current evaluation is already >= beta. A lot of details in each program that could change this formula of course - but for us it's not a close enough call to even be ambiguous.

What does Arasan do? Do you always do the null move test?

I decided to give this a cluster test since that is one option I do not recall having tried in real games. Initially, with a few positional tests, it actually looks like a reasonable idea (I just added && eval >= beta to the end of the rest of the null-move qualifiers such as not being in check, the hash table didn't say null is futile, etc..., so that the eval call is not done on every node, just the ones that pass the earlier filters). Reduced the size of the tree, and slowed the nps. But with a net gain on just a few test positions. Am now running a cluster test to see if it is a real gain in terms of Elo...

More in a couple of hours...

lkaufman · Post by **lkaufman** » Thu Apr 12, 2012 10:18 pm

bob wrote:
I don't think what greenblatt described was anywhere near what we are talking about here. Greenblatt used static criteria to toss out (forward prune) moves even at the root. Not using dynamic search information such as history counters to order the moves on the fly and throw away the last few...

Well, he looked at a constant number of moves on the last ply, a bigger number on the previous one, etc., right? That is exactly what all the top programs do (with scoring conditions in some programs). Of course the criteria for selecting those moves is different now than forty-five years ago, but the principle is the same. If Greenblatt did this at the root, it was only because the computers then couldn't reach enough depth to justify going full width at the root. I don't think there is any real difference; Greenblatt deserves the credit for the idea.

bob · Post by **bob** » Thu Apr 12, 2012 10:55 pm

lkaufman wrote:
bob wrote:
I don't think what greenblatt described was anywhere near what we are talking about here. Greenblatt used static criteria to toss out (forward prune) moves even at the root. Not using dynamic search information such as history counters to order the moves on the fly and throw away the last few...
Well, he looked at a constant number of moves on the last ply, a bigger number on the previous one, etc., right? That is exactly what all the top programs do (with scoring conditions in some programs). Of course the criteria for selecting those moves is different now than forty-five years ago, but the principle is the same. If Greenblatt did this at the root, it was only because the computers then couldn't reach enough depth to justify going full width at the root. I don't think there is any real difference; Greenblatt deserves the credit for the idea.

What he did in tournaments was a tapered search. 15 moves at root, ordered by plausibility, rest were dumped and never searched. Next 2 plies were 9, last two were 7, on the version of his code I once had and played around with on an old dec PDP-10. No dynamic information at all. At root, generate all moves, order them by the plausibility scores, then chop off all but first 15. If you use history with LMR or LMP, the search is providing feedback as the iterations increase to help recognize those moves which are least likely to be useful vs those that are most likely to help...

There's a huge leap from static forward pruning to dynamic forward pruning where the search guides the pruning as it unfolds... Much easier and more accurate to do an iteration, then on the next iteration, use information from the last to help choose what to discard, rather than doing it with absolute no searching of any kind, just pure static criteria.

bob · Post by **bob** » Thu Apr 12, 2012 11:00 pm

bob wrote:
Don wrote:
jdart wrote:
What about for deciding whether to try null move or not? I don't see how Crafty could avoid calling the eval in the main search.
No eval test is done in Crafty as a condition for null move. Arasan does not do this either. The efficacy of this is questionable.

--Jon
For us there is no question. We have tried many permutations of the general idea from always doing the null move test regardless of the evaluation to restricting it to certain node types, etc.

What has always worked best is to do null move pruning test only when the current evaluation is already >= beta. A lot of details in each program that could change this formula of course - but for us it's not a close enough call to even be ambiguous.

What does Arasan do? Do you always do the null move test?
I decided to give this a cluster test since that is one option I do not recall having tried in real games. Initially, with a few positional tests, it actually looks like a reasonable idea (I just added && eval >= beta to the end of the rest of the null-move qualifiers such as not being in check, the hash table didn't say null is futile, etc..., so that the eval call is not done on every node, just the ones that pass the earlier filters). Reduced the size of the tree, and slowed the nps. But with a net gain on just a few test positions. Am now running a cluster test to see if it is a real gain in terms of Elo...

More in a couple of hours...

Early results so-so. +2 elo after 15,000 games, with an error bar between +/-5. I am suspecting it is one of those "no improvement ideas" for me at present, but will let the test run twice to confirm...

dchoman · Post by **dchoman** » Thu Apr 12, 2012 11:52 pm

bob wrote:
Don wrote:
jdart wrote:
What about for deciding whether to try null move or not? I don't see how Crafty could avoid calling the eval in the main search.
No eval test is done in Crafty as a condition for null move. Arasan does not do this either. The efficacy of this is questionable.

--Jon
For us there is no question. We have tried many permutations of the general idea from always doing the null move test regardless of the evaluation to restricting it to certain node types, etc.

What has always worked best is to do null move pruning test only when the current evaluation is already >= beta. A lot of details in each program that could change this formula of course - but for us it's not a close enough call to even be ambiguous.

What does Arasan do? Do you always do the null move test?
I decided to give this a cluster test since that is one option I do not recall having tried in real games. Initially, with a few positional tests, it actually looks like a reasonable idea (I just added && eval >= beta to the end of the rest of the null-move qualifiers such as not being in check, the hash table didn't say null is futile, etc..., so that the eval call is not done on every node, just the ones that pass the earlier filters). Reduced the size of the tree, and slowed the nps. But with a net gain on just a few test positions. Am now running a cluster test to see if it is a real gain in terms of Elo...

More in a couple of hours...

I tried this in EXchess a few weeks ago and it was a small net win. One thing I realized is that it should not slow down the program very much because I would need to do an eval anyway if the null move drops straight into qsearch. So if the eval >= beta, and I do the null move, I keep the score and just use it (with the sign flipped) when qsearch is called rather than repeat the eval.

- Dan

bob · Post by **bob** » Fri Apr 13, 2012 1:26 am

dchoman wrote:
bob wrote:
Don wrote:
jdart wrote:
What about for deciding whether to try null move or not? I don't see how Crafty could avoid calling the eval in the main search.
No eval test is done in Crafty as a condition for null move. Arasan does not do this either. The efficacy of this is questionable.

--Jon
For us there is no question. We have tried many permutations of the general idea from always doing the null move test regardless of the evaluation to restricting it to certain node types, etc.

What has always worked best is to do null move pruning test only when the current evaluation is already >= beta. A lot of details in each program that could change this formula of course - but for us it's not a close enough call to even be ambiguous.

What does Arasan do? Do you always do the null move test?
I decided to give this a cluster test since that is one option I do not recall having tried in real games. Initially, with a few positional tests, it actually looks like a reasonable idea (I just added && eval >= beta to the end of the rest of the null-move qualifiers such as not being in check, the hash table didn't say null is futile, etc..., so that the eval call is not done on every node, just the ones that pass the earlier filters). Reduced the size of the tree, and slowed the nps. But with a net gain on just a few test positions. Am now running a cluster test to see if it is a real gain in terms of Elo...

More in a couple of hours...
I tried this in EXchess a few weeks ago and it was a small net win. One thing I realized is that it should not slow down the program very much because I would need to do an eval anyway if the null move drops straight into qsearch. So if the eval >= beta, and I do the null move, I keep the score and just use it (with the sign flipped) when qsearch is called rather than repeat the eval.

- Dan

First run completed and showed "0 elo improvement" but one takes that with the +/-4 error bar to be not so conclusive. Running another 30K games to refine the accuracy somewhat...

If it sticks at 0 gain, I am done. If it shows some promise, I might try to optimize a bit. But I have tried so many things with null-move in the past, this was almost certainly one of them, and there's generally a reason something is not in my code like that, in that it simply didn't work for me. I am going to look at this a bit further to see if there are any other potential things to try, again..

lkaufman · Post by **lkaufman** » Fri Apr 13, 2012 1:40 am

bob wrote:
dchoman wrote:
bob wrote:
Don wrote:
jdart wrote:
What about for deciding whether to try null move or not? I don't see how Crafty could avoid calling the eval in the main search.
No eval test is done in Crafty as a condition for null move. Arasan does not do this either. The efficacy of this is questionable.

--Jon
For us there is no question. We have tried many permutations of the general idea from always doing the null move test regardless of the evaluation to restricting it to certain node types, etc.

What has always worked best is to do null move pruning test only when the current evaluation is already >= beta. A lot of details in each program that could change this formula of course - but for us it's not a close enough call to even be ambiguous.

What does Arasan do? Do you always do the null move test?
I decided to give this a cluster test since that is one option I do not recall having tried in real games. Initially, with a few positional tests, it actually looks like a reasonable idea (I just added && eval >= beta to the end of the rest of the null-move qualifiers such as not being in check, the hash table didn't say null is futile, etc..., so that the eval call is not done on every node, just the ones that pass the earlier filters). Reduced the size of the tree, and slowed the nps. But with a net gain on just a few test positions. Am now running a cluster test to see if it is a real gain in terms of Elo...

More in a couple of hours...
I tried this in EXchess a few weeks ago and it was a small net win. One thing I realized is that it should not slow down the program very much because I would need to do an eval anyway if the null move drops straight into qsearch. So if the eval >= beta, and I do the null move, I keep the score and just use it (with the sign flipped) when qsearch is called rather than repeat the eval.

- Dan
First run completed and showed "0 elo improvement" but one takes that with the +/-4 error bar to be not so conclusive. Running another 30K games to refine the accuracy somewhat...

If it sticks at 0 gain, I am done. If it shows some promise, I might try to optimize a bit. But I have tried so many things with null-move in the past, this was almost certainly one of them, and there's generally a reason something is not in my code like that, in that it simply didn't work for me. I am going to look at this a bit further to see if there are any other potential things to try, again..

Probably for you, the cost of calling the eval offsets the benefit. For us (and all the top engines) the eval must be called for various other prunings anyway, so the benefit is free.

Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second (early results)

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second