Comparative nodes per second

bob · Post by **bob** » Thu Apr 12, 2012 3:19 am

lkaufman wrote:
jdart wrote:
2. If real, what accounts for SF being so much stronger than Ivanhoe on an equal nodes (rather than time) basis?
I am not sure this is even a reasonable question. Real games are not conducted with fixed node counts so this test is not really indicative of playing strength. Also pruning and extensions are different in these programs. So for the same number of nodes you are visiting a very different slice of the search tree, and I'd expect results to be different.

--Jon
I am well aware of the effects of these differences, and of course results will differ. However I know from my own experiments that it is impossible to make reasonable changes to pruning and/or extensions in a program like Stockfish, Ivanhoe, or Komodo that would change the results AT FIXED NODES by anywhere near the margin of superiority SF exhibits over Ivanhoe on this basis. I suppose some of the margin of superiority is due to a better (but slower) eval, and perhaps some more is due to a difference in node counting. But the gap is so large that I am looking for some third factor. Something that would account for the huge disparity in the relative strength of the two engines in bullet but not in blitz or longer games. This is the phenomenon I really want to understand.

Just take stockfish and turn off LMR. Then run your fixed node test. The results will be significantly skewed. As jon said, fixed nodes makes sense when playing two versions of your program, but not when playing to different programs against each other. This has been rehashed over and over here. Ditto for fixed depth matches. Or anything other than fixed time matches. And fixed-time only works because that is the actual way real games are played.

lkaufman · Post by **lkaufman** » Thu Apr 12, 2012 3:35 am

bob wrote:
lkaufman wrote:
jdart wrote:
lkaufman wrote: 1. How does Ivanhoe achieve such a huge lead in NPS over Stockfish?
2. What is Ivanhoe giving up to achieve this speed? It can't just be simpler eval, as the speed gap is too large. But Ivanhoe would be totally crushed by Stockfish on an equal-nodes basis. Why?
a few comments:

1. As you probably know, NPS across programs is not highly correlated to playing strength. A very simple program can have very high NPS and not be a a good chessplayer. To take another example, Crafty has quite high NPS relative to other programs, but is not nearly equal to IvanHoe in strength.

2. Chess programs are complex beasts, and all the pieces typically interact. So for example programs with different board representations will behave quite differently. Also pruning and evaluation decisions affect NPS quite a bit. Many programs do eval at every node because it is used in pruning decisions. If you don't do this, or do lazy eval, you will be faster, but not necessarily stronger.

3. On the x86 platform, cache and memory usage has a very large effect on speed, especially for multithreaded programs. A program can be optimized at a local level by the compiler but optimizing memory usage usually has to be done by design.

4. I haven't done this but running the programs you are comparing under a good profiler (such as VTune) will answer some of your questions about causes of speed differences.

--Jon
Perhaps we'll try the profiler, but it won't answer the basic questions:
1. Is the reported NPS gap real or artificial?
2. If real, what accounts for SF being so much stronger than Ivanhoe on an equal nodes (rather than time) basis?

If the profiler shows much higher % of time spent in eval in SF, that could help explain #2. But if it shows no major differences in percentages, then what?
NPS is simply total nodes searched divided by time taken to search all of them. Or, in a simpler term, the time taken to search a single typical node, multiplied by the number of such nodes searched. There are lots of things one can do to change NPS. For example, use eval at all interior nodes. I typically see the q-search taking 80-90% of the total nodes searched, so adding eval to the interior nodes will slow it down by about 10%. What if that gives you better move ordering so that your tree is now 25% deeper on average? You are 10% slower, but getting to the same depth 25% faster. A net win.

Comparing NPS is not a particularly interesting number, any more than comparing the RPM reading on two different vehicles tells you anything at all about how fast they are going relative to each other, how much fuel they are consuming, or how much noise they are making.

Don't you have to do eval on the last four mainsearch plies in order to do futility pruning? If so then not doing eval at interior nodes should only save about 1%. Anyway the two programs being compared both do eval at interior nodes as this is needed for pruning decisions.
I agree that comparing NPS in general is rather pointless, but these are open source programs, so in theory the differences should be fully explainable. I am pointing to it because the program that does more NPS is much stronger at hyperspeed chess but not at regular chess. The question is whether the first observation explains the second.
Crafty is too unlike any of the top programs for comparisons to be relevant. One big difference is movecount pruning, which you correctly credited all the way back to Greenblatt in MacHack but which for some reason you don't use in Crafty. It took forty years to recognize that he was right!

BubbaTough · Post by **BubbaTough** » Thu Apr 12, 2012 4:32 am

lkaufman wrote: Crafty is too unlike any of the top programs for comparisons to be relevant.

So you think Stockfish is more like Igor than it is like Crafty? Is this based on elo, or looking at the code?

-Sam

lkaufman · Post by **lkaufman** » Thu Apr 12, 2012 4:50 am

BubbaTough wrote:
lkaufman wrote: Crafty is too unlike any of the top programs for comparisons to be relevant.
So you think Stockfish is more like Igor than it is like Crafty? Is this based on elo, or looking at the code?

-Sam

The code, of course. I'm only talking about search, not eval. I don't mean to imply that Stockfish bears any close relationship to any of the Ippos (or Rybka), they are very different, as is Komodo from either, especially from the Ippos. But the differences are much smaller than the differences between any of these engines and Crafty or pretty much any other pre-Rybka engine. We are all using the same basic ideas of heavy futility, movecount, and static null pruning, as well as singular extension of hashmoves. The pre-Rybka engines rarely used any of these ideas except perhaps futility pruning (and apparently some static null pruning in Rebel). The details differ greatly between Stockfish and the Ippos, enough so as to make their behavior at different time levels very different. Komodo is generally in between SF and the Ippos in most search issues where parameters must be chosen, and usually sides with Stockfish on issues that can only be done one way or the other.

BubbaTough · Post by **BubbaTough** » Thu Apr 12, 2012 5:10 am

lkaufman wrote:
BubbaTough wrote:
lkaufman wrote: Crafty is too unlike any of the top programs for comparisons to be relevant.
So you think Stockfish is more like Igor than it is like Crafty? Is this based on elo, or looking at the code?

-Sam
The code, of course. I'm only talking about search, not eval. I don't mean to imply that Stockfish bears any close relationship to any of the Ippos (or Rybka), they are very different, as is Komodo from either, especially from the Ippos. But the differences are much smaller than the differences between any of these engines and Crafty or pretty much any other pre-Rybka engine. We are all using the same basic ideas of heavy futility, movecount, and static null pruning, as well as singular extension of hashmoves. The pre-Rybka engines rarely used any of these ideas except perhaps futility pruning (and apparently some static null pruning in Rebel). The details differ greatly between Stockfish and the Ippos, enough so as to make their behavior at different time levels very different. Komodo is generally in between SF and the Ippos in most search issues where parameters must be chosen, and usually sides with Stockfish on issues that can only be done one way or the other.

You could well be right. I think of Stockfish as being more similar to Crafty than Igor, probably because I think it would not take that long for me to convert the crafty search into the stockfish search, while it would take me a long, long time to convert it to Igor search. Perhaps this is just because Igor is pretty confusingly written, and I haven't put much time into trying to figure it out. Perhaps if I studied it like you and Don seem to have I would see the similarities better.

-Sam

lkaufman · Post by **lkaufman** » Thu Apr 12, 2012 5:35 am

BubbaTough wrote:
lkaufman wrote:
BubbaTough wrote:
lkaufman wrote: Crafty is too unlike any of the top programs for comparisons to be relevant.
So you think Stockfish is more like Igor than it is like Crafty? Is this based on elo, or looking at the code?

-Sam
The code, of course. I'm only talking about search, not eval. I don't mean to imply that Stockfish bears any close relationship to any of the Ippos (or Rybka), they are very different, as is Komodo from either, especially from the Ippos. But the differences are much smaller than the differences between any of these engines and Crafty or pretty much any other pre-Rybka engine. We are all using the same basic ideas of heavy futility, movecount, and static null pruning, as well as singular extension of hashmoves. The pre-Rybka engines rarely used any of these ideas except perhaps futility pruning (and apparently some static null pruning in Rebel). The details differ greatly between Stockfish and the Ippos, enough so as to make their behavior at different time levels very different. Komodo is generally in between SF and the Ippos in most search issues where parameters must be chosen, and usually sides with Stockfish on issues that can only be done one way or the other.
You could well be right. I think of Stockfish as being more similar to Crafty than Igor, probably because I think it would not take that long for me to convert the crafty search into the stockfish search, while it would take me a long, long time to convert it to Igor search. Perhaps this is just because Igor is pretty confusingly written, and I haven't put much time into trying to figure it out. Perhaps if I studied it like you and Don seem to have I would see the similarities better.

-Sam

I believe Stockfish has more in common with Crafty than with the Ippos in terms of coding, in which area I am a novice. But I'm speaking about search algorithms, in which area I have some expertise. The Ippos do a lot of things differently from Stockfish, but these are mostly of minor importance compared to the things that they do differently from Crafty.
Another way to say it is that the Elo effect and the effect on the shape of the search tree is much greater for the differences between Stockfish and Crafty than for the differences between Stockfish and the Ippos. Their nearly identical ratings strongly hint at this, but of course they could theoretically be entirely dissimilar and still of equal rating.

jdart · Post by **jdart** » Thu Apr 12, 2012 4:37 pm

Don't you have to do eval on the last four mainsearch plies in order to do futility pruning?

IIRC Crafty does futility based on material score only.

--Jon

lkaufman · Post by **lkaufman** » Thu Apr 12, 2012 5:24 pm

jdart wrote:
Don't you have to do eval on the last four mainsearch plies in order to do futility pruning?
IIRC Crafty does futility based on material score only.

--Jon

What about for deciding whether to try null move or not? I don't see how Crafty could avoid calling the eval in the main search.

jdart · Post by **jdart** » Thu Apr 12, 2012 6:25 pm

What about for deciding whether to try null move or not? I don't see how Crafty could avoid calling the eval in the main search.

No eval test is done in Crafty as a condition for null move. Arasan does not do this either. The efficacy of this is questionable.

--Jon

Don · Post by **Don** » Thu Apr 12, 2012 6:33 pm

jdart wrote:
What about for deciding whether to try null move or not? I don't see how Crafty could avoid calling the eval in the main search.
No eval test is done in Crafty as a condition for null move. Arasan does not do this either. The efficacy of this is questionable.

--Jon

For us there is no question. We have tried many permutations of the general idea from always doing the null move test regardless of the evaluation to restricting it to certain node types, etc.

What has always worked best is to do null move pruning test only when the current evaluation is already >= beta. A lot of details in each program that could change this formula of course - but for us it's not a close enough call to even be ambiguous.

What does Arasan do? Do you always do the null move test?

Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second