Comparative nodes per second

lkaufman · Post by **lkaufman** » Wed Apr 11, 2012 4:28 pm

jdart wrote:
lkaufman wrote: 1. How does Ivanhoe achieve such a huge lead in NPS over Stockfish?
2. What is Ivanhoe giving up to achieve this speed? It can't just be simpler eval, as the speed gap is too large. But Ivanhoe would be totally crushed by Stockfish on an equal-nodes basis. Why?
a few comments:

1. As you probably know, NPS across programs is not highly correlated to playing strength. A very simple program can have very high NPS and not be a a good chessplayer. To take another example, Crafty has quite high NPS relative to other programs, but is not nearly equal to IvanHoe in strength.

2. Chess programs are complex beasts, and all the pieces typically interact. So for example programs with different board representations will behave quite differently. Also pruning and evaluation decisions affect NPS quite a bit. Many programs do eval at every node because it is used in pruning decisions. If you don't do this, or do lazy eval, you will be faster, but not necessarily stronger.

3. On the x86 platform, cache and memory usage has a very large effect on speed, especially for multithreaded programs. A program can be optimized at a local level by the compiler but optimizing memory usage usually has to be done by design.

4. I haven't done this but running the programs you are comparing under a good profiler (such as VTune) will answer some of your questions about causes of speed differences.

--Jon

Perhaps we'll try the profiler, but it won't answer the basic questions:
1. Is the reported NPS gap real or artificial?
2. If real, what accounts for SF being so much stronger than Ivanhoe on an equal nodes (rather than time) basis?

If the profiler shows much higher % of time spent in eval in SF, that could help explain #2. But if it shows no major differences in percentages, then what?

lkaufman · Post by **lkaufman** » Wed Apr 11, 2012 4:36 pm

BubbaTough wrote:
jdart wrote:
1. As you probably know, NPS across programs is not highly correlated to playing strength. A very simple program can have very high NPS and not be a a good chessplayer. To take another example, Crafty has quite high NPS relative to other programs, but is not nearly equal to IvanHoe in strength.

2. Chess programs are complex beasts, and all the pieces typically interact. So for example programs with different board representations will behave quite differently. Also pruning and evaluation decisions affect NPS quite a bit. Many programs do eval at every node because it is used in pruning decisions. If you don't do this, or do lazy eval, you will be faster, but not necessarily stronger.

--Jon
Exactly. #2 leads to #1. And #1 makes the question less interesting. Ivan and Stockfish are great examples of this. If memory serves, Ivan is designed to avoid eval whenever possible, and Stockfish is designed to take advantage of it when possible. The result is Ivan goes fast, and Stockfish slow. Both approaches have advantages and disadvantages, and comparing NPS across programs with such different approaches is not all that informative in my opinion.

-Sam

OK, at least you have given a clear answer: you believe the NPS difference is real and primarily attributable to time spent in eval. If you are right, nothing much to add. If this is not the main reason for the huge NPS gap, I would like very much to know what is. I would just point out that if SF spends 40% of time in eval (per Marco), in order to get a 5 to 3 speedup this would have to be reduced to NEGATIVE 16%!

jdart · Post by **jdart** » Wed Apr 11, 2012 5:06 pm

OK, at least you have given a clear answer: you believe the NPS difference is real and primarily attributable to time spent in eval. If you are right, nothing much to add. If this is not the main reason for the huge NPS gap, I would like very much to know what is. I would just point out that if SF spends 40% of time in eval (per Marco), in order to get a 5 to 3 speedup this would have to be reduced to NEGATIVE 16%!

To be clear Sam & I are saying that the difference is not just due to how fast eval is, but how often it is called. If you are calling eval at 100% of your nodes and another program calls it at 50% of the nodes, then you can have the same speed executing eval but still spend different amounts of time in that function.

--Jon

jdart · Post by **jdart** » Wed Apr 11, 2012 5:10 pm

2. If real, what accounts for SF being so much stronger than Ivanhoe on an equal nodes (rather than time) basis?

I am not sure this is even a reasonable question. Real games are not conducted with fixed node counts so this test is not really indicative of playing strength. Also pruning and extensions are different in these programs. So for the same number of nodes you are visiting a very different slice of the search tree, and I'd expect results to be different.

--Jon

lkaufman · Post by **lkaufman** » Wed Apr 11, 2012 5:59 pm

jdart wrote:
2. If real, what accounts for SF being so much stronger than Ivanhoe on an equal nodes (rather than time) basis?
I am not sure this is even a reasonable question. Real games are not conducted with fixed node counts so this test is not really indicative of playing strength. Also pruning and extensions are different in these programs. So for the same number of nodes you are visiting a very different slice of the search tree, and I'd expect results to be different.

--Jon

I am well aware of the effects of these differences, and of course results will differ. However I know from my own experiments that it is impossible to make reasonable changes to pruning and/or extensions in a program like Stockfish, Ivanhoe, or Komodo that would change the results AT FIXED NODES by anywhere near the margin of superiority SF exhibits over Ivanhoe on this basis. I suppose some of the margin of superiority is due to a better (but slower) eval, and perhaps some more is due to a difference in node counting. But the gap is so large that I am looking for some third factor. Something that would account for the huge disparity in the relative strength of the two engines in bullet but not in blitz or longer games. This is the phenomenon I really want to understand.

Don · Post by **Don** » Thu Apr 12, 2012 1:23 am

mcostalba wrote:
lkaufman wrote:I don't believe that such details would amount to more than a few percent in the given instance.
It's time to became a believer

SF updates nodes in do_move(), while Ivanhoe (and I guess also Komodo) updates nodes at the beginning of search(). This differs considerably because in case of null move search Ivanhoe counts 2 while SF counts 1. At the end the difference between updating in do_move() or in search() is of about 20-30% in nps if I don't remember wrong.

P.S: I really don't think Ivanhoe is faster than SF in comparable functions, actually I'd would not be surprised of the contrary (although impossible to prove): SF is really super tuned for speed, see for instance its perft performance.

Marco,

Komodo counts a node every time it makes a move. In fact if we count nodes as liberally as possible, which means even nodes we are going to forward prune we are still blown away by Ivanhoe in speed - and if we actually did the same number of nodes per second (even counting liberally) as Houdini (for example) we would be well over 100 ELO stronger than any other program including Houdini.

However, I am sure a lot of our strength is payed for by this low nodes per second, still it's pretty odd to be so slow. I don't think my coding is that horrible.

I recently profiled our program and the evaluation function is taking 37.3 percent of the time, that's the evaluation and all it's children. It is 20% if you don't include it's children such as the routine to calculate pawn structure or do population count, findbit, etc. But even if I got the evaluation for free I would still slower than Houdini although this would make us much stronger of course.

Pawn structure is 3% of the time, presumably due to pawn structure hashing. We even cache the evaluation function (but not in the regular hash table) and that does give us a modest speedup, but the 37.3 is even after caching.

I also noticed from looking at the profile that Komodo is a victim of being "nickel and dimed" to death. There are a LOT of routines that extract a few percent. Not counting children, we have 20 routines which cost more than 1 percent of the execution time. Other than evaluation there is no single routine I can look at and say, "aha!" there's the problem!

Don · Post by **Don** » Thu Apr 12, 2012 1:50 am

Don wrote:
mcostalba wrote:
lkaufman wrote:I don't believe that such details would amount to more than a few percent in the given instance.
It's time to became a believer

SF updates nodes in do_move(), while Ivanhoe (and I guess also Komodo) updates nodes at the beginning of search(). This differs considerably because in case of null move search Ivanhoe counts 2 while SF counts 1. At the end the difference between updating in do_move() or in search() is of about 20-30% in nps if I don't remember wrong.

P.S: I really don't think Ivanhoe is faster than SF in comparable functions, actually I'd would not be surprised of the contrary (although impossible to prove): SF is really super tuned for speed, see for instance its perft performance.
Marco,

Komodo counts a node every time it makes a move. In fact if we count nodes as liberally as possible, which means even nodes we are going to forward prune we are still blown away by Ivanhoe in speed - and if we actually did the same number of nodes per second (even counting liberally) as Houdini (for example) we would be well over 100 ELO stronger than any other program including Houdini.

However, I am sure a lot of our strength is payed for by this low nodes per second, still it's pretty odd to be so slow. I don't think my coding is that horrible.

I recently profiled our program and the evaluation function is taking 37.3 percent of the time, that's the evaluation and all it's children. It is 20% if you don't include it's children such as the routine to calculate pawn structure or do population count, findbit, etc. But even if I got the evaluation for free I would still slower than Houdini although this would make us much stronger of course.

Pawn structure is 3% of the time, presumably due to pawn structure hashing. We even cache the evaluation function (but not in the regular hash table) and that does give us a modest speedup, but the 37.3 is even after caching.

I also noticed from looking at the profile that Komodo is a victim of being "nickel and dimed" to death. There are a LOT of routines that extract a few percent. Not counting children, we have 20 routines which cost more than 1 percent of the execution time. Other than evaluation there is no single routine I can look at and say, "aha!" there's the problem!

I miscalculated the node advantage Houdini has over Komodo, it's substantial but not worth 100 ELO. It would give us an additional ply of search roughly.

Stockfish seems to be the fastest program I know of, in therms of stated iteration depth and branching factor. I don't really know about nodes per second. Like every good program out there there is a trade-off. If Komodo did that same depth, or conversely if Stockfish got as much out of a ply, the result would be spectacular beyond belief.

BubbaTough · Post by **BubbaTough** » Thu Apr 12, 2012 2:07 am

Don wrote:
Stockfish seems to be the fastest program I know of, in therms of stated iteration depth and branching factor. I don't really know about nodes per second. Like every good program out there there is a trade-off. If Komodo did that same depth, or conversely if Stockfish got as much out of a ply, the result would be spectacular beyond belief.

That is a very unusual definition of speed. Almost everyone else seems to mean NPS when they say speed, and mean branching factor when they say, well, branching factor.

Of course spending more time per node should translate into making better pruning and reduction decisions, thereby improving branching factor. So its not at all surprising that a program with less NPS (speed by my way of thinking) should have a better branching factor.

-Sam

bob · Post by **bob** » Thu Apr 12, 2012 3:10 am

lkaufman wrote:
bob wrote:
lkaufman wrote:Comparing the two strongest open-source engines, Stockfish and Ivanhoe (recent versions), I notice that although they are virtually identical in playing strength (except at bullet levels), the reported nodes per second in Ivanhoe is higher in about the ratio of 5 to 3. I am well aware that node counts can differ depending on details of how they are counted (whether to count illegal moves for example), I don't believe that such details would amount to more than a few percent in the given instance. A ratio of 5 to 3 is HUGE, far more than could plausibly be attributed to differences in the complexity of the eval function (in the given instance at least). Obviously, since the programs are of equal strength, this huge disparity must have a high price or must somehow be misleading. Both programs are highly selective, though details differ, so I doubt this is the answer.

So, two questions:

1. How does Ivanhoe achieve such a huge lead in NPS over Stockfish?
2. What is Ivanhoe giving up to achieve this speed? It can't just be simpler eval, as the speed gap is too large. But Ivanhoe would be totally crushed by Stockfish on an equal-nodes basis. Why?
Have you compared to Crafty? I know exactly how I count nodes, which would give a starting point that doesn't require a lot of effort to figure out...
Since Crafty doesn't do the massive move-count based pruning that is used in all the top engines, I don't believe the node-counting in Crafty would be very relevant to node-counting in these engines, where the counting or not of these pruned moves is so critical.

Not sure what you mean. Crafty does LMR aggressively. It does futility pruning in last 4 plies. However, I am not sure those affect overall NPS, which was what you were asking about. NPS doesn't seem to be affected by pruning/reductions in my testing. My NPS hasn't changed much over time unless hardware changes or I do major efficiency rewrites.

bob · Post by **bob** » Thu Apr 12, 2012 3:15 am

lkaufman wrote:
jdart wrote:
lkaufman wrote: 1. How does Ivanhoe achieve such a huge lead in NPS over Stockfish?
2. What is Ivanhoe giving up to achieve this speed? It can't just be simpler eval, as the speed gap is too large. But Ivanhoe would be totally crushed by Stockfish on an equal-nodes basis. Why?
a few comments:

1. As you probably know, NPS across programs is not highly correlated to playing strength. A very simple program can have very high NPS and not be a a good chessplayer. To take another example, Crafty has quite high NPS relative to other programs, but is not nearly equal to IvanHoe in strength.

2. Chess programs are complex beasts, and all the pieces typically interact. So for example programs with different board representations will behave quite differently. Also pruning and evaluation decisions affect NPS quite a bit. Many programs do eval at every node because it is used in pruning decisions. If you don't do this, or do lazy eval, you will be faster, but not necessarily stronger.

3. On the x86 platform, cache and memory usage has a very large effect on speed, especially for multithreaded programs. A program can be optimized at a local level by the compiler but optimizing memory usage usually has to be done by design.

4. I haven't done this but running the programs you are comparing under a good profiler (such as VTune) will answer some of your questions about causes of speed differences.

--Jon
Perhaps we'll try the profiler, but it won't answer the basic questions:
1. Is the reported NPS gap real or artificial?
2. If real, what accounts for SF being so much stronger than Ivanhoe on an equal nodes (rather than time) basis?

If the profiler shows much higher % of time spent in eval in SF, that could help explain #2. But if it shows no major differences in percentages, then what?

NPS is simply total nodes searched divided by time taken to search all of them. Or, in a simpler term, the time taken to search a single typical node, multiplied by the number of such nodes searched. There are lots of things one can do to change NPS. For example, use eval at all interior nodes. I typically see the q-search taking 80-90% of the total nodes searched, so adding eval to the interior nodes will slow it down by about 10%. What if that gives you better move ordering so that your tree is now 25% deeper on average? You are 10% slower, but getting to the same depth 25% faster. A net win.

Comparing NPS is not a particularly interesting number, any more than comparing the RPM reading on two different vehicles tells you anything at all about how fast they are going relative to each other, how much fuel they are consuming, or how much noise they are making.

Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second

Re: Comparative nodes per second