It seems unlikely for top programs to have an nps difference by factor of 2:1.
Rasjid
Crafty has always been slower prior to castling, because the development code is executed everywhere. Once castling happens in the search, the development stuff does not get executed below that point. So as pieces are developed, that starts to happen more frequently. As far as the difference in NPS between robo* and Crafty, not a clue and not interested.
Chan Rasjid wrote:It seems unlikely for top programs to have an nps difference by factor of 2:1.
Why?
I don't mean it too seriously; but a factor of x 2 seems too good. Top programs are generally well designed and they should be near in speed. What can bring down the nps by 50% is evaluation. So the SECRET of Robbo, SF should then be evaluation as both are about as "slow" which means they have very much more in their evaluation than Crafty.
Some of this programs discard the moves before evaluate! or even before make_move!! (skiping make and unmake) but counting it
Normally these skipped moves, hash hits are not counted.
Rasjid
There are lots of things to comment on, but for just two, a factor of 2x is completely unremarkable. Compare hiarcs to fritz. Of course one can compare rybka to anything but Rybka's numbers are bogus and way low.
Second, crafty certainly does pruning in the last 4 plies, where it does do make and unmake (otherwise an evaluation would be worthless) but no recursive search calls are done
Normally these skipped moves, hash hits are not counted.
The amount of work per node still affects NPS greatly. More pruning will lower the NPS: you do more work and you generate less successor positions.
Nps is inversely proportional to the "average work done within each node visited".
1) 70-80% of nodes are QS nodes and the heaviest workload is eval(); this seems to be the single most significant item that determines nps.
2) pruning nodes don't seem to significantly affect nps; the work done executing the pruning codes would be incurred whether a move be pruned or not pruned. So there should be little change in the average work done with the node.
3) (maybe) In general, all "aggressive" or other search tricks like lmr, SE, futility don't affect nps much; only it reaches depth much higher.
Rasjid
That is my feeling. Our NPS doesn't change much over time, unless we do something new like the mobility look-up rather than calculation loop, which speeds up eval significantly, and hence NPS. I've not noticed any great change in Crafty's NPS, at least for as long as we have been doing the cluster testing and using the 8-core cluster node to play in tournaments. We've been averaging 20M or so for a long time, with a jump a version or two back with the mobility change...
I said it was 350,000 as compared to that of Robbo's and SF's 1M nps; that was because my current program does eval() at all internal nodes "blindly"; it was just a simple plugging in of eval() just for testing; taking it away brought my nps to back to about 600,000 - 700,000 nps.
I think I have too much of useless stuff in my current eval(). Also my program has no serious optimization yet; there should be ways to speed things up. I will take Crafty's as benchmark.
Chan Rasjid wrote:ok, my nps is not too low and bad after all;
I said it was 350,000 as compared to that of Robbo's and SF's 1M nps; that was because my current program does eval() at all internal nodes "blindly"; it was just a simple plugging in of eval() just for testing; taking it away brought my nps to back to about 600,000 - 700,000 nps.
I think I have too much of useless stuff in my current eval(). Also my program has no serious optimization yet; there should be ways to speed things up. I will take Crafty's as benchmark.
Rasjid
I have a version of Crafty that does an eval at all interior nodes. I was playing around with this in 23.2/23.3 when I was redoing the LMR and futility stuff. Interestingly, doing the interior node eval didn't seem to affect NPS by very much at all. Mainly because that is only 10% of the nodes I suppose. But I did not find it helped enough to keep it at the time, but saved it for later testing. You can make better LMR decisions if you know the eval before/after any particular move. But the gain there was offset by the speed loss so it was a break-even. And for break-even, I always choose "simpler"...
Chan Rasjid wrote:ok, my nps is not too low and bad after all;
I said it was 350,000 as compared to that of Robbo's and SF's 1M nps; that was because my current program does eval() at all internal nodes "blindly"; it was just a simple plugging in of eval() just for testing; taking it away brought my nps to back to about 600,000 - 700,000 nps.
Rasjid
I doubt that the main reason for slowing the engine is an internal node evaluation. As previously mentioned, 70-80% of nodes are in the qsearch, so that the elimination of evals from all the internal nodes can not bring you more than 20-30% acceleration. I think that the evaluation of all internal nodes can not slow down the engine more than 10-15%.
bob wrote:
...
Mainly because that is only 10% of the nodes I suppose.
...
Internal evaluation of the node also save inchek test, and generating attacks within eval may be used to generate moves.
How do you save the incheck test? I don't deal with check in the evaluation, that's a dynamic (rather than static) property best left to search. Generating attacks is not a high-cost thing if you are using bitboards.
bob wrote:
...
Mainly because that is only 10% of the nodes I suppose.
...
Internal evaluation of the node also save inchek test, and generating attacks within eval may be used to generate moves.
How do you save the incheck test? I don't deal with check in the evaluation, that's a dynamic (rather than static) property best left to search. Generating attacks is not a high-cost thing if you are using bitboards.
If you generate the mobility on a standard way (create an attacks mask, and then counting bits), then you can save the attacks masks for both sides, and use them to detect illegal moves, or moves that gives check. Not much but could be useful.
Chan Rasjid wrote:ok, my nps is not too low and bad after all;
I said it was 350,000 as compared to that of Robbo's and SF's 1M nps; that was because my current program does eval() at all internal nodes "blindly"; it was just a simple plugging in of eval() just for testing; taking it away brought my nps to back to about 600,000 - 700,000 nps.
Rasjid
I doubt that the main reason for slowing the engine is an internal node evaluation. As previously mentioned, 70-80% of nodes are in the qsearch, so that the elimination of evals from all the internal nodes can not bring you more than 20-30% acceleration. I think that the evaluation of all internal nodes can not slow down the engine more than 10-15%.
1) internal eval() called at all ply.
2) I have futility evaluation at start of eval(int beta){ }; beta is passed infi so no lazy evaluation for internal
eval(). Normal futility hit rates is only about 5-15%; so should not be much different from QS eval().
If the nps drop should be about 10-15% with internal nodes evaluation, then I am not sure why I have about 50%. This 50% is confirmed in playing games with other engines.