mcostalba wrote:Rein Halbersma wrote: That is because your current algorithm is not stable and mine is
Hi Rein,
yes you are right !
This makes to compare speeds a bit more difficult because searched nodes are little different. Under this non-optimal conditions current SF code _seems_ a bit faster of your fastest version, that is the "always swap" one.
It is not self evident to understand why, we have to consider the surrounding conditions in which that code is used:
1) It is manly used only for captures that are very few, something from 0 to 4-5 and the code actually kicks in when we have at least 2 captures.
2) The swap is done with a register and very well optimized by the compiler
3) std::max_element() is a function, is not inlined and so there is some overhead there.
4) Pointers are a little slower than values, for instance I have tried to change current SF version using Move* bestMovePtr instead of Move bestMove and results are slightly slower.
So overall I think your idea is good, but possibly it needs further tuning work.
Marco
P.S: std::swap() uses some ad-hoc gcc magic (_GLIBCXX_MOVE) that probably is faster then the usual tmp way. On MSVC instead the swap is implemented with the usual algorithm you can find in current SF.
Hi Marco,
I always find it more useful to get a correct program first and then worry about efficiency.
The current loop has a subtle side effect, it is not only unstable, but it also loses the best move picked from the move list and instead duplicates the first move. So pick_best is not even a partial unstable sort! Your current search might not depend on move order stability or the fact that you lose information, and for now your program might be correct. But someday a rewrite might occur that violates this hidden assumption. E.g. if you ever decide to write out iterative deepening as iteration instead of recursion, you cannot re-use the generated move list because pick_best_move dissipates information. It seems simply not worth the trouble.
Insisting on manually inlining std::max_element and/or std::swap doesn't make sense to me either. It is something the compiler should do for you, and the STL is source code templates (here already an advantage over C library!), so the compiler can do so if necessary. And if it doesn't do it even after pgo runs, then it simply doesn't matter anyway. But inlining would be at least an improvement on doing multiple swaps and overwriting the first entry of the move list.
The proposal reduces the pick_best from 13 lines with subtle side effects to 3 lines that guarantee a stable partial sort. It's simply more maintainable and readable. I am confident the change would not cost even a single ELO point (if anything, the reduced number of swaps should be a gain), but even if it did, then the simplicity alone would be worth that tiny cost.
Rein
P.S. in the current Stockfish implementation, there is still one micro-optimization left by writing T bestMove = *curMove (i.e. combine declaration and the assignment into a single initalization, but perhaps the compiler already generated this for you)