cdani wrote:cdani wrote:cdani wrote:
Kai, If you think it makes any sense, I can do a version of Stockfish and another of Andscacs that just plays picking the move that has the best eval.
And maybe someone can do a version of Giraffe that does the same
If anyone does a compile like this, should take the evaluation returned by the quiescence function, not the plain static eval, to avoid nonsense sacrifices that raise temporarily the evaluation.
I should admit I simply do not understand quite a lot of things going on here.
does 1-ply search with quiescence included guarantee objectiveness of the test? what if quiescence definitions and techniques differ among engines(and they do) ? in this way, including quiescence search will still be biassed(maybe some engines do part of the work of static eval in quiescence)
and without quiescence, would not that be prejudicial to engines with better quiescence routines?
one thing completely makes no sense is that SF has worse eval than Andscacs and Houdini 3. I have always presumed Komodo eval is better tuned, as a lot of SF term values simply make no sense, but SF eval worse than that of engines more than 300 elo weaker? that simply makes no sense. I understand searching deeper provides some additional advantages, like more frequently reaching positions with sufficiently big eval to point at a clear winner, but, as a matter of fact, more than 3/4 of all possible nodes should more or less return purely evaluation positions, with no clear advantage to either side, where precise eval is all-important. SF, as well as weaker engines, should reach them equally. How would SF perform so much better, if its eval is so much worse? it simply does not make sense to me.
SF eval is extensively tuned at the framework, much more so than that for the weaker engines. Why it would be so much worse tuned?
I guess the answer might lie in the fact that every engine achieves its relative optimum of eval tuning only within the particular framework of its eval+search. SF will tune its eval at deeper nodes, so it does make sense that it is only worse tuned for shallower nodes. But that does not necessarily mean its eval per se is worse. Rather that, because of being tuned at higher depths, it is not optimally tuned for shallower depths.
it is true static eval should remain the same for all nodes, but that does not necessarily mean tuning it with different search parameters should lead to identical results. I guess that, if SF would like to specifically tune for 1-ply searches, they could run some couple of billion games per day and achieve some couple of hundreds elo increase within days or months. SF would become one of the best evaluators. But then, that would certainly perform worse with SF search activated. Why so? Does not better static eval also lead to better game play overall?
There is something I clearly do not understand here.
Having watched so many games of different engines, I would still support my claim SF eval is better than that of a lot of other weaker engines that would otherwise perform better at 1-ply searches. Maybe, until we understand all the intricacies of engines, we should instead concentrate on just comparing engine strength at regular time control.