Comparing two version of the same engine

bob · Post by **bob** » Tue Oct 28, 2008 8:58 pm

Kempelen wrote:
bob wrote:
Kempelen wrote: 3) Sometimes, for compare only evaluation changes, apply ply tournaments.
You have to be very careful here. Fixed-depth distorts the results significantly. If you add a slow eval term, that program will get an unequal advantage at fixed depth since the slower eval won't be a penalty, it will just take longer to move than the opponent. I don't do _any_ fixed depth testing myself.

Well, I was thinking in doing ply tournaments only when changing score values, no when adding new chess knowledge that need different execution time. I don't see any drawback in doing ply tourneys for that kind of testing. do you?

Hard to say. Changing a significant scoring term can speed you up or slow you down, because it affects move ordering. If it happens to make your trees bigger, you won't see the penalty for that with a fixed depth search. Again, I don't like the idea of fixed depth at all since we don't play like that and it is quite easy to fool yourself into believing something is bad or good when the opposite is true in real games....

If you do fixed depth searches, the question you are asking is "independent of everything else, is this scoring change better or worse?" But that "independent of ..." clause is a killer. What if it slows you down by a factor of two because of the effect on the shape/size of the tree? You won't know that.

Comparing two version of the same engine

Re: Comparing two version of the same engine