Thanks for the question!
The testing strategy is obviously fair and specified exactly in the wiki.
Every time, that I publish I do it only if the engine passes such tests:
a.at least same strength in the match starting from positional characteristics
b.at least the same resolved positions
c.better in at least one between a and b.
Andrea