1/ you cannot conclude anything from seeing -2 elo after 332 games. detecting a 2 elo regression with 95% confidence requires an SPRT(-2,0) which is *much* costlier than thatTShackel wrote:Hi,
I've been using April 12th development version in most of my testing since it proved to be several elo stronger than stockfish 6.0 in my own testing. However, I tested the April 12th development version against the most recent development version, and after 332 games the recent version is down by 2 elo from the april 12th development version. I know it's not to a thousand games yet, but 332 is quite a few games to start drawing a conclusion from.
Does anyone have an idea why this is? Normally the most recent development version is stronger than the previous versions.
Sincerely,
Tim.
2/ you are using abrok.eu compiler, and anything you can conclude from those could be the result of abrok.eu doing lame compiler (eg. not using the right version of mingw-gcc or not doing profile guided optimizations). unless you can compile both versions yourself, exactly the same way, you cannot make any meaningful observation.
