@Ray: a poor choice of words for which I apologize. I know I got a FRC rating quickly for which I'm very grateful
Gabor, let me explain (and then I'm bailing out because I'm already starting to pollute this thread which goes against the promise I made to Guenther
I measured 56-57% in self-play against 4.40 at very fast TC (10+0.1 and 60+0.6 sec). I usually get less than half of that (in CCRL 40/15 for sure)
The problem is how to extrapolate? I simply gave a conservative (hopefully lower bound) estimate, because I didn't want to lie about the improvement (unlike some people who make ridiculous claims and then play surprised - so I believe it's better to give a lower estimate)
With 4.40 the rating spread among various rating lists was over 40 elo (!) which is a lot for my taste. the outlier being CCRL 40/15 where I got like +35, +60 in CEGT 40/20 and CCRL 40/2 IIRC, over 80 in FastGM and CCRL FRC. at this level, that a huge discrepancy.
For example, Tucano 9 seems underrated in CCRL 40/15, Amoeba 3.3 (which show a clear improvement in CEGT 40/20) is "worse" in CCRL 40/15 after 700 games.
could be error bars, could be the openings, could be the TC, the choice of opponents or even some problem with the engine, honestly I don't know
CEGT 40/20 certainly shows more than my +15 guesstimate (around 50, which will likely drop as more games are played), so I still have hope that it's more than I expected.
while self-play is a big gamble (you can get anything between 25-90% of what you measure), it's still the way to test small elo changes with many games. considering the huge spread among rating lists, I think it's virtually impossible to give any reliable predictions until the engine is actually tested by independent testers. (and unless the improvement is like 150+ elo in self-play, which is a lot at this level)