Now my session had run for more than 20000 samples, the "mean" values are still more or less centered on the experiment domain (which I understand as "there are sample points everywhere" and not in a specific area) but still somehow the announced elo and win rate are positive.
Shall I wait more ? Is it just luck and those "mean" values are indeed interesting ? Why CLOP never shows "max" values my the result table ?
26K games is already a significant resource investment, and more importantly, the "All" performance being high is completely abnormal.
If it isn't able to output values that perform better than the baseline, it makes no sense that all games played, including many trying random garbage values, would have such a performance.
Small sample size is not a valid explanation, the 99% LB for "All" was still +6 elo ; so there is something else going on, something very wrong. For all I know, it could be as stupid as the baseline always getting to play black.
Interpreting statistics: clop does not estimate strength accurately
-------------------------------------------------------------------
Win rates displayed in clop-gui are biased. The win rate over samples with
w(x)=1 ("Central" column) is often too optimistic. The win rate over all
samples ("All" column) is pessimistic. clop cannot estimate accurately how
strong the program is at the maximum.
Is "All" is pessimistic and "All" is already +5 or +8 elo I don't know what to think.
I'd like to know if it is ok that I have nothing in my "Max" table (I have something in "mean").