Is it reasonably safe to assume that MatMoi 7.13.1d is stronger than MatMoi 7.14.0-see even if the confidence ranges overlap. If not, can I simply make them play maybe 500 games more to see if the ranges shrink enough that they no longer overlap or do I need to run a new test with 3500+ games for each engines?
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Matheus 97 18 18 1200 65% -15 20%
2 Prophet 68 18 18 1200 61% -15 14%
3 Matant 63 18 18 1200 60% -15 18%
4 Monarch 62 17 17 1211 61% -16 22%
5 MatMoi 7.13.1d -5 11 11 3000 46% 20 18%
6 MatMoi 7.14.0-see -26 11 11 3000 44% 20 18%
7 MatMoi 7.12.4e -71 182 182 11 32% 62 9%
8 Sharper -189 19 19 1200 28% -15 16%