Sven Schüle wrote:I think I found my misunderstanding. It is win%_a + win%_b = 100 but I thought it were win%_a + win%_b + draw% = 100. With the correct way I now see how it works.
Yes this is a correct interpretation.
Moreover, the mistake in the upper table that you printed is in usage.
The formula gives only 1 sigma, meaning 68% probability.
So if you get elo difference between engines A and B of N elo, there is 68% chance that real difference will be in the range [N-sigma, N+sigma].
If you want 95% certainty then you take [N-2*sigma, N+2*sigma] range.
For gauntlet with many opponents, the exact formula becomes very complicated and cannot be calculated by hand (multidimensional gaussian distribution approximation). Moreover, sigma intervals can be non-symmetrical. Still as a rule of thumb, you can use
sigma=1.41*40/sqrt(num_games) and this one usually works well.