Tell me what you can determine from 10 games between A and B. You can't, with any confidence at all, decide which is best. You can't, with any confidence, decide that A and B are at least reliable and can play long matches without one crashing. You can't, with any confidence, decide that neither A or B will uncork an illegal move or fail to recognize an opponent's legal move in a long match.krazyken wrote:A. You are changing your statement. I was responding to "10 games is worthless for determining anything" now you are switching to "10 games _is_ absolutely worthless for identifying which is best." Which is statistically a completely different question. The first is false, the second is frequently true, especially given the qualification you added to it.bob wrote:Sorry, but if the programs are 100 (or even 200) Elo of each other, 10 games _is_ absolutely worthless for identifying which is best.. Absolutely worthless...krazyken wrote:Data points are never worthless. You are going to need much more than a 1000 game match to declare that both players get a 10 in a row as probable. If the two players are equal the chance of one of them getting a 10 in a row is far less than 1%. The chance of both is far lower. True 10 games has a large confidence interval, but it is far from worthless.bob wrote:My point was that 10 games is worthless for determining anything. In a 1000 game match, you can probably find 10 games in a row where each side wins. If you trust 10 game results, that's up to you. I know the inaccuracy this involves.yanquis1972 wrote:the results he posted are quite a bit better than nothing, obviously. you can glance at them & get a fairly good idea of TK's strength (id guess about 3000+ CCRL elo). even if i'm off by a long shot, i'm going to be a lot closer than i would if we just had random results, because chess is not random chaos. in fact it's much farther removed from chance than almost any game i can think of.
anyway, it's anyone's choice how to use their hardware & software & by looking at the results posted & combining them with mine i can see that naum is probably not some kind of special poision for TK, but that its performance was what should be expected.
BTW, as far as 10 in a row needing more than 1000 games? See "the birthday paradox". You don't even need 1000 to be reasonably sure...
B. Picking 2 people out of a group in the Birthday Paradox, has nothing to do with the problem of finding a particular streak in a series. The formula for finding the probability of a streak of wins with independent trials is:
(N - x + 1)(p^x)
Where N is the number of trials, x is the length of the streak, and p is the probability of winning. Depending on what p is, I slightly overstated the case before, the probability of finding a streak of 10 wins in 1000 a is close to 1%, not far less. The probability of a streak of 10 wins and a streak of 10 losses is still far less though.
So exactly _what_ can you conclude from 10 games?