Because Elo difference and LOS are two different questions. Elo difference requires to take draws into account. It just translates to the expected score between two players. 8 wins out of 1 million games translates to a tiny Elo difference. LOS on the other hand is merely concerned with which player is the strongest, regardless of how much stronger. With 8 wins and no losses, no matter the number of games, the probability is 1 in 256 that the players are equal. This is just how the math works out.Dann Corbit wrote: ↑Fri Jul 03, 2020 2:24 am If the draws do not matter in understanding who is stronger, why does the Elo calculation get a totally wrong answer if you set the draws to zero?
I do understand we are looking for a razor turning point and not a magnitude. But I think it should be more obvious which is stronger if we know an engine is ten times stronger instead of .01% stronger. Or it should affect the size of our confidence interval.
Throwing out draws to calculate Elo
Moderators: hgm, Rebel, chrisw
-
- Posts: 741
- Joined: Tue May 22, 2007 11:13 am
Re: Throwing out draws to calculate Elo
-
- Posts: 27828
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Throwing out draws to calculate Elo
Close enough to reject with 95% confidence the hypothesis that the engines must be unequal, in 95% of the cases.Dann Corbit wrote: ↑Fri Jul 03, 2020 12:01 am If I grab an entry from the database, chosen at random, how close is it to 0.5?
Close enough to reject it with 99% confidence in 99% of the cases.
etc.
-
- Posts: 12542
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Throwing out draws to calculate Elo
Do you actually believe in a file of 8 million games with 8 wins and no losses that the 8 wins are not statistical noise?Rein Halbersma wrote: ↑Fri Jul 03, 2020 9:25 amBecause Elo difference and LOS are two different questions. Elo difference requires to take draws into account. It just translates to the expected score between two players. 8 wins out of 1 million games translates to a tiny Elo difference. LOS on the other hand is merely concerned with which player is the strongest, regardless of how much stronger. With 8 wins and no losses, no matter the number of games, the probability is 1 in 256 that the players are equal. This is just how the math works out.Dann Corbit wrote: ↑Fri Jul 03, 2020 2:24 am If the draws do not matter in understanding who is stronger, why does the Elo calculation get a totally wrong answer if you set the draws to zero?
I do understand we are looking for a razor turning point and not a magnitude. But I think it should be more obvious which is stronger if we know an engine is ten times stronger instead of .01% stronger. Or it should affect the size of our confidence interval.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 27828
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Throwing out draws to calculate Elo
There is no place for belief in mathematics. I would know that there is only a 1 in 256 likelihood that random noise on equal engines would achieve that result.
But you probably also believe that 8-0 in an 8-game match without draws would just be statistical noise. So the problem you have is not really related to the draws. It is related to your lack of understanding of what is statistically significant, and what not.
But you probably also believe that 8-0 in an 8-game match without draws would just be statistical noise. So the problem you have is not really related to the draws. It is related to your lack of understanding of what is statistically significant, and what not.
-
- Posts: 741
- Joined: Tue May 22, 2007 11:13 am
Re: Throwing out draws to calculate Elo
For Elo difference, yes, the 8 wins out of a million are tiny compared to the standard error on the mean, about 1 / sqrt(million) ~ 1 in thousand. So Elo-wise, the engines are within a nose-length. But there's no question that the nose-length is almost certainly in favor of the engine with 8 wins.Dann Corbit wrote: ↑Fri Jul 03, 2020 9:36 amDo you actually believe in a file of 8 million games with 8 wins and no losses that the 8 wins are not statistical noise?Rein Halbersma wrote: ↑Fri Jul 03, 2020 9:25 amBecause Elo difference and LOS are two different questions. Elo difference requires to take draws into account. It just translates to the expected score between two players. 8 wins out of 1 million games translates to a tiny Elo difference. LOS on the other hand is merely concerned with which player is the strongest, regardless of how much stronger. With 8 wins and no losses, no matter the number of games, the probability is 1 in 256 that the players are equal. This is just how the math works out.Dann Corbit wrote: ↑Fri Jul 03, 2020 2:24 am If the draws do not matter in understanding who is stronger, why does the Elo calculation get a totally wrong answer if you set the draws to zero?
I do understand we are looking for a razor turning point and not a magnitude. But I think it should be more obvious which is stronger if we know an engine is ten times stronger instead of .01% stronger. Or it should affect the size of our confidence interval.
It's counter-intuitive, I know, but that's the way the math works out.
-
- Posts: 27828
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Throwing out draws to calculate Elo
I could add that in the Elo calculation the ratings would be outside each other's error bars. So the Elo says exactly the same thing as the LOS.
-
- Posts: 12542
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Throwing out draws to calculate Elo
It isn't the math that bothers me, it is the model.Rein Halbersma wrote: ↑Fri Jul 03, 2020 9:56 amFor Elo difference, yes, the 8 wins out of a million are tiny compared to the standard error on the mean, about 1 / sqrt(million) ~ 1 in thousand. So Elo-wise, the engines are within a nose-length. But there's no question that the nose-length is almost certainly in favor of the engine with 8 wins.Dann Corbit wrote: ↑Fri Jul 03, 2020 9:36 amDo you actually believe in a file of 8 million games with 8 wins and no losses that the 8 wins are not statistical noise?Rein Halbersma wrote: ↑Fri Jul 03, 2020 9:25 amBecause Elo difference and LOS are two different questions. Elo difference requires to take draws into account. It just translates to the expected score between two players. 8 wins out of 1 million games translates to a tiny Elo difference. LOS on the other hand is merely concerned with which player is the strongest, regardless of how much stronger. With 8 wins and no losses, no matter the number of games, the probability is 1 in 256 that the players are equal. This is just how the math works out.Dann Corbit wrote: ↑Fri Jul 03, 2020 2:24 am If the draws do not matter in understanding who is stronger, why does the Elo calculation get a totally wrong answer if you set the draws to zero?
I do understand we are looking for a razor turning point and not a magnitude. But I think it should be more obvious which is stronger if we know an engine is ten times stronger instead of .01% stronger. Or it should affect the size of our confidence interval.
It's counter-intuitive, I know, but that's the way the math works out.
If I flipped a coint 8 times and then gave you an Elo figure you would tell me it is useless.
If i play a thousands games and you throw out all but 8 and give me an LOS figure, it not only seems as bad as the Elo calculation, it seems worse. Because the Elo calculation uses the draws to determine strength and the LOS simply ignores them. And the randomness for a thousand games gives a large expected deviation. But this too does not matter.
I guess maybe it is just my brain wrestling with what I feel is common sense. But at any rate, I cannot feel convinced until I understand why these things should be so. And the math is not convincing because I feeld the model *must* be wrong. We are ignoring the majority of the data, and relying on a tiny cherry-picked sample and assuming that the error bars are as tiny as the sample.
I shudder to think of it.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 27828
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Throwing out draws to calculate Elo
And this misconception of yours is really at the root of your troubles.Dann Corbit wrote: ↑Fri Jul 03, 2020 10:14 amIf I flipped a coint 8 times and then gave you an Elo figure you would tell me it is useless.
Because if the result of those flips were 8-0 we would of course not say at all that it was useless. It would be highly significant, and make it very likely (i.e >99% confidence) that the coin was not fair. Or, in the case of engines, that there must be a sizable Elo difference. Put it in your Elo calculator if you don't believe it.
Of course if the result had been 5-3, then it would have likely just been statistical noise.
We discussed all this in connection with the WCCC result. And apparently you learned nothing from it!
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Throwing out draws to calculate Elo
It is only shocking if there is no explanation for the large number of draws. Most likely there is an explanation, such as one of the explanations that were given already in this thread.Dann Corbit wrote: ↑Fri Jul 03, 2020 3:40 amIt is not more reliable, it is less reliable.syzygy wrote: ↑Fri Jul 03, 2020 3:27 amWHat do you mean error bar? Is 1/256 somehow more reliable if there were N=10^100 draws than if there were N=3 draws? Do you have any idea what it means you are saying?Dann Corbit wrote: ↑Fri Jul 03, 2020 2:12 am Yes and no, in that you can call it 1/256, but the width of the error bars on the measurement should be much, much wider. In fact, so wide as to render the data meaningless.
So the number is the same. But the quality of the number is lower. Lower to the point of uselessness.
The number itself, 1/256 is the same. Yes, I agree with you about that. But the data spread we expect to see when you have10^100 draws are enormous. In fact, it is very shocking that we only have 8 wins. So shocking that it makes the data points look much more like sports.
Under the assumption that both engines are equally strong (or more precisely: that each game A versus B has probabilities p, (1-2p), p for win, draw, loss), the probability that the first 8 wins go to A is 1/256. This is independent of N. (But N does give an estimation of p.)
You seem to get confused when N is really big (the likely result of p being very small). But under the hypothesis that the outcomes of games between A and B are distributed w,d,l as p,(1-2p),p, for some unknown value of p, the first 8 wins going to A has the same implication on whether or not we should reject the hypothesis, independent of the number of draws N.
However, if you would fix p to be 0.1 in your hypothesis (which is then no longer just about A being as strong as B), then N being 10^100 obviously means that your hypothesis should be rejected.
-
- Posts: 12038
- Joined: Mon Jul 07, 2008 10:50 pm
Re: Throwing out draws to calculate Elo
Not been following this properly but in your opinion what is the maximum amount of games that you can have for the 8 wins not to be statistical noise?Dann Corbit wrote: ↑Fri Jul 03, 2020 9:36 am
Do you actually believe in a file of 8 million games with 8 wins and no losses that the 8 wins are not statistical noise?