For high t-values (in coin and chess cases 3.5 and above) stopping rules can be safely applied, though not very efficiently, in both Bayesian and frequentist approaches. I will re-post something I already posted here in the past, a bit modified:AlvaroBegue wrote:I am not proposing to use p-value as a stopping rule. If you were to use the t-value of the posterior distribution in your Bayesian setup as a stopping rule, that wouldn't fare too well either. This is not at all what we are talking about.Laskos wrote:I don't know why you are so infatuated with p-value and frequentist approach. Yes, it gives scientifical predictions. But p-value stopping rule is more an art, and is not even sound on theoretical grounds. Type I error is unbounded, and for infinite number of games is 100%. P-value, LOS with uniformed prior, can clumsily and inefficiently be used as stopping rule in our computer chess case, because the divergence is logarithmic, so on some range of data Type I error can be controlled. Did you ever see what Type I error accumulates for p-value of 0.05 stopping rule in 1000 games? I could dig for my older posts here.AlvaroBegue wrote:H.G.'s criticism of Bayesian inference is very common among scientists and engineers: If we can't agree on what the prior distribution looks like, we cannot agree on the conclusions.
I tend to favor Bayesian statistics more than most, but in this particular case Kai seems to be using the prior to make it look like we have more information than we really do.
So we have a coin, flipped it 6 times and it came down heads 5 times and tails once. How certain are we that the coin is not fair? Well, for a Bayesian analysis you need to know something about the origin of the coin, so we can have some a priori distribution for the hidden parameter p, the true probability of getting heads. Unfortunately in the real world you don't know where the coin came from. So the best we can do is try to quantify the evidence that we got from the observations we have. For instance, you can compute how often you expect to get a result as lopsided as the one you observed, if the coin were fair. In this case, it's 11% of the time. That's easy to interpret and doesn't depend on a prior that we can't agree on. That's why it's valuable. Go LOS!
================================================================================================
Suppose a man came to you with a coin, and said "whenever heads come up I win a dollar, whenever tails come up you win a dollar". You believe the coin is fair, and start the game. Your prior for the coin is the following:
Based on that you estimate the a priori LOS of the coin at 50.0%, the game is fair.
After 5 tosses, the result came unfavorably, 5 heads, 0 tails.
Based on that you estimate the LOS of the coin with this prior at 55.3%.
With "Uninformed Prior" LOS is about 98.4% (t-value about 2.1), and you are not able to stop yet.
But after 5-0 you begin to suspect that something is suspect ("gut feeling"). You don't like how that man behaves, his face expression. You see some anomalies in the density distribution of the coin. You decide to take another prior for the coin, one favoring heads (favoring the dubious by now, after 5-0, man proposing the game):
Based on new prior, you re-interpret the LOS of the coin after the same 5-0 as before (no more tosses) at t-value of 3.7. The stop after 5 tosses is justified, and you come to conclusion that the man is cheating you, the coin is not fair.
With "Uninformed Prior", you would reach the same conclusion after about 10 tosses.
The practical difference between the approaches in this case is that I lose 5 dollars, you lose 10. Bayesian framework allows for "gut feeling" of humans, and humans are often good at it. Humans are "holistic" in a approach, they look not only at 5-0 in binomial, but they know that charlatans exist, the coin has to have uniform shape and density, they know how to interpret dubious face expressions, etc. It doesn't matter too much the precise shape of the prior, or whether t-value is 3 or 5. Bayesian approach seems to me to favor "qualitative" plausibility issues and "reasoning" in real world more than precise quantities in some formalized framework. And that in phenomenology is often better than playing with more rigorous but vague Null hypothesis in formalized and specialized domain.
================================================================================================
Maybe that was not that good example. It shows more arbitrariety of stopping and ad hoc decisions than anything else. Say, In Bayesian approach, if I am a stupid human, and keep the initial prior (first plot), I would need some 150 tosses to reasonable t-value for a stop. But I guess an average human is not that stupid. And I guess that an average human has both a holistic and Bayesian approach on a plethora of issues daily.
There are no formal problems with Cosmological constant in Einstein field equations. The problem appears when physicists try to make "some sense" of it in the real, rich phenomenologically, physical world. The issues like "naturalness" and even "anthropic principle" (derided in the past) come to prominence. And this is very related to Bayesian approach.
Anyway, let's return to chess engines:
Code: Select all
Score of Stockfish 8 vs Stockfish 7: 385 - 82 - 533 [0.651] 1000
ELO difference: 108.68 +/- 14.51
Finished match
Sorry for this long, rambling post.