Likelihood Of Success (LOS) in the real world
Moderators: Harvey Williamson, bob, hgm
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Likelihood Of Success (LOS) in the real world
LOS as usually understood and computed is a mathematical fiction. It uses a uniform prior (1 for [0,1] region of score in Chess). Humans have uniforms priors probably only at birth. In the likes of Stockfish Testing Framework and other development frameworks, the unconsciously assumed priors are so strong, that LOSp (or LOS with nonuniform prior) is completely off LOS with uniform prior. LOSp depends both on prior and Draws, besides Wins and Losses. LOS depends only on Wins and Losses.
A unnormalized prior for Stockfish Testing Framework might look a bit scary:
[score*(1score)]**1000
As scary as it seems, it assumes that the ELO differences between development versions are no larger than 15 ELO points, which is a reasonable assumption for the Framework. LOS and LOSp in for W=1, D=0, L=0 look as following:
LOS = 0.75
LOSp = 0.517
One Win gives almost no information in real Stockfish world. Suppose with have now 5 consecutive Draws: W=1, D=5, L=0:
LOS = 0.75 again (independent of Draws)
LOSp = 0.553
5 Draws gave more information on LOSp than 1 Win (having that Win).
Rating groups use more liberal ELO differences in direct matches of up to say 400 ELO points. A suitable prior is [score*(1score)]**2. In this case the differences is less accentuated, but still visible in W=1, D=0, L=0 case:
LOS = 0.75
LOSp = 0.698
With 5 Draws added, LOSp becomes 0.739, closer to 0.75 of the uniform prior.
Many people are still using LOS as some empiric stopping rule, and additional care must be taken, especially when one feels or knows engines are very close in strength.
My computations were done in Mathematica for general prior and W, D, L, I just exemplified hare some results. I could post the code, but it's not very illuminating.
A unnormalized prior for Stockfish Testing Framework might look a bit scary:
[score*(1score)]**1000
As scary as it seems, it assumes that the ELO differences between development versions are no larger than 15 ELO points, which is a reasonable assumption for the Framework. LOS and LOSp in for W=1, D=0, L=0 look as following:
LOS = 0.75
LOSp = 0.517
One Win gives almost no information in real Stockfish world. Suppose with have now 5 consecutive Draws: W=1, D=5, L=0:
LOS = 0.75 again (independent of Draws)
LOSp = 0.553
5 Draws gave more information on LOSp than 1 Win (having that Win).
Rating groups use more liberal ELO differences in direct matches of up to say 400 ELO points. A suitable prior is [score*(1score)]**2. In this case the differences is less accentuated, but still visible in W=1, D=0, L=0 case:
LOS = 0.75
LOSp = 0.698
With 5 Draws added, LOSp becomes 0.739, closer to 0.75 of the uniform prior.
Many people are still using LOS as some empiric stopping rule, and additional care must be taken, especially when one feels or knows engines are very close in strength.
My computations were done in Mathematica for general prior and W, D, L, I just exemplified hare some results. I could post the code, but it's not very illuminating.

 Posts: 925
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Re: Likelihood Of Success (LOS) in the real world
I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a pvalue. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].
See https://en.wikipedia.org/wiki/Pvalue .
See https://en.wikipedia.org/wiki/Pvalue .
Re: Likelihood Of Success (LOS) in the real world
LOS as pvalue is indeed defined with an uniform prior. So, a pretty bad quantity to use for us.AlvaroBegue wrote:I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a pvalue. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].
See https://en.wikipedia.org/wiki/Pvalue .

 Posts: 925
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Re: Likelihood Of Success (LOS) in the real world
I don't understand what you are saying. LOS as a pvalue doesn't use a prior at all, uniform or otherwise. It is a measure of statistical significance of a departure from the null hypothesis. There is really nothing wrong with it.Laskos wrote:LOS as pvalue is indeed defined with an uniform prior. So, a pretty bad quantity to use for us.AlvaroBegue wrote:I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a pvalue. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].
See https://en.wikipedia.org/wiki/Pvalue .
Re: Likelihood Of Success (LOS) in the real world
Bayes' formula to derive LOS uses the uniform prior (and thus giving that nice closed form and erf approximation). I used the same Bayes' formula to derive LOSp (nonuniform prior) with numerical results.AlvaroBegue wrote:I don't understand what you are saying. LOS as a pvalue doesn't use a prior at all, uniform or otherwise. It is a measure of statistical significance of a departure from the null hypothesis. There is really nothing wrong with it.Laskos wrote:LOS as pvalue is indeed defined with an uniform prior. So, a pretty bad quantity to use for us.AlvaroBegue wrote:I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a pvalue. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].
See https://en.wikipedia.org/wiki/Pvalue .

 Posts: 925
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Re: Likelihood Of Success (LOS) in the real world
Let me see if I understand what you are saying. We can consider one "heat" to be a small match between engine 1 and engine 2 where we continue playing games until we get a result that is not a draw. There is a true probability of engine 1 winning the heat, and we call it p.Laskos wrote:Bayes' formula to derive LOS uses the uniform prior (and thus giving that nice closed form and erf approximation). I used the same Bayes' formula to derive LOSp (nonuniform prior) with numerical results.AlvaroBegue wrote:I don't understand what you are saying. LOS as a pvalue doesn't use a prior at all, uniform or otherwise. It is a measure of statistical significance of a departure from the null hypothesis. There is really nothing wrong with it.Laskos wrote:LOS as pvalue is indeed defined with an uniform prior. So, a pretty bad quantity to use for us.AlvaroBegue wrote:I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a pvalue. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].
See https://en.wikipedia.org/wiki/Pvalue .
We could discover something about p by using Bayesian statistics, where we start with some prior, we observe some results of heats and we then get a posterior probability. We might be interested in answering questions like "what is the probability that p is larger than 0.5?".
If we use a uniform prior, that probability is the LOS as it's usually defined. If we use a different prior (I think you suggest a Beta(1001,1001) distribution), we'll get an alternative definition (which will look a lot like assuming an initial tally of 1000 wins and 1000 losses).
Are we together so far?
What I am saying is that you can define LOS as a pvalue of the results, which is a test of the plausibility of the null hypothesis. This is a frequentist approach to the problem, and not a Bayesian one. This is how I think of the meaning of LOS, and nothing else. It's still a very useful number, but it needs to be interpreted carefully, just like any pvalue.
WHAAAT??CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.
Now I see what your beef is about! Your Bayesian interpretation of LOS with uniform prior would give some meaning to that sentence, but assuming a uniform prior is unreasonable. The other possibility is that whoever wrote that is being tripped by a very common misunderstanding of pvalues. So common in fact that it has its own Wikipedia page: https://en.wikipedia.org/wiki/Misunders ... f_pvalues
Re: Likelihood Of Success (LOS) in the real world
Can any of you give a non mathematical plain language sentence of what is LOS? Thanks!AlvaroBegue wrote:WHAAAT??CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.
Daniel José  http://www.andscacs.com

 Posts: 1327
 Joined: Sun Jul 17, 2011 9:14 am
Re: Likelihood Of Success (LOS) in the real world
I'm going to say this as I understand it; if I'm wrong then we've both learned something.cdani wrote:Can any of you give a non mathematical plain language sentence of what is LOS? Thanks!AlvaroBegue wrote:WHAAAT??CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.
Let's say you have a match between engine A and engine B. The Likelihood of superiority is the probability that A will win the match. If A is clearly stronger (wins more games), then the LOS will increase to a limit of 1. If A is clearly weaker (loses more games), the LOS will decrease to a limit of 0. If the two are equally strong (games are about equal), the LOS will be around the 0.5 mark.
Some people then use LOS > 0.99 or whatever to conclude A is stronger and LOS < 0.01 to conclude B is stronger.
The mathematicians among us say this is a bad idea and you should use the SPRT instead, like Stockfish does.
Some believe in the almighty dollar.
I believe in the almighty printf statement.
I believe in the almighty printf statement.
Re: Likelihood Of Success (LOS) in the real world
In Bayesian approach P(w>l  W,D,L) is indeed the probability w>l with a given prior. And our usual LOS is the probability of w>l with a usually wrong, uniform prior. In frequentist approach, LOS gives the plausibility of the Null hypothesis. LOS of 50% gives 100% plausibility, LOS 100% gives 0% plausibility. It gives no information on probabilities w>l, as it just tests the Null hypothesis (pvalue). So, if you want to have (posterior) probability, use a reasonable prior and use Bayes' formula to get LOSp as I have shown in OP.cdani wrote:Can any of you give a non mathematical plain language sentence of what is LOS? Thanks!AlvaroBegue wrote:WHAAAT??CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.
Re: Likelihood Of Success (LOS) in the real world
There is some universality of LOSp under chosen prior, its width is all that matters. Priors (s*(1s))**1000 and exp((s0.5)**2 * 2500) with widths both of 15 ELO points difference give similar results for LOSp. And very different from naive, uniform prior LOS.