## Likelihood Of Success (LOS) in the real world

Discussion of chess software programming and technical issues.

Moderators: Harvey Williamson, bob, hgm

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Posts: 10823
Joined: Wed Jul 26, 2006 8:21 pm

### Likelihood Of Success (LOS) in the real world

LOS as usually understood and computed is a mathematical fiction. It uses a uniform prior (1 for [0,1] region of score in Chess). Humans have uniforms priors probably only at birth. In the likes of Stockfish Testing Framework and other development frameworks, the unconsciously assumed priors are so strong, that LOSp (or LOS with non-uniform prior) is completely off LOS with uniform prior. LOSp depends both on prior and Draws, besides Wins and Losses. LOS depends only on Wins and Losses.

A unnormalized prior for Stockfish Testing Framework might look a bit scary:

[score*(1-score)]**1000

As scary as it seems, it assumes that the ELO differences between development versions are no larger than 15 ELO points, which is a reasonable assumption for the Framework. LOS and LOSp in for W=1, D=0, L=0 look as following:

LOS = 0.75
LOSp = 0.517

One Win gives almost no information in real Stockfish world. Suppose with have now 5 consecutive Draws: W=1, D=5, L=0:

LOS = 0.75 again (independent of Draws)
LOSp = 0.553

5 Draws gave more information on LOSp than 1 Win (having that Win).

Rating groups use more liberal ELO differences in direct matches of up to say 400 ELO points. A suitable prior is [score*(1-score)]**2. In this case the differences is less accentuated, but still visible in W=1, D=0, L=0 case:

LOS = 0.75
LOSp = 0.698

With 5 Draws added, LOSp becomes 0.739, closer to 0.75 of the uniform prior.

Many people are still using LOS as some empiric stopping rule, and additional care must be taken, especially when one feels or knows engines are very close in strength.

My computations were done in Mathematica for general prior and W, D, L, I just exemplified hare some results. I could post the code, but it's not very illuminating.

AlvaroBegue
Posts: 925
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

### Re: Likelihood Of Success (LOS) in the real world

I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a p-value. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].

See https://en.wikipedia.org/wiki/P-value .

Posts: 10823
Joined: Wed Jul 26, 2006 8:21 pm

### Re: Likelihood Of Success (LOS) in the real world

AlvaroBegue wrote:I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a p-value. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].

See https://en.wikipedia.org/wiki/P-value .
LOS as p-value is indeed defined with an uniform prior. So, a pretty bad quantity to use for us.

AlvaroBegue
Posts: 925
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

### Re: Likelihood Of Success (LOS) in the real world

AlvaroBegue wrote:I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a p-value. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].

See https://en.wikipedia.org/wiki/P-value .
LOS as p-value is indeed defined with an uniform prior. So, a pretty bad quantity to use for us.
I don't understand what you are saying. LOS as a p-value doesn't use a prior at all, uniform or otherwise. It is a measure of statistical significance of a departure from the null hypothesis. There is really nothing wrong with it.

Posts: 10823
Joined: Wed Jul 26, 2006 8:21 pm

### Re: Likelihood Of Success (LOS) in the real world

AlvaroBegue wrote:
AlvaroBegue wrote:I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a p-value. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].

See https://en.wikipedia.org/wiki/P-value .
LOS as p-value is indeed defined with an uniform prior. So, a pretty bad quantity to use for us.
I don't understand what you are saying. LOS as a p-value doesn't use a prior at all, uniform or otherwise. It is a measure of statistical significance of a departure from the null hypothesis. There is really nothing wrong with it.
Bayes' formula to derive LOS uses the uniform prior (and thus giving that nice closed form and erf approximation). I used the same Bayes' formula to derive LOSp (non-uniform prior) with numerical results.

AlvaroBegue
Posts: 925
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

### Re: Likelihood Of Success (LOS) in the real world

AlvaroBegue wrote:
AlvaroBegue wrote:I don't think of LOS in any Bayesian framework: LOS is what in other fields is called a p-value. It's a quantity that, under the null hypothesis that both players are equally strong (i.e., if the true Elo difference is 0), would be uniformly distributed in [0,1].

See https://en.wikipedia.org/wiki/P-value .
LOS as p-value is indeed defined with an uniform prior. So, a pretty bad quantity to use for us.
I don't understand what you are saying. LOS as a p-value doesn't use a prior at all, uniform or otherwise. It is a measure of statistical significance of a departure from the null hypothesis. There is really nothing wrong with it.
Bayes' formula to derive LOS uses the uniform prior (and thus giving that nice closed form and erf approximation). I used the same Bayes' formula to derive LOSp (non-uniform prior) with numerical results.
Let me see if I understand what you are saying. We can consider one "heat" to be a small match between engine 1 and engine 2 where we continue playing games until we get a result that is not a draw. There is a true probability of engine 1 winning the heat, and we call it p.

We could discover something about p by using Bayesian statistics, where we start with some prior, we observe some results of heats and we then get a posterior probability. We might be interested in answering questions like "what is the probability that p is larger than 0.5?".

If we use a uniform prior, that probability is the LOS as it's usually defined. If we use a different prior (I think you suggest a Beta(1001,1001) distribution), we'll get an alternative definition (which will look a lot like assuming an initial tally of 1000 wins and 1000 losses).

Are we together so far?

What I am saying is that you can define LOS as a p-value of the results, which is a test of the plausibility of the null hypothesis. This is a frequentist approach to the problem, and not a Bayesian one. This is how I think of the meaning of LOS, and nothing else. It's still a very useful number, but it needs to be interpreted carefully, just like any p-value.

CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.
WHAAAT??

Now I see what your beef is about! Your Bayesian interpretation of LOS with uniform prior would give some meaning to that sentence, but assuming a uniform prior is unreasonable. The other possibility is that whoever wrote that is being tripped by a very common misunderstanding of p-values. So common in fact that it has its own Wikipedia page: https://en.wikipedia.org/wiki/Misunders ... f_p-values

cdani
Posts: 2175
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

### Re: Likelihood Of Success (LOS) in the real world

AlvaroBegue wrote:
CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.
WHAAAT??
Can any of you give a non mathematical plain language sentence of what is LOS? Thanks!

ZirconiumX
Posts: 1327
Joined: Sun Jul 17, 2011 9:14 am

### Re: Likelihood Of Success (LOS) in the real world

cdani wrote:
AlvaroBegue wrote:
CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.
WHAAAT??
Can any of you give a non mathematical plain language sentence of what is LOS? Thanks!
I'm going to say this as I understand it; if I'm wrong then we've both learned something.

Let's say you have a match between engine A and engine B. The Likelihood of superiority is the probability that A will win the match. If A is clearly stronger (wins more games), then the LOS will increase to a limit of 1. If A is clearly weaker (loses more games), the LOS will decrease to a limit of 0. If the two are equally strong (games are about equal), the LOS will be around the 0.5 mark.

Some people then use LOS > 0.99 or whatever to conclude A is stronger and LOS < 0.01 to conclude B is stronger.

The mathematicians among us say this is a bad idea and you should use the SPRT instead, like Stockfish does.
Some believe in the almighty dollar.

I believe in the almighty printf statement.

Posts: 10823
Joined: Wed Jul 26, 2006 8:21 pm

### Re: Likelihood Of Success (LOS) in the real world

cdani wrote:
AlvaroBegue wrote:
CPW wrote:The likelihood of superiority (LOS) denotes the probability of a certain engine being stronger than another.
WHAAAT??
Can any of you give a non mathematical plain language sentence of what is LOS? Thanks!
In Bayesian approach P(w>l | W,D,L) is indeed the probability w>l with a given prior. And our usual LOS is the probability of w>l with a usually wrong, uniform prior. In frequentist approach, LOS gives the plausibility of the Null hypothesis. LOS of 50% gives 100% plausibility, LOS 100% gives 0% plausibility. It gives no information on probabilities w>l, as it just tests the Null hypothesis (p-value). So, if you want to have (posterior) probability, use a reasonable prior and use Bayes' formula to get LOSp as I have shown in OP.