Explain like I'm five: LOS formula

ZirconiumX · Post by **ZirconiumX** » Tue Apr 02, 2013 4:09 pm

I was testing Firenzina earlier today, when out of curiosity, I opened a minefield.

I asked on nTCEC how to calculate LOS, which resulted in a lot of answers, confusing me quite a bit.

Some people said you needed to include draws, some said discard them.
Some said use binomials, some said use a gaussian distribution.

I put one formula in, and got a LOS of 147%(!)

Is there a single formula that will work on give or take any calculator?

Matthew:out

Ajedrecista · Post by **Ajedrecista** » Tue Apr 02, 2013 5:09 pm

Hello Matthew:

ZirconiumX wrote:I was testing Firenzina earlier today, when out of curiosity, I opened a minefield.

I asked on nTCEC how to calculate LOS, which resulted in a lot of answers, confusing me quite a bit.

Some people said you needed to include draws, some said discard them.
Some said use binomials, some said use a gaussian distribution.

I put one formula in, and got a LOS of 147%(!)

Is there a single formula that will work on give or take any calculator?

Matthew:out

I use an assumption of normal distribution with its mean and sample standard deviation. My programme LOS_and_Elo_uncertainties_calculator (download link in my signature) does it in this way. This one-sided test is intuitive for me. Ed Schroder kindly hosts here some calculations that I did... of course take them with lot of care.

Another formula by Rémi Coulom is seen in the last equation of this post. These two methods gives very similar results with a high number of played games (some thousands of games indeed).

I did a quick search on TalkChess and these are some long and interesting threads about LOS:

Likelihood of superiority

LOS calculation: Does the same result is always the same?

LOS (again)

Fast LOS estimation

There is a recent, interesting thread on Open Chess forum about LOS:

LOS

You might see some contradictions, taking in mind that the number of posts is so high. One thing is sure:

ZirconiumX wrote:I put one formula in, and got a LOS of 147%(!)

0 < LOS < 1; the formula you applied (or the data you feed it) is clearly wrong.

Sorry for not explain much about LOS but I am not an expert in spite of my numerous posts in this subject. I hope that all the links will be useful for you.

Regards from Spain.

Ajedrecista.

Steve Maughan · Post by **Steve Maughan** » Tue Apr 02, 2013 5:45 pm

Hi Matthew,

Think of it this way. Given a particular score (e.g. 60 Wins, 20 Draws, 20 Loses = 70% of 100 games), what is the chance that if you tossed a coin you'd get a 70% heads or better out of 100 goes? This is the "LOS" (with a binomial distribution).

Hope that helps,

Steve

AlvaroBegue · Post by **AlvaroBegue** » Tue Apr 02, 2013 7:50 pm

This is what I use:

Code: Select all

#include <cstdio>
#include <cstdlib>
#include <cmath>

int main(int argc, char **argv) {
  if (argc != 4) {
    std::printf("Wrong number of arguments.\n\nUsage:%s <wins> <losses> <draws>\n", argv[0]);
    return 1;
  }
  int wins = std::atoi(argv[1]);
  int losses = std::atoi(argv[2]);
  int draws = std::atoi(argv[3]);

  double winning_fraction = (wins + 0.5*draws) / (wins + losses + draws);
  std::printf("Winning fraction: %g\n", winning_fraction);
  double elo_difference = -std::log(1.0/winning_fraction-1.0)*400.0/std::log(10.0);
  std::printf("Elo difference: %+g\n", elo_difference);
  double p_value = .5 + .5 * std::erf((wins-losses)/std::sqrt(2.0*(wins+losses)));
  std::printf("p-value: %g\n", p_value);
}

I ignore draws, and I am probably approximating a binomial distribution with a Gaussian distribution, but I only intend to use this after playing over 100 games, so it won't matter.

lucasart · Post by **lucasart** » Wed Apr 03, 2013 1:17 am

ZirconiumX wrote: Some people said you needed to include draws, some said discard them.
Some said use binomials, some said use a gaussian distribution.

Yes, I know that spreads a lot of FUD (Fear Uncertainty Doubt). Here's what happens:
* the true distribution is neither binomial nor gaussian, it's actually trinomial.
* but the trinomial distribution is a royal pain to calculate, so for N "large enough" you can approximate it well with a gaussian distribution
* with or without draws is the _same_
- either you calculate the mean and standard deviation, and use gaussian quantiles
- or you use the erf() shortcut formula: this formula is again a shortcut that gives the same results for large values of N

lucasart · Post by **lucasart** » Wed Apr 03, 2013 1:23 am

IMO the REAL problem that tricks most people is early stopping. That's a far more complicated problem. Solutions exist, but let's keep it simple and assume that N (nb of games) is a pre-determined number and you never stop earlyon a favorable outcome (or the result will be biaised). What matters beyond the formula is to understand the hypothesis without which yhe "LOS formula" makes no sense at all:
* game results are identically distributed
* game results are independant (so you must flush the hash table before every game for example)
* N is a constant, that was pre-determined before the experiment.

Michel · Post by **Michel** » Wed Apr 03, 2013 12:34 pm

Yes, I know that spreads a lot of FUD (Fear Uncertainty Doubt).

Let me put things straight on this one.

LOS is a purely Bayesian concept. In the frequentist world the "probability that engine A is stronger than engine B" could be 0 or 1, but nothing else.

Another way to see that LOS is Bayesian is the fact that one needs to specify a prior to define it.

The true formula for LOS is given by Remi in this post

http://talkchess.com/forum/viewtopic.ph ... 05&t=30624

(assuming uniform prior).

Now LOS is often confused with the p-value (this is what you compute)

http://en.wikipedia.org/wiki/P-value

p-value is a frequentist concept.

In the simple case of a match between 2 engines, the p-value happens
to be asymptotically equal to the LOS, but this is not entirely trivial to prove.
But of course there is no need to prove it. Confirming it numerically is
sufficient for practical applications.

lucasart · Post by **lucasart** » Thu Apr 04, 2013 2:02 am

Michel wrote: Now LOS is often confused with the p-value (this is what you compute)

Thank you for clarifying that. I'll call it p-value from now on, promised!

The key thing about that p-value is that you need to fix N in advance and play N games before using it as an unilateral test. It is a common mistake to try to use it in sequential testing.

Regarding sequential testing: is the bayesian LOS something that can be used as a stopping rule ?

Michel · Post by **Michel** » Thu Apr 04, 2013 9:27 am

Regarding sequential testing: is the bayesian LOS something that can be used as a stopping rule ?

This is something I do not clearly understand myself. Given the definition of LOS

The probability that engine A is better than engine B, taking into account all
information we have up to now.

(it is a conditional probability) you'd think one should be able to use LOS in a stopping rule. But since LOS is almost equal to the p-value this clearly can't be the case.

There is a lot of stuff on the internet about "Bayesian stopping" but as far as I can tell it does not bring anything new to the table in a practical sense

AlvaroBegue · Post by **AlvaroBegue** » Thu Apr 04, 2013 10:27 am

I have given this a little thought. This is what I think can be done in practice. Think of a situation where you are testing a proposed change to your engine.

You design an experiment by describing the stopping rule (say, stop when a player is ahead by 100 games, or when they have played 2,000 games, which ever happens first) and a rule for accepting the change (say, if new version has beaten old version by at least 50 games).

Now you can compute the distribution of the length of the experiment (measured in number of games) and a function that maps true Elo difference to probability of the change being accepted.

If you combine this with some prior for how good your changes are (people that have done systematic testing for years might have a good idea as to what that prior should be), you can measure the quality of the experiment design in expected Elo gain per game played. Now tweak it to maximize!

Explain like I'm five: LOS formula

Explain like I'm five: LOS formula

Re: Explain like I'm five: LOS formula.

Re: Explain like I'm five: LOS formula

Re: Explain like I'm five: LOS formula

Re: Explain like I'm five: LOS formula

Re: Explain like I'm five: LOS formula

Re: Explain like I'm five: LOS formula

Re: Explain like I'm five: LOS formula

Re: Explain like I'm five: LOS formula

Re: Explain like I'm five: LOS formula