I was testing Firenzina earlier today, when out of curiosity, I opened a minefield.
I asked on nTCEC how to calculate LOS, which resulted in a lot of answers, confusing me quite a bit.
Some people said you needed to include draws, some said discard them.
Some said use binomials, some said use a gaussian distribution.
I put one formula in, and got a LOS of 147%(!)
Is there a single formula that will work on give or take any calculator?
Matthew:out
Explain like I'm five: LOS formula
Moderator: Ras
-
ZirconiumX
- Posts: 1361
- Joined: Sun Jul 17, 2011 11:14 am
- Full name: Hannah Ravensloft
Explain like I'm five: LOS formula
tu ne cede malis, sed contra audentior ito
-
Ajedrecista
- Posts: 2178
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Explain like I'm five: LOS formula.
Hello Matthew:
Another formula by Rémi Coulom is seen in the last equation of this post. These two methods gives very similar results with a high number of played games (some thousands of games indeed).
I did a quick search on TalkChess and these are some long and interesting threads about LOS:
Likelihood of superiority
LOS calculation: Does the same result is always the same?
LOS (again)
Fast LOS estimation
There is a recent, interesting thread on Open Chess forum about LOS:
LOS
You might see some contradictions, taking in mind that the number of posts is so high. One thing is sure:
Sorry for not explain much about LOS but I am not an expert in spite of my numerous posts in this subject. I hope that all the links will be useful for you.
Regards from Spain.
Ajedrecista.
I use an assumption of normal distribution with its mean and sample standard deviation. My programme LOS_and_Elo_uncertainties_calculator (download link in my signature) does it in this way. This one-sided test is intuitive for me. Ed Schroder kindly hosts here some calculations that I did... of course take them with lot of care.ZirconiumX wrote:I was testing Firenzina earlier today, when out of curiosity, I opened a minefield.
I asked on nTCEC how to calculate LOS, which resulted in a lot of answers, confusing me quite a bit.
Some people said you needed to include draws, some said discard them.
Some said use binomials, some said use a gaussian distribution.
I put one formula in, and got a LOS of 147%(!)
Is there a single formula that will work on give or take any calculator?
Matthew:out
Another formula by Rémi Coulom is seen in the last equation of this post. These two methods gives very similar results with a high number of played games (some thousands of games indeed).
I did a quick search on TalkChess and these are some long and interesting threads about LOS:
Likelihood of superiority
LOS calculation: Does the same result is always the same?
LOS (again)
Fast LOS estimation
There is a recent, interesting thread on Open Chess forum about LOS:
LOS
You might see some contradictions, taking in mind that the number of posts is so high. One thing is sure:
0 < LOS < 1; the formula you applied (or the data you feed it) is clearly wrong.ZirconiumX wrote:I put one formula in, and got a LOS of 147%(!)
Sorry for not explain much about LOS but I am not an expert in spite of my numerous posts in this subject. I hope that all the links will be useful for you.
Regards from Spain.
Ajedrecista.
-
Steve Maughan
- Posts: 1315
- Joined: Wed Mar 08, 2006 8:28 pm
- Location: Florida, USA
Re: Explain like I'm five: LOS formula
Hi Matthew,
Think of it this way. Given a particular score (e.g. 60 Wins, 20 Draws, 20 Loses = 70% of 100 games), what is the chance that if you tossed a coin you'd get a 70% heads or better out of 100 goes? This is the "LOS" (with a binomial distribution).
Hope that helps,
Steve
Think of it this way. Given a particular score (e.g. 60 Wins, 20 Draws, 20 Loses = 70% of 100 games), what is the chance that if you tossed a coin you'd get a 70% heads or better out of 100 goes? This is the "LOS" (with a binomial distribution).
Hope that helps,
Steve
-
AlvaroBegue
- Posts: 932
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: Explain like I'm five: LOS formula
This is what I use:
I ignore draws, and I am probably approximating a binomial distribution with a Gaussian distribution, but I only intend to use this after playing over 100 games, so it won't matter.
Code: Select all
#include <cstdio>
#include <cstdlib>
#include <cmath>
int main(int argc, char **argv) {
if (argc != 4) {
std::printf("Wrong number of arguments.\n\nUsage:%s <wins> <losses> <draws>\n", argv[0]);
return 1;
}
int wins = std::atoi(argv[1]);
int losses = std::atoi(argv[2]);
int draws = std::atoi(argv[3]);
double winning_fraction = (wins + 0.5*draws) / (wins + losses + draws);
std::printf("Winning fraction: %g\n", winning_fraction);
double elo_difference = -std::log(1.0/winning_fraction-1.0)*400.0/std::log(10.0);
std::printf("Elo difference: %+g\n", elo_difference);
double p_value = .5 + .5 * std::erf((wins-losses)/std::sqrt(2.0*(wins+losses)));
std::printf("p-value: %g\n", p_value);
}
-
lucasart
- Posts: 3243
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Explain like I'm five: LOS formula
Yes, I know that spreads a lot of FUD (Fear Uncertainty Doubt). Here's what happens:ZirconiumX wrote: Some people said you needed to include draws, some said discard them.
Some said use binomials, some said use a gaussian distribution.
* the true distribution is neither binomial nor gaussian, it's actually trinomial.
* but the trinomial distribution is a royal pain to calculate, so for N "large enough" you can approximate it well with a gaussian distribution
* with or without draws is the _same_
- either you calculate the mean and standard deviation, and use gaussian quantiles
- or you use the erf() shortcut formula: this formula is again a shortcut that gives the same results for large values of N
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
lucasart
- Posts: 3243
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Explain like I'm five: LOS formula
IMO the REAL problem that tricks most people is early stopping. That's a far more complicated problem. Solutions exist, but let's keep it simple and assume that N (nb of games) is a pre-determined number and you never stop earlyon a favorable outcome (or the result will be biaised). What matters beyond the formula is to understand the hypothesis without which yhe "LOS formula" makes no sense at all:
* game results are identically distributed
* game results are independant (so you must flush the hash table before every game for example)
* N is a constant, that was pre-determined before the experiment.
* game results are identically distributed
* game results are independant (so you must flush the hash table before every game for example)
* N is a constant, that was pre-determined before the experiment.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
Michel
- Posts: 2292
- Joined: Mon Sep 29, 2008 1:50 am
Re: Explain like I'm five: LOS formula
Let me put things straight on this one.Yes, I know that spreads a lot of FUD (Fear Uncertainty Doubt).
LOS is a purely Bayesian concept. In the frequentist world the "probability that engine A is stronger than engine B" could be 0 or 1, but nothing else.
Another way to see that LOS is Bayesian is the fact that one needs to specify a prior to define it.
The true formula for LOS is given by Remi in this post
http://talkchess.com/forum/viewtopic.ph ... 05&t=30624
(assuming uniform prior).
Now LOS is often confused with the p-value (this is what you compute)
http://en.wikipedia.org/wiki/P-value
p-value is a frequentist concept.
In the simple case of a match between 2 engines, the p-value happens
to be asymptotically equal to the LOS, but this is not entirely trivial to prove.
But of course there is no need to prove it. Confirming it numerically is
sufficient for practical applications.
-
lucasart
- Posts: 3243
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Explain like I'm five: LOS formula
Thank you for clarifying that. I'll call it p-value from now on, promised!Michel wrote: Now LOS is often confused with the p-value (this is what you compute)
The key thing about that p-value is that you need to fix N in advance and play N games before using it as an unilateral test. It is a common mistake to try to use it in sequential testing.
Regarding sequential testing: is the bayesian LOS something that can be used as a stopping rule ?
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
Michel
- Posts: 2292
- Joined: Mon Sep 29, 2008 1:50 am
Re: Explain like I'm five: LOS formula
This is something I do not clearly understand myself. Given the definition of LOSRegarding sequential testing: is the bayesian LOS something that can be used as a stopping rule ?
The probability that engine A is better than engine B, taking into account all
information we have up to now.
(it is a conditional probability) you'd think one should be able to use LOS in a stopping rule. But since LOS is almost equal to the p-value this clearly can't be the case.
There is a lot of stuff on the internet about "Bayesian stopping" but as far as I can tell it does not bring anything new to the table in a practical sense
-
AlvaroBegue
- Posts: 932
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: Explain like I'm five: LOS formula
I have given this a little thought. This is what I think can be done in practice. Think of a situation where you are testing a proposed change to your engine.
You design an experiment by describing the stopping rule (say, stop when a player is ahead by 100 games, or when they have played 2,000 games, which ever happens first) and a rule for accepting the change (say, if new version has beaten old version by at least 50 games).
Now you can compute the distribution of the length of the experiment (measured in number of games) and a function that maps true Elo difference to probability of the change being accepted.
If you combine this with some prior for how good your changes are (people that have done systematic testing for years might have a good idea as to what that prior should be), you can measure the quality of the experiment design in expected Elo gain per game played. Now tweak it to maximize!
You design an experiment by describing the stopping rule (say, stop when a player is ahead by 100 games, or when they have played 2,000 games, which ever happens first) and a rule for accepting the change (say, if new version has beaten old version by at least 50 games).
Now you can compute the distribution of the length of the experiment (measured in number of games) and a function that maps true Elo difference to probability of the change being accepted.
If you combine this with some prior for how good your changes are (people that have done systematic testing for years might have a good idea as to what that prior should be), you can measure the quality of the experiment design in expected Elo gain per game played. Now tweak it to maximize!