I downloaded Ordo. I looks quite professional and is very fast. However I noticed that the error bars computed by Ordo are much higher than those
of BayesElo.
For example I have a pgn where some players have about 100000 games. BayesElo given +- 2 for the error bars whereas Ordo gives +- 25 (100 simulations).
Since Ordo is based on simulations it should give quite accurate error bars. So what is going on?
EDIT: Maybe I did something wrong. On a second run the error bars were +-3.3 which is still too high but much more reasonable.
Question about Ordo
Moderator: Ras
-
Michel
- Posts: 2292
- Joined: Mon Sep 29, 2008 1:50 am
Re: Question about Ordo
Ok it seems to depend on the value of the parameter -a. With -a 0
it gives 3.3. Without -a it gives 24.7.
it gives 3.3. Without -a it gives 24.7.
-
michiguel
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Question about Ordo
The errors are +/- to a given reference. If the reference given is a player that played only 10 games, then the error will be big no matter if the other players played a many games. By default, the reference is the average of the pool. If you have too many players with few games, the uncertainty of the players with many games (compared to the average of the pool) will be big too. For instance, let's assume you haveMichel wrote:I downloaded Ordo. I looks quite professional and is very fast. However I noticed that the error bars computed by Ordo are much higher than those
of BayesElo.
For example I have a pgn where some players have about 100000 games. BayesElo given +- 2 for the error bars whereas Ordo gives +- 25 (100 simulations).
Since Ordo is based on simulations it should give quite accurate error bars. So what is going on?
EDIT: Maybe I did something wrong. On a second run the error bars were +-3.3 which is still too high but much more reasonable.
EngA vs EngB (10000 games)
EngA vs EngC (10 games)
EngB vs EngC (10 games)
The elo error for EngA and EngB will be big, because the relative elos to C are big. However, the relative strength between A and B are well defined. So, if you use as reference EngA with switches -a0 -AEngA then you will see that the error for EngB will be small and the error for EngC will be big (and or course, the error for EngA will be zero since it is the reference). You can play around with a file made up like the above to get a feeling what Ordo is doing.
Another way to look at errors is to use the -efile.csv switch, and then you will get a csv file that could be open with any spreadsheet. It will have a matrix with all the errors between all the players. That is the most straightforward way to interpret the results. For BayesELO, the most straightforward way to interpret the uncertainty is the LOS table. The errors could vary very much according to the different options chosen and can confuse the user. I bet that the csv matrix file in Ordo will give results that will match the LOS table well.
I hope this helps (please let me know), otherwise, you can strip the moves from the pgn with ordoprep, zip it, and send it to my by email so I can take a look to make sure whay I say it applies to your case. Of course I am assuming there is no bug, but it is always possible.
Miguel
EDIT: now I see your follow up. I think what I said applies to your case.
-
Michel
- Posts: 2292
- Joined: Mon Sep 29, 2008 1:50 am
Re: Question about Ordo
Ok, thanks for the clarification.
(BTW: it should not be too hard to compute the covariance matrix without doing simulations I think.)
(BTW: it should not be too hard to compute the covariance matrix without doing simulations I think.)
-
Michel
- Posts: 2292
- Joined: Mon Sep 29, 2008 1:50 am
Re: Question about Ordo
When Ordo does "simulations" does it take draws into account?
-
michiguel
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Question about Ordo
Yes, it does. But this is an area that could be a little bit more elaborated. The key is to get what is the fraction of draws expected for a given delta rating. The way Ordo does it is with a formula that fits relatively well the results observed in most rating lists. For pragmatic reasons, I believe this is ok, but it can certainly be improved.Michel wrote:When Ordo does "simulations" does it take draws into account?
Miguel
PS: This is the equation that calculates the probabilities
Code: Select all
static void
// performance expected for ratings a and b
static double
xpect (double a, double b)
{
return 1.0 / (1.0 + exp((b-a)*BETA));
}
get_pWDL(double dr /*delta rating*/, double *pw, double *pd, double *pl)
{
double f, dc, pdra, pwin, plos;
bool_t switched;
switched = dr < 0;
if (switched) dr = -dr;
f = xpect (dr,0);
dc = 0.5 / (0.5 + 1.23 * exp(dr/175.0)); // <=== HERE
pwin = f * (1 - dc);
plos = 1 - f;
pdra = 1 - pwin - plos;
if (switched) {
*pw = plos;
*pd = pdra;
*pl = pwin;
} else {
*pw = pwin;
*pd = pdra;
*pl = plos;
}
return;
}-
michiguel
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Question about Ordo
I think so, but I thought I could see how this works and provide something different. The methodology is really straightforward. Besides, I was really curious. One of the reasons I was curious to do it is because I figured that for tough cases the simulations could provide a different (and may be more reasonable?) answer. These covariance matrices also assume things.Michel wrote:Ok, thanks for the clarification.
(BTW: it should not be too hard to compute the covariance matrix without doing simulations I think.)
Miguel
-
Michel
- Posts: 2292
- Joined: Mon Sep 29, 2008 1:50 am
Re: Question about Ordo
Code: Select all
f = xpect (dr,0);
dc = 0.5 / (0.5 + 1.23 * exp(dr/175.0)); // <=== HERE
pwin = f * (1 - dc);
plos = 1 - f;
pdra = 1 - pwin - plos;
draws you seem to change the expected score in such a way that it
no longer fits the logistic model.
For example for dr=0 I seem to be getting 0.43 for the expected score
instead of 0.5.
Unless beta is not what I think it is.... (I would guess it is -log(10)/400).
-
michiguel
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Question about Ordo
Thanks for taking a look at this. I should be less lazy and open source it.Michel wrote:I am not sure I understand this correctly. By replacing some wins byCode: Select all
f = xpect (dr,0); dc = 0.5 / (0.5 + 1.23 * exp(dr/175.0)); // <=== HERE pwin = f * (1 - dc); plos = 1 - f; pdra = 1 - pwin - plos;
draws you seem to change the expected score in such a way that it
no longer fits the logistic model.
For example for dr=0 I seem to be getting 0.43 for the expected score
instead of 0.5.
Unless beta is not what I think it is.... (I would guess it is -log(10)/400).
beta = 1/175 and it looks you are right. A quick looks tell me that what it makes sense if I have
f = xpect (0,dr);
dc = 0.5 / (0.5 + 1.23 * exp(dr/175.0));
pwin = f * (1 - dc);
pdra = 2 * f * dc; // which is 2 * (f-pwin)
plos = 1 - pwin - plos;
In that way, the probability of draws will increase relative to the probability to wins when you are weaker, and of course the performance expected from draws and wins matches the performance predicted by the formula.
I need to take a closer look at that to see if it matches the data I had from the databases. If this mistake is confirmed, the ratings are fine but the errors are not accurate. The good news is that (almost?) nobody used the simulations so far.
Miguel
-
michiguel
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Question about Ordo
I rechecked the data Adam sent me with statistics of draw rate versus rating difference and this is what I will use:michiguel wrote:Thanks for taking a look at this. I should be less lazy and open source it.Michel wrote:I am not sure I understand this correctly. By replacing some wins byCode: Select all
f = xpect (dr,0); dc = 0.5 / (0.5 + 1.23 * exp(dr/175.0)); // <=== HERE pwin = f * (1 - dc); plos = 1 - f; pdra = 1 - pwin - plos;
draws you seem to change the expected score in such a way that it
no longer fits the logistic model.
For example for dr=0 I seem to be getting 0.43 for the expected score
instead of 0.5.
Unless beta is not what I think it is.... (I would guess it is -log(10)/400).
beta = 1/175 and it looks you are right. A quick looks tell me that what it makes sense if I have
f = xpect (0,dr);
dc = 0.5 / (0.5 + 1.23 * exp(dr/175.0));
pwin = f * (1 - dc);
pdra = 2 * f * dc; // which is 2 * (f-pwin)
plos = 1 - pwin - plos;
In that way, the probability of draws will increase relative to the probability to wins when you are weaker, and of course the performance expected from draws and wins matches the performance predicted by the formula.
I need to take a closer look at that to see if it matches the data I had from the databases. If this mistake is confirmed, the ratings are fine but the errors are not accurate. The good news is that (almost?) nobody used the simulations so far.
Miguel
Code: Select all
#define DRAWRATE_AT_EQUAL_STRENGTH 0.33
#define DRAWFACTOR (1/(2*(DRAWRATE_AT_EQUAL_STRENGTH))-0.5)
static void
get_pWDL(double dr /*delta rating*/, double *pw, double *pd, double *pl)
{
// Performance comprises wins and draws.
// if f is expected performance from 0 to 1.0, then
// f = pwin + pdraw/2
// from that, dc is the fraction of points that come from draws, not wins, so
// pdraw (probability of draw) = 2 * f * dc
// calculation of dc is an empirical formula to fit average data from CCRL:
// Draw rate of equal engines is near 0.33, and decays on uneven matches.
double f, dc, pdra, pwin, plos;
bool_t switched;
switched = dr < 0;
if (switched) dr = -dr;
f = xpect (0,dr);
dc = 0.5 / (0.5 + DRAWFACTOR * exp(dr*BETA));
pdra = 2 * f * dc;
pwin = f - pdra/2;
plos = 1 - pwin - pdra;
if (switched) {
*pw = plos;
*pd = pdra;
*pl = pwin;
} else {
*pw = pwin;
*pd = pdra;
*pl = plos;
}
return;
}Miguel