Question about Ordo

Michel · Post by **Michel** » Wed Jun 19, 2013 5:46 pm

I downloaded Ordo. I looks quite professional and is very fast. However I noticed that the error bars computed by Ordo are much higher than those
of BayesElo.

For example I have a pgn where some players have about 100000 games. BayesElo given +- 2 for the error bars whereas Ordo gives +- 25 (100 simulations).

Since Ordo is based on simulations it should give quite accurate error bars. So what is going on?

EDIT: Maybe I did something wrong. On a second run the error bars were +-3.3 which is still too high but much more reasonable.

Michel · Post by **Michel** » Wed Jun 19, 2013 6:03 pm

Ok it seems to depend on the value of the parameter -a. With -a 0
it gives 3.3. Without -a it gives 24.7.

michiguel · Post by **michiguel** » Wed Jun 19, 2013 6:18 pm

Michel wrote:I downloaded Ordo. I looks quite professional and is very fast. However I noticed that the error bars computed by Ordo are much higher than those
of BayesElo.

For example I have a pgn where some players have about 100000 games. BayesElo given +- 2 for the error bars whereas Ordo gives +- 25 (100 simulations).

Since Ordo is based on simulations it should give quite accurate error bars. So what is going on?

EDIT: Maybe I did something wrong. On a second run the error bars were +-3.3 which is still too high but much more reasonable.

The errors are +/- to a given reference. If the reference given is a player that played only 10 games, then the error will be big no matter if the other players played a many games. By default, the reference is the average of the pool. If you have too many players with few games, the uncertainty of the players with many games (compared to the average of the pool) will be big too. For instance, let's assume you have

EngA vs EngB (10000 games)
EngA vs EngC (10 games)
EngB vs EngC (10 games)

The elo error for EngA and EngB will be big, because the relative elos to C are big. However, the relative strength between A and B are well defined. So, if you use as reference EngA with switches -a0 -AEngA then you will see that the error for EngB will be small and the error for EngC will be big (and or course, the error for EngA will be zero since it is the reference). You can play around with a file made up like the above to get a feeling what Ordo is doing.

Another way to look at errors is to use the -efile.csv switch, and then you will get a csv file that could be open with any spreadsheet. It will have a matrix with all the errors between all the players. That is the most straightforward way to interpret the results. For BayesELO, the most straightforward way to interpret the uncertainty is the LOS table. The errors could vary very much according to the different options chosen and can confuse the user. I bet that the csv matrix file in Ordo will give results that will match the LOS table well.

I hope this helps (please let me know), otherwise, you can strip the moves from the pgn with ordoprep, zip it, and send it to my by email so I can take a look to make sure whay I say it applies to your case. Of course I am assuming there is no bug, but it is always possible.

Miguel
EDIT: now I see your follow up. I think what I said applies to your case.

Michel · Post by **Michel** » Wed Jun 19, 2013 9:15 pm

Ok, thanks for the clarification.

(BTW: it should not be too hard to compute the covariance matrix without doing simulations I think.)

Michel · Post by **Michel** » Sat Jun 22, 2013 9:23 am

When Ordo does "simulations" does it take draws into account?

michiguel · Post by **michiguel** » Sat Jun 22, 2013 5:59 pm

Michel wrote:When Ordo does "simulations" does it take draws into account?

Yes, it does. But this is an area that could be a little bit more elaborated. The key is to get what is the fraction of draws expected for a given delta rating. The way Ordo does it is with a formula that fits relatively well the results observed in most rating lists. For pragmatic reasons, I believe this is ok, but it can certainly be improved.

Miguel
PS: This is the equation that calculates the probabilities

Code: Select all

static void

// performance expected for ratings a and b
static double
xpect (double a, double b)
{
	return 1.0 / (1.0 + exp((b-a)*BETA));
}

get_pWDL(double dr /*delta rating*/, double *pw, double *pd, double *pl)
{
	double f, dc, pdra, pwin, plos;
	bool_t switched;
	
	switched = dr < 0;

	if (switched) dr = -dr;
		
	f = xpect (dr,0);
	dc = 0.5 / (0.5 + 1.23 * exp(dr/175.0)); // <=== HERE
	pwin = f * (1 - dc);
	plos = 1 - f;
	pdra = 1 - pwin - plos;

	if (switched) {
		*pw = plos;
		*pd = pdra;
		*pl = pwin;
	} else {
		*pw = pwin;
		*pd = pdra;
		*pl = plos;
	}
	return;
}

michiguel · Post by **michiguel** » Sat Jun 22, 2013 6:11 pm

Michel wrote:Ok, thanks for the clarification.

(BTW: it should not be too hard to compute the covariance matrix without doing simulations I think.)

I think so, but I thought I could see how this works and provide something different. The methodology is really straightforward. Besides, I was really curious. One of the reasons I was curious to do it is because I figured that for tough cases the simulations could provide a different (and may be more reasonable?) answer. These covariance matrices also assume things.

Miguel

Michel · Post by **Michel** » Sun Jun 23, 2013 10:05 am

Code: Select all

   f = xpect (dr,0);
   dc = 0.5 / (0.5 + 1.23 * exp(dr/175.0)); // <=== HERE
   pwin = f * (1 - dc);
   plos = 1 - f;
   pdra = 1 - pwin - plos;

I am not sure I understand this correctly. By replacing some wins by
draws you seem to change the expected score in such a way that it
no longer fits the logistic model.

For example for dr=0 I seem to be getting 0.43 for the expected score
instead of 0.5.

Unless beta is not what I think it is.... (I would guess it is -log(10)/400).

michiguel · Post by **michiguel** » Sun Jun 23, 2013 9:30 pm

Michel wrote:
Code: Select all
   f = xpect (dr,0);
   dc = 0.5 / (0.5 + 1.23 * exp(dr/175.0)); // <=== HERE
   pwin = f * (1 - dc);
   plos = 1 - f;
   pdra = 1 - pwin - plos;
I am not sure I understand this correctly. By replacing some wins by
draws you seem to change the expected score in such a way that it
no longer fits the logistic model.

For example for dr=0 I seem to be getting 0.43 for the expected score
instead of 0.5.

Unless beta is not what I think it is.... (I would guess it is -log(10)/400).

Thanks for taking a look at this. I should be less lazy and open source it.

beta = 1/175 and it looks you are right. A quick looks tell me that what it makes sense if I have

f = xpect (0,dr);
dc = 0.5 / (0.5 + 1.23 * exp(dr/175.0));
pwin = f * (1 - dc);
pdra = 2 * f * dc; // which is 2 * (f-pwin)
plos = 1 - pwin - plos;

In that way, the probability of draws will increase relative to the probability to wins when you are weaker, and of course the performance expected from draws and wins matches the performance predicted by the formula.

I need to take a closer look at that to see if it matches the data I had from the databases. If this mistake is confirmed, the ratings are fine but the errors are not accurate. The good news is that (almost?) nobody used the simulations so far.

Miguel

michiguel · Post by **michiguel** » Mon Jun 24, 2013 9:58 am

michiguel wrote:
Michel wrote:
Code: Select all
   f = xpect (dr,0);
   dc = 0.5 / (0.5 + 1.23 * exp(dr/175.0)); // <=== HERE
   pwin = f * (1 - dc);
   plos = 1 - f;
   pdra = 1 - pwin - plos;
I am not sure I understand this correctly. By replacing some wins by
draws you seem to change the expected score in such a way that it
no longer fits the logistic model.

For example for dr=0 I seem to be getting 0.43 for the expected score
instead of 0.5.

Unless beta is not what I think it is.... (I would guess it is -log(10)/400).
Thanks for taking a look at this. I should be less lazy and open source it.

beta = 1/175 and it looks you are right. A quick looks tell me that what it makes sense if I have

f = xpect (0,dr);
dc = 0.5 / (0.5 + 1.23 * exp(dr/175.0));
pwin = f * (1 - dc);
pdra = 2 * f * dc; // which is 2 * (f-pwin)
plos = 1 - pwin - plos;

In that way, the probability of draws will increase relative to the probability to wins when you are weaker, and of course the performance expected from draws and wins matches the performance predicted by the formula.

I need to take a closer look at that to see if it matches the data I had from the databases. If this mistake is confirmed, the ratings are fine but the errors are not accurate. The good news is that (almost?) nobody used the simulations so far.

Miguel

I rechecked the data Adam sent me with statistics of draw rate versus rating difference and this is what I will use:

Code: Select all

#define DRAWRATE_AT_EQUAL_STRENGTH 0.33
#define DRAWFACTOR (1/(2*(DRAWRATE_AT_EQUAL_STRENGTH))-0.5)

static void
get_pWDL(double dr /*delta rating*/, double *pw, double *pd, double *pl)
{
	// Performance comprises wins and draws.
	// if f is expected performance from 0 to 1.0, then
	// f = pwin + pdraw/2
	// from that, dc is the fraction of points that come from draws, not wins, so
	// pdraw (probability of draw) = 2 * f * dc
	// calculation of dc is an empirical formula to fit average data from CCRL:
	// Draw rate of equal engines is near 0.33, and decays on uneven matches.

	double f, dc, pdra, pwin, plos;
	bool_t switched;
	
	switched = dr < 0;

	if (switched) dr = -dr;

	f = xpect (0,dr);
	dc = 0.5 / (0.5 + DRAWFACTOR * exp(dr*BETA));
	pdra = 2 * f * dc;
	pwin = f - pdra/2;
	plos = 1 - pwin - pdra; 

	if (switched) {
		*pw = plos;
		*pd = pdra;
		*pl = pwin;
	} else {
		*pw = pwin;
		*pd = pdra;
		*pl = plos;
	}
	return;
}

What I had was certainly not good. I could later use the DRAWRATE_AT_EQUAL_STRENGTH as a parameter or maybe estimated from the data.

Miguel

Question about Ordo

Question about Ordo

Re: Question about Ordo

Re: Question about Ordo

Re: Question about Ordo

Re: Question about Ordo

Re: Question about Ordo

Re: Question about Ordo

Re: Question about Ordo

Re: Question about Ordo

Re: Question about Ordo