Mathematically there seem to be two ways of pinning elo with the Ordo approach
(1) Assume that the pinned engines have played a large number of games
among each other with the expected score.
(2) Drop the elo's of the pinned engines from the equations, and likewise drop the equations that match the scores of the pinned engines. I that way you keep
the same number of variables and equations.
It is not clear to me if both methods are mathematically equivalent.
For BE there is no problem. BE maximizes a certain function (the Likelihood function). Pinning just changes that function.
Ordo v0.7
Moderators: hgm, Rebel, chrisw
-
- Posts: 1539
- Joined: Thu Mar 09, 2006 2:02 pm
Re: Ordo v0.7
Just my 2 cents: 0.x is purly academic. The only "persons" who are looking at ratings are humans and humans can't destinguish 10 Elo (my personal border) not to talk about 0.1. Even for engine development 1 Elo +/- is more than enough. Besides that, to calculate the ELo internaly you can use as many digits as you like and you would be off 0.9 Elo max. Showing 0.x is "comic" (at least to me) as us humans cant see/feel/taste that.michiguel wrote:
It is not comic if you play 80k games or more. At one point, you may have an error of 2 elo points or so. Generally, it is good to have two significant figures for the error, and the value should have the same number of decimals than the error. For those situations, one decial is the way to go.
With fewer number of games, probably integer numbers will suffice but if I have to choose an output, I do the one that satisfy most situations, and that is what Ordo has now. I could make this variable and add a switch for it, but it will be an overkill. If you really want to manipulate the output, Ordo gives you already the chance. Just select to output as .csv (comma separated valued). That format is compatible with Excel or any other spreadsheet. Just double click it and format it any way you like it.
Miguel
PS: It was already discussed a lot some of the issues that make BE alter their scale in terms of what elo number equals what. That is the origin of discrepancy (I believe). I think Kai Laskos (and Michel too) is the one that followed this more closely. For Ordo, if you want to see what x ratings points equal what probability to win you can use the switch -T and you will get a table of probabilities (you can alter this if you want).
But, it doesn't hurt. If you want to do it ...
Yes your percentage/winning probability is (as expected) 7 Elo/1% for the first 10 Elo difference. Usually I use that 7 Elo thumb rule for close engines while I have a testrun.
Thx for the tool again, I usually have a look with it at my games!
Bye
Ingo
-
- Posts: 1971
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Ordo v0.7.
Hello:
By the way, I want to ask about this tool for compare two files. Where can it be downloaded? What is its name? I find it very useful. Thanks in advance.
Sorry for going off-topic.
Regards from Spain.
Ajedrecista.
I did not expected changes in Ordo ratings between versions 0.6 and 0.7.Vinvin wrote:I ran Ordo 0.7 on my latest list ( http://www.talkchess.com/forum/viewtopic.php?t=48738 ) :
Comparison to 0.6 : 8 points difference at the top and 33 at the bottom :
By the way, I want to ask about this tool for compare two files. Where can it be downloaded? What is its name? I find it very useful. Thanks in advance.
Sorry for going off-topic.
Regards from Spain.
Ajedrecista.
-
- Posts: 5228
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: Ordo v0.7.
It's an option in "Total Commander" ( http://www.ghisler.com/ ).Ajedrecista wrote:Hello:
I did not expected changes in Ordo ratings between versions 0.6 and 0.7.Vinvin wrote:I ran Ordo 0.7 on my latest list ( http://www.talkchess.com/forum/viewtopic.php?t=48738 ) :
Comparison to 0.6 : 8 points difference at the top and 33 at the bottom :
....
By the way, I want to ask about this tool for compare two files. Where can it be downloaded? What is its name? I find it very useful. Thanks in advance.
Sorry for going off-topic.
Regards from Spain.
Ajedrecista.
Select 2 files and then go to the menu "Files" -> "Compare by content".
-
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Ordo v0.7
To pin the ELO rating all you have to do (and it's mathematically sound) is to add or subtract a constant from all engines. For example if Shredder comes out 2750 and you want it to be the reference program at 2800 you would add 50 ELO to all programs.
Michel wrote:Mathematically there seem to be two ways of pinning elo with the Ordo approach
(1) Assume that the pinned engines have played a large number of games
among each other with the expected score.
(2) Drop the elo's of the pinned engines from the equations, and likewise drop the equations that match the scores of the pinned engines. I that way you keep
the same number of variables and equations.
It is not clear to me if both methods are mathematically equivalent.
For BE there is no problem. BE maximizes a certain function (the Likelihood function). Pinning just changes that function.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Ordo v0.7
BayesELO can do that, using the "offset" command. But you can only pin one engine, obviously. So I don't really understand Michel's question. Hpow could you pin several engines, without inducing some strange distortion in the model.Don wrote:To pin the ELO rating all you have to do (and it's mathematically sound) is to add or subtract a constant from all engines. For example if Shredder comes out 2750 and you want it to be the reference program at 2800 you would add 50 ELO to all programs.
Michel wrote:Mathematically there seem to be two ways of pinning elo with the Ordo approach
(1) Assume that the pinned engines have played a large number of games
among each other with the expected score.
(2) Drop the elo's of the pinned engines from the equations, and likewise drop the equations that match the scores of the pinned engines. I that way you keep
the same number of variables and equations.
It is not clear to me if both methods are mathematically equivalent.
For BE there is no problem. BE maximizes a certain function (the Likelihood function). Pinning just changes that function.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Ordo v0.7
Got it.Michel wrote:Typically you know the elo of the foreign engines you use for testing very accurately. After all they may have played millions of games. I would simply like to prefeed that elo information to Ordo or BayesElo when running a new test.I think I guess what you mean but I am not sure. Could you give an example to illustrate what you need?
Currently I do this by having one large pgn that contains all tests I ever ran with the same set of foreign engines. But this is becoming very unwieldy.
So to give the requested example. Assume that X,Y,Z are foreign engines and a,b,c,d,e,f,g,... are test versions.
The information I have is a pgn with
X,Y,Z,a,b,c,d,e,f,g (*)
I run a test "h versus X,Y,Z"
To get accurate elo information I run say Ordo on
X,Y,Z,a,b,c,d,e,f,g,h
and consult the result.
What I would like to do is to prefeed Ordo the elo of X,Y,Z (known from (*)) and then run Ordo on
X,Y,Z,h
This won't be entirely the same of course since the elo of X,Y,Z is not completely known (there are some small error bars remaining) but it would be good enough to compare different test version which typically have much larger error bars.
This is an old file I have in my computer (plus I added four "fake" games = engine x drew twice to spark, and twice to crafty). The rating is
Code: Select all
# ENGINE : RATING POINTS PLAYED (%)
1 spark : 2355.4 9320.0 16002 58.2%
2 toga-1.4 : 2321.1 8550.0 16000 53.4%
3 Gaviota_ke152 : 2319.4 21115.0 40000 52.8%
4 texel-1.01 : 2318.6 8493.0 16000 53.1%
5 glaurung-2.2 : 2317.7 8473.0 16000 53.0%
6 x : 2315.0 2.0 4 50.0%
7 Gaviota_853-rt2 : 2305.0 20298.5 40000 50.7%
8 fruit-051103 : 2293.4 7919.0 16000 49.5%
9 daydreamer-1.5x : 2291.8 7883.5 16000 49.3%
10 Gaviota_853 : 2288.2 19346.5 40000 48.4%
11 komodo-1.3-4s : 2283.6 7696.5 16000 48.1%
12 scorpio-2.7 : 2279.5 7603.5 16000 47.5%
13 Gaviota_851 : 2275.1 18607.5 40000 46.5%
14 crafty-23.4 : 2274.6 7493.5 16002 46.8%
15 critter-1.0-32-3s : 2261.8 7202.5 16000 45.0%
Code: Select all
# ENGINE : RATING POINTS PLAYED (%)
1 spark : 2350.0 9320.0 16002 58.2%
2 toga-1.4 : 2306.0 8550.0 16000 53.4%
3 Gaviota_ke152 : 2304.3 21115.0 40000 52.8%
4 texel-1.01 : 2303.5 8493.0 16000 53.1%
5 glaurung-2.2 : 2302.6 8473.0 16000 53.0%
6 x : 2300.0 2.0 4 50.0%
7 Gaviota_853-rt2 : 2289.9 20298.5 40000 50.7%
8 fruit-051103 : 2278.2 7919.0 16000 49.5%
9 daydreamer-1.5x : 2276.7 7883.5 16000 49.3%
10 Gaviota_853 : 2273.1 19346.5 40000 48.4%
11 komodo-1.3-4s : 2268.5 7696.5 16000 48.1%
12 scorpio-2.7 : 2264.4 7603.5 16000 47.5%
13 Gaviota_851 : 2260.0 18607.5 40000 46.5%
14 crafty-23.4 : 2250.0 7493.5 16002 46.8%
15 critter-1.0-32-3s : 2246.7 7202.5 16000 45.0%
Code: Select all
# ENGINE : RATING ERROR POINTS PLAYED (%)
1 spark : 2350.0 ---- 9320.0 16002 58.2%
2 toga-1.4 : 2306.0 5.2 8550.0 16000 53.4%
3 Gaviota_ke152 : 2304.3 3.8 21115.0 40000 52.8%
4 texel-1.01 : 2303.5 5.5 8493.0 16000 53.1%
5 glaurung-2.2 : 2302.6 5.0 8473.0 16000 53.0%
6 x : 2300.0 348.0 2.0 4 50.0%
7 Gaviota_853-rt2 : 2289.9 3.7 20298.5 40000 50.7%
8 fruit-051103 : 2278.2 5.8 7919.0 16000 49.5%
9 daydreamer-1.5x : 2276.7 5.0 7883.5 16000 49.3%
10 Gaviota_853 : 2273.1 3.9 19346.5 40000 48.4%
11 komodo-1.3-4s : 2268.5 5.6 7696.5 16000 48.1%
12 scorpio-2.7 : 2264.4 5.0 7603.5 16000 47.5%
13 Gaviota_851 : 2260.0 3.7 18607.5 40000 46.5%
14 crafty-23.4 : 2250.0 ---- 7493.5 16002 46.8%
15 critter-1.0-32-3s : 2246.7 5.8 7202.5 16000 45.0%
This hack was done inserting at the beginning:
Code: Select all
{
int prefed = 0; // number of pins
int j;
for (j = 0; j < N_players; j++) {
Prefed[j] = FALSE;
if (!strcmp(Name[j], "spark") ) {
Prefed[j] = TRUE;
Ratingof[j] = 2350;
}
if (!strcmp(Name[j], "crafty-23.4") ) {
Prefed[j] = TRUE;
Ratingof[j] = 2250;
}
if (Prefed[j]) prefed++;
}
}
Code: Select all
for (j = 0; j < N_players; j++) {
if (Prefed[j]) continue;
.... // main calculation here where are the ratings are adjusted step by step
}
Miguel
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: Ordo v0.7
I think you should first read the example I have presented in my post.But you can only pin one engine, obviously. So I don't really understand Michel's question. Hpow could you pin several engines, without inducing some strange distortion in the model.
(in short: you know the elo of foreign engines accurately and you don't want to throw that information away for every new test)
For BayesElo there is no theoretical problem. Just fill in the pinned elo's as contants in the likelihood function and maximize for the unpinned elo's. The way MLE works implies that the matches between pinned engines (if there
are such) will simply be ignored.
For Ordo it was initially bit less clear to me what the correct theoretical solution is (I still need to read Miguel's post below). I was proposing to make the elo for the pinned engines constants and drop the corresponding equations for the scores. This is what you get if you think of the Ordo model as a "drawless" BayesElo model.
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: Ordo v0.7
First of all thanks for implementing this!
This would of course only make sense if you had a lot more games somewhere else which indicated that the elo difference was really 80.
If I understand correctly you are pinning these at an elo difference 100 which is quite far from their measured difference (80 elo) in this pgn.Then I re-run this "pinning" spark to 2350 and crafty to 2250.
This would of course only make sense if you had a lot more games somewhere else which indicated that the elo difference was really 80.
Yes I see! That corresponds to the implementation I have in mind. I think it is theoretically the correct solution if you think of Ordo as implementing a "drawless" MLE estimator.I can do this just by converting the elo from crafty and spark to a constant, not a parameter. In each iteration, when it comes the time to "adjust" crafty's or spark's rating, it just doesn't. That means it is treated as a constant.
Yes of course. As I said above. Pinning spark and crafty at 80 only makes sense if you have other information.So, spark and crafty will have 2350 and 2250, and the rest of the engines will adjust around these numbers. Engine x behaves as expected (it will be exactly at the average of both engines), but the rest is not that simple since they cannot compress so easily since they played each other so many games that the keep them at a certain distance between each other.
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Ordo v0.7
Yes, I chose something different to observe the effect, particularly on "engine x".Michel wrote:First of all thanks for implementing this!
If I understand correctly you are pinning these at an elo difference 100 which is quite far from their measured difference (80 elo) in this pgn.Then I re-run this "pinning" spark to 2350 and crafty to 2250.
Yes, it would save reloading and recalculating the whole thing.This would of course only make sense if you had a lot more games somewhere else which indicated that the elo difference was really 80.
Miguel
Yes I see! That corresponds to the implementation I have in mind. I think it is theoretically the correct solution if you think of Ordo as implementing a "drawless" MLE estimator.I can do this just by converting the elo from crafty and spark to a constant, not a parameter. In each iteration, when it comes the time to "adjust" crafty's or spark's rating, it just doesn't. That means it is treated as a constant.
Yes of course. As I said above. Pinning spark and crafty at 80 only makes sense if you have other information.So, spark and crafty will have 2350 and 2250, and the rest of the engines will adjust around these numbers. Engine x behaves as expected (it will be exactly at the average of both engines), but the rest is not that simple since they cannot compress so easily since they played each other so many games that the keep them at a certain distance between each other.