https://sites.google.com/site/gaviotach ... e/releases
Nothing spectacularly new, but I had several small changes and clean-ups done a while ago. I anybody used the "group connections" switch, the output is better. I addition, I fixed a recent bug in the simulations (thanks Michel), so I figure I release this.
Miguel
Ordo v0.7
Moderators: hgm, Rebel, chrisw
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
-
- Posts: 4790
- Joined: Sat Mar 11, 2006 12:42 am
Re: Ordo v0.7
michiguel wrote:https://sites.google.com/site/gaviotach ... e/releases
Nothing spectacularly new, but I had several small changes and clean-ups done a while ago. I anybody used the "group connections" switch, the output is better. I addition, I fixed a recent bug in the simulations (thanks Michel), so I figure I release this.
Miguel
I'm scratching my head. I have never heard of this engine before. Does it just play FRC? If not, what is its rating and how have I missed it in rating lists? Enlighten me a bit please.
Thanks,
george
-
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: Ordo v0.7
It is a program that computes Elo ratings, like Bayeselo and Elostat.
-
- Posts: 4790
- Joined: Sat Mar 11, 2006 12:42 am
Re: Ordo v0.7
Adam Hair wrote:It is a program that computes Elo ratings, like Bayeselo and Elostat.
Thanks Adam. Problem solved.
Best,
george
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: Ordo v0.7
One thing I am sorely missing in both Ordo and BayesElo is the ability to "pin" the elo of certain engines (for testing with foreign engines obviously).
With the Ordo approach it would mean that you assume that the pinned engines have played a very high number of games among themselves with the expected score.
In BayesElo it is just a question of modifying the LL computation. I guess I could do that myself but I have no time.
With the Ordo approach it would mean that you assume that the pinned engines have played a very high number of games among themselves with the expected score.
In BayesElo it is just a question of modifying the LL computation. I guess I could do that myself but I have no time.
-
- Posts: 1539
- Joined: Thu Mar 09, 2006 2:02 pm
Re: Ordo v0.7
Thx for th enew version, I usually check from time to time how the IPON would look like with something else:
ORDO:
Bayeselo mm 0 1
That is a remarkable difference.
Thx again
Ingo
PS: Elo given with 0.x precision is a bit comic (why not 0.xx or 0.xxx), just round properly to one full Elo point.
ORDO:
Code: Select all
# ENGINE : RATING POINTS PLAYED (%)
1 Houdini 3 STD : 3091.0 2262.0 2850 79.4%
2 Komodo CCT : 3055.5 2161.0 2850 75.8%
3 Stockfish 3 : 2994.0 1965.0 2850 68.9%
4 Critter 1.4a : 2987.3 1942.0 2850 68.1%
5 Deep Rybka 4.1 : 2964.8 1863.5 2850 65.4%
6 Gull 2.1 : 2953.2 1822.0 2850 63.9%
7 Chiron 1.5 : 2844.3 1408.5 2850 49.4%
8 Protector 1.5.0 : 2838.7 1386.5 2850 48.6%
9 Naum 4.2 : 2834.7 1371.0 2850 48.1%
10 Hannibal 1.3 : 2821.0 1317.5 2850 46.2%
11 Deep Fritz 13 32b : 2818.8 1309.0 2850 45.9%
12 HIARCS 14 WCSC 32b : 2812.9 1286.0 2850 45.1%
13 Deep Shredder 12 : 2800.0 1236.0 2850 43.4%
14 Deep Sjeng c't 2010 32b : 2777.7 1150.5 2850 40.4%
15 Spike 1.4 32b : 2763.1 1095.5 2850 38.4%
16 spark-1.0 : 2755.7 1067.5 2850 37.5%
17 Deep Junior 13.3 : 2743.9 1024.0 2850 35.9%
18 Booot 5.2.0 : 2732.7 983.0 2850 34.5%
19 Quazar 0.4 : 2724.7 954.0 2850 33.5%
20 Toga II 3.0 32b : 2708.1 895.5 2850 31.4%
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Houdini 3 STD 3071 10 10 2850 79% 2835 27%
2 Komodo CCT 3036 10 10 2850 76% 2837 32%
3 Stockfish 3 2976 9 9 2850 69% 2840 42%
4 Critter 1.4a 2974 10 9 2850 68% 2840 40%
5 Deep Rybka 4.1 2952 9 9 2850 65% 2841 42%
6 Gull 2.1 2944 9 9 2850 64% 2841 41%
7 Chiron 1.5 2844 9 9 2850 49% 2847 42%
8 Protector 1.5.0 2837 9 9 2850 49% 2847 45%
9 Naum 4.2 2832 9 9 2850 48% 2847 41%
10 Hannibal 1.3 2820 9 9 2850 46% 2848 42%
11 Deep Fritz 13 32b 2818 9 9 2850 46% 2848 40%
12 HIARCS 14 WCSC 32b 2812 9 9 2850 45% 2848 40%
13 Deep Shredder 12 2800 9 9 2850 43% 2849 41%
14 Deep Sjeng c't 2010 32b 2778 9 9 2850 40% 2850 41%
15 Spike 1.4 32b 2764 9 9 2850 38% 2851 41%
16 spark-1.0 2756 9 10 2850 37% 2851 39%
17 Deep Junior 13.3 2743 10 10 2850 36% 2852 34%
18 Booot 5.2.0 2737 9 9 2850 34% 2852 38%
19 Quazar 0.4 2726 10 10 2850 33% 2853 36%
20 Toga II 3.0 32b 2711 10 10 2850 31% 2854 36%
Thx again
Ingo
PS: Elo given with 0.x precision is a bit comic (why not 0.xx or 0.xxx), just round properly to one full Elo point.
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Ordo v0.7
It is not comic if you play 80k games or more. At one point, you may have an error of 2 elo points or so. Generally, it is good to have two significant figures for the error, and the value should have the same number of decimals than the error. For those situations, one decial is the way to go.IWB wrote:Thx for th enew version, I usually check from time to time how the IPON would look like with something else:
ORDO:
Bayeselo mm 0 1Code: Select all
# ENGINE : RATING POINTS PLAYED (%) 1 Houdini 3 STD : 3091.0 2262.0 2850 79.4% 2 Komodo CCT : 3055.5 2161.0 2850 75.8% 3 Stockfish 3 : 2994.0 1965.0 2850 68.9% 4 Critter 1.4a : 2987.3 1942.0 2850 68.1% 5 Deep Rybka 4.1 : 2964.8 1863.5 2850 65.4% 6 Gull 2.1 : 2953.2 1822.0 2850 63.9% 7 Chiron 1.5 : 2844.3 1408.5 2850 49.4% 8 Protector 1.5.0 : 2838.7 1386.5 2850 48.6% 9 Naum 4.2 : 2834.7 1371.0 2850 48.1% 10 Hannibal 1.3 : 2821.0 1317.5 2850 46.2% 11 Deep Fritz 13 32b : 2818.8 1309.0 2850 45.9% 12 HIARCS 14 WCSC 32b : 2812.9 1286.0 2850 45.1% 13 Deep Shredder 12 : 2800.0 1236.0 2850 43.4% 14 Deep Sjeng c't 2010 32b : 2777.7 1150.5 2850 40.4% 15 Spike 1.4 32b : 2763.1 1095.5 2850 38.4% 16 spark-1.0 : 2755.7 1067.5 2850 37.5% 17 Deep Junior 13.3 : 2743.9 1024.0 2850 35.9% 18 Booot 5.2.0 : 2732.7 983.0 2850 34.5% 19 Quazar 0.4 : 2724.7 954.0 2850 33.5% 20 Toga II 3.0 32b : 2708.1 895.5 2850 31.4%
That is a remarkable difference.Code: Select all
Rank Name Elo + - games score oppo. draws 1 Houdini 3 STD 3071 10 10 2850 79% 2835 27% 2 Komodo CCT 3036 10 10 2850 76% 2837 32% 3 Stockfish 3 2976 9 9 2850 69% 2840 42% 4 Critter 1.4a 2974 10 9 2850 68% 2840 40% 5 Deep Rybka 4.1 2952 9 9 2850 65% 2841 42% 6 Gull 2.1 2944 9 9 2850 64% 2841 41% 7 Chiron 1.5 2844 9 9 2850 49% 2847 42% 8 Protector 1.5.0 2837 9 9 2850 49% 2847 45% 9 Naum 4.2 2832 9 9 2850 48% 2847 41% 10 Hannibal 1.3 2820 9 9 2850 46% 2848 42% 11 Deep Fritz 13 32b 2818 9 9 2850 46% 2848 40% 12 HIARCS 14 WCSC 32b 2812 9 9 2850 45% 2848 40% 13 Deep Shredder 12 2800 9 9 2850 43% 2849 41% 14 Deep Sjeng c't 2010 32b 2778 9 9 2850 40% 2850 41% 15 Spike 1.4 32b 2764 9 9 2850 38% 2851 41% 16 spark-1.0 2756 9 10 2850 37% 2851 39% 17 Deep Junior 13.3 2743 10 10 2850 36% 2852 34% 18 Booot 5.2.0 2737 9 9 2850 34% 2852 38% 19 Quazar 0.4 2726 10 10 2850 33% 2853 36% 20 Toga II 3.0 32b 2711 10 10 2850 31% 2854 36%
Thx again
Ingo
PS: Elo given with 0.x precision is a bit comic (why not 0.xx or 0.xxx), just round properly to one full Elo point.
With fewer number of games, probably integer numbers will suffice but if I have to choose an output, I do the one that satisfy most situations, and that is what Ordo has now. I could make this variable and add a switch for it, but it will be an overkill. If you really want to manipulate the output, Ordo gives you already the chance. Just select to output as .csv (comma separated valued). That format is compatible with Excel or any other spreadsheet. Just double click it and format it any way you like it.
Miguel
PS: It was already discussed a lot some of the issues that make BE alter their scale in terms of what elo number equals what. That is the origin of discrepancy (I believe). I think Kai Laskos (and Michel too) is the one that followed this more closely. For Ordo, if you want to see what x ratings points equal what probability to win you can use the switch -T and you will get a table of probabilities (you can alter this if you want).
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Ordo v0.7
I think I guess what you mean but I am not sure. Could you give an example to illustrate what you need?Michel wrote:One thing I am sorely missing in both Ordo and BayesElo is the ability to "pin" the elo of certain engines (for testing with foreign engines obviously).
With the Ordo approach it would mean that you assume that the pinned engines have played a very high number of games among themselves with the expected score.
In BayesElo it is just a question of modifying the LL computation. I guess I could do that myself but I have no time.
Miguel
-
- Posts: 5228
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: Ordo v0.7
I ran Ordo 0.7 on my latest list ( http://www.talkchess.com/forum/viewtopic.php?t=48738 ) :
Comparison to 0.6 : 8 points difference at the top and 33 at the bottom :
Comparison to 0.6 : 8 points difference at the top and 33 at the bottom :
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: Ordo v0.7
Typically you know the elo of the foreign engines you use for testing very accurately. After all they may have played millions of games. I would simply like to prefeed that elo information to Ordo or BayesElo when running a new test.I think I guess what you mean but I am not sure. Could you give an example to illustrate what you need?
Currently I do this by having one large pgn that contains all tests I ever ran with the same set of foreign engines. But this is becoming very unwieldy.
So to give the requested example. Assume that X,Y,Z are foreign engines and a,b,c,d,e,f,g,... are test versions.
The information I have is a pgn with
X,Y,Z,a,b,c,d,e,f,g (*)
I run a test "h versus X,Y,Z"
To get accurate elo information I run say Ordo on
X,Y,Z,a,b,c,d,e,f,g,h
and consult the result.
What I would like to do is to prefeed Ordo the elo of X,Y,Z (known from (*)) and then run Ordo on
X,Y,Z,h
This won't be entirely the same of course since the elo of X,Y,Z is not completely known (there are some small error bars remaining) but it would be good enough to compare different test version which typically have much larger error bars.