Ordo v0.7

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Ordo v0.7

Post by michiguel »

https://sites.google.com/site/gaviotach ... e/releases

Nothing spectacularly new, but I had several small changes and clean-ups done a while ago. I anybody used the "group connections" switch, the output is better. I addition, I fixed a recent bug in the simulations (thanks Michel), so I figure I release this.

Miguel
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: Ordo v0.7

Post by geots »

michiguel wrote:https://sites.google.com/site/gaviotach ... e/releases

Nothing spectacularly new, but I had several small changes and clean-ups done a while ago. I anybody used the "group connections" switch, the output is better. I addition, I fixed a recent bug in the simulations (thanks Michel), so I figure I release this.

Miguel



I'm scratching my head. I have never heard of this engine before. Does it just play FRC? If not, what is its rating and how have I missed it in rating lists? Enlighten me a bit please.


Thanks,

george
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Ordo v0.7

Post by Adam Hair »

It is a program that computes Elo ratings, like Bayeselo and Elostat.
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: Ordo v0.7

Post by geots »

Adam Hair wrote:It is a program that computes Elo ratings, like Bayeselo and Elostat.





Thanks Adam. Problem solved.


Best,

george
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Ordo v0.7

Post by Michel »

One thing I am sorely missing in both Ordo and BayesElo is the ability to "pin" the elo of certain engines (for testing with foreign engines obviously).

With the Ordo approach it would mean that you assume that the pinned engines have played a very high number of games among themselves with the expected score.

In BayesElo it is just a question of modifying the LL computation. I guess I could do that myself but I have no time.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Ordo v0.7

Post by IWB »

Thx for th enew version, I usually check from time to time how the IPON would look like with something else:

ORDO:

Code: Select all

   # ENGINE                     : RATING    POINTS  PLAYED    (%)
   1 Houdini 3 STD              : 3091.0    2262.0    2850   79.4%
   2 Komodo CCT                 : 3055.5    2161.0    2850   75.8%
   3 Stockfish 3                : 2994.0    1965.0    2850   68.9%
   4 Critter 1.4a               : 2987.3    1942.0    2850   68.1%
   5 Deep Rybka 4.1             : 2964.8    1863.5    2850   65.4%
   6 Gull 2.1                   : 2953.2    1822.0    2850   63.9%
   7 Chiron 1.5                 : 2844.3    1408.5    2850   49.4%
   8 Protector 1.5.0            : 2838.7    1386.5    2850   48.6%
   9 Naum 4.2                   : 2834.7    1371.0    2850   48.1%
  10 Hannibal 1.3               : 2821.0    1317.5    2850   46.2%
  11 Deep Fritz 13 32b          : 2818.8    1309.0    2850   45.9%
  12 HIARCS 14 WCSC 32b         : 2812.9    1286.0    2850   45.1%
  13 Deep Shredder 12           : 2800.0    1236.0    2850   43.4%
  14 Deep Sjeng c't 2010 32b    : 2777.7    1150.5    2850   40.4%
  15 Spike 1.4 32b              : 2763.1    1095.5    2850   38.4%
  16 spark-1.0                  : 2755.7    1067.5    2850   37.5%
  17 Deep Junior 13.3           : 2743.9    1024.0    2850   35.9%
  18 Booot 5.2.0                : 2732.7     983.0    2850   34.5%
  19 Quazar 0.4                 : 2724.7     954.0    2850   33.5%
  20 Toga II 3.0 32b            : 2708.1     895.5    2850   31.4%
Bayeselo mm 0 1

Code: Select all

Rank Name                      Elo    +    - games score oppo. draws 
   1 Houdini 3 STD            3071   10   10  2850   79%  2835   27% 
   2 Komodo CCT               3036   10   10  2850   76%  2837   32% 
   3 Stockfish 3              2976    9    9  2850   69%  2840   42% 
   4 Critter 1.4a             2974   10    9  2850   68%  2840   40% 
   5 Deep Rybka 4.1           2952    9    9  2850   65%  2841   42% 
   6 Gull 2.1                 2944    9    9  2850   64%  2841   41% 
   7 Chiron 1.5               2844    9    9  2850   49%  2847   42% 
   8 Protector 1.5.0          2837    9    9  2850   49%  2847   45% 
   9 Naum 4.2                 2832    9    9  2850   48%  2847   41% 
  10 Hannibal 1.3             2820    9    9  2850   46%  2848   42% 
  11 Deep Fritz 13 32b        2818    9    9  2850   46%  2848   40% 
  12 HIARCS 14 WCSC 32b       2812    9    9  2850   45%  2848   40% 
  13 Deep Shredder 12         2800    9    9  2850   43%  2849   41% 
  14 Deep Sjeng c't 2010 32b  2778    9    9  2850   40%  2850   41% 
  15 Spike 1.4 32b            2764    9    9  2850   38%  2851   41% 
  16 spark-1.0                2756    9   10  2850   37%  2851   39% 
  17 Deep Junior 13.3         2743   10   10  2850   36%  2852   34% 
  18 Booot 5.2.0              2737    9    9  2850   34%  2852   38% 
  19 Quazar 0.4               2726   10   10  2850   33%  2853   36% 
  20 Toga II 3.0 32b          2711   10   10  2850   31%  2854   36%
That is a remarkable difference.

Thx again
Ingo

PS: Elo given with 0.x precision is a bit comic (why not 0.xx or 0.xxx), just round properly to one full Elo point.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Ordo v0.7

Post by michiguel »

IWB wrote:Thx for th enew version, I usually check from time to time how the IPON would look like with something else:

ORDO:

Code: Select all

   # ENGINE                     : RATING    POINTS  PLAYED    (%)
   1 Houdini 3 STD              : 3091.0    2262.0    2850   79.4%
   2 Komodo CCT                 : 3055.5    2161.0    2850   75.8%
   3 Stockfish 3                : 2994.0    1965.0    2850   68.9%
   4 Critter 1.4a               : 2987.3    1942.0    2850   68.1%
   5 Deep Rybka 4.1             : 2964.8    1863.5    2850   65.4%
   6 Gull 2.1                   : 2953.2    1822.0    2850   63.9%
   7 Chiron 1.5                 : 2844.3    1408.5    2850   49.4%
   8 Protector 1.5.0            : 2838.7    1386.5    2850   48.6%
   9 Naum 4.2                   : 2834.7    1371.0    2850   48.1%
  10 Hannibal 1.3               : 2821.0    1317.5    2850   46.2%
  11 Deep Fritz 13 32b          : 2818.8    1309.0    2850   45.9%
  12 HIARCS 14 WCSC 32b         : 2812.9    1286.0    2850   45.1%
  13 Deep Shredder 12           : 2800.0    1236.0    2850   43.4%
  14 Deep Sjeng c't 2010 32b    : 2777.7    1150.5    2850   40.4%
  15 Spike 1.4 32b              : 2763.1    1095.5    2850   38.4%
  16 spark-1.0                  : 2755.7    1067.5    2850   37.5%
  17 Deep Junior 13.3           : 2743.9    1024.0    2850   35.9%
  18 Booot 5.2.0                : 2732.7     983.0    2850   34.5%
  19 Quazar 0.4                 : 2724.7     954.0    2850   33.5%
  20 Toga II 3.0 32b            : 2708.1     895.5    2850   31.4%
Bayeselo mm 0 1

Code: Select all

Rank Name                      Elo    +    - games score oppo. draws 
   1 Houdini 3 STD            3071   10   10  2850   79%  2835   27% 
   2 Komodo CCT               3036   10   10  2850   76%  2837   32% 
   3 Stockfish 3              2976    9    9  2850   69%  2840   42% 
   4 Critter 1.4a             2974   10    9  2850   68%  2840   40% 
   5 Deep Rybka 4.1           2952    9    9  2850   65%  2841   42% 
   6 Gull 2.1                 2944    9    9  2850   64%  2841   41% 
   7 Chiron 1.5               2844    9    9  2850   49%  2847   42% 
   8 Protector 1.5.0          2837    9    9  2850   49%  2847   45% 
   9 Naum 4.2                 2832    9    9  2850   48%  2847   41% 
  10 Hannibal 1.3             2820    9    9  2850   46%  2848   42% 
  11 Deep Fritz 13 32b        2818    9    9  2850   46%  2848   40% 
  12 HIARCS 14 WCSC 32b       2812    9    9  2850   45%  2848   40% 
  13 Deep Shredder 12         2800    9    9  2850   43%  2849   41% 
  14 Deep Sjeng c't 2010 32b  2778    9    9  2850   40%  2850   41% 
  15 Spike 1.4 32b            2764    9    9  2850   38%  2851   41% 
  16 spark-1.0                2756    9   10  2850   37%  2851   39% 
  17 Deep Junior 13.3         2743   10   10  2850   36%  2852   34% 
  18 Booot 5.2.0              2737    9    9  2850   34%  2852   38% 
  19 Quazar 0.4               2726   10   10  2850   33%  2853   36% 
  20 Toga II 3.0 32b          2711   10   10  2850   31%  2854   36%
That is a remarkable difference.

Thx again
Ingo

PS: Elo given with 0.x precision is a bit comic (why not 0.xx or 0.xxx), just round properly to one full Elo point.
It is not comic if you play 80k games or more. At one point, you may have an error of 2 elo points or so. Generally, it is good to have two significant figures for the error, and the value should have the same number of decimals than the error. For those situations, one decial is the way to go.

With fewer number of games, probably integer numbers will suffice but if I have to choose an output, I do the one that satisfy most situations, and that is what Ordo has now. I could make this variable and add a switch for it, but it will be an overkill. If you really want to manipulate the output, Ordo gives you already the chance. Just select to output as .csv (comma separated valued). That format is compatible with Excel or any other spreadsheet. Just double click it and format it any way you like it.

Miguel
PS: It was already discussed a lot some of the issues that make BE alter their scale in terms of what elo number equals what. That is the origin of discrepancy (I believe). I think Kai Laskos (and Michel too) is the one that followed this more closely. For Ordo, if you want to see what x ratings points equal what probability to win you can use the switch -T and you will get a table of probabilities (you can alter this if you want).
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Ordo v0.7

Post by michiguel »

Michel wrote:One thing I am sorely missing in both Ordo and BayesElo is the ability to "pin" the elo of certain engines (for testing with foreign engines obviously).

With the Ordo approach it would mean that you assume that the pinned engines have played a very high number of games among themselves with the expected score.

In BayesElo it is just a question of modifying the LL computation. I guess I could do that myself but I have no time.
I think I guess what you mean but I am not sure. Could you give an example to illustrate what you need?

Miguel
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Ordo v0.7

Post by Vinvin »

I ran Ordo 0.7 on my latest list ( http://www.talkchess.com/forum/viewtopic.php?t=48738 ) :

Comparison to 0.6 : 8 points difference at the top and 33 at the bottom :

Image
Image
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Ordo v0.7

Post by Michel »

I think I guess what you mean but I am not sure. Could you give an example to illustrate what you need?
Typically you know the elo of the foreign engines you use for testing very accurately. After all they may have played millions of games. I would simply like to prefeed that elo information to Ordo or BayesElo when running a new test.

Currently I do this by having one large pgn that contains all tests I ever ran with the same set of foreign engines. But this is becoming very unwieldy.

So to give the requested example. Assume that X,Y,Z are foreign engines and a,b,c,d,e,f,g,... are test versions.

The information I have is a pgn with

X,Y,Z,a,b,c,d,e,f,g (*)

I run a test "h versus X,Y,Z"

To get accurate elo information I run say Ordo on

X,Y,Z,a,b,c,d,e,f,g,h

and consult the result.

What I would like to do is to prefeed Ordo the elo of X,Y,Z (known from (*)) and then run Ordo on

X,Y,Z,h

This won't be entirely the same of course since the elo of X,Y,Z is not completely known (there are some small error bars remaining) but it would be good enough to compare different test version which typically have much larger error bars.