Hello Geert:
countrychess wrote:Code: Select all
CountryChess 02-07 Germany d2
Halle, 2012.11.08 - 2012.11.15
Score SB Oc Gh Qu So Pa Ol Br En Re iC Fe Am
------------------------------------------------------------------------------------
1: Octochess revision 4741 20.5 / 22 205.75 XX =1 11 11 11 =1 11 11 11 1= 11 11
2: Ghost 2.01 14.5 / 22 131.00 =0 XX 10 0= 01 00 11 11 =1 11 11 11
3: Quark v2.35 14.0 / 22 131.25 00 01 XX =0 10 11 =0 11 11 11 11 ==
4: Sorgenkind 0.4 12.5 / 22 119.00 00 1= =1 XX == 1= 00 0= 1= 01 11 11
5: PatzerUCI/FRC 12.0 / 22 116.75 00 10 01 == XX =1 11 0= 11 == 00 11
6: Olithink 5.32 JA 12.0 / 22 115.50 =0 11 00 0= =0 XX 10 11 1= 10 == 11
7: Bringer 1.9 10.0 / 22 95.50 00 00 =1 11 00 01 XX 00 01 =1 11 01
8: EnginMax 5.24 x64 9.5 / 22 86.50 00 00 00 1= 1= 00 11 XX 0= =1 1= 10
9: Resp 0.19 JA 9.0 / 22 74.00 00 =0 00 0= 00 0= 10 1= XX 1= 11 =1
10: iCE 0.3 v2750 8.0 / 22 74.50 0= 00 00 10 == 01 =0 =0 0= XX 01 11
11: Feuerstein 0.4.6.1 6.5 / 22 55.75 00 00 00 00 11 == 00 0= 00 10 XX 11
12: Amy 0.87b DC 3.5 / 22 38.00 00 00 == 00 00 00 10 01 =0 00 00 XX
------------------------------------------------------------------------------------
132 games: +55 =27 -50
OctoChess and Ghost promote to the first division; Feuerstein and Amy demote to the third division.
Games at the CountryChess blog.
Thanks again for your tournaments and sorry for this long post.
Just in few hours of my spare time, I have written a very clumsy rating programme that is valid only for Round Robin tournaments where each player/engine has played the same number of games as the others. This programme is very simple so it does not support scores neither 0% nor 100%. Furthermore, it does not take into account the direct results between each engine-engine match: it only works with the global percentage that each engine scored in the tournament.
The programme itself is very short: it has less than 250 lines written in Fortran 95; I do not have intention of release it because there is no warranty of obtaining reasonable ratings; furthermore, it is very limited: only for Round Robin tournaments. I wrote it just for fun.
I take as example the tournament of this thread, then I compute two different lists: one of them gets similar results to EloSTAT (computed by BayesElo) and the other list obtains similar results to Ordo. If I input in a notepad the points won by each engine:
Code: Select all
20.5
14.5
14
12.5
12
12
10
9.5
9
8
6.5
3.5
Then my programme calculates and prints all the ratings in few milliseconds:
Code: Select all
Elo_ratings_for_Round_Robin_tournaments, ® 2012.
Write down the full name of the Notepad (including .txt), up to 64 characters:
a.txt
Write down the number of engines of the Round Robin tournament (up to 64):
12
Write down the number of games of each engine (up to 400000):
22
Write down your desired mean of ratings:
0
Write down the clock rate of the CPU (in GHz), only for timing the elapsed time
of the calculations:
3
End of the calculations. Elo performances have been saved in Elo_rating_list.txt
file.
Approximated elapsed time: 65 ms.
Thanks for using Elo_ratings_for_Round_Robin_tournaments. Press Enter to exit.
Here are the results:
Code: Select all
Round Robin with 12 engines and 22 games per engine.
Total number of games: 132 games.
405.67 (engine 01).
96.81 (engine 02).
81.08 (engine 03).
36.04 (engine 04).
21.49 (engine 05).
21.49 (engine 06).
-36.09 (engine 07).
-50.64 (engine 08).
-65.37 (engine 09).
-95.68 (engine 10).
-144.54 (engine 11).
-270.25 (engine 12).
Mean of ratings: 0.00 Elo.
Code: Select all
Round Robin with 12 engines and 22 games per engine.
Total number of games: 132 games.
Scale: 1.0909
442.55 (engine 01).
105.61 (engine 02).
88.45 (engine 03).
39.32 (engine 04).
23.45 (engine 05).
23.45 (engine 06).
-39.38 (engine 07).
-55.25 (engine 08).
-71.32 (engine 09).
-104.38 (engine 10).
-157.68 (engine 11).
-294.82 (engine 12).
Mean of ratings: 0.00 Elo.
These ratings are the same as the others in the other list, but refactored with the scale.
I surely did not use the appropiate terms, such as 'scale' (by the way, I gave it the first value that came to my mind, so it is far from being a good value), so please do not pay much attention on that.
Just for comparison, I post the lists using BayesElo (with EloSTAT algorithm) and Ordo:
Code: Select all
version 0057.2, Copyright (C) 1997-2010 Remi Coulom.
compiled Apr 5 2012 17:26:01.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under the terms and conditions of the GNU General Public License.
See http://www.gnu.org/copyleft/gpl.html for details.
ResultSet>readpgn 172643.pgn
132 game(s) loaded, 0 game(s) with unknown result ignored.
ResultSet>elo
ResultSet-EloRating>elostat
9 iterations
00:00:00,00
ResultSet-EloRating>ratings
Rank Name Elo Diff + - Games Score Oppo. Draws Win W-L-D
1 Octochess revision 4741 409.05 0.00 145.73 138.75 22 93.18% -37.19 13.64% 86.36% 19-0-3
2 Ghost 2.01 97.62 -311.43 175.51 130.75 22 65.91% -8.87 13.64% 59.09% 13-6-3
3 Quark v2.35 81.75 -15.87 162.18 127.75 22 63.64% -7.43 18.18% 54.55% 12-6-4
4 Sorgenkind 0.4 36.34 -45.41 132.53 119.54 22 56.82% -3.30 31.82% 40.91% 9-6-7
5 PatzerUCI/FRC 21.67 -14.67 134.66 125.40 22 54.55% -1.97 27.27% 40.91% 9-7-6
6 Olithink 5.32 JA 21.67 -0.00 134.66 125.40 22 54.55% -1.97 27.27% 40.91% 9-7-6
7 Bringer 1.9 -36.39 -58.07 141.39 153.44 22 45.45% 3.31 9.09% 40.91% 9-11-2
8 EnginMax 5.24 x64 -51.06 -14.67 127.76 142.82 22 43.18% 4.64 22.73% 31.82% 7-10-5
9 Resp 0.19 JA -65.92 -14.86 122.22 141.15 22 40.91% 5.99 27.27% 27.27% 6-10-6
10 iCE 0.3 v2750 -96.48 -30.56 119.80 149.29 22 36.36% 8.77 27.27% 22.73% 5-11-6
11 Feuerstein 0.4.6.1 -145.75 -49.27 129.95 193.33 22 29.55% 13.25 13.64% 22.73% 5-14-3
12 Amy 0.87b DC -272.50 -126.75 134.56 310.76 22 15.91% 24.77 13.64% 9.09% 2-17-3
ResultSet-EloRating>x
ResultSet>
Code: Select all
[...]\ordo-windows-v0.6>ordo-win32 -a 0 -p 172643.pgn -o Ordo.txt
Loading data (2000 games x dot):
|
Total games 132
White wins 55
Draws 27
Black wins 50
No result 0
Unique head to head 0.00%
Reference rating 0.0 (average of the pool)
Convergence rating calculation
phase iteration deviation resolution
0 3 82.64464 123.84743
1 26 0.00000 0.00000
done
Post-Convergence rating estimation
# ENGINE : RATING POINTS PLAYED (%)
1 Octochess revision 4741 : 447.3 20.5 22 93.2%
2 Ghost 2.01 : 117.2 14.5 22 65.9%
3 Quark v2.35 : 98.3 14.0 22 63.6%
4 Sorgenkind 0.4 : 44.2 12.5 22 56.8%
5 PatzerUCI/FRC : 26.7 12.0 22 54.5%
6 Olithink 5.32 JA : 26.7 12.0 22 54.5%
7 Bringer 1.9 : -42.1 10.0 22 45.5%
8 EnginMax 5.24 x64 : -59.3 9.5 22 43.2%
9 Resp 0.19 JA : -76.7 9.0 22 40.9%
10 iCE 0.3 v2750 : -112.1 8.0 22 36.4%
11 Feuerstein 0.4.6.1 : -168.1 6.5 22 29.5%
12 Amy 0.87b DC : -302.0 3.5 22 15.9%
Doing some numbers:
Code: Select all
Max.(EloSTAT) - min.(EloSTAT) ~ 409.05 - (-272.5) = 681.55
Max.(mine) - min.(mine) ~ 405.67 - 270.25 = 675.92
681.55/675.92 ~ 1.0083; 675.92/681.55 ~ 0.9917
------------------------
Max.(Ordo) - min.(Ordo) ~ 447.3 - (-302) = 749.3
Max.(mine) - min.(mine) ~ 442.55 - 294.82 = 737.37
749.3/737.37 ~ 1.0162; 737.37/749.3 ~ 0.9841
As you see, I get similar results to both EloSTAT and Ordo, although it could be good luck; logically I would get even more similar results if the tournament was more balanced (in this case, the winner scored 20.5/22 while the last engine scored 3.5/22).
I am quite happy with the results I get knowing that the lines dedicated to calculate ratings are less than thirty! Indeed, my programme is very minimal but of course
I know that BayesElo, EloSTAT and Ordo (just written in alphabetic order)
are far superior than my clumsy programme that I wrote just for fun.
Regards from Spain.
Ajedrecista.