Still seems to assume transitivity in matches (assumed by Elo scale transitivity). Maybe I will try to disprove your proof with 4 entities, when I will get time to build a suitable pgn file for Ordo (I recommend Ordo to every tester).michiguel wrote:It can be demonstrated as long as certain assumptions are respected, like two draws equal one win + loss. But, I am not sure I can do it elegantly or understandably by quickly typing here.Laskos wrote:Miguel, I am a bit tired and can't reason clearly. Can you prove that in the case:michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.
1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%
Miguel
Direct matches in RR
A>B
B>C
C>A
and total points in RR are A>B>C, then Elo ratings are always also A>B>C?
For instance, let's try a reductio ad absurdum. Let's assume that the elo is EloB > EloA > EloC. In that case, A face stronger schedule than B. Both faced C, but the head to head match was tougher for A (because EloB > EloA). Consequently, A face a tougher schedule and got more points, which means it should have a higher elo. But, that contradicts the initial assumption EloB > EloA > EloC, disproving it. If you keep doing this analysis, you will see that the only reasonable scenario is EloA > EloB > EloC.
Miguel
H4 or S5 !?
Moderator: Ras
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: H4 or S5 !?
-
- Posts: 5694
- Joined: Tue Feb 28, 2012 11:56 pm
Re: H4 or S5 !?
To be able to prove anything we must first define what we are talking about.
What is "Elo rating"?
Option 1: A number calculated using some formula (e.g. Elostat, BayesElo, Ordo) based on a bunch of game results.
In this case the answer to the question follows from the formula. Use a different formula and the answer may be different.
I would say that a "good" formula should have the property that, if the set of input games is full RR, then Elo(A) > Elo(B) iff A > B according to the RR (in points).
Clearly BayesElo does not have this property.
Option 2: Elo is a measure of a player's "true chess strength". All numbers that we calculate are estimations of this measure.
In this case it is clearly always possible that Elo(A) > Elo(B) even though in a particular RR B scores more points than A.
We're probably not talking about option 2, but about option 1. So it all depends on the formula used to calculate the Elo rating from game results.
What is "Elo rating"?
Option 1: A number calculated using some formula (e.g. Elostat, BayesElo, Ordo) based on a bunch of game results.
In this case the answer to the question follows from the formula. Use a different formula and the answer may be different.
I would say that a "good" formula should have the property that, if the set of input games is full RR, then Elo(A) > Elo(B) iff A > B according to the RR (in points).
Clearly BayesElo does not have this property.
Option 2: Elo is a measure of a player's "true chess strength". All numbers that we calculate are estimations of this measure.
In this case it is clearly always possible that Elo(A) > Elo(B) even though in a particular RR B scores more points than A.
We're probably not talking about option 2, but about option 1. So it all depends on the formula used to calculate the Elo rating from game results.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: H4 or S5 !?
Yes Ron, on the same lines, we have to define what we want, but intuitively a 99:1 results should have less weight than 52:48, as the error margins on logistic curve in the first case are larger, and statistical weight is 1/(error margins)^2. So, not all points are equal, if we want the lesser error margins, and that's the goal, I think, of all rating calculations (besides obeying the logistic, which BayesElo does not).syzygy wrote:To be able to prove anything we must first define what we are talking about.
What is "Elo rating"?
Option 1: A number calculated using some formula (e.g. Elostat, BayesElo, Ordo) based on a bunch of game results.
In this case the answer to the question follows from the formula. Use a different formula and the answer may be different.
I would say that a "good" formula should have the property that, if the set of input games is full RR, then Elo(A) > Elo(B) iff A > B according to the RR (in points).
Clearly BayesElo does not have this property.
Option 2: Elo is a measure of a player's "true chess strength". All numbers that we calculate are estimations of this measure.
In this case it is clearly always possible that Elo(A) > Elo(B) even though in a particular RR B scores more points than A.
We're probably not talking about option 2, but about option 1. So it all depends on the formula used to calculate the Elo rating from game results.
-
- Posts: 3703
- Joined: Thu Jun 07, 2012 11:02 pm
Re: H4 or S5 !?
Take one of Graham's amateur tourneys where he provides the pgn. Run it through bayeslelo or your other tool of choice, and see if the ratings follow the rankings in the tournament. I'm certain I've done this in the past, and the Elo ratings do not always follow the tournament rankings.
-
- Posts: 6227
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: H4 or S5 !?
It's quite clear that BayesElo does not have the property of ordering the ratings the same as results in a RR, while both EloStat (which has worse problems) and Ordo will always do this. Anyone who wants ratings to be in the same order as RR scores should switch to Ordo. It also has the nice property that rating differences in a simple match will always be the same as the elo system dictates. With BayesElo, you always have to specify the parameters when talking about a rating difference between two engines.Modern Times wrote:Take one of Graham's amateur tourneys where he provides the pgn. Run it through bayeslelo or your other tool of choice, and see if the ratings follow the rankings in the tournament. I'm certain I've done this in the past, and the Elo ratings do not always follow the tournament rankings.
-
- Posts: 3703
- Joined: Thu Jun 07, 2012 11:02 pm
Re: H4 or S5 !?
No, I would have been using EloStat back then.lkaufman wrote:It's quite clear that BayesElo does not have the property of ordering the ratings the same as results in a RR, while both EloStat (which has worse problems) and Ordo will always do this. Anyone who wants ratings to be in the same order as RR scores should switch to Ordo. It also has the nice property that rating differences in a simple match will always be the same as the elo system dictates. With BayesElo, you always have to specify the parameters when talking about a rating difference between two engines.Modern Times wrote:Take one of Graham's amateur tourneys where he provides the pgn. Run it through bayeslelo or your other tool of choice, and see if the ratings follow the rankings in the tournament. I'm certain I've done this in the past, and the Elo ratings do not always follow the tournament rankings.
And no, I don't want ratings to be in the same order as RR scores because I think that is a false premise. True in some cases, not in others.
-
- Posts: 2284
- Joined: Sat Jun 02, 2012 2:13 am
Re: H4 or S5 !?
Thanks for the re-run, IngoIWB wrote:Test is running. The original setup had this:IWB wrote: ..., as I am curious I will run the S5 match again with 4pc SYSYSY bases....
So 74.58% have to be beaten!Code: Select all
Stockfish 5 3080.0 (2297.0 : 783.0) 220.0 (127.5 : 92.5) Houdini 4 3111 220.0 (121.5 : 98.5) Komodo 7a 3088 220.0 (134.0 : 86.0) Gull 3 3057 220.0 (150.5 : 69.5) Critter 1.4a 2980 220.0 (149.0 : 71.0) Equinox 2.02 2975 220.0 (159.5 : 60.5) Deep Rybka 4.1 2959 220.0 (176.0 : 44.0) Deep Fritz 14 2894 220.0 (170.5 : 49.5) Chiron 2 2889 220.0 (181.0 : 39.0) Protector 1.6.0 2870 220.0 (168.0 : 52.0) Hannibal 1.4b 2870 220.0 (183.0 : 37.0) Naum 4.2 2838 220.0 (187.0 : 33.0) Texel 1.04 2838 220.0 (187.0 : 33.0) Senpai 1.0 2838 220.0 (188.0 : 32.0) HIARCS 14 WCSC 32b 2812 220.0 (190.5 : 29.5) Jonny 6.00 2798
Bye
Ingo

CL
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: H4 or S5 !?
With a concocted PGN file using Ordo, I gotLaskos wrote:Still seems to assume transitivity in matches (assumed by Elo scale transitivity). Maybe I will try to disprove your proof with 4 entities, when I will get time to build a suitable pgn file for Ordo (I recommend Ordo to every tester).michiguel wrote:It can be demonstrated as long as certain assumptions are respected, like two draws equal one win + loss. But, I am not sure I can do it elegantly or understandably by quickly typing here.Laskos wrote:Miguel, I am a bit tired and can't reason clearly. Can you prove that in the case:michiguel wrote:
There is only one correct answer, and that is SF5 should be #1 (by a very tiny small margin, though). Why? this is a round robin, so everybody played each other in the same conditions etc. etc. so, the programs who score mores points overall should be #1. This is one of the cases in which there is no doubt about the relative order. As a reference, in the output of Ordo you can see the actual points (the others give %). Whatever program you use, the relative order should follow the number of points. Basically, SF won this gigantic RR tournament, and should be #1.
1 Stockfish 5 : 3115.1 2473.0 3300 74.9%
2 Houdini 4 : 3111.0 2458.5 3300 74.5%
Miguel
Direct matches in RR
A>B
B>C
C>A
and total points in RR are A>B>C, then Elo ratings are always also A>B>C?
For instance, let's try a reductio ad absurdum. Let's assume that the elo is EloB > EloA > EloC. In that case, A face stronger schedule than B. Both faced C, but the head to head match was tougher for A (because EloB > EloA). Consequently, A face a tougher schedule and got more points, which means it should have a higher elo. But, that contradicts the initial assumption EloB > EloA > EloC, disproving it. If you keep doing this analysis, you will see that the only reasonable scenario is EloA > EloB > EloC.
Miguel
Code: Select all
# PLAYER RATING ERROR POINTS PLAYED (%)
1 4 : 2361.7 100.0 8.0 15 53.3%
2 3 : 2314.6 99.5 7.5 15 50.0%
3 2 : 2303.0 102.2 8.0 15 53.3%
4 1 : 2220.7 96.7 6.5 15 43.3%
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: H4 or S5 !?
The command line in Ordo v0.8 was:
The artificial PGN has the following properties
and is listed here:
Code: Select all
ordo -p order.pgn -o ratings.txt -s1000 -W
Code: Select all
Games : 30 (finished)
White Wins : 16 (53.3 %)
Black Wins : 13 (43.3 %)
Draws : 1 ( 3.3 %)
Unfinished : 0
White Perf. : 55.0 %
Black Perf. : 45.0 %
Code: Select all
[White "1"]
[Black "2"]
[Result "1-0"]
1-0
[White "1"]
[Black "2"]
[Result "1-0"]
1-0
[White "1"]
[Black "2"]
[Result "1-0"]
1-0
[White "1"]
[Black "2"]
[Result "0-1"]
0-1
[White "1"]
[Black "2"]
[Result "0-1"]
0-1
[White "1"]
[Black "3"]
[Result "1-0"]
1-0
[White "1"]
[Black "3"]
[Result "1-0"]
1-0
[White "1"]
[Black "3"]
[Result "1-0"]
1-0
[White "1"]
[Black "3"]
[Result "1/2-1/2"]
1/2-1/2
[White "1"]
[Black "3"]
[Result "0-1"]
0-1
[White "2"]
[Black "3"]
[Result "1-0"]
1-0
[White "2"]
[Black "3"]
[Result "1-0"]
1-0
[White "2"]
[Black "3"]
[Result "1-0"]
1-0
[White "2"]
[Black "3"]
[Result "0-1"]
0-1
[White "2"]
[Black "3"]
[Result "0-1"]
0-1
[White "2"]
[Black "4"]
[Result "1-0"]
1-0
[White "2"]
[Black "4"]
[Result "1-0"]
1-0
[White "2"]
[Black "4"]
[Result "1-0"]
1-0
[White "2"]
[Black "4"]
[Result "0-1"]
0-1
[White "2"]
[Black "4"]
[Result "0-1"]
0-1
[White "3"]
[Black "4"]
[Result "1-0"]
1-0
[White "3"]
[Black "4"]
[Result "1-0"]
1-0
[White "3"]
[Black "4"]
[Result "1-0"]
1-0
[White "3"]
[Black "4"]
[Result "1-0"]
0-1
[White "3"]
[Black "4"]
[Result "0-1"]
0-1
[White "1"]
[Black "4"]
[Result "0-1"]
0-1
[White "1"]
[Black "4"]
[Result "0-1"]
0-1
[White "1"]
[Black "4"]
[Result "0-1"]
0-1
[White "1"]
[Black "4"]
[Result "0-1"]
0-1
[White "1"]
[Black "4"]
[Result "0-1"]
0-1
-
- Posts: 5287
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: H4 or S5 !?
I think 4pc Syzygy will change nothing in the SF strength. So for me, it's a second test with SF at the exact same level, which it is interesting tooIWB wrote:Test is running. The original setup had this:IWB wrote: ..., as I am curious I will run the S5 match again with 4pc SYSYSY bases....
So 74.58% have to be beaten!Code: Select all
Stockfish 5 3080.0 (2297.0 : 783.0) 220.0 (127.5 : 92.5) Houdini 4 3111 220.0 (121.5 : 98.5) Komodo 7a 3088 220.0 (134.0 : 86.0) Gull 3 3057 220.0 (150.5 : 69.5) Critter 1.4a 2980 220.0 (149.0 : 71.0) Equinox 2.02 2975 220.0 (159.5 : 60.5) Deep Rybka 4.1 2959 220.0 (176.0 : 44.0) Deep Fritz 14 2894 220.0 (170.5 : 49.5) Chiron 2 2889 220.0 (181.0 : 39.0) Protector 1.6.0 2870 220.0 (168.0 : 52.0) Hannibal 1.4b 2870 220.0 (183.0 : 37.0) Naum 4.2 2838 220.0 (187.0 : 33.0) Texel 1.04 2838 220.0 (187.0 : 33.0) Senpai 1.0 2838 220.0 (188.0 : 32.0) HIARCS 14 WCSC 32b 2812 220.0 (190.5 : 29.5) Jonny 6.00 2798
Bye
Ingo

BTW, it should be interesting to find one (yes, only one ! ) game where the 4pc Syzygy change the final outcome.