Well it doesn't really matter as I'm interested in differences, but to make the difference obvious you could set Houdini 3 contempt 0 to 3000 for both lists. Then we only have to compare the last two digits of the rating of default Houdini 3 on the Ordo and Bayeselo list. I'll predict that the difference will fall by something like 30-40%, but I could be totally wrong.IWB wrote:What engine do you like to have fixed with what rating?lkaufman wrote:. A simple way to check this out would be to calculate a rating list including both Houdini Cont.0 and Houdini Cont. 1 on the same list, and run it with both Ordo and BayesElo.
BYe
INgo
Komodo run - Ingo list revisited
Moderators: hgm, Rebel, chrisw
-
- Posts: 5960
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Komodo run - Ingo list revisitied.
-
- Posts: 1539
- Joined: Thu Mar 09, 2006 2:02 pm
Re: Komodo run - Ingo list revisitied.
Here is the result with Bayes and ORDO, I hope this is what you wanted to have. If not please tell me.
Bayes mm 0 1:
Ordo:
For the mathematical impaired user this looks just slightly different. "I" usually consider everything within 10 Elo as as equal as 10 Elo for a human means nothing (and testing bejond it is a waste of energy) - we simply can't feel it!
Bye
Ingo
Bayes mm 0 1:
Code: Select all
1 Houdini 3 STD 3020 10 10 3000 78% 2791 27%
2 K113300 3013 10 9 3150 77% 2801 32%
3 Houdini 3 Con 0 3000 10 10 3000 77% 2791 35%
4 Stockfish 4 2966 9 9 3150 72% 2803 38%
5 Gull 2.2 2929 9 9 3150 67% 2805 40%
6 Critter 1.4a 2928 9 9 3150 66% 2805 41%
7 Deep Rybka 4.1 2901 9 9 3150 63% 2806 42%
8 Hannibal 1.4a 2819 9 9 3150 51% 2810 43%
9 Chiron 1.5 2799 9 9 3150 48% 2811 40%
10 Protector 1.5.0 2792 9 9 3150 47% 2812 44%
11 Naum 4.2 2788 9 9 3150 47% 2812 40%
12 HIARCS 14 WCSC 32b 2769 9 9 3150 44% 2813 40%
13 Deep Shredder 12 2755 9 9 3150 42% 2813 38%
14 Jonny 6.00 2752 9 9 3150 42% 2814 37%
15 Deep Sjeng c't 2010 32b 2735 9 9 3150 40% 2814 39%
16 Spike 1.4 32b 2730 9 9 3150 39% 2815 41%
17 spark-1.0 2718 9 9 3150 37% 2815 38%
18 Deep Junior 13.3 2697 9 9 3150 35% 2816 32%
19 Booot 5.2.0 2695 9 9 3150 34% 2816 37%
20 Quazar 0.4 2687 9 9 3150 33% 2817 35%
21 Zappa Mexico II 2676 9 9 3150 32% 2817 35%
22 Toga II 3.0 32b 2666 9 9 3150 30% 2818 35%
Code: Select all
1 Houdini 3 STD : 3012.2 2349.0 3000 78.3%
2 K113300 : 3010.4 2421.0 3150 76.9%
3 Houdini 3 Con 0 : 3000.0 2315.5 3000 77.2%
4 Stockfish 4 : 2961.1 2257.5 3150 71.7%
5 Gull 2.2 : 2917.1 2097.5 3150 66.6%
6 Critter 1.4a : 2916.2 2094.0 3150 66.5%
7 Deep Rybka 4.1 : 2888.4 1987.0 3150 63.1%
8 Hannibal 1.4a : 2795.9 1609.0 3150 51.1%
9 Chiron 1.5 : 2774.6 1519.5 3150 48.2%
10 Protector 1.5.0 : 2767.6 1490.0 3150 47.3%
11 Naum 4.2 : 2765.2 1480.0 3150 47.0%
12 HIARCS 14 WCSC 32b : 2744.2 1392.0 3150 44.2%
13 Deep Shredder 12 : 2730.8 1336.0 3150 42.4%
14 Jonny 6.00 : 2726.9 1320.0 3150 41.9%
15 Deep Sjeng c't 2010 32b : 2708.7 1245.0 3150 39.5%
16 Spike 1.4 32b : 2703.3 1223.0 3150 38.8%
17 spark-1.0 : 2691.6 1175.5 3150 37.3%
18 Deep Junior 13.3 : 2672.0 1097.5 3150 34.8%
19 Booot 5.2.0 : 2666.0 1074.0 3150 34.1%
20 Quazar 0.4 : 2660.9 1054.0 3150 33.5%
21 Zappa Mexico II : 2647.6 1003.0 3150 31.8%
22 Toga II 3.0 32b : 2636.1 960.0 3150 30.5%
Bye
Ingo
-
- Posts: 5960
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Komodo run - Ingo list revisitied.
So the result was a 39% drop in the rating difference between the two contempt values by using ORDO, within the 30 to 40% I predicted. Note that this 8 elo drop in the difference has nothing to do with error margins, it reflects a difference between the two rating systems on the same data. I also note that rating differences are in general larger on lthe ORDO run, so this means the 39% drop is even more significant.IWB wrote:Here is the result with Bayes and ORDO, I hope this is what you wanted to have. If not please tell me.
Bayes mm 0 1:
Ordo:Code: Select all
1 Houdini 3 STD 3020 10 10 3000 78% 2791 27% 2 K113300 3013 10 9 3150 77% 2801 32% 3 Houdini 3 Con 0 3000 10 10 3000 77% 2791 35% 4 Stockfish 4 2966 9 9 3150 72% 2803 38% 5 Gull 2.2 2929 9 9 3150 67% 2805 40% 6 Critter 1.4a 2928 9 9 3150 66% 2805 41% 7 Deep Rybka 4.1 2901 9 9 3150 63% 2806 42% 8 Hannibal 1.4a 2819 9 9 3150 51% 2810 43% 9 Chiron 1.5 2799 9 9 3150 48% 2811 40% 10 Protector 1.5.0 2792 9 9 3150 47% 2812 44% 11 Naum 4.2 2788 9 9 3150 47% 2812 40% 12 HIARCS 14 WCSC 32b 2769 9 9 3150 44% 2813 40% 13 Deep Shredder 12 2755 9 9 3150 42% 2813 38% 14 Jonny 6.00 2752 9 9 3150 42% 2814 37% 15 Deep Sjeng c't 2010 32b 2735 9 9 3150 40% 2814 39% 16 Spike 1.4 32b 2730 9 9 3150 39% 2815 41% 17 spark-1.0 2718 9 9 3150 37% 2815 38% 18 Deep Junior 13.3 2697 9 9 3150 35% 2816 32% 19 Booot 5.2.0 2695 9 9 3150 34% 2816 37% 20 Quazar 0.4 2687 9 9 3150 33% 2817 35% 21 Zappa Mexico II 2676 9 9 3150 32% 2817 35% 22 Toga II 3.0 32b 2666 9 9 3150 30% 2818 35%
For the mathematical impaired user this looks just slightly different. "I" usually consider everything within 10 Elo as as equal as 10 Elo for a human means nothing (and testing bejond it is a waste of energy) - we simply can't feel it!Code: Select all
1 Houdini 3 STD : 3012.2 2349.0 3000 78.3% 2 K113300 : 3010.4 2421.0 3150 76.9% 3 Houdini 3 Con 0 : 3000.0 2315.5 3000 77.2% 4 Stockfish 4 : 2961.1 2257.5 3150 71.7% 5 Gull 2.2 : 2917.1 2097.5 3150 66.6% 6 Critter 1.4a : 2916.2 2094.0 3150 66.5% 7 Deep Rybka 4.1 : 2888.4 1987.0 3150 63.1% 8 Hannibal 1.4a : 2795.9 1609.0 3150 51.1% 9 Chiron 1.5 : 2774.6 1519.5 3150 48.2% 10 Protector 1.5.0 : 2767.6 1490.0 3150 47.3% 11 Naum 4.2 : 2765.2 1480.0 3150 47.0% 12 HIARCS 14 WCSC 32b : 2744.2 1392.0 3150 44.2% 13 Deep Shredder 12 : 2730.8 1336.0 3150 42.4% 14 Jonny 6.00 : 2726.9 1320.0 3150 41.9% 15 Deep Sjeng c't 2010 32b : 2708.7 1245.0 3150 39.5% 16 Spike 1.4 32b : 2703.3 1223.0 3150 38.8% 17 spark-1.0 : 2691.6 1175.5 3150 37.3% 18 Deep Junior 13.3 : 2672.0 1097.5 3150 34.8% 19 Booot 5.2.0 : 2666.0 1074.0 3150 34.1% 20 Quazar 0.4 : 2660.9 1054.0 3150 33.5% 21 Zappa Mexico II : 2647.6 1003.0 3150 31.8% 22 Toga II 3.0 32b : 2636.1 960.0 3150 30.5%
Bye
Ingo
It appears to me that using ORDO would substantially reduce the effect of using contempt, and would produce ratings that are much closer to what they would be if all pairings were within a hundred elo points. But of course we would need more examples than just this one to prove the point.
So I'm asking the mathematicians reading this if I am correct in saying that BayesElo puts more weight on mismatches than ORDO does, and I'm also asking if any programmers might want to run simulations to test the hypothesis. Fjor example, if we start with a real data set (could be IPON or some other engine rating list), and add a mismatch with a surprising result (for example a 2800 engine scoring 40 out of 100 against a 3000 engine), I would expect the BayesElo rating to be more strongly affect by this than the ORDO rating (i.e. in the given example to rise by more).
-
- Posts: 3553
- Joined: Thu Jun 07, 2012 11:02 pm
Re: Komodo run - Ingo list revisitied.
It also depends on which parameters you use for BayesELO.
-
- Posts: 5960
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Komodo run - Ingo list revisitied.
I know that this affects the spread of the ratings and the effect of draws, but I don't think it affects the fundamental tendency of Bayeselo to give more weight to mismatches than does ORDO. But not having a math degree, I could be wrong about this.Modern Times wrote:It also depends on which parameters you use for BayesELO.
-
- Posts: 3553
- Joined: Thu Jun 07, 2012 11:02 pm
Re: Komodo run - Ingo list revisitied.
Neither do I... but I wonder whatlkaufman wrote: But not having a math degree, I could be wrong about this.
mm 1 1
scale 1
and just
mm 1 1
would produce.
-
- Posts: 5960
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Komodo run - Ingo list revisitied.
If scale means what it sounds like then the only effect of setting that value should be to expand or contract the range of ratings by some percentage. This would have nothing to do with the issue being discussed here.Modern Times wrote:Neither do I... but I wonder whatlkaufman wrote: But not having a math degree, I could be wrong about this.
mm 1 1
scale 1
and just
mm 1 1
would produce.
-
- Posts: 3553
- Joined: Thu Jun 07, 2012 11:02 pm
Re: Komodo run - Ingo list revisitied.
Maybe not, but I'd like to see it.