where you know this without to have the played games ?
i want to check out it by my self too, but its not possible to download any games from the site.
The IPON BayesElo mystery solved.
Moderators: hgm, Rebel, chrisw, Ras, hgm, chrisw, Rebel, Ras
-
- Posts: 987
- Joined: Mon Jan 05, 2009 7:40 pm
- Location: Germany
- Full name: Engin Üstün
-
- Posts: 1539
- Joined: Thu Mar 09, 2006 2:02 pm
Re: The IPON BayesElo mystery solved.
Hi Engin,
Nonetheless, for all statistical purposes you find a result.pgn in the individual.7z file. What is done here is done with exactly that file as every statistical information you need is in there.
Bye and have a nice weekend
Ingo
We met in Thuringia so you know me and when I included Tornado you asked about the games. I explained in a personal mail why I do not delivere them ...Engin wrote:where you know this without to have the played games ?
i want to check out it by my self too, but its not possible to download any games from the site.
Nonetheless, for all statistical purposes you find a result.pgn in the individual.7z file. What is done here is done with exactly that file as every statistical information you need is in there.
Bye and have a nice weekend
Ingo
-
- Posts: 60
- Joined: Thu Nov 05, 2009 9:53 pm
Re: The IPON BayesElo mystery solved.
I was curious, what would my idea of weighted average say about this data. After some thinking, I decided that in this case, a simple product of points would be a good weight. So Houdini would weight nearly four times more than Crafty. After a quick computation in spreadsheet, I got weighted average 2990.233 for the 'default' table and for 'drawelo' it was 2989.976. This averages are roughly in between of BayesElo values and simple averages.Ingo Bauer wrote:Code: Select all
Default: 4 Komodo 4 SSE42 2975 2500.0 (1892.5 : 607.5) Perf.: 100.0 ( 51.5 : 48.5) Houdini 2.0 STD 3016 3026 100.0 ( 45.0 : 55.0) Critter 1.4 SSE42 2977 2942 100.0 ( 51.5 : 48.5) Deep Rybka 4.1 SSE42 2956 2966 100.0 ( 53.5 : 46.5) Critter 1.2 2952 2976 100.0 ( 52.5 : 47.5) Stockfish 2.1.1 JA 2941 2958 100.0 ( 65.5 : 34.5) Chiron 1.1a 2833 2944 100.0 ( 69.5 : 30.5) Naum 4.2 2827 2970 100.0 ( 70.0 : 30.0) Fritz 13 32b 2819 2966 100.0 ( 68.0 : 32.0) Deep Shredder 12 2800 2930 100.0 ( 75.0 : 25.0) Gull 1.2 2795 2985 100.0 ( 79.0 : 21.0) Deep Sjeng c't 2010 32b 2788 3018 100.0 ( 77.0 : 23.0) Spike 1.4 32b 2785 2994 100.0 ( 78.5 : 21.5) Protector 1.4.0 2759 2983 100.0 ( 80.0 : 20.0) Hannibal 1.1 2758 2998 100.0 ( 85.0 : 15.0) spark-1.0 SSE42 2755 3056 100.0 ( 87.5 : 12.5) HIARCS 13.2 MP 32b 2748 3086 100.0 ( 83.5 : 16.5) Deep Junior 12.5 2731 3012 100.0 ( 88.5 : 11.5) Zappa Mexico II 2716 3070 100.0 ( 90.5 : 9.5) Deep Onno 1-2-70 2684 3075 100.0 ( 90.5 : 9.5) Strelka 2.0 B 2671 3062 100.0 ( 87.5 : 12.5) Umko 1.2 SSE42 2664 3002 100.0 ( 88.0 : 12.0) Loop 2007 2621 2967 100.0 ( 89.5 : 10.5) Jonny 4.00 32b 2614 2986 100.0 ( 93.0 : 7.0) Tornado 4.80 2608 3057 100.0 ( 92.5 : 7.5) Crafty 23.3 JA 2598 3034 Aver. 3003 DrawElo 4 Komodo 4 SSE42 mm01 2982 2500.0 (1892.5 : 607.5) Perf.: 100.0 ( 51.5 : 48.5) Houdini 2.0 STD 3023 3033 100.0 ( 45.0 : 55.0) Critter 1.4 SSE42 2984 2949 100.0 ( 51.5 : 48.5) Deep Rybka 4.1 SSE42 2962 2972 100.0 ( 53.5 : 46.5) Critter 1.2 2958 2982 100.0 ( 52.5 : 47.5) Stockfish 2.1.1 JA 2947 2964 100.0 ( 65.5 : 34.5) Chiron 1.1a 2834 2945 100.0 ( 69.5 : 30.5) Naum 4.2 2828 2971 100.0 ( 70.0 : 30.0) Fritz 13 32b 2820 2967 100.0 ( 68.0 : 32.0) Deep Shredder 12 2800 2930 100.0 ( 75.0 : 25.0) Gull 1.2 2794 2984 100.0 ( 79.0 : 21.0) Deep Sjeng c't 2010 32b 2787 3017 100.0 ( 77.0 : 23.0) Spike 1.4 32b 2783 2992 100.0 ( 78.5 : 21.5) Protector 1.4.0 2756 2980 100.0 ( 80.0 : 20.0) Hannibal 1.1 2755 2995 100.0 ( 85.0 : 15.0) spark-1.0 SSE42 2752 3053 100.0 ( 87.5 : 12.5) HIARCS 13.2 MP 32b 2744 3082 100.0 ( 83.5 : 16.5) Deep Junior 12.5 2726 3007 100.0 ( 88.5 : 11.5) Zappa Mexico II 2711 3065 100.0 ( 90.5 : 9.5) Deep Onno 1-2-70 2677 3068 100.0 ( 90.5 : 9.5) Strelka 2.0 B 2663 3054 100.0 ( 87.5 : 12.5) Umko 1.2 SSE42 2655 2993 100.0 ( 88.0 : 12.0) Loop 2007 2610 2956 100.0 ( 89.5 : 10.5) Jonny 4.00 32b 2603 2975 100.0 ( 93.0 : 7.0) Tornado 4.80 2597 3046 100.0 ( 92.5 : 7.5) Crafty 23.3 JA 2587 3023 Aver. 3000
So I went ahead and used square of the product of points as a new weight. For the 'default' table I got weighted average 2981.444 and for 'drawelo' it was 2982.835. This works well for 'drawelo' table, and not so well for 'default' table.
It would be nice to have a post-hoc explanation of why squared product is better weight. Simple product would correspond to sigma (of performance from binomial distribution) being proportional to 1/sqrt(n*p*(1-p)). But what is a reason for additional weighting? All I came up with, is a hypothesis that the rating difference between (future) average and an opponent is less precise when there is a gap. So not only Crafty match has four times less information about the performance, but also the rating of Crafty is four times more unprecise when comparing to 3000-ish future average than the rating of Houdini.
But I am not convinced. It is possible that draws really play a significant role and the square weighted average works just by coincidence, for this data.
-
- Posts: 60
- Joined: Thu Nov 05, 2009 9:53 pm
Re: The IPON BayesElo mystery solved.
Why stop at small values of h? When defining E=ln(10)/400 (hopefuly ok), we have F(x)=1/(1+E^-x)=1-F(-x). So for BayesElo, with drawelo=h and opponent being x behind, likelihood of win is F(x-h) and likelihood of loss is F(-x-h), we have:H.G.Muller wrote:For one win and one loss the likelihood becomes
F(x-h) * (1 - F(x+h)) = (F - hF')* (1 - F - hF') + O(h^2) =
= F * (1-F) -hF' * (1-F) - hF' * F + O(h^2)
= F * (1-F) - hF' + O(h^2)
(all F and F' taken in x unless specified otherwise).
Now since F * (1-F) is proportional to F' and of O(1), this means that the shape of the likelyhood distribution is F' upto an error of O(h^2). And with Bayes only the shape counts, not the normalization factor (which is indeed 1+O(h), but also O(h) for the draws).
Code: Select all
draw likelihood = 1 - F(-x-h) - F(x-h)
= 1 - F(-x-h) - F(x-h) + F(-x-h) * F(x-h) - F(-x-h) * F(x-h) // plusminus the same term
= (1-F(-x-h)) * (1-F(x-h)) - F(-x-h) * F(x-h)
= F(x+h) * F(-x+h) - F(-x-h) * F(x-h)
= 1 / ((1 + E^-x * E^-h) * (1 + E^x * E^-h)) - 1 / ((1 + E^x * E^h) * (1 + E^-x * E^h))
= 1 / (1 + E^-x * E^-h + E^x * E^-h + E^-2h) - 1 / (1 + E^x * E^h + E^-x * E^h + E^2h)
= E^h / (E^h + E^-x + E^x +E^-h) - E^-h / (E^-h + E^x + E^-x + E^h)
= (E^h - E^-h) / (E^h + E^-x + E^x + E^-h)
= (1 - E^-2h) * F(x+h) * F(-x+h) = (E^2h - 1) * F(-x-h) * F(x-h)
So, 'draw = win + loss' holds only when (computed) rating difference is way above drawelo. Conversely, when drawelo is way above rating difference, draws do not count at all (as their likelihood is not sensitive to such difference).