lkaufman wrote:Daniel Shawul wrote:lkaufman wrote:I'd like to propose a related question. Let's say that we want to know whether A or B is stronger, i.e. which would win in a direct match. Version A scores 55% against a foreign gauntlet, B scores 56%, so B is 7 elo stronger according to normal Elo calculations and to Ordo. But let's say that A ends up rated 7 elo higher according to Bayeselo (which I believe can and does happen sometimes, due to differing draw rates and to which programs each scored better or worse against). Should you bet your money on A or on B in a direct match? Aside from just expressing opinions, does anyone have any data that would help answer this question?
The problem with your question is that you hope Ordo may bring improvement when it has inferior algorithms. It simply can't. Someone should first do analysis of what improvements if any Ordo brings. Remi did such comparison against state of the art ( EloStat at the time) when he first introduced bayeselo.
http://remi.coulom.free.fr/Bayesian-Elo/ . The improvements of bayeselo are there for every one to see. Nothing like that from Ordo guys aside from spreading 'misconceptions' (now admitted by Adam) of bayeselo to look good. They know it is inferior so only chance is FUD (thanks Michel

)
And don't use scale=1.
EloStat was no good because it made the unsound assumption that you can average the ratings of opposing engines and get a meaningful number. Ordo (I believe) corrects that flaw.
Ordo did not correct that error bayeselo did as clearly pointed out in the homepage. So give credit where credit is due! Yes Ordo has invented the wheal yet again, but it did not give you improvements. Why wouldn't the author give a clear information if there are any improvements?
So does Bayeselo. The fact that both are clearly superior to EloStat does not give us any information on which of the two is superior. Bayeselo treats two draws very differently from a win and a loss; Ordo (I believe) does not. The question comes down to whether this different treatment, which is justified by a theoretical model, is actually justified with real-world data. It cannot be answered by abstract arguments, only by actually doing comparisons with real data.
You are making way too many uniformed assumptions and conclusions and probably consider your self as an expert now. News flash you are not and please take no offence. You say that the assumption of 1 win and 1 loss was not set as 2 draws has an effect but I have have tested that and another model and it barely has an effect. CCRL blitz, CCRL 40/40 and CEGT data were used
I'm asking whether anyone has attempted to do so. My hunch is that the Bayeselo assumption is less correct than the standard (Ordo) one, because I've seen strange results (both in real data and in simulations) for Bayeselo that seem wrong to me intuitively. But I'm perfectly willing to admit that I'm wrong if there is data to prove so.
Yes you are wrong. You claim way too many things you don't understand. Your problem is staring you at the face scale = 1. If you don't like draw ration, you can get rid of it by using mm 0 0... So what now? It should be the same as Ordo and Elostat which don't have it.
I would also like to add that, based on HGM's explanation, if Bayeselo is superior to Ordo, it implies that the scoring system used in chess is wrong. I'm just not sure what scoring system for wins, draws, and losses would be consistent (or most nearly so) with Bayeselo. Does anyone know?
Too many questions. First make sure you understand you shouldn't use scale=1. I don't know what HGM explanation you are talking about but I am sure he said nothing close to "bayeselo scoring system is wrong" or to that effect.