CCRL LOS stats

CRoberson · Post by **CRoberson** » Tue Oct 18, 2011 6:39 am

I've often thought some of the numbers were a bit off. Tonight I saw a case of what seems to be a LOS value that is too high.

I scanned the CCRL pages for an explanation of their equations and their equations and couldn't find anything other than they might be getting them from bayeselo. A review of the bayeselo pages didn't explain the calculations either.

Here is what I noticed:

Code: Select all


Rank    Program                   Elo    +     -    Score   Draws  Games LOS
3   Critter 1.2 64-bit 4CPU      3268   +29   −29	61.2%	47.7%	369
                                                                          85.0%
4   Stockfish 2.1.1 64-bit 4CPU  3247	+29	−29	53.7%	46.6%	352

An 85% LOS seemed high to me. So, I did my own calculations based on fundamental classic probability. All of this is based on the premise that a rating of r +/- m means that there is equal chance that the true rating is anywhere between r-m and r+m. Here are my equations:

Code: Select all


      Critter Range (58 pts)
                -29                            +29
          |--------------------|------------|-----------|
        3239                  3268         3276        3297
    
    
       
       Stockfish Range (58 pts)
                 -29                            +29
          |------------|--------|-----------------------|
         3218        3239      3247                    3276
     
     
     
     Total Range (79)
                                   Overlap
             |-----21-----|----------37--------------|--21----|
             |------------|---------|-------|--------|--------|
            3218         3239      3247    3268    3276     3297
        

         
  P(Critter in 3239-3276(Overlap) Range)   = (3276-3239)/58 = 37/58 = 63.8%
  P(StockFish in 3239-3276(Overlap) Range) = (3276-3239)/58 = 37/58 = 63.8%
  P(both in Overlap Range) = 0.638*0.638 * 100 = 40.7%
  P(Either in extreme case) = 1 - P(both in Overlap) = 100-40.7 = 59.3%  --> Part of Critter is stronger
     
  P(equal strength) = P(both in Overlap) * (P matching scores) = 40.7% * 1/37 = 1.1%
  P(unequal strengths with both in overlap) = 40.7% - 1.1% = 39.6%
     
  Given that random chance would give equal chances to either program being stronger than the other once both are in the overlap range,
   P(Stockfish stronger (Given both in Overlap)) = 39.6%/2 = 19.8%
   P(Critter stronger (Given both in Overlap)) = 39.6%/2 = 19.8%

   All of this sets up two equations to calculate the probability of Critter being stronger than Stockfish.
      
   P(Critter Stronger) = 1 - P(Stockfish Stronger) - P(equal strength) 
                               = 100 - 19.8 - 1.1 = 79.1%
   P(Critter Stronger) = P(Either in Extreme case) + P(Critter Stronger(Given both in Overlap)) 
                               = 59.3% + 19.8% = 79.1%

We see that both equations produce the same value of 79.1% which does not equal CCRL's 85%. So, where is the mistake? Is it in my math or theirs or both? Or maybe a slight coding error?

Kirill Kryukov · Post by **Kirill Kryukov** » Tue Oct 18, 2011 7:14 am

Hi Charles,

CRoberson wrote:I scanned the CCRL pages for an explanation of their equations and their equations and couldn't find anything other than they might be getting them from bayeselo.

Yes, our LOS numbers are computed by bayeselo.

CRoberson wrote:So, I did my own calculations based on fundamental classic probability. All of this is based on the premise that a rating of r +/- m means that there is equal chance that the true rating is anywhere between r-m and r+m.

r +/- m does not mean this. Instead, it's the boundaries of 95% confidence interval. So, "3268 +29 −29" means that there is a 95% estimated probability that the true rating is within the [3268-29,3268+29] interval. The true rating is not uniformly distributed over the interval, but has a normal bell-shaped distribution.

Rein Halbersma · Post by **Rein Halbersma** » Tue Oct 18, 2011 8:18 am

Kirill Kryukov wrote:Hi Charles,

CRoberson wrote:I scanned the CCRL pages for an explanation of their equations and their equations and couldn't find anything other than they might be getting them from bayeselo.
Yes, our LOS numbers are computed by bayeselo.

CRoberson wrote:So, I did my own calculations based on fundamental classic probability. All of this is based on the premise that a rating of r +/- m means that there is equal chance that the true rating is anywhere between r-m and r+m.
r +/- m does not mean this. Instead, it's the boundaries of 95% confidence interval. So, "3268 +29 −29" means that there is a 95% estimated probability that the true rating is within the [3268-29,3268+29] interval. The true rating is not uniformly distributed over the interval, but has a normal bell-shaped distribution.

Actually it means that there is less than 5% probability that you would have observed these outcomes if the true rating lies outside the [3268 - 29, 3268 + 29] confidence interval.

This is a statement of the form P(data|rating), whereas your statement was of the form P(rating|data). That statement requires prior unconditional distributions P(rating) and P(data), as well as the application of Bayes' theorem: P(rating|data) = P(data|rating) * P(rating) / P(data).

CCRL LOS stats

CCRL LOS stats

Re: CCRL LOS stats

Re: CCRL LOS stats