I scanned the CCRL pages for an explanation of their equations and their equations and couldn't find anything other than they might be getting them from bayeselo. A review of the bayeselo pages didn't explain the calculations either.
Here is what I noticed:
Code: Select all
Rank Program Elo + - Score Draws Games LOS
3 Critter 1.2 64-bit 4CPU 3268 +29 −29 61.2% 47.7% 369
85.0%
4 Stockfish 2.1.1 64-bit 4CPU 3247 +29 −29 53.7% 46.6% 352
Code: Select all
Critter Range (58 pts)
-29 +29
|--------------------|------------|-----------|
3239 3268 3276 3297
Stockfish Range (58 pts)
-29 +29
|------------|--------|-----------------------|
3218 3239 3247 3276
Total Range (79)
Overlap
|-----21-----|----------37--------------|--21----|
|------------|---------|-------|--------|--------|
3218 3239 3247 3268 3276 3297
P(Critter in 3239-3276(Overlap) Range) = (3276-3239)/58 = 37/58 = 63.8%
P(StockFish in 3239-3276(Overlap) Range) = (3276-3239)/58 = 37/58 = 63.8%
P(both in Overlap Range) = 0.638*0.638 * 100 = 40.7%
P(Either in extreme case) = 1 - P(both in Overlap) = 100-40.7 = 59.3% --> Part of Critter is stronger
P(equal strength) = P(both in Overlap) * (P matching scores) = 40.7% * 1/37 = 1.1%
P(unequal strengths with both in overlap) = 40.7% - 1.1% = 39.6%
Given that random chance would give equal chances to either program being stronger than the other once both are in the overlap range,
P(Stockfish stronger (Given both in Overlap)) = 39.6%/2 = 19.8%
P(Critter stronger (Given both in Overlap)) = 39.6%/2 = 19.8%
All of this sets up two equations to calculate the probability of Critter being stronger than Stockfish.
P(Critter Stronger) = 1 - P(Stockfish Stronger) - P(equal strength)
= 100 - 19.8 - 1.1 = 79.1%
P(Critter Stronger) = P(Either in Extreme case) + P(Critter Stronger(Given both in Overlap))
= 59.3% + 19.8% = 79.1%
We see that both equations produce the same value of 79.1% which does not equal CCRL's 85%. So, where is the mistake? Is it in my math or theirs or both? Or maybe a slight coding error?