Question to math expert may be Ajedrecista
Moderator: Ras
-
Jouni
- Posts: 3821
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Question to math expert may be Ajedrecista
Engine is X elo stronger than opponent. What is the probability it wins match containing Y game pairs (same opening)?
thanks
thanks
Jouni
-
Ajedrecista
- Posts: 2196
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Question to math expert, maybe Ajedrecista.
Hello Jouni:
With that, we have bounded how easy is to win or draw a single game for the stronger engine, without taking into account the white advantage. I also assume that the games are non deterministic, this is, that both engines do not play the same moves over and over, just to get a variety of games.
Then, a single game pair can end in three ways: a win for the stronger engine, a tie or a lose for the stronger engine.
From the stronger engine's POV, I would go with a trinomial distribution where the stronger engine wins more game pairs than loses:
In other words, I find the answer difficult supposing that my math is right because we must list all the cases were w > l, then compute them individually, finally adding all of them. I would say that the most annoying thing is to list all the cases (or all the cases were w > l), just like can be seen below in my numerical example: 10 outcomes with only 3 game pairs. Imagine 100 or 1000 game pairs, to say something. This is other exercise: count or calculate how many possible outcomes are after Y game pairs.
I think we can not discard the tied game pairs and go for the simpler binomial distribution because we would be evaluating Y - t game pairs instead of Y.
Larger Y benefits the stronger engine, of course. A game pair amplifies the advantage of the stronger engine because it has more chances (more games) to prove that is stronger.
------------
The problem has an additional degree of freedom with the draw ratio of a single game (0 =< D =< D_max), calculated before. Fixing D would enable a {W,D,L} triplet to start with.
I bring a numerical example to test my math:
Any insights are welcome.
------------
I did some math in the past with game pairs, although not what you asked:
SPCC: Testrun of Stockfish 16.1 finished
Ajedrecista.
I feel I am not that expert. However, I can give it a try:
Code: Select all
W: probability of the stronger engine to win a single game.
D: probability of the stronger engine to draw a single game.
L: probability of the stronger engine to lose a single game.
X = 400*log10{ [ 1/2 + (W - L)/2 ] / [ 1/2 - (W - L)/2 ] }
[...]
W - L = [ 10^(X/400) - 1 ] / [ 10^(X/400) + 1 ]
------------
D is limited between 0 and a maximum value D_max, where L = 0:
W + D + L = 1
W + D_max = 1
D_max = 1 - W
W - L = [ 10^(X/400) - 1 ] / [ 10^(X/400) + 1 ]
W - 0 = [ 10^(X/400) - 1 ] / [ 10^(X/400) + 1 ]
W = [ 10^(X/400) - 1 ] / [ 10^(X/400) + 1 ]
D_max = 1 - [ 10^(X/400) - 1 ] / [ 10^(X/400) + 1 ]
D_max = 2 / [ 1 + 10^(X/400) ]
0 =< D =< D_maxThen, a single game pair can end in three ways: a win for the stronger engine, a tie or a lose for the stronger engine.
Code: Select all
a*b means the probability of finishing the first game of a game pair as "a" {win, draw or lose} and the second game as "b" {win, draw or lose}.
W*W + W*D + D*W = pW = Prob.(stronger engine wins a game pair)
W*L + D*D + L*W = pT = Prob.(stronger engine ties a game pair).
D*L + L*D + L*L = pL = Prob.(stronger engine loses a game pair).
pW > pL by definition (X > 0 → W > L)Code: Select all
w: number of won game pairs for the stronger engine.
t: number of tied game pairs for the stronger engine.
l: number of lost game pairs for the stronger engine.
w + t + l = Y
t = Y - w - l
Prob.(w,t,l) = { Y! / [ w! * (Y - w - l)! * l! ] } * [ (pW)^w ] * [ (1 - pW - pL)^(Y - w - l) ] * [ (pL)^l ]
We search the cases where w > l:
P = SUM[Prob.(w,t,l) | w > l]I think we can not discard the tied game pairs and go for the simpler binomial distribution because we would be evaluating Y - t game pairs instead of Y.
Larger Y benefits the stronger engine, of course. A game pair amplifies the advantage of the stronger engine because it has more chances (more games) to prove that is stronger.
------------
The problem has an additional degree of freedom with the draw ratio of a single game (0 =< D =< D_max), calculated before. Fixing D would enable a {W,D,L} triplet to start with.
I bring a numerical example to test my math:
Code: Select all
W = 0.3 = 30%
D = 0.5 = 50%
L = 0.2 = 20%
X ~ 34.86 Elo
pW = 0.3*0.3 + 0.3*0.5 + 0.5*0.3 = 0.39 = 39%
pT = 0.3*0.2 + 0.5*0.5 + 0.2*0.3 = 0.37 = 37%
pL = 0.5*0.2 + 0.2*0.5 + 0.2*0.2 = 0.24 = 24%
Check: pW + pT + pL = 39% + 37% + 24% = 100% (correct).
Y = 3 game pairs (few game pairs for simplicity of the example).
====================
w t l Prob.(w,t,l)
====================
3 0 0 5.9319%
2 1 0 16.8831%
2 0 1 10.9512%
1 2 0 16.0173%
--------------------
1 1 1 20.7792%
0 3 0 5.0653%
--------------------
1 0 2 6.7392%
0 2 1 9.8568%
0 1 2 6.3936%
0 0 3 1.3824%
Check: 5.9319% + 16.8831% + 10.9512% + 16.0173% + 20.7792% + 5.0653% + 6.7392% + 9.8568% + 6.3936% + 1.3824% = 100% (correct).
There is a probability of 49.7835% of the stronger engine to win this match regarding game pairs.
There is a probability of 25.8445% of the stronger engine to tie this match regarding game pairs.
There is a probability of 24.3720% of the stronger engine to lose this match regarding game pairs.------------
I did some math in the past with game pairs, although not what you asked:
SPCC: Testrun of Stockfish 16.1 finished
Ajedrecista.
-
Jouni
- Posts: 3821
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Re: Question to math expert may be Ajedrecista
Hmm this is much more difficult than I thought. Better use Cute Chess which displays LOS automatically
.
Jouni
-
jefk
- Posts: 1085
- Joined: Sun Jul 25, 2010 10:07 pm
- Location: the Netherlands
- Full name: Jef Kaan
-
Ajedrecista
- Posts: 2196
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Question to math expert, maybe Ajedrecista.
Hello:
The number of possible outcomes tend to Y²/2 with big Y.
------------
https://web.archive.org/web/20180713110 ... ed_elo.pdf
Mixing his notation with mine:
And so on. With the numbers of my numerical example (computations without Excel, I hope no typos):
t and t0 values should be related to LOS, as seen in chapter 2: LOS = phi(t) = phi[t0*sqrt(N)]. I get LOS = phi(t) ~ phi(0.3499) ~ 63.68%. Do my numbers in Michel's equations make sense? I do not know. In my other post, I got:
Though we are probably measuring different things. I do not know if cutechess implements the pentanomial model, but I hope so.
Regards from Spain.
Ajedrecista.
Exercise solved. For Y game pairs, there are (Y + 1)*(Y + 2)/2 possible outcomes, distributed like this:Ajedrecista wrote: ↑Tue Jan 27, 2026 8:58 pm[...]
[...] I would say that the most annoying thing is to list all the cases (or all the cases were w > l), just like can be seen below in my numerical example: 10 outcomes with only 3 game pairs. Imagine 100 or 1000 game pairs, to say something. This is other exercise: count or calculate how many possible outcomes are after Y game pairs.
[...]
Code: Select all
o_t := Possible outcomes where the stronger engine wins the same number of game pairs than loses = floor(Y/2) + 1
o_w := Possible outcomes where the stronger engine wins more number of game pairs than loses = { (Y + 1)*(Y + 2)/2 - [ floor(Y/2) + 1 ] }/2
o_l := Possible outcomes where the stronger engine wins less number of game pairs than loses = { (Y + 1)*(Y + 2)/2 - [ floor(Y/2) + 1 ] }/2
o := Possible outcomes.
o = o_w + o_t + o_l
o_w = o_l = ( o - o_t ) / 2
=========================
Y o_w o_t o_l o
=========================
1 1 1 1 3
2 2 2 2 6
3 4 2 4 10
4 6 3 6 15
5 9 3 9 21------------
The typical LOS value that most softwares compute is for single games (trinomial distribution), which can be easily approximated with normal distributions; but game pairs requires other calculus (pentanomial distribution). Michel did some nice math on the matter, within a concept called 'normalized Elo'. The pentanomial model is briefly explained in the chapter 4 here:
https://web.archive.org/web/20180713110 ... ed_elo.pdf
Mixing his notation with mine:
Code: Select all
p_0 = L*L
p_1/2 = W*D + D*W = 2*W*D
p_1 = W*L + D*D + L*W = 2*W*L + D*D
p_3/2 = D*L + L*D = 2*D*L
p_2 = W*W
s2 = p_0 * 0 + p_1/2 * 1/2 + p_1 * 1 + p_3/2 * 3/2 + p_2 * 2 = 2s
s2 = 0 + D*L + (2*W*L + D*D) + 3*W*D + 2*W*W = 2sCode: Select all
s2 = 0 + 0.1 + 0.37 + 0.45 + 0.18 = 1.1 = 2s
SUM(p_i * i²) = 0.04 * 0 + 0.2 * 0.25 + 0.37 * 1 + 0.3 * 2.25 + 0.09 * 4 = 1.455
N = 6 ; Y = N/2 = 3
sigma(s2) = [1/sqrt(6/2)] * sqrt( 1.455 - 1.1² ) ~ 0.2858
// From here, do not confuse t of Michel's paper with t of my former post.
t = ( s2 - 1 ) / [ sigma(s2) ] = 0.1 / [ sigma(s2) ] ~ 0.3499
t0 = t/sqrt(N) = t/sqrt(6) ~ 0.1429Code: Select all
[...]
There is a probability of 49.7835% of the stronger engine to win this match regarding game pairs.
There is a probability of 25.8445% of the stronger engine to tie this match regarding game pairs.
There is a probability of 24.3720% of the stronger engine to lose this match regarding game pairs.Regards from Spain.
Ajedrecista.
-
towforce
- Posts: 12816
- Joined: Thu Mar 09, 2006 12:57 am
- Location: Birmingham UK
- Full name: Graham Laight
Re: Question to math expert may be Ajedrecista
There is an Elo calculator here - link - but all it seems to do is to tell you how your rating changes as you win/draw/lose games in a competition.
There's an "opportunity" for someone to write a better Elo calculator!
There's an "opportunity" for someone to write a better Elo calculator!
Human chess is partly about tactics and strategy, but mostly about memory
-
towforce
- Posts: 12816
- Joined: Thu Mar 09, 2006 12:57 am
- Location: Birmingham UK
- Full name: Graham Laight
Re: Question to math expert may be Ajedrecista
Elo Calculations Only: (Elo calculations only give expected score)
To get separate probabilities for win, draw, and loss, a more complex model like the Davidson model is necessary, which introduces a 'draw parameter' to reflect draw frequency (which varies by skill level and time control). A simplified approach (known as the Glicko/Lichess method) uses the two-way win probability to estimate the three outcomes:
Once you have probabilities for win/draw/loss, then just multiply these by the number of games.
Code: Select all
import math
def calculate_expected_score_X(Rx, Ry):
"""
Calculates the expected score for Player X against Player Y
based on their Elo ratings.
Args:
Rx (float): Elo rating of Player X.
Ry (float): Elo rating of Player Y.
Returns:
float: Expected score for Player X (range 0 to 1).
"""
# Rating difference capped at 400 points for FIDE purposes,
# but for a general function we use the actual difference.
rating_difference = Ry - Rx
expected_score_X = 1 / (1 + math.pow(10, rating_difference / 400))
return expected_score_XTo get separate probabilities for win, draw, and loss, a more complex model like the Davidson model is necessary, which introduces a 'draw parameter' to reflect draw frequency (which varies by skill level and time control). A simplified approach (known as the Glicko/Lichess method) uses the two-way win probability to estimate the three outcomes:
Code: Select all
import math
def calculate_win_draw_loss_probabilities(Rx, Ry, draw_parameter=0.5):
"""
Calculates probabilities of win, draw, and loss for Player X using
a model that incorporates a draw parameter (e.g., Lichess method).
Args:
Rx (float): Elo rating of Player X.
Ry (float): Elo rating of Player Y.
draw_parameter (float): A value (typically around 0.5 for classic games)
representing expected draw frequency.
Returns:
dict: Probabilities for Player X winning, drawing, and losing.
"""
# Calculate the expected score (g) using the standard Elo formula.
rating_difference = Rx - Ry
g = 1 / (1 + math.pow(10, -rating_difference / 400))
# Calculate probabilities using the simplified Davidson/Lichess approach.
# p(draw) = draw_parameter * 2 * g * (1 - g)
# This specific simplified model has p(win) = g^2 and p(loss) = (1-g)^2 when the draw_parameter is implicitly 1
# A more general Davidson model is complex to derive on the fly.
# A common implementation uses an odds ratio approach with a draw utility parameter:
# This requires external calibration and is more complex.
# The most common simple model for the three outcomes:
# Prob(Win) = g^2
# Prob(Loss) = (1-g)^2
# Prob(Draw) = 2 * g * (1-g)
# Note: This specific implementation assumes a 50% draw rate for equal players.
prob_win = g**2
prob_loss = (1 - g)**2
prob_draw = 1 - prob_win - prob_loss # or 2 * g * (1-g)
return {
"Probability X Wins": prob_win,
"Probability X Draws": prob_draw,
"Probability X Loses": prob_loss
}Human chess is partly about tactics and strategy, but mostly about memory
-
towforce
- Posts: 12816
- Joined: Thu Mar 09, 2006 12:57 am
- Location: Birmingham UK
- Full name: Graham Laight
Re: Question to math expert may be Ajedrecista
Sorry - I didn't answer the OP question! 
Human chess is partly about tactics and strategy, but mostly about memory
-
towforce
- Posts: 12816
- Joined: Thu Mar 09, 2006 12:57 am
- Location: Birmingham UK
- Full name: Graham Laight
Re: Question to math expert may be Ajedrecista
To calculate the probability of a player winning a match (obtaining a higher total score) over multiple games, we must account for individual game outcomes: win 1 point, draw 0.5 points, and loss 0 points. Because the standard Elo formula only provides an expected score rather than specific outcome probabilities, we must use a model for draws (such as a fixed draw rate or the Davidson model) to distribute that expectation into discrete win/draw/loss probabilities - so the work in my previous post is not wasted!
Code: Select all
import math
def calculate_match_win_probability(Rx, Ry, n, draw_rate=0.35):
"""
Calculates the probability of Player X scoring more points than Player Y
over n games.
"""
# 1. Calculate Expected Score (Ex) for a single game
Ex = 1 / (1 + 10**((Ry - Rx) / 400))
# 2. Derive individual probabilities (Win, Draw, Loss)
# Standard heuristic: Win prob = Expected Score - (Draw Prob / 2)
pw = max(0, Ex - (draw_rate / 2))
pl = max(0, (1 - Ex) - (draw_rate / 2))
pd = 1 - pw - pl
match_win_prob = 0
# 3. Sum probabilities of all scenarios where X wins the match
# A match is won if Player X's total wins (w) > total losses (l)
for w in range(n + 1):
for d in range(n - w + 1):
l = n - w - d
if w > l:
# Multinomial coefficient: n! / (w! * d! * l!)
coeff = math.factorial(n) / (
math.factorial(w) * math.factorial(d) * math.factorial(l)
)
# Probability of this specific w, d, l combination
match_win_prob += coeff * (pw**w) * (pd**d) * (pl**l)
return match_win_probYou also need to assume the the Elo ratings remain constant throughout the match and that the games are independent (which playing the same opening over and over per the OP would actually preclude).
Human chess is partly about tactics and strategy, but mostly about memory
-
Ajedrecista
- Posts: 2196
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Question to math expert, maybe Ajedrecista.
Hello Graham:
With this behaviour, A would be more suitable for KO tournaments while B would perform better in double round-robin tournaments.
Regards from Spain.
Ajedrecista.
This is the exact way if the metric is single games: simply more wins than loses. But the OP asked about winning more game pairs, which introduces a subtle difference: an engine can win more engine pairs while losing more games than wins. How can be possible? It is very unlikely to be fair, but mathematically possible: please imagine that the engine A is only able to win game pairs 1.5-0.5 against the engine B, while the engine B only wins game pairs 2-0 against the engine A, with A winning 1.5-0.5 more frequently than B wins 2-0, but not far more:
Code: Select all
A wins 1.5-0.5 40 times (A wins 40 game pairs).
B wins 2.0-0.0 30 times (B wins 30 game pairs).
A wins B in game pairs metric: A (40-30) B
A's wins/draws/loses: 40W+40D+60L
B's wins/draws/loses: 60W+40D+40L
B wins A in games metric: B (80-60) ARegards from Spain.
Ajedrecista.