Question to math expert may be Ajedrecista

Jouni · Post by **Jouni** » Tue Jan 27, 2026 5:41 pm

Engine is X elo stronger than opponent. What is the probability it wins match containing Y game pairs (same opening)?

thanks

Ajedrecista · Post by **Ajedrecista** » Tue Jan 27, 2026 8:58 pm

Hello Jouni:

Jouni wrote: ↑Tue Jan 27, 2026 5:41 pm Engine is X elo stronger than opponent. What is the probability it wins match containing Y game pairs (same opening)?

thanks

I feel I am not that expert. However, I can give it a try:

Code: Select all

W: probability of the stronger engine to win  a single game.
D: probability of the stronger engine to draw a single game.
L: probability of the stronger engine to lose a single game.

X = 400*log10{ [ 1/2 + (W - L)/2 ] / [ 1/2 - (W - L)/2 ] }
[...]
W - L = [ 10^(X/400) - 1 ] / [ 10^(X/400) + 1 ]

------------

D is limited between 0 and a maximum value D_max, where L = 0:

W + D + L = 1
W + D_max = 1
D_max = 1 - W

W - L = [ 10^(X/400) - 1 ] / [ 10^(X/400) + 1 ]
W - 0 = [ 10^(X/400) - 1 ] / [ 10^(X/400) + 1 ]
W = [ 10^(X/400) - 1 ] / [ 10^(X/400) + 1 ]

D_max = 1 - [ 10^(X/400) - 1 ] / [ 10^(X/400) + 1 ]

D_max = 2 / [ 1 + 10^(X/400) ]

0 =< D =< D_max

With that, we have bounded how easy is to win or draw a single game for the stronger engine, without taking into account the white advantage. I also assume that the games are non deterministic, this is, that both engines do not play the same moves over and over, just to get a variety of games.

Then, a single game pair can end in three ways: a win for the stronger engine, a tie or a lose for the stronger engine.

Code: Select all

a*b means the probability of finishing the first game of a game pair as "a" {win, draw or lose} and the second game as "b" {win, draw or lose}.

W*W + W*D + D*W = pW = Prob.(stronger engine wins  a game pair)
W*L + D*D + L*W = pT = Prob.(stronger engine ties  a game pair).
D*L + L*D + L*L = pL = Prob.(stronger engine loses a game pair).

pW > pL by definition (X > 0 → W > L)

From the stronger engine's POV, I would go with a trinomial distribution where the stronger engine wins more game pairs than loses:

Code: Select all

w: number of won  game pairs for the stronger engine.
t: number of tied game pairs for the stronger engine.
l: number of lost game pairs for the stronger engine.

w + t + l = Y
t = Y - w - l

Prob.(w,t,l) = { Y! / [  w! * (Y - w - l)! * l! ] } * [ (pW)^w ] * [ (1 - pW - pL)^(Y - w - l) ] * [ (pL)^l ]

We search the cases where w > l:

P = SUM[Prob.(w,t,l) | w > l]

In other words, I find the answer difficult supposing that my math is right because we must list all the cases were w > l, then compute them individually, finally adding all of them. I would say that the most annoying thing is to list all the cases (or all the cases were w > l), just like can be seen below in my numerical example: 10 outcomes with only 3 game pairs. Imagine 100 or 1000 game pairs, to say something. This is other exercise: count or calculate how many possible outcomes are after Y game pairs.

I think we can not discard the tied game pairs and go for the simpler binomial distribution because we would be evaluating Y - t game pairs instead of Y.

Larger Y benefits the stronger engine, of course. A game pair amplifies the advantage of the stronger engine because it has more chances (more games) to prove that is stronger.

------------

The problem has an additional degree of freedom with the draw ratio of a single game (0 =< D =< D_max), calculated before. Fixing D would enable a {W,D,L} triplet to start with.

I bring a numerical example to test my math:

Code: Select all

W = 0.3 = 30%
D = 0.5 = 50%
L = 0.2 = 20%

X ~ 34.86 Elo

pW = 0.3*0.3 + 0.3*0.5 + 0.5*0.3 = 0.39 = 39%
pT = 0.3*0.2 + 0.5*0.5 + 0.2*0.3 = 0.37 = 37%
pL = 0.5*0.2 + 0.2*0.5 + 0.2*0.2 = 0.24 = 24%

Check: pW + pT + pL = 39% + 37% + 24% = 100% (correct).

Y = 3 game pairs (few game pairs for simplicity of the example).

====================
w t l   Prob.(w,t,l)
====================
3 0 0      5.9319%
2 1 0     16.8831%
2 0 1     10.9512%
1 2 0     16.0173%
--------------------
1 1 1     20.7792%
0 3 0      5.0653%
--------------------
1 0 2      6.7392%
0 2 1      9.8568%
0 1 2      6.3936%
0 0 3      1.3824%

Check: 5.9319% + 16.8831% + 10.9512% + 16.0173% + 20.7792% + 5.0653% + 6.7392% + 9.8568% + 6.3936% + 1.3824% = 100% (correct).

There is a probability of 49.7835% of the stronger engine to win  this match regarding game pairs.
There is a probability of 25.8445% of the stronger engine to tie  this match regarding game pairs.
There is a probability of 24.3720% of the stronger engine to lose this match regarding game pairs.

Any insights are welcome.

------------

I did some math in the past with game pairs, although not what you asked:

SPCC: Testrun of Stockfish 16.1 finished

Ajedrecista.

Jouni · Post by **Jouni** » Wed Jan 28, 2026 1:50 pm

Hmm this is much more difficult than I thought. Better use Cute Chess which displays LOS automatically

.

jefk · Post by **jefk** » Wed Jan 28, 2026 4:27 pm

https://en.wikipedia.org/wiki/Elo_rating_system#Theory

Ajedrecista · Post by **Ajedrecista** » Wed Jan 28, 2026 8:29 pm

Hello:

Ajedrecista wrote: ↑Tue Jan 27, 2026 8:58 pm[...]

[...] I would say that the most annoying thing is to list all the cases (or all the cases were w > l), just like can be seen below in my numerical example: 10 outcomes with only 3 game pairs. Imagine 100 or 1000 game pairs, to say something. This is other exercise: count or calculate how many possible outcomes are after Y game pairs.

[...]

Exercise solved. For Y game pairs, there are (Y + 1)*(Y + 2)/2 possible outcomes, distributed like this:

Code: Select all

o_t := Possible outcomes where the stronger engine wins the same number of game pairs than loses = floor(Y/2) + 1
o_w := Possible outcomes where the stronger engine wins   more   number of game pairs than loses = { (Y + 1)*(Y + 2)/2 - [ floor(Y/2) + 1 ] }/2
o_l := Possible outcomes where the stronger engine wins   less   number of game pairs than loses = { (Y + 1)*(Y + 2)/2 - [ floor(Y/2) + 1 ] }/2
o   := Possible outcomes.

o = o_w + o_t + o_l
o_w = o_l = ( o - o_t ) / 2

=========================
Y    o_w   o_t   o_l    o
=========================
1     1     1     1     3
2     2     2     2     6 
3     4     2     4    10
4     6     3     6    15
5     9     3     9    21

The number of possible outcomes tend to Y²/2 with big Y.

------------

Jouni wrote: ↑Wed Jan 28, 2026 1:50 pm Hmm this is much more difficult than I thought. Better use Cute Chess which displays LOS automatically .

The typical LOS value that most softwares compute is for single games (trinomial distribution), which can be easily approximated with normal distributions; but game pairs requires other calculus (pentanomial distribution). Michel did some nice math on the matter, within a concept called 'normalized Elo'. The pentanomial model is briefly explained in the chapter 4 here:

https://web.archive.org/web/20180713110 ... ed_elo.pdf

Mixing his notation with mine:

Code: Select all

p_0   = L*L
p_1/2 = W*D + D*W = 2*W*D
p_1   = W*L + D*D + L*W = 2*W*L + D*D
p_3/2 = D*L + L*D = 2*D*L
p_2   = W*W

s2 = p_0 * 0 + p_1/2 * 1/2 + p_1 * 1 + p_3/2 * 3/2 + p_2 * 2 = 2s
s2 = 0 + D*L + (2*W*L + D*D) + 3*W*D + 2*W*W = 2s

And so on. With the numbers of my numerical example (computations without Excel, I hope no typos):

Code: Select all

s2 = 0 + 0.1 + 0.37 + 0.45 + 0.18 = 1.1 = 2s

SUM(p_i * i²) = 0.04 * 0 + 0.2 * 0.25 + 0.37 * 1 + 0.3 * 2.25 + 0.09 * 4 = 1.455

N = 6 ; Y = N/2 = 3
sigma(s2) = [1/sqrt(6/2)] * sqrt( 1.455 - 1.1² ) ~ 0.2858

// From here, do not confuse t of Michel's paper with t of my former post.
t = ( s2 - 1 ) / [ sigma(s2) ] = 0.1 / [ sigma(s2) ] ~ 0.3499

t0 = t/sqrt(N) = t/sqrt(6) ~ 0.1429

t and t0 values should be related to LOS, as seen in chapter 2: LOS = phi(t) = phi[t0*sqrt(N)]. I get LOS = phi(t) ~ phi(0.3499) ~ 63.68%. Do my numbers in Michel's equations make sense? I do not know. In my other post, I got:

Code: Select all

[...]

There is a probability of 49.7835% of the stronger engine to win  this match regarding game pairs.
There is a probability of 25.8445% of the stronger engine to tie  this match regarding game pairs.
There is a probability of 24.3720% of the stronger engine to lose this match regarding game pairs.

Though we are probably measuring different things. I do not know if cutechess implements the pentanomial model, but I hope so.

Regards from Spain.

Ajedrecista.

towforce · Post by **towforce** » Wed Jan 28, 2026 10:01 pm

There is an Elo calculator here - link - but all it seems to do is to tell you how your rating changes as you win/draw/lose games in a competition.

There's an "opportunity" for someone to write a better Elo calculator!

towforce · Post by **towforce** » Thu Jan 29, 2026 6:40 pm

Elo Calculations Only: (Elo calculations only give expected score)

Code: Select all

import math

def calculate_expected_score_X(Rx, Ry):
    """
    Calculates the expected score for Player X against Player Y 
    based on their Elo ratings.

    Args:
        Rx (float): Elo rating of Player X.
        Ry (float): Elo rating of Player Y.

    Returns:
        float: Expected score for Player X (range 0 to 1).
    """
    # Rating difference capped at 400 points for FIDE purposes, 
    # but for a general function we use the actual difference.
    rating_difference = Ry - Rx
    expected_score_X = 1 / (1 + math.pow(10, rating_difference / 400))
    return expected_score_X

To get separate probabilities for win, draw, and loss, a more complex model like the Davidson model is necessary, which introduces a 'draw parameter' to reflect draw frequency (which varies by skill level and time control). A simplified approach (known as the Glicko/Lichess method) uses the two-way win probability to estimate the three outcomes:

Code: Select all

import math

def calculate_win_draw_loss_probabilities(Rx, Ry, draw_parameter=0.5):
    """
    Calculates probabilities of win, draw, and loss for Player X using 
    a model that incorporates a draw parameter (e.g., Lichess method).

    Args:
        Rx (float): Elo rating of Player X.
        Ry (float): Elo rating of Player Y.
        draw_parameter (float): A value (typically around 0.5 for classic games) 
                                representing expected draw frequency.

    Returns:
        dict: Probabilities for Player X winning, drawing, and losing.
    """
    # Calculate the expected score (g) using the standard Elo formula.
    rating_difference = Rx - Ry
    g = 1 / (1 + math.pow(10, -rating_difference / 400))

    # Calculate probabilities using the simplified Davidson/Lichess approach.
    # p(draw) = draw_parameter * 2 * g * (1 - g)  
    # This specific simplified model has p(win) = g^2 and p(loss) = (1-g)^2 when the draw_parameter is implicitly 1 
    # A more general Davidson model is complex to derive on the fly.

    # A common implementation uses an odds ratio approach with a draw utility parameter:
    # This requires external calibration and is more complex.

    # The most common simple model for the three outcomes:
    # Prob(Win) = g^2
    # Prob(Loss) = (1-g)^2
    # Prob(Draw) = 2 * g * (1-g)
    # Note: This specific implementation assumes a 50% draw rate for equal players.

    prob_win = g**2
    prob_loss = (1 - g)**2
    prob_draw = 1 - prob_win - prob_loss # or 2 * g * (1-g)

    return {
        "Probability X Wins": prob_win,
        "Probability X Draws": prob_draw,
        "Probability X Loses": prob_loss
    }

Once you have probabilities for win/draw/loss, then just multiply these by the number of games.

towforce · Post by **towforce** » Thu Jan 29, 2026 6:44 pm

Sorry - I didn't answer the OP question!

towforce · Post by **towforce** » Thu Jan 29, 2026 6:56 pm

Jouni wrote: ↑Tue Jan 27, 2026 5:41 pm Engine is X elo stronger than opponent. What is the probability it wins match containing Y game pairs (same opening)?

thanks

To calculate the probability of a player winning a match (obtaining a higher total score) over multiple games, we must account for individual game outcomes: win 1 point, draw 0.5 points, and loss 0 points. Because the standard Elo formula only provides an expected score rather than specific outcome probabilities, we must use a model for draws (such as a fixed draw rate or the Davidson model) to distribute that expectation into discrete win/draw/loss probabilities - so the work in my previous post is not wasted!

Code: Select all

import math

def calculate_match_win_probability(Rx, Ry, n, draw_rate=0.35):
    """
    Calculates the probability of Player X scoring more points than Player Y 
    over n games.
    """
    # 1. Calculate Expected Score (Ex) for a single game
    Ex = 1 / (1 + 10**((Ry - Rx) / 400))
    
    # 2. Derive individual probabilities (Win, Draw, Loss)
    # Standard heuristic: Win prob = Expected Score - (Draw Prob / 2)
    pw = max(0, Ex - (draw_rate / 2))
    pl = max(0, (1 - Ex) - (draw_rate / 2))
    pd = 1 - pw - pl
    
    match_win_prob = 0
    
    # 3. Sum probabilities of all scenarios where X wins the match
    # A match is won if Player X's total wins (w) > total losses (l)
    for w in range(n + 1):
        for d in range(n - w + 1):
            l = n - w - d
            if w > l:
                # Multinomial coefficient: n! / (w! * d! * l!)
                coeff = math.factorial(n) / (
                    math.factorial(w) * math.factorial(d) * math.factorial(l)
                )
                # Probability of this specific w, d, l combination
                match_win_prob += coeff * (pw**w) * (pd**d) * (pl**l)
                
    return match_win_prob

You would, of course, need to calibrate for the fact that the stronger the players, the higher the draw rate.

You also need to assume the the Elo ratings remain constant throughout the match and that the games are independent (which playing the same opening over and over per the OP would actually preclude).

Ajedrecista · Post by **Ajedrecista** » Thu Jan 29, 2026 9:05 pm

Hello Graham:

towforce wrote: ↑Thu Jan 29, 2026 6:56 pmTo calculate the probability of a player winning a match (obtaining a higher total score) over multiple games, we must account for individual game outcomes: win 1 point, draw 0.5 points, and loss 0 points. [...]

This is the exact way if the metric is single games: simply more wins than loses. But the OP asked about winning more game pairs, which introduces a subtle difference: an engine can win more engine pairs while losing more games than wins. How can be possible? It is very unlikely to be fair, but mathematically possible: please imagine that the engine A is only able to win game pairs 1.5-0.5 against the engine B, while the engine B only wins game pairs 2-0 against the engine A, with A winning 1.5-0.5 more frequently than B wins 2-0, but not far more:

Code: Select all

A wins 1.5-0.5 40 times (A wins 40 game pairs).
B wins 2.0-0.0 30 times (B wins 30 game pairs).
A wins B in game pairs metric: A (40-30) B

A's wins/draws/loses: 40W+40D+60L
B's wins/draws/loses: 60W+40D+40L
B wins A in games metric: B (80-60) A

With this behaviour, A would be more suitable for KO tournaments while B would perform better in double round-robin tournaments.

Regards from Spain.

Ajedrecista.

Question to math expert may be Ajedrecista

Question to math expert may be Ajedrecista

Re: Question to math expert, maybe Ajedrecista.

Re: Question to math expert may be Ajedrecista

Re: Question to math expert may be Ajedrecista

Re: Question to math expert, maybe Ajedrecista.

Re: Question to math expert may be Ajedrecista

Re: Question to math expert may be Ajedrecista

Re: Question to math expert may be Ajedrecista

Re: Question to math expert may be Ajedrecista

Re: Question to math expert, maybe Ajedrecista.