SPCC: Testrun of Stockfish 16.1 finished

pohl4711 · Post by **pohl4711** » Thu Feb 29, 2024 6:20 am

My UHO-Top15 Ratinglist is the world's first engine-ratinglist, using UHO-openings, and the world's first ratinglist offering additionally Gamepair-statistics.

Ratinglist-testrun of Stockfish 16.1 finished (avx2-binary from the official Stockfish website)

https://www.sp-cc.de

Also take a look at the EAS-Ratinglist, the world's first engine-ratinglist not measuring strength of engines but engines's style of play:
https://www.sp-cc.de/eas-ratinglist.htm

(Perhaps you have to clear your browsercache (press STRG+SHIFT+DEL) or reload the website))

pohl4711 · Post by **pohl4711** » Thu Feb 29, 2024 12:52 pm

+12 Elo to Stockfish 16. Not very much, but when we look at the gamepairs, it is far more impressive:
Stockfish 16.1 won over 2 times more gamepairs vs. Stockfish 16 than it lost: 500 (+164 =265 -71)
And Stockfish 16.1 lost only 113 gamepairs overall (out of 7500 gamepairs), Stockfish 16 lost 233 gamepairs overall.
Without the games vs. Stockfish, Stockfish 16.1 lost only 42 gamepairs (out of 7000 gamepairs), Stockfish 16 lost 69 gamepairs.

And Stockfish 16.1 plays measureable more aggressive than Stockfish 16:
https://www.sp-cc.de/eas-ratinglist.htm

Especially, when we look at the sacrifices: Stockfish 16 played a sacrifice in 20.79% of its won games, Stockfish 16.1: 23.73%. This is even more impressive, when looking at the absolute numbers:
For example: In a testrun, Stockfish wins lets say 500 games. This means Stockfish 16 would play a sacrifice in 104 of these 500 won games (20.79% of 500 (rounded)). Stockfish 16.1 would play a sacrifice in 119 of these 500 (23.73% of 500 (rounded) won games:
An increase from 104 to 119 sac-games means +14% (119 is 114% of 104) here. This is definitly an impressive progress! And in these days of superhuman strong engines, the playing-style becomes more and more important!

Jouni · Post by **Jouni** » Thu Feb 29, 2024 10:41 pm

Only change from previous test was "Update the WDL model". The patch only affects the UCI-reported cp and wdl values. And result was inside error bars.

ernest · Post by **ernest** » Fri Mar 01, 2024 3:00 am

pohl4711 wrote: ↑Thu Feb 29, 2024 6:20 am
... offering additionally Gamepair-statistics.

Hi Stefan,

Still not sure I understand...

Can you explain what the Gamepair notion is ?
TIA !

pohl4711 · Post by **pohl4711** » Fri Mar 01, 2024 6:16 am

ernest wrote: ↑Fri Mar 01, 2024 3:00 am
pohl4711 wrote: ↑Thu Feb 29, 2024 6:20 am
... offering additionally Gamepair-statistics.

Hi Stefan,

Still not sure I understand...

Can you explain what the Gamepair notion is ?
TIA !

Pretty easy: Because my UHO-openings offer white a measureable advantage, 2 engines (in a head-to-head) play the same opening twice: One time engine A plays white and engine B plays black and in the second game A plays black and B plays white.
These 2 games are evaluated as one gamepair. Engine A (or B) needs at least 1.5 points out of 2 to win a gamepair. A 1-1 (two draws or 2 wins for the same color (mostly white of course, means one win for A (when having white) and one win for B (when having white)) is a drawn gamepair.

This is done in engine-tournaments, too (TCEC Superfinal, engine-tournaments on chesscom (CCC)). And in engine-development (see the Stockfish website about SF 16.1: "Stockfish 16.1 shows a notable improvement in performance ... winning over 2 times more game pairs than it loses.").

ernest · Post by **ernest** » Fri Mar 01, 2024 8:22 pm

Great to see that our Forum seems to be reborn !
And a big thank you to all who contributed to that !

OK Stefan, thanks ! But my view is that ALL "reputable" tests and tournaments (TCEC...) are indeed done with preset openings, played in pairs. Or else ?
So that is not new to me.
But what is gamepair statistics ?
Do you simply eliminate the 1-1 result pairs (be it 1-0 1-0 or 1/2-1/2 1/2-1/2) ?

Ajedrecista · Post by **Ajedrecista** » Fri Mar 01, 2024 10:03 pm

Hello Ernest:

ernest wrote: ↑Fri Mar 01, 2024 8:22 pm[...]
But what is gamepair statistics ?
Do you simply eliminate the 1-1 result pairs (be it 1-0 1-0 or 1/2-1/2 1/2-1/2) ?

UHO 2024 openings can be downloaded from Stefan's site, as well as the so called GamePairs rescoring tool. I have understood the following from the source code of the tool, but please Stefan correct me if I am wrong:

My notation for each pair of games of a given opening line is (A vs B) and (B vs A). For example, if I write (1-0 and 1-0), I mean that A won the first game being white and B won the second game being white.

Each pair of games of a given opening line is 'compressed' or converted (rescored) into one game.

Neither pair of games is discarded.

For a given opening line, when the outcome of a game is repeated [(1-0 and 1-0) or (½-½ and ½-½) or (0-1 and 0-1)], the pair is rescored as a draw for both engines because both engines finished 1-1.

For a given opening line, when the outcome of a game is not repeated [(1-0 and ½-½) or (1-0 and 0-1) or (½-½ and 1-0) or (½-½ and 0-1) or (0-1 and 1-0) or (0-1 and ½-½)], the pair is rescored as a win for the winning engine and as a lose for the loser engine; 1.5-0.5 and 2-0 (reciprocally 0.5-1.5 and 0-2) are treated exactly the same, as a win or a lose, regardless of the winning/losing margin of the two games.

Then, get Ordo ratings with the rescored outcomes.

Please imagine the following mini-match consisting on two pairs of games with the following results:

Code: Select all

A-B   B-A
1-0   1-0
1-0   ½-½

The standard match result is 2.5-1.5 in favour of A, then get ratings from there.

The rescored match result is 1.5-0.5 in favour of A, because the first pair of games was drawn (½ point for each engine) and the second pair of games was won by A (1 point for A) and lost by B (0 points for B). Then get ratings from there.

An interesting exercise would be to compare standard and rescored results, to find possible relations between them.

Regards from Spain.

Ajedrecista.

Ajedrecista · Post by **Ajedrecista** » Fri Mar 01, 2024 10:56 pm

Hello:

Ajedrecista wrote: ↑Fri Mar 01, 2024 10:03 pm[...]

An interesting exercise would be to compare standard and rescored results, to find possible relations between them.

[...]

I did some math:

Code: Select all

            Normal   Rescored    Number      (1/n)
A-B   B-A    A-B       A-B       pairs    Probability
-----------------------------------------------------
1-0   1-0   1 -1       ½-½        nAB         pAB
1-0   ½-½   1½- ½      1-0        nAd         pAd
1-0   0-1   2 -0       1-0        nAA         pAA
½-½   1-0    ½-1½      0-1        ndB         pdB
½-½   ½-½   1 -1       ½-½        ndd         pdd
½-½   0-1   1½- ½      1-0        ndA         pdA
0-1   1-0   0 -2       0-1        nBB         pBB
0-1   ½-½    ½-1½      0-1        nBd         pBd
0-1   0-1   1 -1       ½-½        nBA         pBA
-----------------------------------------------------
                                   n           1

Code: Select all

From A's POV:
Normal   score: µA = 1*pAA + 0.75*(pAd + pdA) + 0.5*(pAB + pdd + pBA) + 0.25*(pdB + pBd) + 0*pBB
Rescored score: mA = 1*(pAd + pAA + pdA) + 0.5*(pAB + pdd + pBA) + 0*(pdB + pBB + pBd)

mA - µA = 0.25*(pAd + pdA) - 0.25*(pdB + pBd)
mA - µA = (pAd - pdB + pdA - pBd)/4

------------

From B's POV:
Normal   score: µB = 1*pBB + 0.75*(pdB + pBd) + 0.5*(pAB + pdd + pBA) + 0.25*(pAd + pdA) + 0*pAA
Rescored score: mB = 1*(pdB + pBB + pBd) + 0.5*(pAB + pdd + pBA) + 0*(pAd + pAA + pdA)

mB - µB = 0.25*(pdB + pBd) - 0.25*(pAd + pdA)
mB - µB = (- pAd + pdB - pdA + pBd)/4

------------

mB - µB = -(mA - µA)

I hope no typos. The difference between normal and rescored scores (and ratings) are the pairs of games where only one game is drawn, as expected, but it is quantified now.

Regards from Spain.

Ajedrecista.

pohl4711 · Post by **pohl4711** » Sat Mar 02, 2024 7:19 am

Ajedrecista wrote: ↑Fri Mar 01, 2024 10:03 pm Hello Ernest:

ernest wrote: ↑Fri Mar 01, 2024 8:22 pm[...]
But what is gamepair statistics ?
Do you simply eliminate the 1-1 result pairs (be it 1-0 1-0 or 1/2-1/2 1/2-1/2) ?
UHO 2024 openings can be downloaded from Stefan's site, as well as the so called GamePairs rescoring tool. I have understood the following from the source code of the tool, but please Stefan correct me if I am wrong:

My notation for each pair of games of a given opening line is (A vs B) and (B vs A). For example, if I write (1-0 and 1-0), I mean that A won the first game being white and B won the second game being white.

Each pair of games of a given opening line is 'compressed' or converted (rescored) into one game.

Neither pair of games is discarded.

For a given opening line, when the outcome of a game is repeated [(1-0 and 1-0) or (½-½ and ½-½) or (0-1 and 0-1)], the pair is rescored as a draw for both engines because both engines finished 1-1.

For a given opening line, when the outcome of a game is not repeated [(1-0 and ½-½) or (1-0 and 0-1) or (½-½ and 1-0) or (½-½ and 0-1) or (0-1 and 1-0) or (0-1 and ½-½)], the pair is rescored as a win for the winning engine and as a lose for the loser engine; 1.5-0.5 and 2-0 (reciprocally 0.5-1.5 and 0-2) are treated exactly the same, as a win or a lose, regardless of the winning/losing margin of the two games.

Then, get Ordo ratings with the rescored outcomes.
Please imagine the following mini-match consisting on two pairs of games with the following results:
Code: Select all
A-B   B-A
1-0   1-0
1-0   ½-½
The standard match result is 2.5-1.5 in favour of A, then get ratings from there.

The rescored match result is 1.5-0.5 in favour of A, because the first pair of games was drawn (½ point for each engine) and the second pair of games was won by A (1 point for A) and lost by B (0 points for B). Then get ratings from there.
An interesting exercise would be to compare standard and rescored results, to find possible relations between them.

Regards from Spain.

Ajedrecista.

Correct.

"An interesting exercise would be to compare standard and rescored results, to find possible relations between them."
On my main-site, there are both ratinglists: first the normal ratinglist, followed by the gamepair-ratinglist... There you can compare all results. And, of course, you can compare the single results of each engine and each engine head-to-head, too:
https://www.sp-cc.de/files/programs.dat
https://www.sp-cc.de/files/uho_top15_gamepair.txt

Ajedrecista · Post by **Ajedrecista** » Sat Mar 02, 2024 1:24 pm

Hello Stefan:

pohl4711 wrote: ↑Sat Mar 02, 2024 7:19 amCorrect.

"An interesting exercise would be to compare standard and rescored results, to find possible relations between them."
On my main-site, there are both ratinglists: first the normal ratinglist, followed by the gamepair-ratinglist... There you can compare all results. And, of course, you can compare the single results of each engine and each engine head-to-head, too:
https://www.sp-cc.de/files/programs.dat
https://www.sp-cc.de/files/uho_top15_gamepair.txt

Thank you for the confirmation and the links. I have taken the extreme result of SF 16.1 vs. Rebel EAS. I hope no typos:

Code: Select all

Normal results:
https://www.sp-cc.de/files/programs.dat

1 Stockfish 16.1 240224
Rebel EAS avx2           : 1000 (+557,=440,-  3), 77.7 %

------------

Rescored results:
https://www.sp-cc.de/files/uho_top15_gamepair.txt

 1) Stockfish 16.1 240224
    vs.                         :  games (    +,   =,   -),   (%) :   Diff,   SD, CFS (%)
    Rebel EAS avx2              :    500 (  491,   9,   0),  99.1 :   +728,    7,  100.0

My math of yesterday seems correct because I was able to construct some stats:

Code: Select all

S = SF ; d = draw ; R = Rebel
n = 500

mS = 0.991 = (491 + 0.5*9)/500
µS = 0.777 = (557 + 0.5*440)/1000

mS - µS = 0.214 = (pSd - pdR + pdS - pRd)/4  // (Eq. 1)
pSd - pdR + pdS - pRd) = 0.856

Rebel did not won any pair:
pdR + pRR + pRd = 0  // Each value is 0 because is limited to the closed interval [0, 1].
// 3 values out of 9 are known at this point.

Rebel drew 9 pairs out of 500:
pSR + pdd + pRS = 9/500 = 0.018

Rebel won 3 games out of 1000 games (500 pairs):
pSR + pRR + pRS = 3/500 = 0.006  // We know that pRR = 0
pSR + pRS = 0.006  // 2 values are correlated at this point.
pdd = 0.018 - 0.006 = 0.012  // 4 values out of 9 are known at this point.

(Eq. 1) knowing that pdR = 0 and pRd = 0
pSd + pdS = 0.856
// 4 values are correlated (2 and 2) at this point.

The sum of all p is 1. We can compute the last value:
pSS = 1 - 0.856 - 0.018 = 0.126

Code: Select all

SUMMARY:

pSR + pRS = 0.006 → Draw (Pair: 1 win for SF and 1 win for Rebel).
pSd + pdS = 0.856 → SF wins (Pair: 1 win for SF and one draw).
pSS       = 0.126 → SF wins (Pair: 2 wins for SF).
pdR + pRd = 0     → Rebel wins (Pair: 1 draw and 1 win for Rebel).
pdd       = 0.012 → Draw (Pair: 2 draws).
pRR       = 0     → Rebel wins (Pair: 2 wins for Rebel).

SUM       = 1

Code: Select all

Multiplying by n = 500 to get the pairs:

SF won  2.0-0.0:  63 pairs = n*pSS                 = 500*0.126
SF won  1.5-0.5: 428 pairs = n*(pSd + pdS)         = 500*0.856
SF drew 1.0-1.0:   9 pairs = n*[(pSR + pRS) + pdd] = 500*(0.006 + 0.012)
SF lost 0.5-1.5:   0 pairs = n*(pdR + pRd)         = 500*0
SF lost 0.0-2.0:   0 pairs = n*pRR                 = 500*0

Code: Select all

Comprobation of games:

SF won  557 games = n*[   2*pSS    + (pSd + pdS) + (pSR + pRS)] = 500*(0.252 + 0.856 + 0.006)
SF drew 440 games = n*[(pSd + pdS) +    2*pdd    + (pdR + pRd)] = 500*(0.856 + 0.024 + 0    )
SF lost   3 games = n*[(pSR + pRS) + (pdR + pRd) +    2*pRR   ] = 500*(0.006 + 0     + 0    )

I rearranged the last data in a kind of a symmetric matrix for better visualization. Further rearrangements are possible if we want to group certain results into certain patterns.

Regards from Spain.

Ajedrecista.

SPCC: Testrun of Stockfish 16.1 finished

SPCC: Testrun of Stockfish 16.1 finished

Re: SPCC: Testrun of Stockfish 16.1 finished

Re: SPCC: Testrun of Stockfish 16.1 finished

Re: SPCC: Testrun of Stockfish 16.1 finished

Re: SPCC: Testrun of Stockfish 16.1 finished

Re: SPCC: Testrun of Stockfish 16.1 finished

Re: SPCC: Testrun of Stockfish 16.1 finished.

Re: SPCC: Testrun of Stockfish 16.1 finished.

Re: SPCC: Testrun of Stockfish 16.1 finished.

Re: SPCC: Testrun of Stockfish 16.1 finished.