A new way for calculating playing strength with 0:1 games + move average!

Frank Quisinsky · Post by **Frank Quisinsky** » Sat Feb 06, 2021 8:13 am

Hi there,

at first have a look here:
FCP Tourney-2021 results after 400 out of 2000 games (41 engines).

   # Player                           :      Elo  Games  Score%   won  draw  lost  Points  Draw%   Error   OppAvg   OppE  MoveAvg
  -------------------------------------------------------------------------------------------------------------------------------
  01. Stockfish 110121 NN BMI2 x64    :  3489.68    400    88.6   309    91     0   354.5   22.8   36.58  3096.91  23.39     78.1
  02. Dragon by Komodo NN AVX2 x64    :  3445.46    400    85.9   288   111     1   343.5   27.8   34.58  3098.02  23.44     81.6

Have a look on the number of lost games from Stockfish & Dragon by Komodo!
The lost game from Dragon by Komodo vs. Stockfish (not a wonder)!

Will give us the information:
2 super engines are available ... we all know that but what can we do with it?

That's an interesting point for all the others!
We can calculate playing strength with move average from lost games in combination with quantity of draw games?!

Should be logical that the higher the move average vs. Komodo and Stockfish (of course ... UNTIL MATE) that higher is the playing strength.

For a "rating list of engines" important = quantity of draw games and move average.

What I missed is a mathematic formula!

Example:
100 games, 50 games vs. Stockfish and 50 games vs. Dragon = 100 games

List of engines:

Code: Select all

01. Engine X               18x draw, move average = 88,5 for 0:1 games
02. Engine Y               16x draw, move average = 87,1 for 0:1 games
25. Engine A               02x draw, move average = 61,3 for 0:1 games

And I am quite sure that the place for engine A, engine B is the same as in a computer rating list with the different that lesser games are necessary. To test engines, 1000 Elo lesser as Stockfish vs. Stockfish made sense with that idea! After my quickly calculation are ~ 500 games vs. Stockfish and Komodo enough for a very exactly rating.

Interesting can be to do that with white and with black games separate or to do that with white and black games all in one (small example I added).

This idea is a typical result from a dream I had in the night.

Absolute clear:
I should dream from other things, maybe I need a chess break.

Best
Frank

Frank Quisinsky · Post by **Frank Quisinsky** » Sat Feb 06, 2021 8:36 am

The questions are:

1. A chess break for Quisinsky badly need?

or ...

2. A strong idea?

Best
Frank

PS: I forgot ...
Can be do with 500 positions for black and white.
From each ECO code the best postion (can be find out with FEOBOS rating system, each ECO position have an own rating in database) Klaus Wlotzka developed for some years. Klaus Wlotzka ... the person "dances with wolves" or better ... "dances with Excel".

Frank Quisinsky · Post by **Frank Quisinsky** » Sat Feb 06, 2021 9:09 am

With a big database of such a work ...
Stockfish & Dragon by Komodo vs. all the others ...

We can ...

1. Find out faster interesting mate in x positions!
2. We can make Stockfish and Dragon by Komodo stronger, shout be clear if we create the right statistics about the draw games!
3. Engine programmers have more material for improved the own engines (more blunders in a short time of testing).
4. Interesting for analyzes to game phases! Which engines hold in most of cases the mid-game (example).

And so one ...
With a big database (many engines included) a lot of interesting statistics are thinkable!

I believe we like it to do such things in many years (Stockfish is 200 Elo stronger as today).
But with the Elo strength Stockfish and Dragon by Komodo have we can start to do it in the year 2021.

Enough ...
Material for thinking a bit on weakend!

Best
and ... have a nice weakend!
Frank

Only a dream, I am quite sure I have more fun to test Wasp vs. all the others!
But the idea is maybe interesting!

Ferdy · Post by **Ferdy** » Sun Feb 07, 2021 1:11 am

Frank Quisinsky wrote: ↑Sat Feb 06, 2021 8:13 am Hi there,

at first have a look here:
FCP Tourney-2021 results after 400 out of 2000 games (41 engines).
Code: Select all
   # Player                           :      Elo  Games  Score%   won  draw  lost  Points  Draw%   Error   OppAvg   OppE  MoveAvg
  -------------------------------------------------------------------------------------------------------------------------------
  01. Stockfish 110121 NN BMI2 x64    :  3489.68    400    88.6   309    91     0   354.5   22.8   36.58  3096.91  23.39     78.1
  02. Dragon by Komodo NN AVX2 x64    :  3445.46    400    85.9   288   111     1   343.5   27.8   34.58  3098.02  23.44     81.6
Have a look on the number of lost games from Stockfish & Dragon by Komodo!
The lost game from Dragon by Komodo vs. Stockfish (not a wonder)!

Will give us the information:
2 super engines are available ... we all know that but what can we do with it?

That's an interesting point for all the others!
We can calculate playing strength with move average from lost games in combination with quantity of draw games?!

Should be logical that the higher the move average vs. Komodo and Stockfish (of course ... UNTIL MATE) that higher is the playing strength.

For a "rating list of engines" important = quantity of draw games and move average.

What I missed is a mathematic formula!

I took the fcp_qualify_2021_v1.08.01 pgn and run multiple linear regression with sklearn to approximate the formula using Lc0 0.26.3 x64 as the top engine. Other engines' winrate, lossrate etc are against Lc0.

Metrics
mse: 3036.981235018092
mae: 39.52275478335605
r2_score: 0.7531608817101382

intercept: 2665
coefficients: [ 59.76899893 -201.23641212 141.46741318 3.07146156]

Independent variables
winrate weight: 59.76900
lossrate weight: -201.23641
drawrate weight: 141.46741
avemoveloss weight: 3.07146

winrate = wins/games vs Lc0
lossrate = loss/games vs Lc0
drawrate = draw/games vs Lc0
avemoveloss = average number of moves from a losing game vs Lc0

Formula:

Code: Select all

model_rating = 2665 + winrate * 59.77 + lossrate * -201.24 + drawrate * 141.47 + avemoveloss * 3.07146

Table with rating based from the formula. The rating is from the overall rating indicated in ordo file.

Code: Select all

+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| name                         |   winrate |   lossrate |   drawrate |   avemoveloss |   rating |   model_rating |
+==============================+===========+============+============+===============+==========+================+
| Amoeba 3.2 x64               |     0     |      0.5   |      0.5   |            78 |     2877 |           2874 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Asymptote 0.8 Broadwell x64  |     0     |      0.75  |      0.25  |            98 |     2771 |           2850 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Atlas 3.91 POPCNT x64        |     0     |      0.875 |      0.125 |            87 |     2748 |           2773 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Bagatur 2.2 JAVA x64         |     0.125 |      0.625 |      0.25  |            86 |     2874 |           2846 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Cheese 2.2 POPCNT x64        |     0     |      0.875 |      0.125 |            80 |     2675 |           2752 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Cheng 4.40 dev AVX2 x64      |     0     |      0.5   |      0.5   |            84 |     2859 |           2893 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Chess22k 1.14 JAVA x64       |     0     |      0.75  |      0.25  |           114 |     2992 |           2899 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| ChessBrainVB 3.74 TCEC w32   |     0.125 |      0.625 |      0.25  |            90 |     2942 |           2858 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Crafty 25.6 x64              |     0.125 |      0.5   |      0.375 |            76 |     2786 |           2858 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Dirty Cucumber x64           |     0     |      0.875 |      0.125 |            96 |     2763 |           2801 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| FabChess 1.16 BMI2 x64       |     0     |      0.625 |      0.375 |            99 |     2865 |           2896 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Francesca 0.29a x64          |     0     |      0.875 |      0.125 |            87 |     2781 |           2773 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Gaviota 1.0 AVX x64          |     0     |      0.875 |      0.125 |            73 |     2743 |           2730 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Godel 7.0 SSE42 x64          |     0     |      1     |      0     |            92 |     2840 |           2746 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Gogobello 2.2 BMI2 x64       |     0     |      0.75  |      0.25  |            83 |     2763 |           2804 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Hakkapeliitta TCEC v2 x64    |     0.125 |      0.75  |      0.125 |           101 |     2846 |           2849 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Halogen 8.1 PEXT-AVX2 x64    |     0     |      0.75  |      0.25  |            96 |     2880 |           2844 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Hiarcs 14 WCSC w32           |     0.125 |      0.625 |      0.25  |            84 |     2804 |           2839 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Invictus r305 PEXT x64       |     0     |      0.75  |      0.25  |            78 |     2707 |           2788 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Junior 13.3.00 x64           |     0     |      0.75  |      0.25  |            89 |     2781 |           2822 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Koivisto 4.0 POPCNT AVX x64  |     0     |      0.625 |      0.375 |            96 |     2963 |           2887 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Marvin 4.0.1 POPCNT x64      |     0     |      0.75  |      0.25  |            87 |     2928 |           2816 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Monolith 2.01 PEXT x64       |     0     |      0.625 |      0.375 |            94 |     2857 |           2880 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Mr Bob 0.9.0 POPCNT x64      |     0     |      1     |      0     |            83 |     2677 |           2718 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Naum 4.6 x64                 |     0     |      0.5   |      0.5   |            81 |     2871 |           2883 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Nirvanachess 2.4 POPCNT x64  |     0     |      0.625 |      0.375 |            77 |     2962 |           2828 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Orion 0.8 NN POP AVX FMA x64 |     0     |      0.375 |      0.625 |            86 |     2982 |           2941 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Pirarucu 3.3.5 JAVA x64      |     0     |      0.375 |      0.625 |            87 |     2916 |           2945 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Protector 1.9.0 x64          |     0     |      0.375 |      0.625 |            91 |     2958 |           2957 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Quazar 0.4 x64               |     0     |      0.875 |      0.125 |            90 |     2738 |           2782 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Rodent IV 0.30 BMI2 x64      |     0     |      0.75  |      0.25  |            83 |     2851 |           2804 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Seer 1.2.1 NNUE Skylake x64  |     0     |      0.375 |      0.625 |           107 |     2952 |           3006 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| SmarThink 1.98 AVX2 x64      |     0     |      0.875 |      0.125 |           101 |     2874 |           2816 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Spike 1.4 Leiden w32         |     0     |      0.875 |      0.125 |            90 |     2746 |           2782 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Stash 24.0 BMI2 x64          |     0     |      0.875 |      0.125 |            87 |     2736 |           2773 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| TheBaron 3.44.1 x64          |     0     |      1     |      0     |            83 |     2691 |           2718 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Topple 0.7.5 Skylake x64     |     0     |      0.625 |      0.375 |            86 |     2886 |           2856 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Tucano 9.00 x64              |     0     |      0.625 |      0.375 |           100 |     2851 |           2899 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Wasp 4.08 dev Modern x64     |     0     |      0.5   |      0.5   |            91 |     3050 |           2914 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+
| Weiss 1.2 PEXT x64           |     0     |      0.625 |      0.375 |           111 |     2847 |           2933 |
+------------------------------+-----------+------------+------------+---------------+----------+----------------+

References:
mse : mean squared error
mae : mean absolute error
r2_score : coefficient of determination

Now you can run other engines of similar strength against Lc0 get its stats and get the estimated rating using the above formula. The fit could possibly be improved if there are more games against Lc0. In this particular case each engine has only 8 games against Lc0.

Frank Quisinsky · Post by **Frank Quisinsky** » Sun Feb 07, 2021 7:40 am

Good morning Ferdinand,

at first ...
Thank you, looks interesting!

With Klaus I will try out your formula with the games from Stockfish and Dragon vs. the others from FCP Toruney-2021.

Later ...

Best
Frank

PS:
To collect the games in a new databases and using your tool PNGS will give us all what we need for it.

Frank Quisinsky · Post by **Frank Quisinsky** » Sun Feb 07, 2021 8:52 am

Hi Ferdinand,

For FCP Toruney-2021 I have not enough results.
After round 10 out of 50 all engines played 20 games vs. Stockfish 110121 and Dragon by Komodo only!

Principle:
Engines strong in fast won games / king attacks are strong in king safety with many pieces on board too.
Not the speculative king attacking engines like Fizbo!

Tendency 1:
With more games SlowChess & Pedone & Critter & Chiron should go higher (strong in king attacks)!
Fizbo is strong in king attacks but the style is speculative, can also produced many fast lost games.

Tendency 2:
Stronger engines are on better ranking positions after max. possible 20 games.
But Texel on place 2 ???

I think the idea is interesting.
After 100 games the list is much clearer and your formula can give a good ranking!

I will try out the formula with FCP Tourney-2020 results today!

Best
Frank

Here the list of 0:1 games (without draw games), sorted by move-average!
Higher = better ... Booot with 86 moves, average 0:1 games (until mate) vs. Stockfish / Dragon by Komodo

Code: Select all

MOVE AVERAGE AND RESULTS:

 1. Booot 6.4 POPCNT x64           (0+,   0=,   12-)  0.0%, all loses         86       12       12     100.0%
 2. Texel 1.08a13 BMI2 x64         (0+,   0=,   18-)  0.0%, all loses         85       18       18     100.0%
 3. Ethereal 12.75 PEXT x64        (0+,   0=,    9-)  0.0%, all loses         82        9        9     100.0%
 4. Xiphos 0.6 BMI2 x64            (0+,   0=,   13-)  0.0%, all loses         82       13       13     100.0%
 5. RubiChess 1.9 NN BMI2 x64      (0+,   0=,   14-)  0.0%, all loses         81       14       14     100.0%
 6. Schooner 2.2 XB SSE x64        (0+,   0=,   13-)  0.0%, all loses         79       13       13     100.0%
 7. Andscacs 0.95.123 x64          (0+,   0=,   16-)  0.0%, all loses         79       16       16     100.0%
 8. chess22k 1.14 JAVA x64         (0+,   0=,   16-)  0.0%, all loses         78       16       16     100.0%
 9. Wasp 4.50 Modern x64           (0+,   0=,   14-)  0.0%, all loses         78       14       14     100.0%
 9. Seer 1.2.1 NN Skylake x64      (0+,   0=,   14-)  0.0%, all loses         78       14       14     100.0%
11. Igel 2.9.0 NN BMI2 x64         (0+,   0=,   13-)  0.0%, all loses         76       13       13     100.0%
12. GullChess 3.0 Sy BMI2 x64      (0+,   0=,   15-)  0.0%, all loses         76       15       15     100.0%
13. SlowChess 2.5 NN AVX2 x64      (0+,   0=,   12-)  0.0%, all loses         75       12       12     100.0%
13. Nemorino 6.04 NN PEXT x64      (0+,   0=,   12-)  0.0%, all loses         75       12       12     100.0%
15. rofChade 2.3 BMI2 x64          (0+,   0=,   14-)  0.0%, all loses         75       14       14     100.0%
15. Lc0 0.26.3 NN CPU x64          (0+,   0=,   14-)  0.0%, all loses         75       14       14     100.0%
17. Pedone 3.0 NN BMI2 x64         (0+,   0=,   15-)  0.0%, all loses         75       15       15     100.0%
17. Fritz 17 (Ginkgo) x64          (0+,   0=,   15-)  0.0%, all loses         75       15       15     100.0%
19. Laser 1.7 BMI2 x64             (0+,   0=,   16-)  0.0%, all loses         75       16       16     100.0%
20. Arasan 22.2 BMI2 x64           (0+,   0=,   17-)  0.0%, all loses         75       17       17     100.0%
21. Marvin 5.0.0 NN AVX2 x64       (0+,   0=,   19-)  0.0%, all loses         75       19       19     100.0%
22. Defenchess 2.3 dev BMI2 x64    (0+,   0=,   12-)  0.0%, all loses         74       12       12     100.0%
23. iCE 4.0 v853 Modern x64        (0+,   0=,   16-)  0.0%, all loses         74       16       16     100.0%
24. Fizbo 2.0 BMI2 x64             (0+,   0=,   18-)  0.0%, all loses         74       18       18     100.0%
25. Demolito 2020-12-24 PEXT x64   (0+,   0=,   16-)  0.0%, all loses         73       16       16     100.0%
26. Nirvanachess 2.5 POPCNT x64    (0+,   0=,   19-)  0.0%, all loses         73       19       19     100.0%
27. pirarucu 3.3.5 JAVA x64        (0+,   0=,   16-)  0.0%, all loses         72       16       16     100.0%
28. Combusken 1.4.0 AMD x64        (0+,   0=,   18-)  0.0%, all loses         72       18       18     100.0%
29. Critter 1.6a x64               (0+,   0=,   20-)  0.0%, all loses         72       20       20     100.0%
30. Chiron 4 x64                   (0+,   0=,   17-)  0.0%, all loses         71       17       17     100.0%
31. Shredder 13 POPCNT x64         (0+,   0=,   12-)  0.0%, all loses         70       12       12     100.0%
32. Halogen 9 NN PEXT x64          (0+,   0=,   13-)  0.0%, all loses         69       13       13     100.0%
33. Protector 1.9.0 x64            (0+,   0=,   18-)  0.0%, all loses         69       18       18     100.0%
34. Hannibal 1.7 x64               (0+,   0=,   19-)  0.0%, all loses         68       19       19     100.0%
35. Koivisto 4.19 AVX x64          (0+,   0=,   17-)  0.0%, all loses         67       17       17     100.0%
36. Topple 0.8.0 Modern x64        (0+,   0=,   18-)  0.0%, all loses         67       18       18     100.0%
37. Vajolet2 2.8 BMI2 x64          (0+,   0=,   16-)  0.0%, all loses         65       16       16     100.0%
38. Winter 0.9 NN BMI2 x64         (0+,   0=,   16-)  0.0%, all loses         64       16       16     100.0%
39. Orion 0.8 NN FMA x64           (0+,   0=,   14-)  0.0%, all loses         63       14       14     100.0%

Ferdy · Post by **Ferdy** » Sun Feb 07, 2021 12:17 pm

Frank Quisinsky wrote: ↑Sun Feb 07, 2021 7:40 am Good morning Ferdinand,

at first ...
Thank you, looks interesting!

With Klaus I will try out your formula with the games from Stockfish and Dragon vs. the others from FCP Toruney-2021.

That formula is intended only for Lc0.

The formula could be different when using Stockfish and Dragon data. That data has to be generated from the pgn then build a formula from it. Another thing, it is also desirable that the opening of engineA vs Stockfish is similar to engineB vs Stockfish, same with other engines vs Stockfish, so that the average number of moves could be compared equally, the longer it loses the better is the engine's resistance and accuracy of calculating continuation lines. For example, in round 1 all engines vs Stockfish will use ope1, in round 2 they will use ope2 and so on.

Other independent variables can be added and experimented like, number of moves to loss, number of moves to win, number of moves to draw provided the opening is the same.

A new way for calculating playing strength with 0:1 games + move average!

A new way for calculating playing strength with 0:1 games + move average!

Re: A new way for calculating playing strength with 0:1 games + move average!

Re: A new way for calculating playing strength with 0:1 games + move average!

Re: A new way for calculating playing strength with 0:1 games + move average!

Re: A new way for calculating playing strength with 0:1 games + move average!

Re: A new way for calculating playing strength with 0:1 games + move average!

Re: A new way for calculating playing strength with 0:1 games + move average!