Experimental testrun of Revenge 1.0, testing my EAS-tool

pohl4711 · Post by **pohl4711** » Thu Feb 15, 2024 12:46 pm

Experimental testrun of Revenge 1.0 for my UHO-Top15 ratinglist, in order to test, if my EAS-tool works as I predicted.

Author of Willow 4.0 engine on talkchess said this about my EAS-tool:
"Also, the fact that Stockfish and Torch are at the top by a country mile suggests that a large part of what EAS is measuring is engines taking advantage of tactical mistakes by other engines rather than actively seeking out an aggressive play style."

So, here the proof, that this is completely wrong and my EAS-tool works as I always predicted:

https://www.sp-cc.de/experiments.htm

QED...

chessica · Post by **chessica** » Thu Feb 15, 2024 12:57 pm

pohl4711 wrote: ↑Thu Feb 15, 2024 12:46 pm
https://www.sp-cc.de/experiments.htm

But, but no one reads this...

Can't you summarize what's important in a short summary?

pohl4711 · Post by **pohl4711** » Thu Feb 15, 2024 1:15 pm

I did a testrun of Revenge 1.0 (the strongest really aggressive playing engine besides Stockfish and Torch, but lightyears weaker than Stockfish and Torch, of course):
15000 games versus the Top15-engines of my UHO-Top15 ratinglist. Of course, Revenge 1.0 is way too weak, compared to these top engines. So, the score of Revenge 1.0 was only 18.3% (-141 Elo weaker, than the weakest engine in my UHO-Top15 ratinglist (RofChade 3.1) and Revenge 1.0 won only 465 games out of 15000 (!!!)

Code: Select all

     Program                    Elo    +    -  Games    Score   Av.Op. Draws

   1 Stockfish 16 230630      : 3821    4    4 15000    73.8%   3628   45.8%
   2 Torch 1 popavx2          : 3783    4    4 15000    69.3%   3631   46.3%
   3 KomodoDragon 3.3 avx2    : 3749    4    4 15000    65.0%   3633   47.0%
   4 Berserk 12 avx2          : 3725    4    4 15000    61.7%   3635   47.1%
   5 RubiChess 240112 avx2    : 3667    4    4 15000    53.7%   3639   48.4%
   6 Ethereal 14.25 nnue      : 3666    4    4 15000    53.5%   3639   49.2%
   7 Caissa 1.16 avx2         : 3665    4    4 15000    53.4%   3639   49.1%
   8 Obsidian 10.0 avx2       : 3653    4    4 15000    51.6%   3640   49.1%
   9 Seer 2.8.0 avx2          : 3621    4    4 15000    47.1%   3642   49.1%
  10 CSTal 2.0 avx2           : 3604    4    4 15000    44.7%   3643   49.5%
  11 Clover 6.1 avx2          : 3596    4    4 15000    43.6%   3643   49.7%
  12 Koivisto 9.2 avx2        : 3589    4    4 15000    42.6%   3644   48.3%
  13 Alexandria 6.0 avx2      : 3584    4    4 15000    41.9%   3644   48.2%
  14 Rebel EAS avx2           : 3573    4    4 15000    40.4%   3645   48.5%
  15 RofChade 3.1 avx2        : 3566    4    4 15000    39.4%   3645   47.1%
  16 Revenge 1.0 avx2         : 3385    5    5 15000    18.3%   3657   30.3%


Games        : 120000 (finished)

White Wins   : 57796 (48.2 %)
Black Wins   : 5741 (4.8 %)
Draws        : 56463 (47.1 %)

But now, look at the EAS-ratinglist, calculated out of these ratinglist games (120000 games):

Code: Select all

                                 bad  avg.win 
Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    197919  31.18%  29.46%  17.09%   71   Revenge 1.0 avx2  
   2    184362  20.06%  23.61%  09.13%   71   Stockfish 16 230630  
   3    146678  15.17%  27.19%  14.12%   69   Torch 1 popavx2  
   4    122333  15.14%  21.14%  14.53%   72   KomodoDragon 3.3 avx2  
   5    101137  14.39%  17.85%  16.51%   74   RubiChess 240112 avx2  
   6     88201  12.09%  09.84%  16.04%   80   Obsidian 10.0 avx2  
   7     82332  15.98%  10.17%  19.46%   83   Rebel EAS avx2  
   8     81081  10.20%  12.17%  17.87%   80   CSTal 2.0 avx2  
   9     75262  09.37%  12.82%  19.57%   78   Clover 6.1 avx2  
  10     72552  13.23%  08.90%  17.29%   85   Ethereal 14.25 nnue  
  11     69024  10.48%  12.81%  21.57%   76   Caissa 1.16 avx2  
  12     68697  10.94%  09.81%  19.23%   81   Alexandria 6.0 avx2  
  13     66430  09.19%  09.78%  18.59%   80   Berserk 12 avx2  
  14     63224  08.24%  14.94%  23.39%   75   Seer 2.8.0 avx2  
  15     51774  08.79%  13.71%  24.52%   77   RofChade 3.1 avx2  
  16     50559  06.28%  08.08%  21.43%   84   Koivisto 9.2 avx2  
-------------------------------------------------------------------
*** Average length of all won games:     76 moves

So, the clearly (very clearly!) weakest engine is on rank 1 in the EAS-ratinglist ! How awesome is that?
Additionally, I added the Revenge 1.0 games to my full UHO-Top15 ratinglist, so you can download the games as a part of the gamebase of the full UHO-Top15 ratinglist.

pohl4711 · Post by **pohl4711** » Thu Feb 15, 2024 1:17 pm

Code: Select all

A: Most high-value sacrifices (3+ pawnunits): [1]:05.38% Revenge 1.0 avx2   
                                              [2]:03.61% Stockfish 16 230630   
                                              [3]:02.31% Rebel EAS avx2   
                                              [4]:02.25% Torch 1 popavx2   
                                              [5]:01.78% Obsidian 10.0 avx2 
 
B: Most sacrifices overall                  : [1]:31.18% Revenge 1.0 avx2   
                                              [2]:20.06% Stockfish 16 230630   
                                              [3]:15.98% Rebel EAS avx2   
                                              [4]:15.17% Torch 1 popavx2   
                                              [5]:15.14% KomodoDragon 3.3 avx2  

C: Very short wins (45 moves or less)       : [1]:04.73% Revenge 1.0 avx2   
                                              [2]:02.85% Stockfish 16 230630   
                                              [3]:01.95% Torch 1 popavx2   
                                              [4]:01.87% KomodoDragon 3.3 avx2   
                                              [5]:01.15% Rebel EAS avx2  

D: Most short wins overall                  : [1]:29.46% Revenge 1.0 avx2   
                                              [2]:27.19% Torch 1 popavx2   
                                              [3]:23.61% Stockfish 16 230630   
                                              [4]:21.14% KomodoDragon 3.3 avx2   
                                              [5]:17.85% RubiChess 240112 avx2  

E: Average length of all won games          : [1]:069 Torch 1 popavx2   
                                              [2]:071 Revenge 1.0 avx2   
                                              [3]:071 Stockfish 16 230630   
                                              [4]:072 KomodoDragon 3.3 avx2   
                                              [5]:074 RubiChess 240112 avx2

chessica · Post by **chessica** » Thu Feb 15, 2024 1:28 pm

Oh, thank you very much for the explanations. I wouldn't have expected the results like that. What are the reasons for this?

pohl4711 · Post by **pohl4711** » Thu Feb 15, 2024 1:39 pm

chessica wrote: ↑Thu Feb 15, 2024 1:28 pm Oh, thank you very much for the explanations. I wouldn't have expected the results like that. What are the reasons for this?

The reason is, the EAS Tool recognizes aggressive play, no matter, if an engine has a good scoring/high Elo or if the engine is much weaker than its opponents.
And this is exactly, what it should do... showing us the "character"/playing style of engines. Because in these days of superstrong engines, playing style becomes more and more important and interesting. And Elo progress becomes less important.

Only bad news here is, that the EAS-tool needs (recommended) at least 400-500 won games per engine. Because 4/5 of the EAS Tool points are coming from engine wins.

Witek · Post by **Witek** » Thu Feb 15, 2024 3:28 pm

Do you count sacrifices only from won games? Because a spectacular sac could be a spectacular blunder

mclane · Post by **mclane** » Thu Feb 15, 2024 4:11 pm

But as far as i understood the history, revenge 1.0 is an old engine and has been replaced with revenge3 that is not playing as aggressive anymore.

RubiChess · Post by **RubiChess** » Thu Feb 15, 2024 4:12 pm

Witek wrote: ↑Thu Feb 15, 2024 3:28 pm Do you count sacrifices only from won games? Because a spectacular sac could be a spectacular blunder

The very first part of EAS rules page https://www.sp-cc.de/files/eas_scoring_explanation.txt gives the answer:
1) Sacrifices: (percent*100) of the percent-values of the sacrifices (1-5+ pawnunits) calculated out
of the won games by the engine, only.

pohl4711 · Post by **pohl4711** » Thu Feb 15, 2024 4:13 pm

Witek wrote: ↑Thu Feb 15, 2024 3:28 pm Do you count sacrifices only from won games? Because a spectacular sac could be a spectacular blunder

Yes, won games only. Using pgn extract, you can not distinguish blunders from sacs in a lost game.

Experimental testrun of Revenge 1.0, testing my EAS-tool

Experimental testrun of Revenge 1.0, testing my EAS-tool

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool