Experimental testrun of Revenge 1.0, testing my EAS-tool

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
pohl4711
Posts: 2808
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Experimental testrun of Revenge 1.0, testing my EAS-tool

Post by pohl4711 »

Experimental testrun of Revenge 1.0 for my UHO-Top15 ratinglist, in order to test, if my EAS-tool works as I predicted.

Author of Willow 4.0 engine on talkchess said this about my EAS-tool:
"Also, the fact that Stockfish and Torch are at the top by a country mile suggests that a large part of what EAS is measuring is engines taking advantage of tactical mistakes by other engines rather than actively seeking out an aggressive play style."

So, here the proof, that this is completely wrong and my EAS-tool works as I always predicted:

https://www.sp-cc.de/experiments.htm

QED...
chessica
Posts: 962
Joined: Thu Aug 11, 2022 11:30 pm
Full name: Esmeralda Pinto

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Post by chessica »

pohl4711 wrote: Thu Feb 15, 2024 12:46 pm
https://www.sp-cc.de/experiments.htm
But, but no one reads this... :shock: :(
Can't you summarize what's important in a short summary? :?:
User avatar
pohl4711
Posts: 2808
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Post by pohl4711 »

I did a testrun of Revenge 1.0 (the strongest really aggressive playing engine besides Stockfish and Torch, but lightyears weaker than Stockfish and Torch, of course):
15000 games versus the Top15-engines of my UHO-Top15 ratinglist. Of course, Revenge 1.0 is way too weak, compared to these top engines. So, the score of Revenge 1.0 was only 18.3% (-141 Elo weaker, than the weakest engine in my UHO-Top15 ratinglist (RofChade 3.1) and Revenge 1.0 won only 465 games out of 15000 (!!!)

Code: Select all

     Program                    Elo    +    -  Games    Score   Av.Op. Draws

   1 Stockfish 16 230630      : 3821    4    4 15000    73.8%   3628   45.8%
   2 Torch 1 popavx2          : 3783    4    4 15000    69.3%   3631   46.3%
   3 KomodoDragon 3.3 avx2    : 3749    4    4 15000    65.0%   3633   47.0%
   4 Berserk 12 avx2          : 3725    4    4 15000    61.7%   3635   47.1%
   5 RubiChess 240112 avx2    : 3667    4    4 15000    53.7%   3639   48.4%
   6 Ethereal 14.25 nnue      : 3666    4    4 15000    53.5%   3639   49.2%
   7 Caissa 1.16 avx2         : 3665    4    4 15000    53.4%   3639   49.1%
   8 Obsidian 10.0 avx2       : 3653    4    4 15000    51.6%   3640   49.1%
   9 Seer 2.8.0 avx2          : 3621    4    4 15000    47.1%   3642   49.1%
  10 CSTal 2.0 avx2           : 3604    4    4 15000    44.7%   3643   49.5%
  11 Clover 6.1 avx2          : 3596    4    4 15000    43.6%   3643   49.7%
  12 Koivisto 9.2 avx2        : 3589    4    4 15000    42.6%   3644   48.3%
  13 Alexandria 6.0 avx2      : 3584    4    4 15000    41.9%   3644   48.2%
  14 Rebel EAS avx2           : 3573    4    4 15000    40.4%   3645   48.5%
  15 RofChade 3.1 avx2        : 3566    4    4 15000    39.4%   3645   47.1%
  16 Revenge 1.0 avx2         : 3385    5    5 15000    18.3%   3657   30.3%


Games        : 120000 (finished)

White Wins   : 57796 (48.2 %)
Black Wins   : 5741 (4.8 %)
Draws        : 56463 (47.1 %)
But now, look at the EAS-ratinglist, calculated out of these ratinglist games (120000 games):

Code: Select all

                                 bad  avg.win 
Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    197919  31.18%  29.46%  17.09%   71   Revenge 1.0 avx2  
   2    184362  20.06%  23.61%  09.13%   71   Stockfish 16 230630  
   3    146678  15.17%  27.19%  14.12%   69   Torch 1 popavx2  
   4    122333  15.14%  21.14%  14.53%   72   KomodoDragon 3.3 avx2  
   5    101137  14.39%  17.85%  16.51%   74   RubiChess 240112 avx2  
   6     88201  12.09%  09.84%  16.04%   80   Obsidian 10.0 avx2  
   7     82332  15.98%  10.17%  19.46%   83   Rebel EAS avx2  
   8     81081  10.20%  12.17%  17.87%   80   CSTal 2.0 avx2  
   9     75262  09.37%  12.82%  19.57%   78   Clover 6.1 avx2  
  10     72552  13.23%  08.90%  17.29%   85   Ethereal 14.25 nnue  
  11     69024  10.48%  12.81%  21.57%   76   Caissa 1.16 avx2  
  12     68697  10.94%  09.81%  19.23%   81   Alexandria 6.0 avx2  
  13     66430  09.19%  09.78%  18.59%   80   Berserk 12 avx2  
  14     63224  08.24%  14.94%  23.39%   75   Seer 2.8.0 avx2  
  15     51774  08.79%  13.71%  24.52%   77   RofChade 3.1 avx2  
  16     50559  06.28%  08.08%  21.43%   84   Koivisto 9.2 avx2  
-------------------------------------------------------------------
*** Average length of all won games:     76 moves
So, the clearly (very clearly!) weakest engine is on rank 1 in the EAS-ratinglist ! How awesome is that?
Additionally, I added the Revenge 1.0 games to my full UHO-Top15 ratinglist, so you can download the games as a part of the gamebase of the full UHO-Top15 ratinglist.
User avatar
pohl4711
Posts: 2808
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Post by pohl4711 »

Code: Select all

A: Most high-value sacrifices (3+ pawnunits): [1]:05.38% Revenge 1.0 avx2   
                                              [2]:03.61% Stockfish 16 230630   
                                              [3]:02.31% Rebel EAS avx2   
                                              [4]:02.25% Torch 1 popavx2   
                                              [5]:01.78% Obsidian 10.0 avx2 
 
B: Most sacrifices overall                  : [1]:31.18% Revenge 1.0 avx2   
                                              [2]:20.06% Stockfish 16 230630   
                                              [3]:15.98% Rebel EAS avx2   
                                              [4]:15.17% Torch 1 popavx2   
                                              [5]:15.14% KomodoDragon 3.3 avx2  

C: Very short wins (45 moves or less)       : [1]:04.73% Revenge 1.0 avx2   
                                              [2]:02.85% Stockfish 16 230630   
                                              [3]:01.95% Torch 1 popavx2   
                                              [4]:01.87% KomodoDragon 3.3 avx2   
                                              [5]:01.15% Rebel EAS avx2  

D: Most short wins overall                  : [1]:29.46% Revenge 1.0 avx2   
                                              [2]:27.19% Torch 1 popavx2   
                                              [3]:23.61% Stockfish 16 230630   
                                              [4]:21.14% KomodoDragon 3.3 avx2   
                                              [5]:17.85% RubiChess 240112 avx2  

E: Average length of all won games          : [1]:069 Torch 1 popavx2   
                                              [2]:071 Revenge 1.0 avx2   
                                              [3]:071 Stockfish 16 230630   
                                              [4]:072 KomodoDragon 3.3 avx2   
                                              [5]:074 RubiChess 240112 avx2  

chessica
Posts: 962
Joined: Thu Aug 11, 2022 11:30 pm
Full name: Esmeralda Pinto

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Post by chessica »

Oh, thank you very much for the explanations. I wouldn't have expected the results like that. What are the reasons for this?
User avatar
pohl4711
Posts: 2808
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Post by pohl4711 »

chessica wrote: Thu Feb 15, 2024 1:28 pm Oh, thank you very much for the explanations. I wouldn't have expected the results like that. What are the reasons for this?
The reason is, the EAS Tool recognizes aggressive play, no matter, if an engine has a good scoring/high Elo or if the engine is much weaker than its opponents.
And this is exactly, what it should do... showing us the "character"/playing style of engines. Because in these days of superstrong engines, playing style becomes more and more important and interesting. And Elo progress becomes less important.

Only bad news here is, that the EAS-tool needs (recommended) at least 400-500 won games per engine. Because 4/5 of the EAS Tool points are coming from engine wins.
Witek
Posts: 87
Joined: Thu Oct 07, 2021 12:48 am
Location: Warsaw, Poland
Full name: Michal Witanowski

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Post by Witek »

Do you count sacrifices only from won games? Because a spectacular sac could be a spectacular blunder :)
Author of Caissa Chess Engine: https://github.com/Witek902/Caissa
User avatar
mclane
Posts: 18911
Joined: Thu Mar 09, 2006 6:40 pm
Location: US of Europe, germany
Full name: Thorsten Czub

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Post by mclane »

But as far as i understood the history, revenge 1.0 is an old engine and has been replaced with revenge3 that is not playing as aggressive anymore.
What seems like a fairy tale today may be reality tomorrow.
Here we have a fairy tale of the day after tomorrow....
User avatar
RubiChess
Posts: 643
Joined: Fri Mar 30, 2018 7:20 am
Full name: Andreas Matthies

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Post by RubiChess »

Witek wrote: Thu Feb 15, 2024 3:28 pm Do you count sacrifices only from won games? Because a spectacular sac could be a spectacular blunder :)
The very first part of EAS rules page https://www.sp-cc.de/files/eas_scoring_explanation.txt gives the answer:
1) Sacrifices: (percent*100) of the percent-values of the sacrifices (1-5+ pawnunits) calculated out
of the won games by the engine, only.
User avatar
pohl4711
Posts: 2808
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: Experimental testrun of Revenge 1.0, testing my EAS-tool

Post by pohl4711 »

Witek wrote: Thu Feb 15, 2024 3:28 pm Do you count sacrifices only from won games? Because a spectacular sac could be a spectacular blunder :)
Yes, won games only. Using pgn extract, you can not distinguish blunders from sacs in a lost game.