SPCC: Testrun of Berserk 10 finished

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
pohl4711
Posts: 2732
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

SPCC: Testrun of Berserk 10 finished

Post by pohl4711 »

Ratinglist-testrun of Berserk 10 finished.


https://www.sp-cc.de

Also take a look at the EAS-Ratinglist, the world's first engine-ratinglist not measuring strength of engines but engines's style of play:
https://www.sp-cc.de/eas-ratinglist.htm

(Perhaps you have to clear your browsercache (press <STRG>+<SHIFT>+<DEL>) or reload the website))
dkappe
Posts: 1632
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

Re: SPCC: Testrun of Berserk 10 finished

Post by dkappe »

At 3657 up from Berserk 9 at 3644. Seems there is a bug in 10, however.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
jhonnold
Posts: 122
Joined: Wed Feb 17, 2021 3:16 pm
Full name: Jay Honnold

Re: SPCC: Testrun of Berserk 10 finished

Post by jhonnold »

With regards to your EAS rating list, have you checked that the EAS-Score doesn't vary heavily when playing engines against different pools? Berserk is always at the bottom of your EAS list and I'm curious if it has to do with the engine or the pool.

Code: Select all

6 Berserk 10 avx2        : 3657 9000 (+775,=6221,-2004), 43.2 %

Stockfish 221004 avx2    : 1000 (+  2,=566,-432), 28.5 %
Stockfish 220927 avx2    : 1000 (+  1,=582,-417), 29.2 %
KomodoDragon 3.1 MCTS    : 1000 (+ 40,=779,-181), 43.0 %
KomodoDragon 3.1 avx2    : 1000 (+  7,=682,-311), 34.8 %
Revenge 3.0 avx2         : 1000 (+147,=777,- 76), 53.5 %
Ethereal 13.75 nnue      : 1000 (+178,=774,- 48), 56.5 %
Koivisto 8.13 avx2       : 1000 (+120,=814,- 66), 52.7 %
Slow Chess 2.9 avx2      : 1000 (+278,=683,- 39), 62.0 %
Stockfish 15 220418      : 1000 (+  2,=564,-434), 28.4 %
Looking this pool, Berserk has a pretty poor score of 43%, mostly due to 4000 games being against engines 100+ Elo stronger than it. If you replaced the two stockfish dev vesrions with SF HCE and Rubichess, does Berserk's EAS-Score rise? If you played Berserk only vs a pool of engines where it is favored, does the EAS-Score rise?

If the EAS-Score varies heavily based on pools, then there is a flaw in your list in my opinion.
jhonnold
Posts: 122
Joined: Wed Feb 17, 2021 3:16 pm
Full name: Jay Honnold

Re: SPCC: Testrun of Berserk 10 finished

Post by jhonnold »

That being said, I do think Berserk is a pretty simple minded engine and deserves it's low ranking on your list, but I still believe you should check that your EAS-Score isn't impacted by unforseen factors.
User avatar
pohl4711
Posts: 2732
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SPCC: Testrun of Berserk 10 finished

Post by pohl4711 »

jhonnold wrote: Mon Oct 17, 2022 7:06 pm With regards to your EAS rating list, have you checked that the EAS-Score doesn't vary heavily when playing engines against different pools? Berserk is always at the bottom of your EAS list and I'm curious if it has to do with the engine or the pool.

Code: Select all

6 Berserk 10 avx2        : 3657 9000 (+775,=6221,-2004), 43.2 %

Stockfish 221004 avx2    : 1000 (+  2,=566,-432), 28.5 %
Stockfish 220927 avx2    : 1000 (+  1,=582,-417), 29.2 %
KomodoDragon 3.1 MCTS    : 1000 (+ 40,=779,-181), 43.0 %
KomodoDragon 3.1 avx2    : 1000 (+  7,=682,-311), 34.8 %
Revenge 3.0 avx2         : 1000 (+147,=777,- 76), 53.5 %
Ethereal 13.75 nnue      : 1000 (+178,=774,- 48), 56.5 %
Koivisto 8.13 avx2       : 1000 (+120,=814,- 66), 52.7 %
Slow Chess 2.9 avx2      : 1000 (+278,=683,- 39), 62.0 %
Stockfish 15 220418      : 1000 (+  2,=564,-434), 28.4 %
Looking this pool, Berserk has a pretty poor score of 43%, mostly due to 4000 games being against engines 100+ Elo stronger than it. If you replaced the two stockfish dev vesrions with SF HCE and Rubichess, does Berserk's EAS-Score rise? If you played Berserk only vs a pool of engines where it is favored, does the EAS-Score rise?

If the EAS-Score varies heavily based on pools, then there is a flaw in your list in my opinion.
Just look at Slow Chess 2.9:
Score 43.1%, like Berserk 10. And EAS Score is 95348, nearly 3x bigger than Berserk 10. Rank 4 in EAS ratinglist...

The EAS calculations are all done with percent values, because of the reason that it should not matter, how strong the engine plays and how high the score is!
From my website:
"Because a weaker player can be playing aggressive, too, the EAS-Score (= Engine Aggressivenes Score, see explanation below) and all other statistics are build on percents from the won games of an engine/player. So, if an engine has won more games, it must win more short games or win games with sacrifices. A weaker engine, which has won less games, need less wins of short games or win games with sacrifices."


Or look at the full-ratinglist, where all played games of the engines are stored and no Stockfish-dev-versions are included (below the full ratinglist, the full EAS-ratinglist follows):
https://www.sp-cc.de/files/spcc_full_list.txt

Here you have Berserk 9 with 13000 played games, no SF-devs as opponent and a score of 49.2% (nearly 50%):
17 Berserk 9 avx2 : 3647 5 5 13000 49.2% 3653 70.2%
And the EAS-score stays as bad as always (rank 158 of 166 entries!!!):
158 35269 10.09% 05.89% 26.71% Berserk 9 avx2

And SlowChess 2.9 has 19000 games here, with only 42.5% score:
26 Slow Chess 2.9 avx2 : 3585 4 4 17000 42.5% 3641 66.7%
And the EAS-score stays high (Rank 17 of 166):
17 92177 23.99% 23.54% 17.84% Slow Chess 2.9 avx2

Q.E.D.
Last edited by pohl4711 on Mon Oct 17, 2022 7:54 pm, edited 1 time in total.
User avatar
pohl4711
Posts: 2732
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SPCC: Testrun of Berserk 10 finished

Post by pohl4711 »

Or look at my testings of the old King-engine, when I built settings for my King-Chesscomputer:

Code: Select all

     Program                     Elo    +    -   Games   Score   Av.Op.  Draws

   1 Rebel 13                  : 2567    5    5  7500    63.5 %   2466   22.4 %
   2 Delfi 5.4                 : 2533    6    6  7500    59.1 %   2466   23.6 %
   3 TheKing Razorback         : 2525    8    8  3500    52.1 %   2511   29.1 %
   4 TheKing Researcher        : 2525    8    8  3500    52.0 %   2511   28.1 %
   5 K2 0.95                   : 2523    5    5  7500    57.7 %   2466   21.1 %
   6 TheKing TrS Normal        : 2521    8    8  3500    51.5 %   2511   28.1 %
   7 RedQueen 1.1.98           : 2519    5    5  7500    57.2 %   2466   19.6 %
   8 Gandalf 7                 : 2508    6    6  7500    55.7 %   2466   24.6 %
   9 TheKing TrS Solid         : 2502    8    8  3500    48.8 %   2511   37.1 %
  10 TheKing Normal            : 2500    8    8  3500    48.5 %   2511   28.3 %
  11 TheKing SPCC Normal       : 2499    8    8  3500    48.4 %   2511   27.5 %
  12 TheKing SPCC Solid        : 2498    8    8  3500    48.3 %   2511   38.3 %
  13 TheKing Solid             : 2481    8    8  3500    45.9 %   2511   31.5 %
  14 TheKing TrS Active        : 2478    8    8  3500    45.4 %   2511   21.1 %
  15 TheKing Active            : 2474    8    8  3500    44.8 %   2511   20.8 %
  16 Ruffian Leiden            : 2470    6    6  7500    50.4 %   2466   21.7 %
  17 TheKing SPCC Active       : 2461    8    8  3500    43.1 %   2511   18.3 %
  18 Orion 0.6                 : 2456    6    6  7500    48.4 %   2466   27.0 %
  19 TheKing TrS AkAg          : 2449    9    9  3500    41.3 %   2511   13.1 %
  20 TheKing TrS Aggressive    : 2395    8    8  3500    34.2 %   2511    7.1 %
  21 TheKing Aggressive        : 2355    9    9  3500    29.4 %   2511    6.8 %
  22 Open Tal 1.2              : 2329    9    9  3500    26.4 %   2511    7.5 %
  
OpenTal 1.2 has a score of only 26.4% and is on the last rank of the list...
And now look at the EAS-calculation:

Code: Select all

                                 bad 
Rank  EAS-Score  sacs   shorts  draws    Engine/player 
------------------------------------------------------------- 
   1    688980  66.67%  85.23%  10.31%  Open Tal 1.2 
   2    459251  46.75%  81.19%  15.61%  TheKing Aggressive 
   3    449557  40.39%  82.28%  26.91%  TheKing TrS Aggressive 
   4    373875  34.92%  76.25%  26.64%  TheKing TrS AkAg 
   5    312244  31.50%  71.99%  16.19%  TheKing Active 
   6    284089  22.95%  66.72%  19.76%  TheKing TrS Active 
   7    251190  14.13%  67.70%  31.77%  TheKing SPCC Active 
   8    231634  17.23%  64.86%  16.67%  TheKing Researcher 
   9    225428  15.72%  64.97%  19.88%  TheKing Normal 
  10    224918  20.24%  66.00%  15.74%  TheKing TrS Normal 
  11    209408  14.84%  62.40%  17.19%  TheKing Razorback 
  12    202066  11.35%  63.55%  23.05%  Gandalf 7 
  13    201970  11.08%  67.80%  24.83%  K2 0.95 
  14    191813  08.07%  67.76%  27.23%  Delfi 5.4 
  15    188982  15.26%  58.77%  18.53%  TheKing Solid 
  16    185300  08.18%  70.63%  22.49%  RedQueen 1.1.98 
  17    182529  05.81%  67.10%  30.53%  Rebel 13 
  18    180321  08.57%  62.00%  22.97%  TheKing SPCC Normal 
  19    174379  07.98%  61.34%  32.68%  Orion 0.6 
  20    158870  04.75%  62.31%  34.48%  Ruffian Leiden 
  21    157007  12.65%  52.79%  17.71%  TheKing TrS Solid 
  22    155722  07.26%  56.43%  21.92%  TheKing SPCC Solid 
 
Q.E.D. Part 2

The EAS-tool should measure the aggressiveness of an engine, no matter how strong or weak the engine is and no matter how strong or weak the opponents are (of course the opponents should not be so strong, that the engine cannot win any game...). And thats exactly, what the EAS-tool does! And thats why I am really proud of this tool, because such an Agressiveness-ratinglist / scoring-system never existed before in computerchess.
And beyond 3200 Elo the strength of an engine gets more and more insignificant, but an interesting, aggressive playing style gets more and more important. IMO.