SPCC: Testrun of Uralochka 3.38c finished

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
pohl4711
Posts: 2900
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

SPCC: Testrun of Uralochka 3.38c finished

Post by pohl4711 »

Ratinglist-testrun of Uralochka 3.38c finished.


https://www.sp-cc.de

Also take a look at the EAS-Ratinglist, the world's first engine-ratinglist not measuring strength of engines but engines's style of play:
https://www.sp-cc.de/eas-ratinglist.htm

(Uralochka 3.38c is a complete disappointment in my EAS-Ratinglist: Uralochka 3.37c is on rank 1, Uralochka 3.38c only on rank 17 (Uralochka 3.38c has 70685 EAS-points, Uralochka 3.37c has 132484). So, with this new version 3.38c, Uralochka lost it's spectacular, aggressive playing-style - very bad news!)

(Perhaps you have to clear your browsercache or reload the website)
FreemanZlat
Posts: 17
Joined: Mon May 30, 2022 10:50 am
Full name: Ivan Maklyakov

Re: SPCC: Testrun of Uralochka 3.38c finished

Post by FreemanZlat »

Hi Stefan!
Thanks for testing Uralochka!

It is strange that the aggressiveness of the engine has decreased. For the new version, I took a neural network from version 3.37c and retrained it on a dataset that contains about 30% of new data. I didn't think it would affect the playstyle so much.

I have versions 38a and 38b (between 37c and 38c) that were trained on a slightly different dataset. They are also based on 3.37c. If you wish, you can test the aggressiveness of these versions. It would be interesting to know at what stage the style changed so much.
User avatar
pohl4711
Posts: 2900
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SPCC: Testrun of Uralochka 3.38c finished

Post by pohl4711 »

FreemanZlat wrote: Tue Sep 06, 2022 1:38 pm Hi Stefan!
Thanks for testing Uralochka!

It is strange that the aggressiveness of the engine has decreased. For the new version, I took a neural network from version 3.37c and retrained it on a dataset that contains about 30% of new data. I didn't think it would affect the playstyle so much.

I have versions 38a and 38b (between 37c and 38c) that were trained on a slightly different dataset. They are also based on 3.37c. If you wish, you can test the aggressiveness of these versions. It would be interesting to know at what stage the style changed so much.
I am very busy, building my Chess 324 opening-sets...So, I cant do this, sorry

But you can easily use the EAS-tool by yourself:
https://www.sp-cc.de/files/engines_aggr ... cs_tool.7z

You could play a bullet round-robin with all your Uralochka-versions and then use the EAS-tool on the resulting game pgn-file. Mention to play enough games (at least 2000-3000 games per engine is strongly recommended for a "stable" EAS-scoring result).
FreemanZlat
Posts: 17
Joined: Mon May 30, 2022 10:50 am
Full name: Ivan Maklyakov

Re: SPCC: Testrun of Uralochka 3.38c finished

Post by FreemanZlat »

Thanks!

I calculated EAS on test games of the latest versions of the engine (games between versions of Uralochka and some other engines).

Code: Select all

***************************************************************************** 
*** Evaluated file: test007.pgn *** 
***************************************************************************** 
                                 bad 
Rank  EAS-Score  sacs   shorts  draws    Engine/player 
------------------------------------------------------------- 
   1    164113  15.26%  57.68%  21.74%  "Uralochka3-38c" 
   2    151973  16.02%  57.87%  22.16%  "Uralochka3-37c" 
   3    140593  12.60%  50.30%  25.87%  "arasan_23.3" 
   4    132480  12.47%  50.63%  26.24%  "arasan_23.4" 
   5    125390  12.06%  54.25%  23.99%  "Uralochka3-38a" 
   6    110542  13.07%  51.92%  22.07%  "Uralochka3-38b" 
   7     63027  07.54%  26.45%  24.36%  "igel-3_0_5" 
   8     61878  04.44%  34.07%  28.46%  "Clover.3.1-avx2" 
   9     52704  07.35%  25.16%  25.38%  "igel-3_1_0" 
  10     36903  01.82%  32.73%  33.82%  "seer_v2.5" 
Result of bayeselo.exe for the same pgn:

Code: Select all

Rank Name              Elo    +    - games score oppo. draws 
   1 seer_v2.5          50   25   25   436   56%    14   62% 
   2 Uralochka3-38c     28    4    4 13812   57%   -11   65% 
   3 Uralochka3-38b     10    6    6  7727   52%     2   73% 
   4 arasan_23.4         4    7    7  5087   50%     3   63% 
   5 Uralochka3-38a      3    3    3 35122   51%    -5   72% 
   6 Uralochka3-37c     -5    2    2 45152   49%    -1   70% 
   7 igel-3_1_0         -7    6    6  7086   48%     3   66% 
   8 arasan_23.3       -18    6    6  7086   46%     3   59% 
   9 igel-3_0_5        -19    6    6  7090   46%     3   67% 
  10 Clover.3.1-avx2   -47    7    7  6652   41%     4   54% 
Is it correct to calculate the EOS for such pgn, in which each engine has a different number of games?

I also noticed that in your rating list 3.37 and 3.38 have different Score and Av.Op values. (51.1% and 3466 versus 47.0% and 3533). Could this be one of the reasons for the strong reduction in aggressiveness? After all, it is easier to play aggressively against weak opponents than against strong ones.
User avatar
pohl4711
Posts: 2900
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SPCC: Testrun of Uralochka 3.38c finished

Post by pohl4711 »

FreemanZlat wrote: Tue Sep 06, 2022 6:16 pm

Is it correct to calculate the EOS for such pgn, in which each engine has a different number of games?

I also noticed that in your rating list 3.37 and 3.38 have different Score and Av.Op values. (51.1% and 3466 versus 47.0% and 3533). Could this be one of the reasons for the strong reduction in aggressiveness? After all, it is easier to play aggressively against weak opponents than against strong ones.

Of course, both is not the perfect solution, but in ratinglist-testings, there is no other way to do it. Perfect solution would be a huge RoundRobin, with all engines playing vs all opponents and the same number of games (and openings)...

But with a huge number of games (and always using the same openings (important!)), the EAS-results are good IMO. But using the same openings is a very important point! In my ratinglist, all 1000 games head-to-head tests are always played with the 500 HERT-openings.
Take a look at my results of my testings of the old "TheKing"-engine, where the weak but very aggressive OpenTal-engine played, too. Here you see, that OpenTal and the aggressive King-settings are rel. weak (so they play vs stronger opponents (because of their own weakness) than other settings) but in the EAS-ratinglist, they are in the lead (even though their overall-scoring in the ratinglist is around 30%, only)(and all head-to-head tests were played with exactly the same openings, too!)

Code: Select all

     Program                     Elo    +    -   Games   Score   Av.Op.  Draws

   1 Rebel 13                  : 2567    5    5  7500    63.5 %   2466   22.4 %
   2 Delfi 5.4                 : 2533    6    6  7500    59.1 %   2466   23.6 %
   3 TheKing Razorback         : 2525    8    8  3500    52.1 %   2511   29.1 %
   4 TheKing Researcher        : 2525    8    8  3500    52.0 %   2511   28.1 %
   5 K2 0.95                   : 2523    5    5  7500    57.7 %   2466   21.1 %
   6 TheKing TrS Normal        : 2521    8    8  3500    51.5 %   2511   28.1 %
   7 RedQueen 1.1.98           : 2519    5    5  7500    57.2 %   2466   19.6 %
   8 Gandalf 7                 : 2508    6    6  7500    55.7 %   2466   24.6 %
   9 TheKing TrS Solid         : 2502    8    8  3500    48.8 %   2511   37.1 %
  10 TheKing Normal            : 2500    8    8  3500    48.5 %   2511   28.3 %
  11 TheKing SPCC Normal       : 2499    8    8  3500    48.4 %   2511   27.5 %
  12 TheKing SPCC Solid        : 2498    8    8  3500    48.3 %   2511   38.3 %
  13 TheKing Solid             : 2481    8    8  3500    45.9 %   2511   31.5 %
  14 TheKing TrS Active        : 2478    8    8  3500    45.4 %   2511   21.1 %
  15 TheKing Active            : 2474    8    8  3500    44.8 %   2511   20.8 %
  16 Ruffian Leiden            : 2470    6    6  7500    50.4 %   2466   21.7 %
  17 TheKing SPCC Active       : 2461    8    8  3500    43.1 %   2511   18.3 %
  18 Orion 0.6                 : 2456    6    6  7500    48.4 %   2466   27.0 %
  19 TheKing TrS AkAg          : 2449    9    9  3500    41.3 %   2511   13.1 %
  20 TheKing TrS Aggressive    : 2395    8    8  3500    34.2 %   2511    7.1 %
  21 TheKing Aggressive        : 2355    9    9  3500    29.4 %   2511    6.8 %
  22 Open Tal 1.2              : 2329    9    9  3500    26.4 %   2511    7.5 %

Code: Select all

                                 bad 
Rank  EAS-Score  sacs   shorts  draws    Engine/player 
------------------------------------------------------------- 
   1    688980  66.67%  85.23%  10.31%  Open Tal 1.2 
   2    459251  46.75%  81.19%  15.61%  TheKing Aggressive 
   3    449557  40.39%  82.28%  26.91%  TheKing TrS Aggressive 
   4    373875  34.92%  76.25%  26.64%  TheKing TrS AkAg 
   5    312244  31.50%  71.99%  16.19%  TheKing Active 
   6    284089  22.95%  66.72%  19.76%  TheKing TrS Active 
   7    251190  14.13%  67.70%  31.77%  TheKing SPCC Active 
   8    231634  17.23%  64.86%  16.67%  TheKing Researcher 
   9    225428  15.72%  64.97%  19.88%  TheKing Normal 
  10    224918  20.24%  66.00%  15.74%  TheKing TrS Normal 
  11    209408  14.84%  62.40%  17.19%  TheKing Razorback 
  12    202066  11.35%  63.55%  23.05%  Gandalf 7 
  13    201970  11.08%  67.80%  24.83%  K2 0.95 
  14    191813  08.07%  67.76%  27.23%  Delfi 5.4 
  15    188982  15.26%  58.77%  18.53%  TheKing Solid 
  16    185300  08.18%  70.63%  22.49%  RedQueen 1.1.98 
  17    182529  05.81%  67.10%  30.53%  Rebel 13 
  18    180321  08.57%  62.00%  22.97%  TheKing SPCC Normal 
  19    174379  07.98%  61.34%  32.68%  Orion 0.6 
  20    158870  04.75%  62.31%  34.48%  Ruffian Leiden 
  21    157007  12.65%  52.79%  17.71%  TheKing TrS Solid 
  22    155722  07.26%  56.43%  21.92%  TheKing SPCC Solid