Ratinglist-testrun of Uralochka 3.38c finished.
https://www.sp-cc.de
Also take a look at the EAS-Ratinglist, the world's first engine-ratinglist not measuring strength of engines but engines's style of play:
https://www.sp-cc.de/eas-ratinglist.htm
(Uralochka 3.38c is a complete disappointment in my EAS-Ratinglist: Uralochka 3.37c is on rank 1, Uralochka 3.38c only on rank 17 (Uralochka 3.38c has 70685 EAS-points, Uralochka 3.37c has 132484). So, with this new version 3.38c, Uralochka lost it's spectacular, aggressive playing-style - very bad news!)
(Perhaps you have to clear your browsercache or reload the website)
SPCC: Testrun of Uralochka 3.38c finished
Moderator: Ras
-
pohl4711
- Posts: 2900
- Joined: Sat Sep 03, 2011 7:25 am
- Location: Berlin, Germany
- Full name: Stefan Pohl
-
FreemanZlat
- Posts: 17
- Joined: Mon May 30, 2022 10:50 am
- Full name: Ivan Maklyakov
Re: SPCC: Testrun of Uralochka 3.38c finished
Hi Stefan!
Thanks for testing Uralochka!
It is strange that the aggressiveness of the engine has decreased. For the new version, I took a neural network from version 3.37c and retrained it on a dataset that contains about 30% of new data. I didn't think it would affect the playstyle so much.
I have versions 38a and 38b (between 37c and 38c) that were trained on a slightly different dataset. They are also based on 3.37c. If you wish, you can test the aggressiveness of these versions. It would be interesting to know at what stage the style changed so much.
Thanks for testing Uralochka!
It is strange that the aggressiveness of the engine has decreased. For the new version, I took a neural network from version 3.37c and retrained it on a dataset that contains about 30% of new data. I didn't think it would affect the playstyle so much.
I have versions 38a and 38b (between 37c and 38c) that were trained on a slightly different dataset. They are also based on 3.37c. If you wish, you can test the aggressiveness of these versions. It would be interesting to know at what stage the style changed so much.
-
pohl4711
- Posts: 2900
- Joined: Sat Sep 03, 2011 7:25 am
- Location: Berlin, Germany
- Full name: Stefan Pohl
Re: SPCC: Testrun of Uralochka 3.38c finished
I am very busy, building my Chess 324 opening-sets...So, I cant do this, sorryFreemanZlat wrote: ↑Tue Sep 06, 2022 1:38 pm Hi Stefan!
Thanks for testing Uralochka!
It is strange that the aggressiveness of the engine has decreased. For the new version, I took a neural network from version 3.37c and retrained it on a dataset that contains about 30% of new data. I didn't think it would affect the playstyle so much.
I have versions 38a and 38b (between 37c and 38c) that were trained on a slightly different dataset. They are also based on 3.37c. If you wish, you can test the aggressiveness of these versions. It would be interesting to know at what stage the style changed so much.
But you can easily use the EAS-tool by yourself:
https://www.sp-cc.de/files/engines_aggr ... cs_tool.7z
You could play a bullet round-robin with all your Uralochka-versions and then use the EAS-tool on the resulting game pgn-file. Mention to play enough games (at least 2000-3000 games per engine is strongly recommended for a "stable" EAS-scoring result).
-
FreemanZlat
- Posts: 17
- Joined: Mon May 30, 2022 10:50 am
- Full name: Ivan Maklyakov
Re: SPCC: Testrun of Uralochka 3.38c finished
Thanks!
I calculated EAS on test games of the latest versions of the engine (games between versions of Uralochka and some other engines).
Result of bayeselo.exe for the same pgn:
Is it correct to calculate the EOS for such pgn, in which each engine has a different number of games?
I also noticed that in your rating list 3.37 and 3.38 have different Score and Av.Op values. (51.1% and 3466 versus 47.0% and 3533). Could this be one of the reasons for the strong reduction in aggressiveness? After all, it is easier to play aggressively against weak opponents than against strong ones.
I calculated EAS on test games of the latest versions of the engine (games between versions of Uralochka and some other engines).
Code: Select all
*****************************************************************************
*** Evaluated file: test007.pgn ***
*****************************************************************************
bad
Rank EAS-Score sacs shorts draws Engine/player
-------------------------------------------------------------
1 164113 15.26% 57.68% 21.74% "Uralochka3-38c"
2 151973 16.02% 57.87% 22.16% "Uralochka3-37c"
3 140593 12.60% 50.30% 25.87% "arasan_23.3"
4 132480 12.47% 50.63% 26.24% "arasan_23.4"
5 125390 12.06% 54.25% 23.99% "Uralochka3-38a"
6 110542 13.07% 51.92% 22.07% "Uralochka3-38b"
7 63027 07.54% 26.45% 24.36% "igel-3_0_5"
8 61878 04.44% 34.07% 28.46% "Clover.3.1-avx2"
9 52704 07.35% 25.16% 25.38% "igel-3_1_0"
10 36903 01.82% 32.73% 33.82% "seer_v2.5"
Code: Select all
Rank Name Elo + - games score oppo. draws
1 seer_v2.5 50 25 25 436 56% 14 62%
2 Uralochka3-38c 28 4 4 13812 57% -11 65%
3 Uralochka3-38b 10 6 6 7727 52% 2 73%
4 arasan_23.4 4 7 7 5087 50% 3 63%
5 Uralochka3-38a 3 3 3 35122 51% -5 72%
6 Uralochka3-37c -5 2 2 45152 49% -1 70%
7 igel-3_1_0 -7 6 6 7086 48% 3 66%
8 arasan_23.3 -18 6 6 7086 46% 3 59%
9 igel-3_0_5 -19 6 6 7090 46% 3 67%
10 Clover.3.1-avx2 -47 7 7 6652 41% 4 54%
I also noticed that in your rating list 3.37 and 3.38 have different Score and Av.Op values. (51.1% and 3466 versus 47.0% and 3533). Could this be one of the reasons for the strong reduction in aggressiveness? After all, it is easier to play aggressively against weak opponents than against strong ones.
-
pohl4711
- Posts: 2900
- Joined: Sat Sep 03, 2011 7:25 am
- Location: Berlin, Germany
- Full name: Stefan Pohl
Re: SPCC: Testrun of Uralochka 3.38c finished
FreemanZlat wrote: ↑Tue Sep 06, 2022 6:16 pm
Is it correct to calculate the EOS for such pgn, in which each engine has a different number of games?
I also noticed that in your rating list 3.37 and 3.38 have different Score and Av.Op values. (51.1% and 3466 versus 47.0% and 3533). Could this be one of the reasons for the strong reduction in aggressiveness? After all, it is easier to play aggressively against weak opponents than against strong ones.
Of course, both is not the perfect solution, but in ratinglist-testings, there is no other way to do it. Perfect solution would be a huge RoundRobin, with all engines playing vs all opponents and the same number of games (and openings)...
But with a huge number of games (and always using the same openings (important!)), the EAS-results are good IMO. But using the same openings is a very important point! In my ratinglist, all 1000 games head-to-head tests are always played with the 500 HERT-openings.
Take a look at my results of my testings of the old "TheKing"-engine, where the weak but very aggressive OpenTal-engine played, too. Here you see, that OpenTal and the aggressive King-settings are rel. weak (so they play vs stronger opponents (because of their own weakness) than other settings) but in the EAS-ratinglist, they are in the lead (even though their overall-scoring in the ratinglist is around 30%, only)(and all head-to-head tests were played with exactly the same openings, too!)
Code: Select all
Program Elo + - Games Score Av.Op. Draws
1 Rebel 13 : 2567 5 5 7500 63.5 % 2466 22.4 %
2 Delfi 5.4 : 2533 6 6 7500 59.1 % 2466 23.6 %
3 TheKing Razorback : 2525 8 8 3500 52.1 % 2511 29.1 %
4 TheKing Researcher : 2525 8 8 3500 52.0 % 2511 28.1 %
5 K2 0.95 : 2523 5 5 7500 57.7 % 2466 21.1 %
6 TheKing TrS Normal : 2521 8 8 3500 51.5 % 2511 28.1 %
7 RedQueen 1.1.98 : 2519 5 5 7500 57.2 % 2466 19.6 %
8 Gandalf 7 : 2508 6 6 7500 55.7 % 2466 24.6 %
9 TheKing TrS Solid : 2502 8 8 3500 48.8 % 2511 37.1 %
10 TheKing Normal : 2500 8 8 3500 48.5 % 2511 28.3 %
11 TheKing SPCC Normal : 2499 8 8 3500 48.4 % 2511 27.5 %
12 TheKing SPCC Solid : 2498 8 8 3500 48.3 % 2511 38.3 %
13 TheKing Solid : 2481 8 8 3500 45.9 % 2511 31.5 %
14 TheKing TrS Active : 2478 8 8 3500 45.4 % 2511 21.1 %
15 TheKing Active : 2474 8 8 3500 44.8 % 2511 20.8 %
16 Ruffian Leiden : 2470 6 6 7500 50.4 % 2466 21.7 %
17 TheKing SPCC Active : 2461 8 8 3500 43.1 % 2511 18.3 %
18 Orion 0.6 : 2456 6 6 7500 48.4 % 2466 27.0 %
19 TheKing TrS AkAg : 2449 9 9 3500 41.3 % 2511 13.1 %
20 TheKing TrS Aggressive : 2395 8 8 3500 34.2 % 2511 7.1 %
21 TheKing Aggressive : 2355 9 9 3500 29.4 % 2511 6.8 %
22 Open Tal 1.2 : 2329 9 9 3500 26.4 % 2511 7.5 %
Code: Select all
bad
Rank EAS-Score sacs shorts draws Engine/player
-------------------------------------------------------------
1 688980 66.67% 85.23% 10.31% Open Tal 1.2
2 459251 46.75% 81.19% 15.61% TheKing Aggressive
3 449557 40.39% 82.28% 26.91% TheKing TrS Aggressive
4 373875 34.92% 76.25% 26.64% TheKing TrS AkAg
5 312244 31.50% 71.99% 16.19% TheKing Active
6 284089 22.95% 66.72% 19.76% TheKing TrS Active
7 251190 14.13% 67.70% 31.77% TheKing SPCC Active
8 231634 17.23% 64.86% 16.67% TheKing Researcher
9 225428 15.72% 64.97% 19.88% TheKing Normal
10 224918 20.24% 66.00% 15.74% TheKing TrS Normal
11 209408 14.84% 62.40% 17.19% TheKing Razorback
12 202066 11.35% 63.55% 23.05% Gandalf 7
13 201970 11.08% 67.80% 24.83% K2 0.95
14 191813 08.07% 67.76% 27.23% Delfi 5.4
15 188982 15.26% 58.77% 18.53% TheKing Solid
16 185300 08.18% 70.63% 22.49% RedQueen 1.1.98
17 182529 05.81% 67.10% 30.53% Rebel 13
18 180321 08.57% 62.00% 22.97% TheKing SPCC Normal
19 174379 07.98% 61.34% 32.68% Orion 0.6
20 158870 04.75% 62.31% 34.48% Ruffian Leiden
21 157007 12.65% 52.79% 17.71% TheKing TrS Solid
22 155722 07.26% 56.43% 21.92% TheKing SPCC Solid