STS re-re-re-re-re-visited

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Dann Corbit
Posts: 12777
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: STS re-re-re-re-re-visited

Post by Dann Corbit »

The average age of an STS analysis file is about ten years.
So we can figure on 2^10 increase in hardware strength and 2^10 increase on software strength.
In short, the current hardware/software combinations are some constant times one million times stronger than the stuff used when the test set was analyzed.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
peter
Posts: 3385
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: STS re-re-re-re-re-visited

Post by peter »

Ferdy wrote: Mon Oct 24, 2022 11:11 am
Rebel wrote: Mon Oct 24, 2022 6:22 am
Ferdy wrote: Mon Oct 24, 2022 4:43 am There is sts positions analyzed by sf15 at 60s with multipv 10 on a single core using cpu i7-2600K cpu (3.4 Ghz).
Did a quick run with the 6 per-installed engines.

Code: Select all

    EPD  : epd\sts-sf15-ferdy.epd
    Time : 100ms
                                                Max   Total   Time   Hash          
    Engine           Score   Found  Pos   ELO  Score   Rate    ms     Mb  Cpu      
 1  Berserk 9        13452   1025  1500  4041  14973  89.8%    100    64    1
 2  Koivisto 8.0     13436   1037  1500  4036  14973  89.7%    100    64    1
 3  Rebel 15.1       13237   1001  1500  3978  14973  88.4%    100    64    1
 4  Seer 2.5.0       12985    957  1500  3901  14973  86.7%    100    64    1
 5  Toga 4           11575    753  1500  3478  14973  77.3%    100    64    1
 6  Fruit 2.1        11017    678  1500  3312  14973  73.6%    100    64    1

                                    Created with MEA
                                          by
                                       Ferdinand
                                         Mosca
My move bonus point calculation is a bit different than yours. First move is always 10, for remaining moves the score difference with the first move is divided by 5. Moves with a score difference is > 50 are not listed.

For @Srdja, the Ferdy EPD also contains long moves, g1f3 instead of Nf3.
I have added STS1-STS15_LAN_v5.epd. Its max point is 100 instead of 10. The weakness of max 10-point system that I see in STS1-STS15_LAN_v4.epd is that the range is too narrow and it has difficulties differentiating engines that are close in strength.

I describe the interpolation that I use in the repository.
Very well, Ferdy, but 1 minute SF single threaded with MultiPV=10 seems too little hardware- time to me yet still. Same with 30 threads would be better already, but I'd say, there should be some Forward- Backward of the lines of interest to get hash filled before defintive evaluation at least too.
I started a thread in CSS- forum to ask for co-workers to arrange with about a more reliable scheme of evaluation

https://forum.computerschach.de/cgi-bin ... #pid158725

If I succeed, I'll return to this one thread here with results too.

Edit:
Did let run your yesterday's version of the suite here with SF 231022 and with 110922, which I had already done with sts-SF15.epd too, hardware- TC was 30 threads of 16x3.5GHz and 100msec/pos., here are the results compared to each other, of course the points of old and new .epd have to be seen separtely, just to get the differences in relations:

Code: Select all

    EPD  : epd\FM.epd
    Time : 100ms
                                                Max   Total   Time   Hash          
    Engine           Score   Found  Pos   ELO  Score   Rate    ms     Mb  Cpu      
 1  Stockfish 110922  14291   1179  1500  4293  14973  95.4%    100     8   30
 2  Stockfish 231022  14267   1190  1500  4288  14973  95.3%    100     8   30

 6  Stockfish 110922  13873   1209  1500  4162  15000  92.5%    100     8   30
 7  Stockfish 231022  13871   1212  1500  4162  15000  92.5%    100     8   30


                                    Created with MEA
                                          by
                                       Ferdinand
                                         Mosca
First two lines are with your .epd from yesterday from the link you gave, second two are with sts-SF15.epd.

Point of weakness, that percentual performances aren't of much better discrimination neither, remains about the same yet too, as to be expected with same positions and same scheme of evaluation, regards
Last edited by peter on Tue Oct 25, 2022 8:03 am, edited 8 times in total.
Peter.
User avatar
Rebel
Posts: 7299
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: STS re-re-re-re-re-visited

Post by Rebel »

BY THE WAY

Small patch.

http://rebel13.nl/dl/mrl.zip

Replace mrl.exe in the SF folder.

It fixes the text file output.

Code: Select all

    EPD  : epd\sts-sf15.epd
    Time : 100ms
                                                Max   Total   Time   Hash          
    Engine           Score   Found  Pos   ELO  Score   Rate    ms     Mb  Cpu      
 1  Koivisto 8.0     12248   1031  1500  3676  15000  81.7%    100    64    1
Solving time out -> Elo rating in.
90% of coding is debugging, the other 10% is writing bugs.
peter
Posts: 3385
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: STS re-re-re-re-re-visited

Post by peter »

peter wrote: Tue Oct 25, 2022 7:33 am Point of weakness, that percentual performances aren't of much better discrimination neither, remains about the same yet too, as to be expected with same positions and same scheme of evaluation, regards
Edit- time out, guess I should have taken later versions of yours, Ferdy, because of new max. points- sum, here's Lan4 again togehter with today's Lan6, first two lines are newly added, pity, there's only 1188 instead of 1500 now, which I saw not until runs were finished, so I gave Lan5 a try too, including the full 1500. First two lines are Lan5, second two Lan6, 5+6 Lan4, 10+11 sts-SF15.epd:

Code: Select all

    EPD  : epd\FM.epd
    Time : 100ms
                                                Max   Total   Time   Hash          
    Engine           Score   Found  Pos   ELO  Score   Rate    ms     Mb  Cpu      
 1  Stockfish 110922  143696   1187  1500  4320  149703  96.0%    100     8   30
 2  Stockfish 231022  143474   1186  1500  4311  149703  95.8%    100     8   30

 3  Stockfish 231022  115272   1033  1188  4365  118800  97.0%    100     8   30
 4  Stockfish 110922  114476   1023  1188  4338  118800  96.4%    100     8   30

 5  Stockfish 110922  14291    1179  1500  4293  14973   95.4%    100     8   30
 6  Stockfish 231022  14267    1190  1500  4288  14973   95.3%    100     8   30

10  Stockfish 110922  13873    1209  1500  4162  15000   92.5%    100     8   30
11  Stockfish 231022  13871    1212  1500  4162  15000   92.5%    100     8   30



                                    Created with MEA
                                          by
                                       Ferdinand
                                         Mosca
Peter.
chessica
Posts: 915
Joined: Thu Aug 11, 2022 11:30 pm
Full name: Esmeralda Pinto

Re: STS re-re-re-re-re-visited

Post by chessica »

Congratulations Ferdinand, and thank you for the very good tool. :)
A few engines I have tested twice to check if the results are reliable.



EPD : epd\sts-sf15.epd
Time : 10+10+1000ms
Solving Max Total Time Hash
Engine Score Used Time Found Pos Time Score Rate ms Mb Cpu CCRL
1 ShashChess25 13322 00:04:10.0 1133 1500 00:00:00.0 15000 88.8% 1000 64 1 0
2 Stockfish 14.1 13135 00:04:10.0 1117 1500 00:00:00.0 15000 87.6% 1000 64 1 0
3 berserk-10 13009 00:04:10.0 1098 1500 00:00:00.0 15000 86.7% 1000 64 1 1
4 Koivisto 8.10 12835 00:04:10.0 1082 1500 00:00:00.0 15000 85.6% 1000 64 1 0
5 Fire 8.11 12581 00:04:10.0 1072 1500 00:00:00.0 15000 83.9% 1000 64 1 0
6 komodo-13.02 12243 00:04:10.0 1029 1500 00:00:00.0 15000 81.6% 1000 64 1 0
7 Houdini 15a 11999 00:04:10.0 1005 1500 00:00:00.0 15000 80.0% 1000 64 1 0
8 velvet-v4.1.0 11992 00:04:10.0 1002 1500 00:00:00.0 15000 79.9% 1000 64 1 0
9 ShashChess25 11927 00:04:10.0 999 1500 00:00:00.0 15000 79.5% 100 64 1 0
10 Stockfish 14.1 11601 00:04:10.0 944 1500 00:00:00.0 15000 77.3% 100 64 1 0
11 berserk-10 11510 00:04:10.0 955 1500 00:00:00.0 15000 76.7% 100 64 1 1
12 Koivisto 8.10 11282 00:04:10.0 928 1500 00:00:00.0 15000 75.2% 100 64 1 0
13 Fire 8.11 11062 00:04:10.0 921 1500 00:00:00.0 15000 73.7% 100 64 1 0
14 arasanx-64 11033 00:04:10.0 920 1500 00:00:00.0 15000 73.6% 1000 64 1 1
15 sting-sf-28 10953 00:04:10.0 903 1500 00:00:00.0 15000 73.0% 1000 64 1 0
16 komodo-13.02 10718 00:04:10.0 880 1500 00:00:00.0 15000 71.5% 100 64 1 0
17 Toga 4 10473 00:04:10.0 850 1500 00:00:00.0 15000 69.8% 1000 64 1 1
18 velvet-v4.1.0 10407 00:04:10.0 830 1500 00:00:00.0 15000 69.4% 100 64 1 0
19 Houdini 15a 10346 00:04:10.0 848 1500 00:00:00.0 15000 69.0% 100 64 1 0
20 Colossus2022a 9933 00:04:10.0 799 1500 00:00:00.0 15000 66.2% 1000 64 1 0
21 ShashChess25 9829 00:04:10.0 783 1500 00:00:00.0 15000 65.5% 10 64 1 0
22 Fire 8.11 9726 00:04:10.0 763 1500 00:00:00.0 15000 64.8% 10 64 1 1
23 sting-sf-28 9619 00:04:10.0 766 1500 00:00:00.0 15000 64.1% 100 64 1 0
24 arasanx-64 9565 00:04:10.0 769 1500 00:00:00.0 15000 63.8% 100 64 1 1
25 Stockfish 14.1 9262 00:04:10.0 709 1500 00:00:00.0 15000 61.7% 10 64 1 0
26 Stockfish 14.1 9248 00:04:10.0 706 1500 00:00:00.0 15000 61.7% 10 64 1 0
27 Toga 4 9017 00:04:10.0 701 1500 00:00:00.0 15000 60.1% 100 64 1 1
28 Colossus2022a 8692 00:04:10.0 671 1500 00:00:00.0 15000 57.9% 100 64 1 0
29 Koivisto 8.10 8563 00:04:10.0 645 1500 00:00:00.0 15000 57.1% 10 64 1 0
30 komodo-13.02 8539 00:04:10.0 650 1500 00:00:00.0 15000 56.9% 10 64 1 0
31 sting-sf-28 8346 00:04:10.0 647 1500 00:00:00.0 15000 55.6% 10 64 1 0
32 velvet-v4.1.0 8330 00:04:10.0 643 1500 00:00:00.0 15000 55.5% 10 64 1 0
33 berserk-10 8290 00:04:10.0 640 1500 00:00:00.0 15000 55.3% 10 64 1 1
34 berserk-10 8134 00:04:10.0 639 1500 00:00:00.0 15000 54.2% 10 64 1 1
35 Toga 4 7598 00:04:10.0 587 1500 00:00:00.0 15000 50.7% 10 64 1 0
36 Toga 4 7588 00:04:10.0 587 1500 00:00:00.0 15000 50.6% 10 64 1 64
37 Colossus2022a 7456 00:04:10.0 569 1500 00:00:00.0 15000 49.7% 10 64 1 0
38 Colossus2022a 7427 00:04:10.0 567 1500 00:00:00.0 15000 49.5% 10 64 1 0
39 Houdini 15a 7217 00:04:10.0 544 1500 00:00:00.0 15000 48.1% 10 64 1 0
40 arasanx-64 7170 00:04:10.0 539 1500 00:00:00.0 15000 47.8% 10 64 1 1
41 arasanx-64 7098 00:04:10.0 529 1500 00:00:00.0 15000 47.3% 10 64 1 1
42 Slow64 4548 00:04:10.0 335 1500 00:00:00.0 15000 30.3% 1000 64 1 64
43 Slow64 4381 00:04:10.0 316 1500 00:00:00.0 15000 29.2% 100 64 1 64
44 Slow64 4128 00:04:10.0 300 1500 00:00:00.0 15000 27.5% 10 64 1 1

Created with MEA
by
Ferdinand
Mosca
chessica
Posts: 915
Joined: Thu Aug 11, 2022 11:30 pm
Full name: Esmeralda Pinto

Re: STS re-re-re-re-re-visited

Post by chessica »

ccc
chessica
Posts: 915
Joined: Thu Aug 11, 2022 11:30 pm
Full name: Esmeralda Pinto

Re: STS re-re-re-re-re-visited

Post by chessica »

EPD : epd\sts-sf15.epd
Time : 1+10+100+1000ms

Engine Rating Top1 MaxTop1 Top1Rate Score MaxScore ScoreRate MoveTime(ms) Hash(MB) Threads
1 ShashChess25 2500 1133 1500 755 13322 15000 888 1000 64 1
2 Stockfish 14.1 2500 1117 1500 745 13135 15000 876 1000 64 1
3 berserk-10 2500 1098 1500 732 13009 15000 867 1000 64 1
4 Koivisto_8.10 2500 1082 1500 721 12835 15000 856 1000 64 1
5 Fire_8.11 2500 1072 1500 715 12581 15000 839 1000 64 1
6 komodo-13.02 2500 1029 1500 686 12243 15000 816 1000 64 1
7 Houdini_15a 2500 1005 1500 670 11999 15000 800 1000 64 1
8 velvet-v4.1.0 2500 1002 1500 668 11992 15000 799 1000 64 1
9 ShashChess25 2500 999 1500 666 11927 15000 795 100 64 1
10 berserk-10 2500 955 1500 637 11510 15000 767 100 64 1
11 Stockfish 14.1 2500 944 1500 629 11601 15000 773 100 64 1
12 Koivisto_8.10 2500 928 1500 619 11282 15000 752 100 64 1
13 Fire_8.11 2500 921 1500 614 11062 15000 737 100 64 1
14 arasanx-64 2500 920 1500 613 11033 15000 736 1000 64 1
15 sting-sf-28 2500 903 1500 602 10953 15000 730 1000 64 1
16 komodo-13.02 2500 880 1500 587 10718 15000 715 100 64 1
17 Toga_4 2500 850 1500 567 10473 15000 698 1000 64 1
18 Houdini_15a 2500 848 1500 565 10346 15000 690 100 64 1
19 velvet-v4.1.0 2500 830 1500 553 10407 15000 694 100 64 1
20 Colossus2022a 2500 799 1500 533 9933 15000 662 1000 64 1
21 ShashChess25 2500 783 1500 522 9829 15000 655 10 64 1
22 arasanx-64 2500 769 1500 513 9565 15000 638 100 64 1
23 sting-sf-28 2500 766 1500 511 9619 15000 641 100 64 1
24 Fire_8.11 2500 763 1500 509 9726 15000 648 10 64 1
25 Stockfish 14.1 2500 709 1500 473 9262 15000 617 10 64 1
26 Stockfish 14.1 2500 706 1500 471 9248 15000 617 10 64 1
27 Toga_4 2500 701 1500 467 9017 15000 601 100 64 1
28 Fire_8.11 2500 701 1500 467 8926 15000 595 1 64 1
29 Fire_8.11 2500 700 1500 467 8921 15000 595 1 64 1
30 Colossus2022a 2500 671 1500 447 8692 15000 579 100 64 1
31 berserk-10 2500 668 1500 445 8509 15000 567 1 64 1
32 berserk-10 2500 651 1500 434 8380 15000 559 1 64 1
33 komodo-13.02 2500 650 1500 433 8539 15000 569 10 64 1
34 sting-sf-28 2500 647 1500 431 8346 15000 556 10 64 1
35 Koivisto_8.10 2500 645 1500 430 8563 15000 571 10 64 1
36 velvet-v4.1.0 2500 643 1500 429 8330 15000 555 10 64 1
37 velvet-v4.1.0 2500 643 1500 429 8330 15000 555 1 64 1
38 velvet-v4.1.0 2500 641 1500 427 8310 15000 554 1 64 1
39 berserk-10 2500 640 1500 427 8290 15000 553 10 64 1
40 berserk-10 2500 639 1500 426 8134 15000 542 10 64 1
41 sting-sf-28 2500 625 1500 417 8162 15000 544 1 64 1
42 sting-sf-28 2500 625 1500 417 8155 15000 544 1 64 1
43 ShashChess25 2500 596 1500 397 7707 15000 514 1 64 1
44 Toga_4 2500 587 1500 391 7598 15000 507 10 64 1
45 Toga_4 2500 587 1500 391 7588 15000 506 10 64 1
46 Colossus2022a 2500 569 1500 379 7456 15000 497 10 64 1
47 Colossus2022a 2500 567 1500 378 7427 15000 495 10 64 1
48 Stockfish 14.1 2500 548 1500 365 7271 15000 485 1 64 1
49 Houdini_15a 2500 544 1500 363 7217 15000 481 10 64 1
50 ShashChess25 2500 540 1500 360 7215 15000 481 1 64 1
51 arasanx-64 2500 539 1500 359 7170 15000 478 10 64 1
52 Stockfish 14.1 2500 535 1500 357 7121 15000 475 1 64 1
53 arasanx-64 2500 529 1500 353 7098 15000 473 10 64 1
54 Koivisto_8.10 2500 512 1500 341 6999 15000 467 1 64 1
55 Koivisto_8.10 2500 495 1500 330 6796 15000 453 1 64 1
56 Toga_4 2500 492 1500 328 6575 15000 438 1 64 1
57 Toga_4 2500 488 1500 325 6583 15000 439 1 64 1
58 komodo-13.02 2500 477 1500 318 6378 15000 425 1 64 1
59 komodo-13.02 2500 475 1500 317 6475 15000 432 1 64 1
60 Colossus2022a 2500 467 1500 311 6372 15000 425 1 64 1
61 Colossus2022a 2500 465 1500 310 6353 15000 424 1 64 1
62 Houdini_15a 2500 413 1500 275 5857 15000 390 1 64 1
63 Houdini_15a 2500 413 1500 275 5857 15000 390 1 64 1
64 arasanx-64 2500 340 1500 227 4957 15000 330 1 64 1
65 arasanx-64 2500 337 1500 225 4938 15000 329 1 64 1
66 Slow64 2500 335 1500 223 4548 15000 303 1000 64 1
67 Slow64 2500 316 1500 211 4381 15000 292 100 64 1
68 Slow64 2500 300 1500 200 4128 15000 275 10 64 1
69 Slow64 2500 276 1500 184 3855 15000 257 1 64 1
70 Slow64 2500 275 1500 183 3832 15000 255 1 64 1
peter
Posts: 3385
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: STS re-re-re-re-re-visited

Post by peter »

Ferdy wrote: Mon Oct 24, 2022 11:11 am I have added STS1-STS15_LAN_v5.epd. Its max point is 100 instead of 10. The weakness of max 10-point system that I see in STS1-STS15_LAN_v4.epd is that the range is too narrow and it has difficulties differentiating engines that are close in strength.

I describe the interpolation that I use in the repository.
That does work already quite well as for "self- play"- measurements between versions or nets.

Code: Select all

   
    EPD  : epd\FM5.epd
    Time : 300ms
                                                Max     Total    Time    Hash          
    Engine           Score   Found  Pos   ELO   Score   Rate     ms      Mb   Cpu      
 1  lc0-805874       140168   1113  1500  4212  149703  93.6%    300     8    2
 2  lc0-784968       139735   1105  1500  4198  149703  93.3%    300     8    2
 3  lc0-785564       139475   1095  1500  4194  149703  93.2%    300     8    2

                                    Created with MEA
                                          by
                                       Ferdinand
                                         Mosca
Prerelease- version of LC0 (0.29 rc0) at 3070ti GPU with the 3 different nets in names. A recent match of 805874 against 784968 was described here

https://forum.computerschach.de/cgi-bin ... #pid158751

with a result of 17 Elo for the 80- net but with error bar of 39 in 95%- interval (only 100 games). So the 14 Elo in STS isn't of so much more or less difference as another one single result of its own, is it?
:)
Peter.
criko
Posts: 144
Joined: Wed Feb 28, 2018 2:50 pm

Re: STS re-re-re-re-re-visited

Post by criko »

Hi all,

will it be possible to set other engine parameters within the batch file?
f.e. GoldDigger for ShashChess.

cu Chris
peter
Posts: 3385
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: STS re-re-re-re-re-visited

Post by peter »

criko wrote: Wed Oct 26, 2022 11:56 am will it be possible to set other engine parameters within the batch file?
f.e. GoldDigger for ShashChess.
Hi!
As written in (second) mail, with this one edit of the .batch- file it did work somehow
set MT=100
set HASH=8
set THREADS=30
set PROTOCOL=uci
set EPD=epd\FM.epd
set MRL=mea_results

set EXE=engines\ShashChess25.2-x86-64-bmi2.exe
set NAME="ShashChess 25.2 GD"
set OPTION NAME=GoldDigger value true
mea.exe --engine %EXE% --name %NAME% --hash %HASH% --threads %THREADS% --protocol %PROTOCOL% --epd %EPD% --movetime %MT%
mrl %MRL% %MRL% %EPD% %MT%
setoption name GoldDigger value true
would be the console- command, pasting it directy into sts.bat failed (engine ran too but wasn't normally rated and at the end there was an error message from MEA- tool in console.
Adapting the Syntax to
set OPTION NAME=GoldDigger value true
was successful, at least for getting a normal output at the end at all.

This one result with ShashChess two times ranked, default and with GoldDigger checked (ShashChess 25.2GD), all engines using 30 threads of 16x3.5GHz CPU:

Code: Select all

    
    EPD  : epd\FM.epd
    Time : 100ms
                                                 Max     Total   Time    Hash          
    Engine            Score   Found  Pos   ELO   Score   Rate     ms      Mb  Cpu      
 1  Blue Marlin 15.3  144121   1195  1500  4333  149703  96.3%    100     8   30
 2  EMAN 8.40         144095   1191  1500  4333  149703  96.3%    100     8   30
 3  ShashChess 25.2   143945   1193  1500  4329  149703  96.2%    100     8   30
 4  CorChess 231022   143943   1191  1500  4329  149703  96.2%    100     8   30
 5  Stockfish 110922  143696   1187  1500  4320  149703  96.0%    100     8   30
 6  Stockfish 231022  143474   1186  1500  4311  149703  95.8%    100     8   30
 7  ShashChess 25.2GD 143440   1179  1500  4311  149703  95.8%    100     8   30

                                    Created with MEA
                                          by
                                       Ferdinand
                                         Mosca
brings up as many questions as it answers maybe, especially, if 100msec TC only isn't a disadvantage for less pruning, that would explain ShashChess GoldDigger- mode scoring less good than default e.g. One wouldn't check this mode for very short TC in game playing neither. With more hardware- time free I'll give 300msec a try again maybe, as I did with LC0 (counting some more initialation- time for this one engine) or even with 1", but I'd have to repeat all other runs too of course and that's quite some more hardware- time to spend all in all then again.

Just my two cents so far, regards
Peter.