STS re-re-re-re-re-visited

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Rebel
Posts: 7040
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

STS re-re-re-re-re-visited

Post by Rebel »

Image

For STS lovers -

STS stands for Strategic Test Suite. From the CPW we read -

Strategic Test Suite, (STS) a series of themed test suites by Dann Corbit and Swaminathan Natarajan, designed to evaluate chess engine's long term understanding of strategical and positional concepts. More recently, the positions were revised and converted by Ferdinand Mosca to be used by an analysis tool, MEA.

......

We made an effort to give STS a major update by re-analyzing the 1500 positions with nowadays strongest engines, Stockfish 15 and Lc0. And with MEA you can produce your own alternative rating list. See [ SF15 ] [ LC0 ] or as text file [ SF15 ] [ LC0 ].

......

http://rebel13.nl/misc/sts.html
90% of coding is debugging, the other 10% is writing bugs.
smatovic
Posts: 2797
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: STS re-re-re-re-re-visited

Post by smatovic »

Great, thx! Any chance for CAN (pure algebraic coordinate as used in XBoard) instead SAN notation as .epd download?

--
Srdja
User avatar
Rebel
Posts: 7040
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: STS re-re-re-re-re-visited

Post by Rebel »

smatovic wrote: Sat Oct 22, 2022 6:19 pmGreat, thx! Any chance for CAN (pure algebraic coordinate as used in XBoard) instead SAN notation as .epd download?
Do you mean g1f3 instead of Nf3 ?

There is no such option in MEA.
90% of coding is debugging, the other 10% is writing bugs.
peter
Posts: 3206
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: STS re-re-re-re-re-visited

Post by peter »

Rebel wrote: Sat Oct 22, 2022 12:35 pm Image

For STS lovers -

STS stands for Strategic Test Suite. From the CPW we read -

Strategic Test Suite, (STS) a series of themed test suites by Dann Corbit and Swaminathan Natarajan, designed to evaluate chess engine's long term understanding of strategical and positional concepts. More recently, the positions were revised and converted by Ferdinand Mosca to be used by an analysis tool, MEA.

......

We made an effort to give STS a major update by re-analyzing the 1500 positions with nowadays strongest engines, Stockfish 15 and Lc0. And with MEA you can produce your own alternative rating list. See [ SF15 ] [ LC0 ] or as text file [ SF15 ] [ LC0 ].

......

http://rebel13.nl/misc/sts.html
Great, Ed!
:)
Didn't have a closer look at the newly evaluated postions so far, just at the two examples given at download site, the second one from there

1r6/3n4/p2b3p/2nN1kpP/1BP1p3/3rP3/3NKP2/1R1R4 w - - bm Bxc5; c0 "Bc3=10, Bxc5=8, Rdc1=0, Re1=0";

promises much fun, bm Bxc5 but more points for Bc3, guess (if it's a feature, not a bug :)) that's for keeping more material on the board? Yet not quite clear why then Bc3 isn't rated best move also. Probably because engines evaluate Bxc5 better?


1r6/3n4/p2b3p/2nN1kpP/1BP1p3/3rP3/3NKP2/1R1R4 w - - 0 1

Analysis by Stockfish 110922:

Code: Select all

1. +/= (0.47): 1.Lxc5 Sxc5 2.Txb8 Lxb8 3.Tb1 Le5 4.Se7+ Ke6 5.Sg8 Td8 6.Sxh6 Lc7 7.Sb3 Sxb3 8.Txb3 Th8 9.Tb7 Kd7 10.Sf5 Txh5 11.Se7 Kxe7 12.Txc7+ Kf6 13.Tc6+ Kf5 14.Txa6 Th1 15.Ta5+ Kf6 16.Td5 Tc1 17.Td6+ Kf5 18.Tc6 Ta1 19.Tc7 Kg4 20.c5 Ta2+ 21.Kf1 Ta1+ 22.Kg2 Tc1 23.c6 Tc2 24.Tc8 Tc5 25.c7 Tc6 26.Te8 Txc7 27.Txe4+ Kh5 28.Ta4 Tc5 29.Ta8 Tc1 30.Ta5 Tc4 31.Kg3 Kg6 32.Ta6+ Kf7 33.Ta8 

2. = (0.21): 1.Lc3 Txb1 2.Txb1 Sa4 3.Ld4 Sdc5 4.Tf1 Le5 5.f4 Lxd4 6.fxg5+ Kxg5 7.Sxe4+ Sxe4 8.Tg1+ Kf5 9.Se7+ Ke6 10.Kxd3 Lf6 11.Sg8 Sg5 12.Sxh6 Sb2+ 13.Kc2 Sxc4 14.Sg4 Kf5 15.h6 Se5 16.Sf2 Kg6 17.Th1 Kh7 18.Th5 Ld8 19.Sd1 Sgf3 20.Th3 La5 21.Sf2 Sg5 22.Th1 Lb6 23.Kd2 a5 24.Ke2 Ld8 25.Td1 Lc7 26.Tf1 Ld6 27.Sd1 Se4 28.Tf4 Sc5 

3. = (0.00): 1.Tdc1 Se5 2.Sc3 Sa4 3.Sdxe4 Txc3 4.Lxc3 Txb1 5.Txb1 Kxe4 6.La1 Kf5 7.Ld4 Sxc4 8.Tc1 Se5 9.f3 Ke6 10.Tc8 Kd5 11.Th8 Sf7 12.Th7 Ke6 13.Tg7 Sc5 14.Tg6+ Ke7 15.Lxc5 Lxc5 16.Txa6 Ld6 17.Ta4 Ke6 18.Ta8 Kf6 19.Tg8 Se5 20.Th8 Sf7 

4. = (0.00): 1.Te1 Se5 2.Ted1 g4 3.Lxc5 Txb1 4.Txb1 Lxc5 5.Sf1 Sf3 6.Sg3+ Ke5 7.Sxe4 Txd5 8.cxd5 Kxe4 9.Tb8 Se5 10.Th8 g3 11.fxg3 Lxe3 12.d6 Lg5 13.Th7 Kf5 14.Ta7 Ke6 15.Txa6 Sd7 16.Kf3 Sc5 17.Ta7 Sd7 18.g4 Se5+ 19.Kg3 Sd7 20.Ta1 Se5 21.Ta5 Kxd6 22.Tb5 Sd7 23.Kg2 Se5 24.Tb4 Sd7 25.Ta4 Kd5 26.Kf3 Se5+ 27.Kg3 Ke6 28.Ta7 Kd5 29.Ta8 Sd7 30.Ta5+ Kd6 
White is slightly better


30 threads of 16x3.5GHz CPU, 8G hash, depth 51/67, MultiPV=4

Sorry for German abbr., language was set to English as the text at end of output shows, just didn't work correctly with Fritz here.

Thanks for the big work (I know rather weil how big that really is, just to see all the positions through and re- evaluate them)
:!:
Peter.
smatovic
Posts: 2797
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: STS re-re-re-re-re-visited

Post by smatovic »

Rebel wrote: Sat Oct 22, 2022 6:49 pm
smatovic wrote: Sat Oct 22, 2022 6:19 pmGreat, thx! Any chance for CAN (pure algebraic coordinate as used in XBoard) instead SAN notation as .epd download?
Do you mean g1f3 instead of Nf3 ?
Yes.
Rebel wrote: Sat Oct 22, 2022 6:49 pm There is no such option in MEA.
I use .epd with CAN in my own test-suite environment. Or is there any tool which can convert this easily?

--
Srdja
Ferdy
Posts: 4840
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: STS re-re-re-re-re-visited

Post by Ferdy »

Rebel wrote: Sat Oct 22, 2022 12:35 pm Image

For STS lovers -

STS stands for Strategic Test Suite. From the CPW we read -

Strategic Test Suite, (STS) a series of themed test suites by Dann Corbit and Swaminathan Natarajan, designed to evaluate chess engine's long term understanding of strategical and positional concepts. More recently, the positions were revised and converted by Ferdinand Mosca to be used by an analysis tool, MEA.

......

We made an effort to give STS a major update by re-analyzing the 1500 positions with nowadays strongest engines, Stockfish 15 and Lc0. And with MEA you can produce your own alternative rating list. See [ SF15 ] [ LC0 ] or as text file [ SF15 ] [ LC0 ].

......

http://rebel13.nl/misc/sts.html
There was a bug in MEA that was fixed already. The issue was on multipv. Latest exe file is v1.0.

I am not sure if this affects the sts-mea tool.
User avatar
Rebel
Posts: 7040
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: STS re-re-re-re-re-visited

Post by Rebel »

Ferdy wrote: Sat Oct 22, 2022 7:51 pm
There was a bug in MEA that was fixed already. The issue was on multipv. Latest exe file is v1.0.

I am not sure if this affects the sts-mea tool.
rq4kb/5p1p/3p1Pp1/p2Pp1P1/3pP1bP/1P6/P1Q2R2/4BBK1 w - - c0 "Qc6=10, a4=13, Qd3=14, Qc4=23";

Last EPD (1500) of Lc0 analysis.

Reading the bug report I wonder if odd MultiPv=4 cases like these could be part of the bug?

Image

Arena screen confirms unsorted order in the EPD, but in the Arena analysis it is sorted, see:

Code: Select all

  6/15	00:20	      23.470	1.319	+8,40	Qc2-c4 Qb8-d8 Qc4-c6 Ra8-c8 Qc6-a6 Rc8-a8 Qa6-b7 Ra8-c8 Be1-d2 h7-h6
  6/15	00:20	      23.470	1.319	+8,24	Qc2-d3 Bg4-d7 a2-a4 h7-h5 Rf2-c2 Qb8-d8 Bf1-h3 Bd7xh3 Qd3xh3
  6/15	00:20	      23.470	1.319	+7,96	a2-a4 Bg4-d7 Qc2-d3 h7-h5 Rf2-c2 Qb8-d8 Kg1-h2 Bh8xf6 g5xf6
  6/15	00:20	      23.470	1.319	+7,70	Qc2-c6 Qb8-d8 Qc6-b7 Ra8-c8 Be1-d2 h7-h6 Bf1-d3 Bg4-d7 Qb7-a6 Bd7-h3 Kg1-h2
So the question is what Lc0 tells MEA, sorted or unsorted, I guess the latter, but....
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 7040
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: STS re-re-re-re-re-visited

Post by Rebel »

peter wrote: Sat Oct 22, 2022 6:51 pm
Great, Ed!
:)
Didn't have a closer look at the newly evaluated postions so far, just at the two examples given at download site, the second one from there

1r6/3n4/p2b3p/2nN1kpP/1BP1p3/3rP3/3NKP2/1R1R4 w - - bm Bxc5; c0 "Bc3=10, Bxc5=8, Rdc1=0, Re1=0";

promises much fun, bm Bxc5 but more points for Bc3, guess (if it's a feature, not a bug :)) that's for keeping more material on the board? Yet not quite clear why then Bc3 isn't rated best move also. Probably because engines evaluate Bxc5 better?


1r6/3n4/p2b3p/2nN1kpP/1BP1p3/3rP3/3NKP2/1R1R4 w - - 0 1

Analysis by Stockfish 110922:

Code: Select all

1. +/= (0.47): 1.Lxc5 Sxc5 2.Txb8 Lxb8 3.Tb1 Le5 4.Se7+ Ke6 5.Sg8 Td8 6.Sxh6 Lc7 7.Sb3 Sxb3 8.Txb3 Th8 9.Tb7 Kd7 10.Sf5 Txh5 11.Se7 Kxe7 12.Txc7+ Kf6 13.Tc6+ Kf5 14.Txa6 Th1 15.Ta5+ Kf6 16.Td5 Tc1 17.Td6+ Kf5 18.Tc6 Ta1 19.Tc7 Kg4 20.c5 Ta2+ 21.Kf1 Ta1+ 22.Kg2 Tc1 23.c6 Tc2 24.Tc8 Tc5 25.c7 Tc6 26.Te8 Txc7 27.Txe4+ Kh5 28.Ta4 Tc5 29.Ta8 Tc1 30.Ta5 Tc4 31.Kg3 Kg6 32.Ta6+ Kf7 33.Ta8 

2. = (0.21): 1.Lc3 Txb1 2.Txb1 Sa4 3.Ld4 Sdc5 4.Tf1 Le5 5.f4 Lxd4 6.fxg5+ Kxg5 7.Sxe4+ Sxe4 8.Tg1+ Kf5 9.Se7+ Ke6 10.Kxd3 Lf6 11.Sg8 Sg5 12.Sxh6 Sb2+ 13.Kc2 Sxc4 14.Sg4 Kf5 15.h6 Se5 16.Sf2 Kg6 17.Th1 Kh7 18.Th5 Ld8 19.Sd1 Sgf3 20.Th3 La5 21.Sf2 Sg5 22.Th1 Lb6 23.Kd2 a5 24.Ke2 Ld8 25.Td1 Lc7 26.Tf1 Ld6 27.Sd1 Se4 28.Tf4 Sc5 

3. = (0.00): 1.Tdc1 Se5 2.Sc3 Sa4 3.Sdxe4 Txc3 4.Lxc3 Txb1 5.Txb1 Kxe4 6.La1 Kf5 7.Ld4 Sxc4 8.Tc1 Se5 9.f3 Ke6 10.Tc8 Kd5 11.Th8 Sf7 12.Th7 Ke6 13.Tg7 Sc5 14.Tg6+ Ke7 15.Lxc5 Lxc5 16.Txa6 Ld6 17.Ta4 Ke6 18.Ta8 Kf6 19.Tg8 Se5 20.Th8 Sf7 

4. = (0.00): 1.Te1 Se5 2.Ted1 g4 3.Lxc5 Txb1 4.Txb1 Lxc5 5.Sf1 Sf3 6.Sg3+ Ke5 7.Sxe4 Txd5 8.cxd5 Kxe4 9.Tb8 Se5 10.Th8 g3 11.fxg3 Lxe3 12.d6 Lg5 13.Th7 Kf5 14.Ta7 Ke6 15.Txa6 Sd7 16.Kf3 Sc5 17.Ta7 Sd7 18.g4 Se5+ 19.Kg3 Sd7 20.Ta1 Se5 21.Ta5 Kxd6 22.Tb5 Sd7 23.Kg2 Se5 24.Tb4 Sd7 25.Ta4 Kd5 26.Kf3 Se5+ 27.Kg3 Ke6 28.Ta7 Kd5 29.Ta8 Sd7 30.Ta5+ Kd6 
White is slightly better


30 threads of 16x3.5GHz CPU, 8G hash, depth 51/67, MultiPV=4

Sorry for German abbr., language was set to English as the text at end of output shows, just didn't work correctly with Fritz here.

Thanks for the big work (I know rather weil how big that really is, just to see all the positions through and re- evaluate them)
:!:
Well, as you can see in the Lc0 example in my reply to Ferdy MuliPv analysis isn't exact science :wink:
90% of coding is debugging, the other 10% is writing bugs.
peter
Posts: 3206
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: STS re-re-re-re-re-visited

Post by peter »

Rebel wrote: Sat Oct 22, 2022 9:17 pm Well, as you can see in the Lc0 example in my reply to Ferdy MuliPv analysis isn't exact science :wink:
:)
Yet question, if Bxc5 is best move or Bc3 is, must be answered in one or the other one way. If Bxc5 is, as given in .epd- string, then one would expect it to get the maximum of points too.

Edit: in the meantime I compared the fen given at download- site

1r6/3n4/p2b3p/2nN1kpP/1BP1p3/3rP3/3NKP2/1R1R4 w - - bm Bxc5; c0 "Bc3=10, Bxc5=8, Rdc1=0, Re1=0";

with the position nr. 24 in sts-SF15.epd, there it's defined like this

1r6/3n4/p2b3p/2nN1kpP/1BP1p3/3rP3/3NKP2/1R1R4 w - - c0 "Bxc5=10, Bc3=4, Rdc1=2, Rf1=1";

, but in sts- LC0.epd:

1r6/3n4/p2b3p/2nN1kpP/1BP1p3/3rP3/3NKP2/1R1R4 w - - c0 "Bc3=10, Bxc5=9, Rg1=8, Rf1=6";

Bug or feature's still the question to me, and why two different .epd- sets at all?
I know LC0 gives other evaluations than SF does, but shouldn't yet one or the other one be the more or less correct one, at least as for choosing the one or the other one move to be best, and shouldn't at least human evaluation then give more or less points to the one or the other one move accordingly, deciding for the one or the other one move to be the better one and get the maximum of points then too?
Peter.
User avatar
Rebel
Posts: 7040
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: STS re-re-re-re-re-visited

Post by Rebel »

peter wrote: Sat Oct 22, 2022 10:10 pm
Rebel wrote: Sat Oct 22, 2022 9:17 pm Well, as you can see in the Lc0 example in my reply to Ferdy MuliPv analysis isn't exact science :wink:
:)
Yet question, if Bxc5 is best move or Bc3 is, must be answered in one or the other one way. If Bxc5 is, as given in .epd- string, then one would expect it to get the maximum of points too.

Edit: in the meantime I compared the fen given at download- site

1r6/3n4/p2b3p/2nN1kpP/1BP1p3/3rP3/3NKP2/1R1R4 w - - bm Bxc5; c0 "Bc3=10, Bxc5=8, Rdc1=0, Re1=0";

with the position nr. 24 in sts-SF15.epd, there it's defined like this

1r6/3n4/p2b3p/2nN1kpP/1BP1p3/3rP3/3NKP2/1R1R4 w - - c0 "Bxc5=10, Bc3=4, Rdc1=2, Rf1=1";

, but in sts- LC0.epd:

1r6/3n4/p2b3p/2nN1kpP/1BP1p3/3rP3/3NKP2/1R1R4 w - - c0 "Bc3=10, Bxc5=9, Rg1=8, Rf1=6";
About the epd I colored red, it's from the file STS-OLD.EPD from 2008.

The new calculated STS EPD's don't have a "bm" and neither have moves with a "0" score.
Bug or feature's still the question to me, and why two different .epd- sets at all?
I know LC0 gives other evaluations than SF does, but shouldn't yet one or the other one be the more or less correct one, at least as for choosing the one or the other one move to be best, and shouldn't at least human evaluation then give more or less points to the one or the other one move accordingly, deciding for the one or the other one move to be the better one and get the maximum of points then too?
I want people to give a choice which engine they trust most. I am definitely in favor of Lc0.

You can of course lump SF15 and Lc0 together in one list and then you get something like this.

Code: Select all

                                                Max   Total   Time   Hash          
    Engine           Score   Found  Pos   ELO  Score   Rate    ms     Mb  Cpu      
 1  Berserk 9-LC0    13314   1107  1500  3552  15000  88.8%    100    64    1
 2  Koivisto 8.0-LCO 13238   1089  1500  3532  15000  88.3%    100    64    1
 3  Rebel 15.1-LC0   13003   1082  1500  3468  15000  86.7%    100    64    1
 4  Seer 2.5.0-LC0   12690   1033  1500  3384  15000  84.6%    100    64    1
 5  Berserk 9        12350   1032  1500  3292  15000  82.3%    100    64    1
 6  Koivisto 8.0     12260   1018  1500  3268  15000  81.7%    100    64    1
 7  Rebel 15.1       12048   1004  1500  3212  15000  80.3%    100    64    1
 8  Seer 2.5.0       11788    979  1500  3144  15000  78.6%    100    64    1
 9  Toga 4-LC0       10448    792  1500  2788  15000  69.7%    100    64    1
10  Fruit 2.1-LC0     9850    714  1500  2628  15000  65.7%    100    64    1
11  Toga 4            9643    754  1500  2572  15000  64.3%    100    64    1
12  Fruit 2.1         8722    671  1500  2324  15000  58.1%    100    64    1
Not so nice :D
90% of coding is debugging, the other 10% is writing bugs.