Hard-Talkchess-2020 set, final release

AlexChess · Post by **AlexChess** » Fri Sep 02, 2022 7:39 am

amchess wrote: ↑Thu Sep 01, 2022 10:03 pm I created a new Hard Positions 2022 to test engines.
Every position is unsolved by at least a top engine and they are classified based on Shashin theory.
Every suggestion/advise is welcome.
https://github.com/amchess/ShashChess/b ... ns2022.epd
https://github.com/amchess/ShashChess/b ... s2022.xlsx

Thank you Andrea!

I will test ProteusSF-Piranha with them and report results here.

peter · Post by **peter** » Fri Sep 02, 2022 9:36 am

amchess wrote: ↑Thu Sep 01, 2022 10:03 pm I created a new Hard Positions 2022 to test engines.
Every position is unsolved by at least a top engine and they are classified based on Shashin theory.
Every suggestion/advise is welcome.
https://github.com/amchess/ShashChess/b ... ns2022.epd
https://github.com/amchess/ShashChess/b ... s2022.xlsx

Thanks for the suite, Andrea, but it seems, there are very different postions within as for difficulty for modern engines.
E.g. the first one

[fen]1k1rr2b/2q2p2/p4p1P/1p3p2/PR2bQ2/3B4/1PP5/1K1RN3 w - -[/fen]

1.Qc1 given as best move, I'd give it a !? ot ?! compared to other moves of about the same meaning for the already sure win of White"s.

It isn't good for a tactical single best move suite, it's from old Strategic Test Suite from Swaminathan and Corbit, which was to be solved with short TC way back then for engines of that time already, because the principle of that suite was to give positions to be solved out of "static eval" more than out of search, it was a positional test, not a tactical one.
Therefore postions with more than one solution were evaluated with a special system of points rewarding the candidate moves chosen by the engine. There was STS- Stats, a program from Philippe Gailhac to count the points of Arena- solution- file.

https://sites.google.com/site/strategic ... e/sts-stat

At the given position I have these comments in STS:
Qd2=10, Qf2=6, Qc1=10, Qf1=7
meaning these Queen- moves to be rewarded from 7 to 10 points corresponding to the moves chosen by the engine in short ponder - time.

At least these 4 Queen- moves are of about similar eval in MultiP'V=mode of e.g. SF dev. too, so without the point- system (which isn't correct to me as for this one position neither, but that comes from the engines' evauations of that rather long ago time ago) to evaluate the solution given by the engine that can't be used as a tactical single best move at all.
Qc1, which is solution in .epd of yours, isn't such a clear single game changer, and Qc1 isn't even best move, at least not with enough discrimination to e.g. the other in STS given Queen- moves. E.g. Qf2 is as clearly won for White as Qc1 is, as for SF dev.- eval there's about +4.50 for Qf2 and about 4.00 for Qf1 and Qc1.

Best regards

peter · Post by **peter** » Fri Sep 02, 2022 10:17 am

peter wrote: ↑Fri Sep 02, 2022 9:36 am Best regards

Just to tell, I'd have edited the posting of mine above much better and would have made it some shorter, if I wouldn't have had time out- errors just after first sending for about half an hour once in a while again at that very moment.

All last week it was fully impossible to log in at all with my Austrian IP, but once again, just to tell

amchess · Post by **amchess** » Fri Sep 02, 2022 10:49 am

Hi, Peter.
Thanks for your observation.
My idea is to test not only tactical positions (in terms of Shashin, Tal or Petrosian), but also strategic ones (Capablanca).
For this, I classified them.
So, in the case of your example, simply I have to correct in
bm Qd2 Qc1 (the moves with the max score)
Can you help me in finding all strategic positions of this type and do this type of correction?
Thanks a lot,
Andrea

peter wrote: ↑Fri Sep 02, 2022 9:36 am
amchess wrote: ↑Thu Sep 01, 2022 10:03 pm I created a new Hard Positions 2022 to test engines.
Every position is unsolved by at least a top engine and they are classified based on Shashin theory.
Every suggestion/advise is welcome.
https://github.com/amchess/ShashChess/b ... ns2022.epd
https://github.com/amchess/ShashChess/b ... s2022.xlsx
Thanks for the suite, Andrea, but it seems, there are very different postions within as for difficulty for modern engines.
E.g. the first one

[fen]1k1rr2b/2q2p2/p4p1P/1p3p2/PR2bQ2/3B4/1PP5/1K1RN3 w - -[/fen]

1.Qc1 given as best move, I'd give it a !? ot ?! compared to other moves of about the same meaning for the already sure win of White"s.

It isn't good for a tactical single best move suite, it's from old Strategic Test Suite from Swaminathan and Corbit, which was to be solved with short TC way back then for engines of that time already, because the principle of that suite was to give positions to be solved out of "static eval" more than out of search, it was a positional test, not a tactical one.
Therefore postions with more than one solution were evaluated with a special system of points rewarding the candidate moves chosen by the engine. There was STS- Stats, a program from Philippe Gailhac to count the points of Arena- solution- file.

https://sites.google.com/site/strategic ... e/sts-stat

At the given position I have these comments in STS:
Qd2=10, Qf2=6, Qc1=10, Qf1=7
meaning these Queen- moves to be rewarded from 7 to 10 points corresponding to the moves chosen by the engine in short ponder - time.

At least these 4 Queen- moves are of about similar eval in MultiP'V=mode of e.g. SF dev. too, so without the point- system (which isn't correct to me as for this one position neither, but that comes from the engines' evauations of that rather long ago time ago) to evaluate the solution given by the engine that can't be used as a tactical single best move at all.
Qc1, which is solution in .epd of yours, isn't such a clear single game changer, and Qc1 isn't even best move, at least not with enough discrimination to e.g. the other in STS given Queen- moves. E.g. Qf2 is as clearly won for White as Qc1 is, as for SF dev.- eval there's about +4.50 for Qf2 and about 4.00 for Qf1 and Qc1.

Best regards

peter · Post by **peter** » Fri Sep 02, 2022 11:46 am

amchess wrote: ↑Fri Sep 02, 2022 10:49 am My idea is to test not only tactical positions (in terms of Shashin, Tal or Petrosian), but also strategic ones (Capablanca).
For this, I classified them.
So, in the case of your example, simply I have to correct in
bm Qd2 Qc1 (the moves with the max score)
Can you help me in finding all strategic positions of this type and do this type of correction?

You're welcome, Andrea.
As a matter of fact, I'm already working at at revisited version of good old STS by myself too, but problem is, even the points given by comments there at very many of the positions don't fit modern engines' evals anymore neither.

As for the first one position in question, I'd have .cbh- format only for using it, giving togehter with Qf2 the three other Queeen- moves too but with commenting these moves as sidelines with = als prefix (not a subsequent = as a move comment of other meaning in that GUI) of each of these about equivalent moves, (in Fritz- GUI by "RR-equivalent is") which would make them being counted in automatic suite as "solved" too.

Yet 3 is the utmost number of moves to be counted at all for my personal trials with "STS-revised" so far, and the postions must not be too easy to be solved at all, the more equal candidates makes positions even more easily solved in most cases, I hope to get a suite of about 1000 positions out of the 1500 of old STS in .cbh format to be used finally as a modern positional test suite only again.
That will take much hardware- time and manpower- time still, I've just started lately, maybe I'll make it till end of that year working on it on my own only so far

As for a tactical single best move the position in question simply is too easy as for same hardware- time of difficult tactical positions, I'd say.
I'm planning a positional suite (to be solved more out of "static eval" of engines than out of search, the way the plan of STS was way back then too) to run about 1-3 sinlge seconds/positions SMP only, tactical suites at least with 15", so I don't think, that should work in one suite together statistically meaingful.

But with Frank Schubert's great program EloStatTS you can let some different runs of different suites with different hardware- TC be evaluated together in one ranking- and rating- list too, pity suites have to be run in .cbh- format at all, other GUIs don't work with EloStatTS that way.
But you can have a look at such a list here e.g.

forum3/viewtopic.php?p=932939#p932939

Will come back here, when I've finished a first seeing- through of your suite with some more results of my own next days.
We can exchange results and positions of each others together by email of course too, if you like to.

Edit: had already first 3 runs of your suite at 30"/position, 30 threads of 16x3.5GHz CPU, 4G hash and 6men Syzygys, MultiPV=4 for all of these 3 SF- branches I had in the list of given link above already too, same setting for ShashChess (GoldDigger and MCTS single thread on, all other options default, Persited learning off too of course)

Code: Select all

    Program                                    Elo   +/-  Matches  Score   Av.Op.   S.Pos.   MST1    MST2   RIndex

  1 CorChess3300522-Tactical-MV4             : 3508   15    364    51.7 %   3496   159/258    5.0s   14.6s   0.69
  2 ShashChess24-GoldDigger-MV4              : 3503   15    376    50.6 %   3499   162/258    6.6s   15.3s   0.66
  3 BlueMarlin15.3-avx2-MV4                  : 3489   16    368    47.7 %   3505   148/258    6.1s   16.3s   0.67



MST1  : Mean solution time (solved positions only)
MST2  : Mean solution time (solved and unsolved positions)
RIndex: Score according to solution time ranking for each position

Best regards

AlexChess · Post by **AlexChess** » Fri Sep 02, 2022 4:19 pm

amchess wrote: ↑Fri Sep 02, 2022 10:49 am Hi, Peter.
Thanks for your observation.
My idea is to test not only tactical positions (in terms of Shashin, Tal or Petrosian), but also strategic ones (Capablanca).
For this, I classified them.
So, in the case of your example, simply I have to correct in
bm Qd2 Qc1 (the moves with the max score)
Can you help me in finding all strategic positions of this type and do this type of correction?
Thanks a lot,
Andrea

Same here for position #1.
Is there a time limit for each position? Maybe we could bind this to kN/s. Example: 15 secs with a PC that calculates 40 Mn/s, so we can use every hardware respecting always this ratio, to have reliable results.

Kind regards, Alex

peter · Post by **peter** » Fri Sep 02, 2022 5:30 pm

peter wrote: ↑Fri Sep 02, 2022 11:46 am Will come back here, when I've finished a first seeing- through of your suite with some more results of my own next days.

Here are the first 5 positions as I would comment them in .pgn- format to be used as test positions in Fritz- GUIs with prefix =, meaning there, move is equal as solution and counted as such in automatic suite.
First one, having more then 3 equal candidate moves I'd not use in a test suite. If as player's name for White a number is given instead, that's the number of equal solutions, rest of header just give's hints for sources I found in my databases. "Event" gives number it has in Andrea's suite so far to make correlation between first version and later ones with new numbers easy to get.

[pgn][Event "1"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "4"]
[Black "?"]
[Result "*"]
[SetUp "1"]
[FEN "1k1rr2b/2q2p2/p4p1P/1p3p2/PR2bQ2/3B4/1PP5/1K1RN3 w - - 0 1"]
[PlyCount "1"]

{'CorChess 3 300522-Tactical-MV4' >30s BEST, 'ShashChess 24-GoldDigger-MV4'
>30s, 'Blue Marlin 15.3-avx2-MV4' >30s} 1. Qf2 (1. Qc1 $144) (1. Qf1 $144) (1.
Qd2 $144) *[/pgn]

[pgn][Event "2"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "Eret, 37."]
[Black "?"]
[Result "*"]
[SetUp "1"]
[FEN "1k6/bPN2pp1/Pp2p3/p1p5/2pn4/3P4/PPR5/1K6 w - - 0 1"]
[PlyCount "1"]

{'CorChess 3 300522-Tactical-MV4' 2.33s / 20, 'ShashChess 24-GoldDigger-MV4'
1.17s / 22 BEST, 'Blue Marlin 15.3-avx2-MV4' >30s} 1. Na8 *
[/pgn]

[pgn][Event "3"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "2, TacticalInsanity"]
[Black "Corbit, Dann"]
[Result "*"]
[SetUp "1"]
[FEN "1q6/5ppk/4b2p/3pP2N/1p1p1P1B/rPbP1Q1P/6PK/2R5 w - - 0 1"]
[PlyCount "1"]

{'CorChess 3 300522-Tactical-MV4' >30s, 'ShashChess 24-GoldDigger-MV4' 2.31s
/ 23 BEST, 'Blue Marlin 15.3-avx2-MV4' >30s} 1. Bf6 (1. Rf1 $144) *
[/pgn]

[pgn][Event "4"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "ACT, HTC."]
[Black "?"]
[Result "*"]
[SetUp "1"]
[FEN "1r1rb1k1/5ppp/4p3/1p1p3P/1q2P2Q/pN3P2/PPP4P/1K1R2R1 w - - 0 1"]
[PlyCount "1"]

{'CorChess 3 300522-Tactical-MV4' 2.72s / 17, 'ShashChess 24-GoldDigger-MV4'
3.27s / 23, 'Blue Marlin 15.3-avx2-MV4' 0.39s / 14 BEST} 1. Rxg7+ *
[/pgn]

[pgn][Event "5"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "2, HiarcsBook"]
[Black "?"]
[Result "*"]
[SetUp "1"]
[FEN "1r3r1k/2q1b1np/3p1pp1/pp2p1P1/5P1P/PB2B3/1PPQ1R2/1K1R4 w - - 0 1"]
[PlyCount "1"]

{'CorChess 3 300522-Tactical-MV4' 7.09s / 21 BEST, 'ShashChess
24-GoldDigger-MV4' 9.64s / 26, 'Blue Marlin 15.3-avx2-MV4' 12.25s / 25} 1. f5
(1. h5 $144) *
[/pgn]

Won't go on like this in forum taking too much space here, regards

Ajedrecista · Post by **Ajedrecista** » Fri Sep 02, 2022 8:03 pm

Hello Peter:

peter wrote: ↑Fri Sep 02, 2022 5:30 pm[...]

Won't go on like this in forum taking too much space here, regards

I am sure that you know this, but a friendly reminder: an amount of games can be copied in a single PGN tag, then each game can be seen through a drop-down list placed above the chessboard. For example:

[pgn][Event "1"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "4"]
[Black "?"]
[Result "*"]
[SetUp "1"]
[FEN "1k1rr2b/2q2p2/p4p1P/1p3p2/PR2bQ2/3B4/1PP5/1K1RN3 w - - 0 1"]
[PlyCount "1"]

{'CorChess 3 300522-Tactical-MV4' >30s BEST, 'ShashChess 24-GoldDigger-MV4'
>30s, 'Blue Marlin 15.3-avx2-MV4' >30s} 1. Qf2 (1. Qc1 $144) (1. Qf1 $144) (1.
Qd2 $144) *

[Event "2"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "Eret, 37."]
[Black "?"]
[Result "*"]
[SetUp "1"]
[FEN "1k6/bPN2pp1/Pp2p3/p1p5/2pn4/3P4/PPR5/1K6 w - - 0 1"]
[PlyCount "1"]

{'CorChess 3 300522-Tactical-MV4' 2.33s / 20, 'ShashChess 24-GoldDigger-MV4'
1.17s / 22 BEST, 'Blue Marlin 15.3-avx2-MV4' >30s} 1. Na8 *

[Event "3"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "2, TacticalInsanity"]
[Black "Corbit, Dann"]
[Result "*"]
[SetUp "1"]
[FEN "1q6/5ppk/4b2p/3pP2N/1p1p1P1B/rPbP1Q1P/6PK/2R5 w - - 0 1"]
[PlyCount "1"]

{'CorChess 3 300522-Tactical-MV4' >30s, 'ShashChess 24-GoldDigger-MV4' 2.31s
/ 23 BEST, 'Blue Marlin 15.3-avx2-MV4' >30s} 1. Bf6 (1. Rf1 $144) *

[Event "4"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "ACT, HTC."]
[Black "?"]
[Result "*"]
[SetUp "1"]
[FEN "1r1rb1k1/5ppp/4p3/1p1p3P/1q2P2Q/pN3P2/PPP4P/1K1R2R1 w - - 0 1"]
[PlyCount "1"]

{'CorChess 3 300522-Tactical-MV4' 2.72s / 17, 'ShashChess 24-GoldDigger-MV4'
3.27s / 23, 'Blue Marlin 15.3-avx2-MV4' 0.39s / 14 BEST} 1. Rxg7+ *

[Event "5"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "2, HiarcsBook"]
[Black "?"]
[Result "*"]
[SetUp "1"]
[FEN "1r3r1k/2q1b1np/3p1pp1/pp2p1P1/5P1P/PB2B3/1PPQ1R2/1K1R4 w - - 0 1"]
[PlyCount "1"]

{'CorChess 3 300522-Tactical-MV4' 7.09s / 21 BEST, 'ShashChess
24-GoldDigger-MV4' 9.64s / 26, 'Blue Marlin 15.3-avx2-MV4' 12.25s / 25} 1. f5
(1. h5 $144) *[/pgn]

In case of doubt, quoting this post will show how, just removing intermediate PGN tags at the end and start of each game.

Regards from Spain.

Ajedrecista.

peter · Post by **peter** » Fri Sep 02, 2022 8:59 pm

Ajedrecista wrote: ↑Fri Sep 02, 2022 8:03 pm I am sure that you know this, but a friendly reminder: an amount of games can be copied in a single PGN tag, then each game can be seen through a drop-down list placed above the chessboard.

Hello Jesus!

Don't be too sure about that, as a matter of fact, I didn't know it.

Problem remains, that even in single pgn- tag without quoting (and even with showing raw data by quoting) .pgn (not only the one of the reader here) isn't just the same as .cbh, where I take the games from, cause only in Fritz- (and cb-) GUI prefix = (infront of the move) is shown as it should. = put after move means position is equal, cb- prefix = means move is equivalent to main-line-move.
If you copy and paste the raw .pgn with quoting, in cb- GUIs presentation will be correct, Hiarcs CE interprets it correctly too but by verbally commenting the move as equivalent, but e.g. Shredder simply cut's off the prefix, .pgn- reader here shows it but incorrectly behind the move instead of infront of it.

So, not to make more confusion than necessary, I'll rather give download- link to a .cbv- file (archive of .cbh) next time, but to have merit from such, there should be some more positions (games) stored already.

Thanks for the tip yet anyhow regards

criko · Post by **criko** » Sat Sep 03, 2022 9:03 am

Can someone recommend a good / or the "best" net foe lc0 for testsuites?
Please with a download link.
Thx in advance.

Hard-Talkchess-2020 set, final release

Re: Hard-Talkchess-2020 set, final release

Re: Hard-Talkchess-2020 set, final release

Re: Hard-Talkchess-2020 set, final release

Re: Hard-Talkchess-2020 set, final release

Re: Hard-Talkchess-2020 set, final release

Re: Hard-Talkchess-2020 set, final release

Re: Hard-Talkchess-2020 set, final release

Re: Hard-Talkchess-2020 set, final release.

Re: Hard-Talkchess-2020 set, final release.

Re: Hard-Talkchess-2020 set, final release