A question about "win at chess (new)" test suite

chesskobra · Post by **chesskobra** » Fri Sep 13, 2024 12:02 pm

I ran this test suite (obtained from the Arasan repository) with 4 engines Stockfish 17, Arasan 24.2.2, Komodo 14.1 (bmi2) and Laser 1.8 beta on two machines, with the following results.

Code: Select all

300 positions, 5 seconds per position 
Stockfish 281, Arasan 287, Komodo 298, Laser 294 on Dell Latitude 3450 4 GB RAM.
Stockfish 286, Arasan 293, Komodo 298, Laser 298 on Dell XPS 8960.

On all other tests I ran, SF scored significantly better than these other engines on both machines. Why does it score low on WAC (new)? Are there multiple or incorrect solutions for some tests? Or is there some other reason?

Jouni · Post by **Jouni** » Mon Sep 16, 2024 2:50 pm

WAC suite has been outdated almost 30 years already. Too easy positions. And a lot alternative moves missing. In position 2 there is over 10 winning moves

.
[d]8/7p/5k2/5p2/p1p2P2/Pr1pPK2/1P1R3P/8 b - - 0 1

peter · Post by **peter** » Mon Sep 16, 2024 6:11 pm

Jouni wrote: ↑Mon Sep 16, 2024 2:50 pm WAC suite has been outdated almost 30 years already. Too easy positions. And a lot alternative moves missing. In position 2 there is over 10 winning moves .
[d]8/7p/5k2/5p2/p1p2P2/Pr1pPK2/1P1R3P/8 b - - 0 1

Which doesn't mean that there isn't one of them better than all the other ones, MultiPV=2:

Engine: Stockfish 170 (32768 MB)
von the Stockfish developers (see AUTHORS f
Found 510 WDL and 510 DTZ tablebase files (up to 6-man).
Available processors: 0-31
Using 30 threads
...
39 2:55 -199.43 1...Txb2 2.Txb2 c3 3.Tb6+ Ke7 4.Tb7+ Kd6 5.Tb6+ Kc7 6.Tb1 c2 7.Tc1 Kb6 8.Kf2 d2 9.Txc2 d1D 10.Tc8 Dd2+ 11.Kf3 Dd5+ 12.Kg3 Dd3 13.Tf8 Dxa3 14.Tf6+ Kc7 (4.144.577.885) 23678 TB:6.430.237
38 2:55 -7.58 1...Tb8 2.h4 h5 3.Tg2 Tb7 4.Tf2 Tg7 5.e4 fxe4+ 6.Kxe4 Te7+ 7.Kf3 Te1 8.Tg2 Tc1 9.f5 Th1 10.Tg6+ Kxf5 11.Tg5+ Ke6 12.Tg6+ Ke5 13.Ke3 Th3+ 14.Kd2 Th2+ (4.144.577.885) 23678 TB:6.430.237

Much too little singularity of solution is in pos. nr. 223 and 230, replacing those by e.g. Eret nr.1 and nr.3 you have still 300 for modern engines easily solvable positions useable for VSTC and not too strong hardware, e.g. single thread and 1"/pos., regards

peter · Post by **peter** » Tue Sep 17, 2024 11:00 am

Edit:
To these positions I added moves to the given solutions with about as good chances for the side to move as the first ones have:

1r1rb1k1/2p3pp/p2q1p2/3PpP1Q/Pp1bP2N/1B5R/1P4PP/2B4K w - - bm Qxh7+ Ng6; id "WAC.185";
8/8/8/1p5r/p1p1k1pN/P2pBpP1/1P1K1P2/8 b - - bm Rxh4 b4; id "WAC.229";
5r1k/1p4pp/3q4/3Pp1R1/8/8/PP4PP/4Q1K1 b - - bm Qc5+ Qb6+; id "WAC.248";
5r1k/3b2p1/p6p/1pRpR3/1P1P2q1/P4pP1/5QnP/1B4K1 w - - bm h3 Rc3; id "WAC.291";

, and nr. 89 and150 have better to be replaced as well as 223 and 230 have to, e.g. with Eret Nr.4+Nr.5 (+1, +3)
To get more discrimination out of the near to each other high numbers of solutions from stronger engines even with VSTC, one interesting way to me (besides e.g. MEA with differently high points per solution according to their "difficulty") is to simply divide them (the solution- numbers) by the amount of seconds used for all correct solutions together, in Shredder GUI given as "SolTime". Thus you get solutions per seconds, which differ significantly more than numbers of solutions only do, regards

chesskobra · Post by **chesskobra** » Tue Sep 17, 2024 11:15 am

Thanks for these comments. Does Arena support epds with multiple best moves? Also, what are some other test suites with unique best moves and not just elementary puzzles? I have found github repositories with epd files, but it is unclear which ones have been updated more recently using stronger engines and longer time limits.

peter · Post by **peter** » Tue Sep 17, 2024 11:30 am

chesskobra wrote: ↑Tue Sep 17, 2024 11:15 am Does Arena support epds with multiple best moves?

Yes, as well as any other GUI using .epd- strings with its regular syntax, as for .pgn, alternative solutions imported from such .epd- strings then become uncommented variants instead, also treated as equal solutions by GUIs adjudicating .pgn (Fritz- .cbh-) suites automatically.

Also, what are some other test suites with unique best moves and not just elementary puzzles? I have found github repositories with epd files, but it is unclear which ones have been updated more recently using stronger engines and longer time limits.

Classical still is Eret, Arasan, even if both have got to be used with short to very short hardware- TC nowadays too, and even e.g. HTC and ACT needs much less time/pos. now then some years ago at being collected then. So tools to get more discrimination out of higher numbers of solutions like EloStatTS and MEA help a lot as for error bars of results, latter (MEA) lets suites of positions with multiple solution get useable too, regards

chesskobra · Post by **chesskobra** » Tue Sep 17, 2024 11:52 am

peter wrote: ↑Tue Sep 17, 2024 11:30 am Classical still is Eret, Arasan, even if both have got to be used with short to very short hardware- TC nowadays too, and even e.g. HTC and ACT needs much less time/pos. now then some years ago at being collected then. So tools to get more discrimination out of higher numbers of solutions like EloStatTS and MEA help a lot as for error bars of results, latter (MEA) lets suites of positions with multiple solution get useable too, regards

By MEA do you mean the program by Ferdy from https://github.com/fsmosca/Multiple-move-Epd-Analyzer? I tried using it once, but got the following error:

Code: Select all

python3 mea.py -e /usr/local/games/sf17 -n "Stockfish 17" -i eret.epd -m 512 -a 10000 -t 2 -p uci 

Problem reading c0 field in epd: r1bqk1r1/1p1p1n2/p1n2pN1/2p1b2Q/2P1Pp2/1PN5/PB4PP/R4RK1 w q - - bm Rxf4; id "ERET 001 - Relief";
This position is not included.

But I would like a script like that, and plan to test it more.

peter · Post by **peter** » Tue Sep 17, 2024 12:43 pm

chesskobra wrote: ↑Tue Sep 17, 2024 11:52 am By MEA do you mean the program by Ferdy from https://github.com/fsmosca/Multiple-move-Epd-Analyzer? I tried using it once, but got the following error:
Code: Select all
python3 mea.py -e /usr/local/games/sf17 -n "Stockfish 17" -i eret.epd -m 512 -a 10000 -t 2 -p uci 

Problem reading c0 field in epd: r1bqk1r1/1p1p1n2/p1n2pN1/2p1b2Q/2P1Pp2/1PN5/PB4PP/R4RK1 w q - - bm Rxf4; id "ERET 001 - Relief";
This position is not included.
  
But I would like a script like that, and plan to test it more.

For MEA you need special syntax of .epd, the position of yours should look like this e.g.:

r1bqk1r1/1p1p1n2/p1n2pN1/2p1b2Q/2P1Pp2/1PN5/PB4PP/R4RK1 w q - bm Rxf4; c0 "Rf4=75"; id "ERET1";

The reward of 75 points for the correct solution is chosen by me empirically considering it to be "found" (as for hardware- time) yet some less easily then the positions of e.g. STS (Strategic Test Suite) that came along with download of MEA- tool ad Ed's site way back then in MEA- syntax. Especially also because those (STS- positions) are of multiple solutions normally, so each single one of them is to be chosen more probably but a tactical single best one, that's why it's a "strategical" suite, not a "tactical", as the most used other kinds of are.
Here's an example out of STS in MEA- syntax, out of earlier versions of Ed Schröder's, modified by Ferdinand Mosca then:

1kr5/3n4/q3p2p/p2n2p1/PppB1P2/5BP1/1P2Q2P/3R2K1 w - - bm f5; id "STS(v1.0) Undermine.001"; c0 "f5=100, Bf2=68, fxg5=46, b3=39, Bg7=32, Bg4=22, Kh1=11, Be3=8, Bxd5=6, h3=5"; c7 "f5 Bf2 fxg5 b3 Bg7 Bg4 Kh1 Be3 Bxd5 h3"; c8 "100 68 46 39 32 22 11 8 6 5"; c9 "f4f5 d4f2 f4g5 b2b3 d4g7 f3g4 g1h1 d4e3 f3d5 h2h3";

Notice, the points here are meant for much shorter hardware- TC than I use to use for tactical single best move- positions normally. MEA- STS (up to 1500 positions) is run with e.g. 100msec/pos. by Schröder and Mosca.

peter · Post by **peter** » Tue Sep 17, 2024 1:46 pm

peter wrote: ↑Tue Sep 17, 2024 12:43 pm For MEA you need special syntax of .epd, the position of yours should look like this e.g.:

r1bqk1r1/1p1p1n2/p1n2pN1/2p1b2Q/2P1Pp2/1PN5/PB4PP/R4RK1 w q - bm Rxf4; c0 "Rf4=75"; id "ERET1";

Pity I didn't notice in edit- time having missed the x at "Rxf4=75" for the rewarding- points, at bm it's ok, but Rf4=75 as mistyped then doesn't work correctly, even if you won't get any error message, points for correct solution just won't be counted for this one move.

chesskobra · Post by **chesskobra** » Wed Sep 18, 2024 12:19 am

peter wrote: ↑Tue Sep 17, 2024 12:43 pm
1kr5/3n4/q3p2p/p2n2p1/PppB1P2/5BP1/1P2Q2P/3R2K1 w - - bm f5; id "STS(v1.0) Undermine.001"; c0 "f5=100, Bf2=68, fxg5=46, b3=39, Bg7=32, Bg4=22, Kh1=11, Be3=8, Bxd5=6, h3=5"; c7 "f5 Bf2 fxg5 b3 Bg7 Bg4 Kh1 Be3 Bxd5 h3"; c8 "100 68 46 39 32 22 11 8 6 5"; c9 "f4f5 d4f2 f4g5 b2b3 d4g7 f3g4 g1h1 d4e3 f3d5 h2h3";

Notice, the points here are meant for much shorter hardware- TC than I use to use for tactical single best move- positions normally. MEA- STS (up to 1500 positions) is run with e.g. 100msec/pos. by Schröder and Mosca.

Thank you for explaining. How are the numbers corresponding to different moves obtained? Is it by normalizing the evaluation of the top move to 100 and adjusting the evaluations of other moves in proportion?

A question about "win at chess (new)" test suite

A question about "win at chess (new)" test suite

Re: A question about "win at chess (new)" test suite

Re: A question about "win at chess (new)" test suite

Re: A question about "win at chess (new)" test suite

Re: A question about "win at chess (new)" test suite

Re: A question about "win at chess (new)" test suite

Re: A question about "win at chess (new)" test suite

Re: A question about "win at chess (new)" test suite

Re: A question about "win at chess (new)" test suite

Re: A question about "win at chess (new)" test suite