A question about "win at chess (new)" test suite

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

chesskobra
Posts: 358
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

A question about "win at chess (new)" test suite

Post by chesskobra »

I ran this test suite (obtained from the Arasan repository) with 4 engines Stockfish 17, Arasan 24.2.2, Komodo 14.1 (bmi2) and Laser 1.8 beta on two machines, with the following results.

Code: Select all

300 positions, 5 seconds per position 
Stockfish 281, Arasan 287, Komodo 298, Laser 294 on Dell Latitude 3450 4 GB RAM.
Stockfish 286, Arasan 293, Komodo 298, Laser 298 on Dell XPS 8960.
On all other tests I ran, SF scored significantly better than these other engines on both machines. Why does it score low on WAC (new)? Are there multiple or incorrect solutions for some tests? Or is there some other reason?
Jouni
Posts: 3735
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: A question about "win at chess (new)" test suite

Post by Jouni »

WAC suite has been outdated almost 30 years already. Too easy positions. And a lot alternative moves missing. In position 2 there is over 10 winning moves :) .
[d]8/7p/5k2/5p2/p1p2P2/Pr1pPK2/1P1R3P/8 b - - 0 1
Jouni
peter
Posts: 3449
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: A question about "win at chess (new)" test suite

Post by peter »

Jouni wrote: Mon Sep 16, 2024 2:50 pm WAC suite has been outdated almost 30 years already. Too easy positions. And a lot alternative moves missing. In position 2 there is over 10 winning moves :) .
[d]8/7p/5k2/5p2/p1p2P2/Pr1pPK2/1P1R3P/8 b - - 0 1
Which doesn't mean that there isn't one of them better than all the other ones, MultiPV=2:

Engine: Stockfish 170 (32768 MB)
von the Stockfish developers (see AUTHORS f
Found 510 WDL and 510 DTZ tablebase files (up to 6-man).
Available processors: 0-31
Using 30 threads
...
39 2:55 -199.43 1...Txb2 2.Txb2 c3 3.Tb6+ Ke7 4.Tb7+ Kd6 5.Tb6+ Kc7 6.Tb1 c2 7.Tc1 Kb6 8.Kf2 d2 9.Txc2 d1D 10.Tc8 Dd2+ 11.Kf3 Dd5+ 12.Kg3 Dd3 13.Tf8 Dxa3 14.Tf6+ Kc7 (4.144.577.885) 23678 TB:6.430.237
38 2:55 -7.58 1...Tb8 2.h4 h5 3.Tg2 Tb7 4.Tf2 Tg7 5.e4 fxe4+ 6.Kxe4 Te7+ 7.Kf3 Te1 8.Tg2 Tc1 9.f5 Th1 10.Tg6+ Kxf5 11.Tg5+ Ke6 12.Tg6+ Ke5 13.Ke3 Th3+ 14.Kd2 Th2+ (4.144.577.885) 23678 TB:6.430.237

Much too little singularity of solution is in pos. nr. 223 and 230, replacing those by e.g. Eret nr.1 and nr.3 you have still 300 for modern engines easily solvable positions useable for VSTC and not too strong hardware, e.g. single thread and 1"/pos., regards
Peter.
peter
Posts: 3449
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: A question about "win at chess (new)" test suite

Post by peter »

Edit:
To these positions I added moves to the given solutions with about as good chances for the side to move as the first ones have:

1r1rb1k1/2p3pp/p2q1p2/3PpP1Q/Pp1bP2N/1B5R/1P4PP/2B4K w - - bm Qxh7+ Ng6; id "WAC.185";
8/8/8/1p5r/p1p1k1pN/P2pBpP1/1P1K1P2/8 b - - bm Rxh4 b4; id "WAC.229";
5r1k/1p4pp/3q4/3Pp1R1/8/8/PP4PP/4Q1K1 b - - bm Qc5+ Qb6+; id "WAC.248";
5r1k/3b2p1/p6p/1pRpR3/1P1P2q1/P4pP1/5QnP/1B4K1 w - - bm h3 Rc3; id "WAC.291";

, and nr. 89 and150 have better to be replaced as well as 223 and 230 have to, e.g. with Eret Nr.4+Nr.5 (+1, +3)
To get more discrimination out of the near to each other high numbers of solutions from stronger engines even with VSTC, one interesting way to me (besides e.g. MEA with differently high points per solution according to their "difficulty") is to simply divide them (the solution- numbers) by the amount of seconds used for all correct solutions together, in Shredder GUI given as "SolTime". Thus you get solutions per seconds, which differ significantly more than numbers of solutions only do, regards
Peter.
chesskobra
Posts: 358
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: A question about "win at chess (new)" test suite

Post by chesskobra »

Thanks for these comments. Does Arena support epds with multiple best moves? Also, what are some other test suites with unique best moves and not just elementary puzzles? I have found github repositories with epd files, but it is unclear which ones have been updated more recently using stronger engines and longer time limits.
peter
Posts: 3449
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: A question about "win at chess (new)" test suite

Post by peter »

chesskobra wrote: Tue Sep 17, 2024 11:15 am Does Arena support epds with multiple best moves?
Yes, as well as any other GUI using .epd- strings with its regular syntax, as for .pgn, alternative solutions imported from such .epd- strings then become uncommented variants instead, also treated as equal solutions by GUIs adjudicating .pgn (Fritz- .cbh-) suites automatically.
Also, what are some other test suites with unique best moves and not just elementary puzzles? I have found github repositories with epd files, but it is unclear which ones have been updated more recently using stronger engines and longer time limits.
Classical still is Eret, Arasan, even if both have got to be used with short to very short hardware- TC nowadays too, and even e.g. HTC and ACT needs much less time/pos. now then some years ago at being collected then. So tools to get more discrimination out of higher numbers of solutions like EloStatTS and MEA help a lot as for error bars of results, latter (MEA) lets suites of positions with multiple solution get useable too, regards
Peter.
chesskobra
Posts: 358
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: A question about "win at chess (new)" test suite

Post by chesskobra »

peter wrote: Tue Sep 17, 2024 11:30 am Classical still is Eret, Arasan, even if both have got to be used with short to very short hardware- TC nowadays too, and even e.g. HTC and ACT needs much less time/pos. now then some years ago at being collected then. So tools to get more discrimination out of higher numbers of solutions like EloStatTS and MEA help a lot as for error bars of results, latter (MEA) lets suites of positions with multiple solution get useable too, regards
By MEA do you mean the program by Ferdy from https://github.com/fsmosca/Multiple-move-Epd-Analyzer? I tried using it once, but got the following error:

Code: Select all

python3 mea.py -e /usr/local/games/sf17 -n "Stockfish 17" -i eret.epd -m 512 -a 10000 -t 2 -p uci 

Problem reading c0 field in epd: r1bqk1r1/1p1p1n2/p1n2pN1/2p1b2Q/2P1Pp2/1PN5/PB4PP/R4RK1 w q - - bm Rxf4; id "ERET 001 - Relief";
This position is not included.
  
But I would like a script like that, and plan to test it more.
peter
Posts: 3449
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: A question about "win at chess (new)" test suite

Post by peter »

chesskobra wrote: Tue Sep 17, 2024 11:52 am By MEA do you mean the program by Ferdy from https://github.com/fsmosca/Multiple-move-Epd-Analyzer? I tried using it once, but got the following error:

Code: Select all

python3 mea.py -e /usr/local/games/sf17 -n "Stockfish 17" -i eret.epd -m 512 -a 10000 -t 2 -p uci 

Problem reading c0 field in epd: r1bqk1r1/1p1p1n2/p1n2pN1/2p1b2Q/2P1Pp2/1PN5/PB4PP/R4RK1 w q - - bm Rxf4; id "ERET 001 - Relief";
This position is not included.
  
But I would like a script like that, and plan to test it more.
For MEA you need special syntax of .epd, the position of yours should look like this e.g.:

r1bqk1r1/1p1p1n2/p1n2pN1/2p1b2Q/2P1Pp2/1PN5/PB4PP/R4RK1 w q - bm Rxf4; c0 "Rf4=75"; id "ERET1";

The reward of 75 points for the correct solution is chosen by me empirically considering it to be "found" (as for hardware- time) yet some less easily then the positions of e.g. STS (Strategic Test Suite) that came along with download of MEA- tool ad Ed's site way back then in MEA- syntax. Especially also because those (STS- positions) are of multiple solutions normally, so each single one of them is to be chosen more probably but a tactical single best one, that's why it's a "strategical" suite, not a "tactical", as the most used other kinds of are.
Here's an example out of STS in MEA- syntax, out of earlier versions of Ed Schröder's, modified by Ferdinand Mosca then:

1kr5/3n4/q3p2p/p2n2p1/PppB1P2/5BP1/1P2Q2P/3R2K1 w - - bm f5; id "STS(v1.0) Undermine.001"; c0 "f5=100, Bf2=68, fxg5=46, b3=39, Bg7=32, Bg4=22, Kh1=11, Be3=8, Bxd5=6, h3=5"; c7 "f5 Bf2 fxg5 b3 Bg7 Bg4 Kh1 Be3 Bxd5 h3"; c8 "100 68 46 39 32 22 11 8 6 5"; c9 "f4f5 d4f2 f4g5 b2b3 d4g7 f3g4 g1h1 d4e3 f3d5 h2h3";

Notice, the points here are meant for much shorter hardware- TC than I use to use for tactical single best move- positions normally. MEA- STS (up to 1500 positions) is run with e.g. 100msec/pos. by Schröder and Mosca.
Peter.
peter
Posts: 3449
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: A question about "win at chess (new)" test suite

Post by peter »

peter wrote: Tue Sep 17, 2024 12:43 pm For MEA you need special syntax of .epd, the position of yours should look like this e.g.:

r1bqk1r1/1p1p1n2/p1n2pN1/2p1b2Q/2P1Pp2/1PN5/PB4PP/R4RK1 w q - bm Rxf4; c0 "Rf4=75"; id "ERET1";
Pity I didn't notice in edit- time having missed the x at "Rxf4=75" for the rewarding- points, at bm it's ok, but Rf4=75 as mistyped then doesn't work correctly, even if you won't get any error message, points for correct solution just won't be counted for this one move.
Peter.
chesskobra
Posts: 358
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: A question about "win at chess (new)" test suite

Post by chesskobra »

peter wrote: Tue Sep 17, 2024 12:43 pm
1kr5/3n4/q3p2p/p2n2p1/PppB1P2/5BP1/1P2Q2P/3R2K1 w - - bm f5; id "STS(v1.0) Undermine.001"; c0 "f5=100, Bf2=68, fxg5=46, b3=39, Bg7=32, Bg4=22, Kh1=11, Be3=8, Bxd5=6, h3=5"; c7 "f5 Bf2 fxg5 b3 Bg7 Bg4 Kh1 Be3 Bxd5 h3"; c8 "100 68 46 39 32 22 11 8 6 5"; c9 "f4f5 d4f2 f4g5 b2b3 d4g7 f3g4 g1h1 d4e3 f3d5 h2h3";

Notice, the points here are meant for much shorter hardware- TC than I use to use for tactical single best move- positions normally. MEA- STS (up to 1500 positions) is run with e.g. 100msec/pos. by Schröder and Mosca.
Thank you for explaining. How are the numbers corresponding to different moves obtained? Is it by normalizing the evaluation of the top move to 100 and adjusting the evaluations of other moves in proportion?