EN-Test 2022 - new testsuite

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

EN-Test 2022 - new testsuite

Post by Eduard »

I created a new test suite for Engines. Neither the ERET test nor the Stockfish 2021 test suite satisfied me.

Test suite Stockfish-2021 contains many nonsensical positions. What should be useful in a position where the best move is +10 and the second best move is +7 (tested with Stockfish)? It doesn't really matter whether the engine wins with +10 or only with +7. There are positions in the ERET test that are irrelevant in practice. The test also contains positions with a secondary solution.

Examples:


[fen]8/7p/5P1k/1p5P/5p2/2p1p3/P1P1P1P1/1K3Nb1 w - - 0 1[/fen]


This position is even solved by some engines (including mine), but I still think it's useless in practice.


[fen]1k6/bPN2pp1/Pp2p3/p1p5/2pn4/3P4/PPR5/1K6 w - - 0 1[/fen]


This position is also pointless.


[fen]2b1r3/r2ppN2/8/1p1p1k2/pP1P4/2P3R1/PP3PP1/2K5 w - - 0 2[/fen]


Or this.

Some positions have good secondary solutions:


[fen]4r1k1/1r1np3/1pqp1ppB/p7/2b1P1PQ/2P2P2/P3B2R/3R2K1 w - - 0 28[/fen]


Here Bg5 is just as good as Bg7 (ERET).

I wanted a test where all positions could be solved and corresponded to normal practice. So I have summarized the best positions for it from various test suites. I've added some interesting positions of my own that I've seen on the server in games. A test suite with 120 positions was created.

All of these positions were solved on my PC by some engine! The only question is: How much time do I give the engine? The test is intended to provide a rough estimate of the playing strength. That's why I won't test an engine with special settings like "Gold Drigger", not even in MV mode. It is about a rough assessment of the practical playing strength. I myself will test with 30s and 60s per position.

Download EN-Test 2022 (CBH und PGN Format)

https://filehorst.de/d/eefonGnl

and on my home page.

I myself use CBH format, if you prefer EPD you have to convert the PGN to EPD.

Eduard Nemeth
CornfedForever
Posts: 641
Joined: Mon Jun 20, 2022 4:08 am
Full name: Brian D. Smith

Re: EN-Test 2022 - new testsuite

Post by CornfedForever »

Eduard wrote: Wed Oct 19, 2022 5:05 pm The test is intended to provide a rough estimate of the playing strength. That's why I won't test an engine with special settings like "Gold Drigger", not even in MV mode. It is about a rough assessment of the practical playing strength. I myself will test with 30s and 60s per position.
Just as someone who does not fiddle with such things, I wonder why, if looking for "a rough assessment of the practical playing strength" you are doing this with these 'oddities'?

I mean, "Playing strength" should, it would seem, be about the play of a game - play, development of an edge/exploiting the exploitable, etc - not solving such things...or am I missing something?
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

no I feel the same way. However, I have to play a lot of games. I do too! If I test, then only on the server and a few hundred games in nonstop auto play.

Still, it doesn't hurt if I set the engine to these 120 positions for a quick test and then compare the results with other engines before going to the server to play. :wink:
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

Results AMD Ryzen 3900X, 20 Threads, 4 GB hash, all 3456men Syzygy, 30s:

Stockfish 161022, Result: 97 out of 120 = 80.8%.
Stockfish.txt (ZIP):
https://filehorst.de/d/efBggtqx
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

AMD Ryzen 3900X, 20 Threads, 4 GB hash, all 3456men Syzygy, 30s:
Solista Attack v2 (default), Result: 107 out of 120 = 89.1%.

Solista Attack v2.txt (ZIP):
https://filehorst.de/d/ehpqyEGC
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

AMD Ryzen 3900X, 20 Threads, 4 GB hash, all 3456men Syzygy, 30s:
Shashchess 25 (default), Result: 98 out of 120 = 81.6%.

Shashchess 25.txt (ZIP):
https://filehorst.de/d/eJbztnhJ
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

AMD Ryzen 3900X, 20 Threads, 4 GB hash, all 3456men Syzygy, 30s:
Blue Marlin 15.3a, Result: 105 out of 120 = 87.5%.

BlueMarlin 15.3a.txt (ZIP):
https://filehorst.de/d/efqJsvIt
Spliffjiffer
Posts: 420
Joined: Thu Aug 02, 2012 7:48 pm
Location: Germany

Re: EN-Test 2022 - new testsuite

Post by Spliffjiffer »

thx Eduard for the suite!
some questions from my side:
Eduard wrote: Wed Oct 19, 2022 5:05 pm I created a new test suite for Engines. Neither the ERET test nor the Stockfish 2021 test suite satisfied me.

Test suite Stockfish-2021 contains many nonsensical positions. What should be useful in a position where the best move is +10 and the second best move is +7 (tested with Stockfish)? It doesn't really matter whether the engine wins with +10 or only with +7. There are positions in the ERET test that are irrelevant in practice. The test also contains positions with a secondary solution.

Examples:



This position is even solved by some engines (including mine), but I still think it's useless in practice.


[fen]1k6/bPN2pp1/Pp2p3/p1p5/2pn4/3P4/PPR5/1K6 w - - 0 1[/fen]
dont u like it because 1.Rf2 (instead of 1.Na8) is completely winning as well ?

Eduard wrote: Wed Oct 19, 2022 5:05 pm
This position is also pointless.

[fen]2b1r3/r2ppN2/8/1p1p1k2/pP1P4/2P3R1/PP3PP1/2K5 w - - 0 2[/fen]
dont u like it because u believe that say 2.Nh6 is drawing as well ?
Wahrheiten sind Illusionen von denen wir aber vergessen haben dass sie welche sind.
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

AMD Ryzen 3900X, 20 Threads, 4 GB hash, all 3456men Syzygy, 30s:
Dark Sister 1.9a, Result: 99 out of 120 = 82.5%.

Dark Sister 1.9a.txt (ZIP):
https://filehorst.de/d/eoepnbwp
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

Alternative download EN-Test 2022 Testsuite (60 Days)
https://pixeldrain.com/u/cEPxDG84