I created a new test suite for Engines. Neither the ERET test nor the Stockfish 2021 test suite satisfied me.
Test suite Stockfish-2021 contains many nonsensical positions. What should be useful in a position where the best move is +10 and the second best move is +7 (tested with Stockfish)? It doesn't really matter whether the engine wins with +10 or only with +7. There are positions in the ERET test that are irrelevant in practice. The test also contains positions with a secondary solution.
Examples:
[fen]8/7p/5P1k/1p5P/5p2/2p1p3/P1P1P1P1/1K3Nb1 w - - 0 1[/fen]
This position is even solved by some engines (including mine), but I still think it's useless in practice.
[fen]1k6/bPN2pp1/Pp2p3/p1p5/2pn4/3P4/PPR5/1K6 w - - 0 1[/fen]
This position is also pointless.
[fen]2b1r3/r2ppN2/8/1p1p1k2/pP1P4/2P3R1/PP3PP1/2K5 w - - 0 2[/fen]
Or this.
Some positions have good secondary solutions:
[fen]4r1k1/1r1np3/1pqp1ppB/p7/2b1P1PQ/2P2P2/P3B2R/3R2K1 w - - 0 28[/fen]
Here Bg5 is just as good as Bg7 (ERET).
I wanted a test where all positions could be solved and corresponded to normal practice. So I have summarized the best positions for it from various test suites. I've added some interesting positions of my own that I've seen on the server in games. A test suite with 120 positions was created.
All of these positions were solved on my PC by some engine! The only question is: How much time do I give the engine? The test is intended to provide a rough estimate of the playing strength. That's why I won't test an engine with special settings like "Gold Drigger", not even in MV mode. It is about a rough assessment of the practical playing strength. I myself will test with 30s and 60s per position.
Download EN-Test 2022 (CBH und PGN Format)
https://filehorst.de/d/eefonGnl
and on my home page.
I myself use CBH format, if you prefer EPD you have to convert the PGN to EPD.
Eduard Nemeth
EN-Test 2022 - new testsuite
Moderator: Ras
-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
-
- Posts: 646
- Joined: Mon Jun 20, 2022 4:08 am
- Full name: Brian D. Smith
Re: EN-Test 2022 - new testsuite
Just as someone who does not fiddle with such things, I wonder why, if looking for "a rough assessment of the practical playing strength" you are doing this with these 'oddities'?Eduard wrote: ↑Wed Oct 19, 2022 5:05 pm The test is intended to provide a rough estimate of the playing strength. That's why I won't test an engine with special settings like "Gold Drigger", not even in MV mode. It is about a rough assessment of the practical playing strength. I myself will test with 30s and 60s per position.
I mean, "Playing strength" should, it would seem, be about the play of a game - play, development of an edge/exploiting the exploitable, etc - not solving such things...or am I missing something?
-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
Re: EN-Test 2022 - new testsuite
no I feel the same way. However, I have to play a lot of games. I do too! If I test, then only on the server and a few hundred games in nonstop auto play.
Still, it doesn't hurt if I set the engine to these 120 positions for a quick test and then compare the results with other engines before going to the server to play.
Still, it doesn't hurt if I set the engine to these 120 positions for a quick test and then compare the results with other engines before going to the server to play.

-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
Re: EN-Test 2022 - new testsuite
Results AMD Ryzen 3900X, 20 Threads, 4 GB hash, all 3456men Syzygy, 30s:
Stockfish 161022, Result: 97 out of 120 = 80.8%.
Stockfish.txt (ZIP):
https://filehorst.de/d/efBggtqx
Stockfish 161022, Result: 97 out of 120 = 80.8%.
Stockfish.txt (ZIP):
https://filehorst.de/d/efBggtqx
-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
Re: EN-Test 2022 - new testsuite
AMD Ryzen 3900X, 20 Threads, 4 GB hash, all 3456men Syzygy, 30s:
Solista Attack v2 (default), Result: 107 out of 120 = 89.1%.
Solista Attack v2.txt (ZIP):
https://filehorst.de/d/ehpqyEGC
Solista Attack v2 (default), Result: 107 out of 120 = 89.1%.
Solista Attack v2.txt (ZIP):
https://filehorst.de/d/ehpqyEGC
-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
Re: EN-Test 2022 - new testsuite
AMD Ryzen 3900X, 20 Threads, 4 GB hash, all 3456men Syzygy, 30s:
Shashchess 25 (default), Result: 98 out of 120 = 81.6%.
Shashchess 25.txt (ZIP):
https://filehorst.de/d/eJbztnhJ
Shashchess 25 (default), Result: 98 out of 120 = 81.6%.
Shashchess 25.txt (ZIP):
https://filehorst.de/d/eJbztnhJ
-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
Re: EN-Test 2022 - new testsuite
AMD Ryzen 3900X, 20 Threads, 4 GB hash, all 3456men Syzygy, 30s:
Blue Marlin 15.3a, Result: 105 out of 120 = 87.5%.
BlueMarlin 15.3a.txt (ZIP):
https://filehorst.de/d/efqJsvIt
Blue Marlin 15.3a, Result: 105 out of 120 = 87.5%.
BlueMarlin 15.3a.txt (ZIP):
https://filehorst.de/d/efqJsvIt
-
- Posts: 436
- Joined: Thu Aug 02, 2012 7:48 pm
- Location: Germany
Re: EN-Test 2022 - new testsuite
thx Eduard for the suite!
some questions from my side:
some questions from my side:
dont u like it because 1.Rf2 (instead of 1.Na8) is completely winning as well ?Eduard wrote: ↑Wed Oct 19, 2022 5:05 pm I created a new test suite for Engines. Neither the ERET test nor the Stockfish 2021 test suite satisfied me.
Test suite Stockfish-2021 contains many nonsensical positions. What should be useful in a position where the best move is +10 and the second best move is +7 (tested with Stockfish)? It doesn't really matter whether the engine wins with +10 or only with +7. There are positions in the ERET test that are irrelevant in practice. The test also contains positions with a secondary solution.
Examples:
This position is even solved by some engines (including mine), but I still think it's useless in practice.
[fen]1k6/bPN2pp1/Pp2p3/p1p5/2pn4/3P4/PPR5/1K6 w - - 0 1[/fen]
dont u like it because u believe that say 2.Nh6 is drawing as well ?
Wahrheiten sind Illusionen von denen wir aber vergessen haben dass sie welche sind.
-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
Re: EN-Test 2022 - new testsuite
AMD Ryzen 3900X, 20 Threads, 4 GB hash, all 3456men Syzygy, 30s:
Dark Sister 1.9a, Result: 99 out of 120 = 82.5%.
Dark Sister 1.9a.txt (ZIP):
https://filehorst.de/d/eoepnbwp
Dark Sister 1.9a, Result: 99 out of 120 = 82.5%.
Dark Sister 1.9a.txt (ZIP):
https://filehorst.de/d/eoepnbwp
-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
Re: EN-Test 2022 - new testsuite
Alternative download EN-Test 2022 Testsuite (60 Days)
https://pixeldrain.com/u/cEPxDG84
https://pixeldrain.com/u/cEPxDG84