EN-Test 2022 - new testsuite

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

DrEinstein
Posts: 75
Joined: Wed Sep 15, 2021 8:50 pm
Full name: Albert Einstein

Re: EN-Test 2022 - new testsuite

Post by DrEinstein »

Shouldn't these test be made with only one core and cleared hash to get a deterministic result?
gordonr
Posts: 232
Joined: Thu Aug 06, 2009 8:04 pm
Location: UK

Re: EN-Test 2022 - new testsuite

Post by gordonr »

DrEinstein wrote: Sun Nov 20, 2022 8:42 pm Shouldn't these test be made with only one core and cleared hash to get a deterministic result?
I think the engine process needs to be restarted to clear all data and not just the hash tables. However, testing with one core is probably not useful unless that is what is typically used for analysis. If we want to see what is better for analysis with many cores, we need to test with that number of cores.

I used a test set in order to try seeing how analysis differed with various numbers of threads, hash size, hyperthreading on and off, etc for some new hardware. The results often vary wildly for the same setup and it requires a large enough sample set to gain some accuracy. I also observed, as others have pointed out in the past, that it isn't always reliable to stop as soon as the key move is found. Sometimes many extra ply, or a meaningful enough evaluation, are required to make sure the move is indeed being chosen for the right reason and doesn't get dropped.
Jouni
Posts: 3652
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: EN-Test 2022 - new testsuite

Post by Jouni »

I don't know any clone, that is stronger than SF dev. AFAIK there has never been! If clone is better in test positions it plays weaker, because of lower depth. Example Crystal:

Score of SFdev vs Crystal 5: 17 - 4 - 179 [0.532]
... SFdev playing White: 12 - 1 - 87 [0.555] 100
... SFdev playing Black: 5 - 3 - 92 [0.510] 100
... White vs Black: 15 - 6 - 179 [0.522] 200
Elo difference: 22.6 +/- 15.3, LOS: 99.8 %, DrawRatio: 89.5 %
200 of 200 games finished.
Jouni
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

There are other reasons for me to build my own engine. I'm a book expert. My engine can use multiple Polyglot books. My engine can also learn. The advantage of a learning file is this: On a slow computer, the engine will play better if it can use a learning file that has been trained with 64 cores! I have two slower PCs. Also some other players on the server play very successfully with the use of the learning file. Even old computers with only 2 cores are difficult to defeat. This is even more fun than playing with super fast computers. :wink:
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

I am currently working on a new test set. Unfortunately, there are always positions where it seems that a special move wins very nicely, but the engine proves the opposite. What should I do with these two positions?

Position 1:

[fen]7k/8/7p/2p1p1pP/1pPpPpP1/1P1P1P2/4K3/2N5 w - - 0 1[/fen]


I found this position in a test suite. The solution move is Kd2! But then I analyzed further, and an engine now tells me that Kf1! also wins.


Position 2:

[fen]4K1k1/8/1p5p/1Pp3b1/8/1P3P2/P1B2P2/8 w - - 0 1[/fen]


In this position, the beautiful move f4 wins! Until recently, only special engines could find this move. But on a fast computer, new normal engines find this move in less than 60 seconds! I then analyzed a bit and an engine then said that Bd1 wins here too. After about 6 minutes the eval increased to +9 for Bd1! So it wins both f4 and Bd1.

As beautiful as the positions are, when there are two solutions, especially in the endgame, I can't expect an engine to look for a specific move. :cry:

Eduard
Spliffjiffer
Posts: 436
Joined: Thu Aug 02, 2012 7:48 pm
Location: Germany

Re: EN-Test 2022 - new testsuite

Post by Spliffjiffer »

as for the 1st pos i looked at it 2 sec (without an engine) and im pretty sure that there does not exist a single move that does NOT wim !?

as for the 2nd pos i totaly agree
Wahrheiten sind Illusionen von denen wir aber vergessen haben dass sie welche sind.
Jouni
Posts: 3652
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: EN-Test 2022 - new testsuite

Post by Jouni »

Just one note: Crystal result 113 average time 4s is much better than Leptir 115 average time 5s. Just calculate total used time.
Jouni
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

Info:
I am working on a new test suite that I will present in January 2023. I collect good test positions from previous test sets. But I have already implemented some positions that have never been published before! Currently there are 46 positions altogether. I have a large database with many interesting computer games. From this I have selected the best variants, about 2500 games. I've only checked 560 games out of it currently. It's fun to look for new positions.
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

EN-Test 2022 - Starts on 06 Nov 2022:
I have now started testing with a thinking time of 60 seconds. Only the top ten engines are included in this list, and only one engine by the same author. Computer AMD Ryzen 3900X, 20 Threads, Hash 4 GB, all 3456men Syzygy.

New in the List now Brainlearn 20.1 vulkan (Download on my Homepage under Solista News, or here in the Shashchess thread).

Top 10:

1) Leptir 4, Result: 115 out of 120 = 95.8%. Leptir 4.txt (ZIP)
2) Crystal 5 KWK, Result: 113 out of 120 = 94.1%. Crystal 5 KWK.txt (ZIP)
3-4) Brainlearn 20.1 vulkan (AM/EN), Result: 110 out of 120 = 91.6%. Brainlearn 20.1 vulkan.txt (ZIP)
3-4) Corchess 3 201122, Result: 110 out of 120 = 91.6%. Corchess 3 201122.txt (ZIP)
5) Blue Marlin 15.4, Result: 107 out of 120 = 89.1%. Blue Marlin 15.4.txt (ZIP)
6-8) Dark Sister 1.9a, Result: 106 out of 120 = 88.3%. Dark Sister 1.9a.txt (ZIP)
6-8) Kayra 1.7, Result: 106 out of 120 = 88.3%. Kayra 1.7.txt (ZIP)
6-8) ProteusSF-Piranha 220904, Result: 106 out of 120 = 88.3%. ProteusSF-Piranha 220904.txt (ZIP)
9-10) Stockfish dev 231122, Result: 105 out of 120 = 87.5%. Stockfish dev 231122.txt (ZIP)
9-10) Shashchess 25.3 GoldDi, Result: 105 out of 120 = 87.5%. ShashChess 25.3GoldDigger.txt (ZIP)

Textfiles on my Homepage:
https://solistachess.jimdosite.com/testing/
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

Vulkan 021222: The engine is based on Brainlearn vulkan, but with a more aggressive playing style. The other changes are tuned for minimum 5min blitz, and even better for higher time controls. Those who love bullet distances, can forget this engine. In my EN-Test 2022, Vulkan 021222 is impressive, and solves with 60s 114 of 120 positions, and is now on second place!

Download (64-Bit Windows):

Filehorst.de:
https://filehorst.de/d/eqlbtmgI
Pixeldrain:
https://pixeldrain.com/u/GBgYWMg8

And on my homepage:
https://solistachess.jimdosite.com/solista-news/

New in the list: Vulkan 021222:

Starts on 06 Nov 2022
I have now started testing with a thinking time of 60 seconds. Only the top ten engines are included in this list, and only one engine by the same author. Computer AMD Ryzen 3900X, 20 Threads, Hash 4 GB, all 3456men Syzygy. Current TOP 10 LIST:

1) Leptir 4, Result: 115 out of 120 = 95.8%. Leptir 4.txt (ZIP)
2) Vulkan 021222 (EN/AM), Result: 114 out of 120 = 95.0%. Vulkan 021222.txt (ZIP)
3) Crystal 5 KWK, Result: 113 out of 120 = 94.1%. Crystal 5 KWK.txt (ZIP)
4) Corchess 3 201122, Result: 110 out of 120 = 91.6%. Corchess 3 201122.txt (ZIP)
5) Blue Marlin 15.4, Result: 107 out of 120 = 89.1%. Blue Marlin 15.4.txt (ZIP)
6-8) Dark Sister 1.9a, Result: 106 out of 120 = 88.3%. Dark Sister 1.9a.txt (ZIP)
6-8) Kayra 1.7, Result: 106 out of 120 = 88.3%. Kayra 1.7.txt (ZIP)
6-8) ProteusSF-Piranha 220904, Result: 106 out of 120 = 88.3%. ProteusSF-Piranha 220904.txt (ZIP)
9-10) Stockfish dev 231122, Result: 105 out of 120 = 87.5%. Stockfish dev 231122.txt (ZIP)
9-10) Shashchess 25.3 GoldDi, Result: 105 out of 120 = 87.5%. ShashChess 25.3GoldDigger.txt (ZIP)

Text files on my homepage
https://solistachess.jimdosite.com/testing/