re-validate EPD test suite bm's

tissatussa · Post by **tissatussa** » Sun Nov 05, 2023 4:44 am

somewhere i found a file called "IQ.epd" with 182 positions .. the third line got my attention :

2q1r2k/5R1p/pp1B2pN/2p1P3/1n1b4/3P2Q1/1P4K1/8 w - - bm Qh4; id "5.IQ.931";

i let several strong engines evaluate this position for a "long" time, with and without NNUE or MultiPV .. those modern engines prefer bm Qf3, not Qh4 ..

it's a difficult tricky position.
should we adjust a list like IQ.epd by re-validating all their bm's, "judged" by older engines ?
or is this position just an exception ?

Vinvin · Post by **Vinvin** » Sun Nov 05, 2023 4:07 pm

Qh4 seems not especially good :

6 best moves with Stockfish_23103006_x64_avx2 :

Code: Select all

 38/43	03:55	 4.046.369k	17.149k	 0,00	1.Qf4 Nd5 2.Qf3 Qe6 3.Rf8+ Kg7 4.Rxe8 Qxe8 5.Qxd5 Kxh6 6.e6 Bf6 7.b3 Kg7 8.Qb7+ Kh8 9.e7 Kg7 10.Qe4 Bxe7 11.Qxe7+ Qxe7 12.Bxe7 Kf7 13.Bd8 b5 14.Bb6 Ke6 15.Bxc5 a5 16.Kf3 Kd5 17.Be3 h5 18.Kf4 a4 19.bxa4 bxa4 20.Bc1 Kd4 21.Bb2+ Kxd3
 38/43	03:55	 4.046.369k	17.149k	 0,00	1.Rf3 Nd5 2.Qh4 Kg7 3.Rf7+ Kh8
 38/45	03:55	 4.046.369k	17.149k	 0,00	1.Qh4 Nd5 2.Nf5 Nf4+ 3.Qxf4 gxf5 4.Qh6 Qc6+ 5.Kg3 Rg8+ 6.Kf4 Be3+ 7.Kxe3 Rg3+ 8.Kd2 Qg2+ 9.Kc3 Rxd3+ 10.Kxd3 Qe4+ 11.Kd2 Qg2+
 38/59	03:55	 4.046.369k	17.149k	+0,12	1.Be7 Qb7+ 2.Kh2 Qxe7 3.Rxe7 Rxe7 4.e6 Rxe6 5.Qb8+ Kg7 6.Qg8+ Kxh6 7.Qxe6 Bxb2 8.Qe3+ Kg7 9.Qe7+ Kg8 10.Qd8+ Kg7 11.Qxb6 Bd4 12.Qa7+ Kf6 13.Kg3 h5 14.Kf3 Nd5 15.Qa8 Ne7 16.Qf8+ Ke6 17.Ke4 Nf5 18.Qa8 Kf7 19.Qb7+ Kf6 20.Qxa6+ Kg7 21.Kf4 Kf7 22.Qb7+ Kf6 23.Qc6+ Kf7 24.Qc7+ Kf6 25.Ke4 Ne3 26.Qc6+ Kg7 27.Qd7+ Kf6 28.Kf3 Nf5 29.Qd8+ Kf7 30.Qc7+ Ke6
 38/59	03:55	 4.046.369k	17.149k	+0,41	1.Qg5 Nd5 2.Nf5 Qxf5 3.Rxf5 Ne3+ 4.Kh3 Nxf5 5.b3 a5 6.Qf6+ Kg8 7.Bc7 Ng7 8.Qxb6 Bxe5 9.Qxc5 Bxc7 10.Qxc7 Re3+ 11.Kg2 Nf5 12.Kf2 Kf8 13.Qc5+ Kg7 14.Qa7+ Kf6 15.Qb6+ Kf7 16.d4 Re6 17.Qxa5 Rd6 18.b4 Nxd4 19.Qe5 Nf5 20.b5 Rd7 21.Qf4 Ke7 22.Qe4+ Kf7 23.Qf3 Ke7 24.Qh1 Kf6 25.Qh3 Ke6 26.Qb3+ Kf6 27.b6 Nd6 28.Qb2+ Ke6 29.Qa2+ Kf6 30.Qa6
 38/62	03:55	 4.046.369k	17.149k	+4,75	1.Qf3 Qc6 2.Ng4 Re6 3.Qxc6 Nxc6 4.Nf6 Rxf6 5.exf6 Bxf6 6.Rxf6 Kg7 7.Rf1 h5 8.Kf3 c4 9.Ke4 cxd3 10.Kxd3 Kh6 11.Rc1 Nd8 12.Bc7 Ne6 13.Rc6 Nxc7 14.Rxc7 Kg5 15.Ke3 h4 16.Rb7 h3 17.Kf2 h2

tissatussa · Post by **tissatussa** » Sun Nov 05, 2023 5:26 pm

this is my SF 16 NNUE data :

Dann Corbit · Post by **Dann Corbit** » Tue Nov 07, 2023 11:01 pm

Until a position has been driven to checkmate and all other moves have been proven draws or losses, it is always possible that there are better or equal moves. And test sets that were once challenging like WAC become trivial over time.

There is a great deal of effort spent in this forum to refine test sets. Vincent Lejeune and Jon Dart's test sets are constantly refined.

A test set I worked on years ago (STS) has been greatly improved recently through the efforts of many. When I first started working on STS in 2008, I was using 32 bit engines like Rybka (and two others for verification) at one hour per position. Today's best engines would get vastly superior analysis after one second on modern hardware. Both the hardware and the software will be roughly 32000 times stronger, so a billion times better analysis today.

Fifteen years from today, the current hardware and software will be one billion times better than today's as well, making our current analysis moot, other than things that are literally proven (like Chest319 output without fancy pruning in play).

ImNotStockfish · Post by **ImNotStockfish** » Wed Nov 08, 2023 10:43 am

The problem that I've seen most often with these issues is just that they wanted to write "am" and instead used "bm".

tissatussa · Post by **tissatussa** » Wed Nov 08, 2023 12:06 pm

@ImNotStockfish hard to believe .. please be serious, or i'll report you.

tissatussa · Post by **tissatussa** » Wed Nov 08, 2023 12:56 pm

@Dann Corbit : thanks for your reaction, this gives relevant background info !

Eelco de Groot · Post by **Eelco de Groot** » Wed Nov 08, 2023 5:11 pm

Qh4 seems like a threat to Black's King that you may make as a poor player like me, the solution move Qf3 not so much. So I liked the explanation (from Sam Davis) especially if this testset is for humans and maybe this problem was copied from some chessbook for humans that they misinterpreted for the testset later (making it a bm) but it does seem very careless that I agree.

Qh4 clearly not good says Crystal in best four moves:

[d]2q1r2k/5R1p/pp1B2pN/2p1P3/1n1b4/3P2Q1/1P4K1/8 w - - Engine: Crystal 6 PMT (512 MB)
gemaakt door the Stockfish developers (see AUTHORS f

51 693:09 +4.44 1.Df3 Dc6 2.Pg4 Dxf3+ 3.Kxf3 Pd5
4.Ke4 h5 5.Kxd5 hxg4 6.e6 Kg8 7.Tf4 Kg7
8.Txg4 Lxb2 9.e7 Lf6 10.Ke6 g5
11.Tg1 Kg6 12.Kd7 Lxe7 13.Lxe7 Th8
14.Txg5+ (34.061.997.091) 819

50 693:09 +0.24 1.Dg5 Pd5 2.Pf5 Dxf5 3.Txf5 Pe3+
4.Kh3 Pxf5 5.b3 Kg7 6.Df6+ Kg8 7.Lc7 Pg7
8.Dc6 Te6 9.Dd5 Kh8 10.b4 a5 11.bxa5 bxa5
12.Ld6 h6 13.Lxc5 Lxe5 14.d4 (34.061.997.091) 819

50 693:09 0.00 1.Le7 Db7+ 2.Kh2 Dxe7 3.Txe7 Txe7
4.e6 Txe6 5.Db8+ Kg7 6.Dg8+ Kxh6
7.Dxe6 Lxb2 8.Dxb6 Ld4 9.Dd8 Pxd3
10.Dh4+ Kg7 11.De7+ Kg8 12.Dd8+ Kg7 (34.061.997.091) 819

50 693:09 0.00 1.Dh4 Pd5 2.Pf5 Pf4+ 3.Dxf4 gxf5
4.Dh6 Dc6+ 5.Kg3 Tg8+ 6.Kf4 Le3+
7.Kxe3 Tg3+ 8.Kd2 Dg2+ 9.Kc3 Txd3+
10.Kxd3 De4+ 11.Kc3 Db4+ 12.Kc2 De4+ (34.061.997.091) 819

tissatussa · Post by **tissatussa** » Mon Nov 13, 2023 1:47 pm

@Eelco de Groot who is Sam Davis ?

Eelco de Groot · Post by **Eelco de Groot** » Mon Nov 13, 2023 2:04 pm

Hello, seriously, I do not know really who, I just know he is not Stockfish. Look at the 'Full name' field at right.

re-validate EPD test suite bm's

re-validate EPD test suite bm's

Re: re-validate EPD test suite bm's

Re: re-validate EPD test suite bm's

Re: re-validate EPD test suite bm's

Re: re-validate EPD test suite bm's

Re: re-validate EPD test suite bm's

Re: re-validate EPD test suite bm's

Re: re-validate EPD test suite bm's

Re: re-validate EPD test suite bm's

Re: re-validate EPD test suite bm's