re-validate EPD test suite bm's

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
tissatussa
Posts: 33
Joined: Sat Sep 24, 2016 4:13 am
Location: Netherlands
Full name: Roelof Berkepeis

re-validate EPD test suite bm's

Post by tissatussa »

somewhere i found a file called "IQ.epd" with 182 positions .. the third line got my attention :

Code: Select all

2q1r2k/5R1p/pp1B2pN/2p1P3/1n1b4/3P2Q1/1P4K1/8 w - - bm Qh4; id "5.IQ.931";
Image

i let several strong engines evaluate this position for a "long" time, with and without NNUE or MultiPV .. those modern engines prefer bm Qf3, not Qh4 ..

it's a difficult tricky position.
should we adjust a list like IQ.epd by re-validating all their bm's, "judged" by older engines ?
or is this position just an exception ?
-simple is not always best but best is always simple-
Vinvin
Posts: 5298
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: re-validate EPD test suite bm's

Post by Vinvin »

Qh4 seems not especially good :

6 best moves with Stockfish_23103006_x64_avx2 :

Code: Select all

 38/43	03:55	 4.046.369k	17.149k	 0,00	1.Qf4 Nd5 2.Qf3 Qe6 3.Rf8+ Kg7 4.Rxe8 Qxe8 5.Qxd5 Kxh6 6.e6 Bf6 7.b3 Kg7 8.Qb7+ Kh8 9.e7 Kg7 10.Qe4 Bxe7 11.Qxe7+ Qxe7 12.Bxe7 Kf7 13.Bd8 b5 14.Bb6 Ke6 15.Bxc5 a5 16.Kf3 Kd5 17.Be3 h5 18.Kf4 a4 19.bxa4 bxa4 20.Bc1 Kd4 21.Bb2+ Kxd3
 38/43	03:55	 4.046.369k	17.149k	 0,00	1.Rf3 Nd5 2.Qh4 Kg7 3.Rf7+ Kh8
 38/45	03:55	 4.046.369k	17.149k	 0,00	1.Qh4 Nd5 2.Nf5 Nf4+ 3.Qxf4 gxf5 4.Qh6 Qc6+ 5.Kg3 Rg8+ 6.Kf4 Be3+ 7.Kxe3 Rg3+ 8.Kd2 Qg2+ 9.Kc3 Rxd3+ 10.Kxd3 Qe4+ 11.Kd2 Qg2+
 38/59	03:55	 4.046.369k	17.149k	+0,12	1.Be7 Qb7+ 2.Kh2 Qxe7 3.Rxe7 Rxe7 4.e6 Rxe6 5.Qb8+ Kg7 6.Qg8+ Kxh6 7.Qxe6 Bxb2 8.Qe3+ Kg7 9.Qe7+ Kg8 10.Qd8+ Kg7 11.Qxb6 Bd4 12.Qa7+ Kf6 13.Kg3 h5 14.Kf3 Nd5 15.Qa8 Ne7 16.Qf8+ Ke6 17.Ke4 Nf5 18.Qa8 Kf7 19.Qb7+ Kf6 20.Qxa6+ Kg7 21.Kf4 Kf7 22.Qb7+ Kf6 23.Qc6+ Kf7 24.Qc7+ Kf6 25.Ke4 Ne3 26.Qc6+ Kg7 27.Qd7+ Kf6 28.Kf3 Nf5 29.Qd8+ Kf7 30.Qc7+ Ke6
 38/59	03:55	 4.046.369k	17.149k	+0,41	1.Qg5 Nd5 2.Nf5 Qxf5 3.Rxf5 Ne3+ 4.Kh3 Nxf5 5.b3 a5 6.Qf6+ Kg8 7.Bc7 Ng7 8.Qxb6 Bxe5 9.Qxc5 Bxc7 10.Qxc7 Re3+ 11.Kg2 Nf5 12.Kf2 Kf8 13.Qc5+ Kg7 14.Qa7+ Kf6 15.Qb6+ Kf7 16.d4 Re6 17.Qxa5 Rd6 18.b4 Nxd4 19.Qe5 Nf5 20.b5 Rd7 21.Qf4 Ke7 22.Qe4+ Kf7 23.Qf3 Ke7 24.Qh1 Kf6 25.Qh3 Ke6 26.Qb3+ Kf6 27.b6 Nd6 28.Qb2+ Ke6 29.Qa2+ Kf6 30.Qa6
 38/62	03:55	 4.046.369k	17.149k	+4,75	1.Qf3 Qc6 2.Ng4 Re6 3.Qxc6 Nxc6 4.Nf6 Rxf6 5.exf6 Bxf6 6.Rxf6 Kg7 7.Rf1 h5 8.Kf3 c4 9.Ke4 cxd3 10.Kxd3 Kh6 11.Rc1 Nd8 12.Bc7 Ne6 13.Rc6 Nxc7 14.Rxc7 Kg5 15.Ke3 h4 16.Rb7 h3 17.Kf2 h2
User avatar
tissatussa
Posts: 33
Joined: Sat Sep 24, 2016 4:13 am
Location: Netherlands
Full name: Roelof Berkepeis

Re: re-validate EPD test suite bm's

Post by tissatussa »

this is my SF 16 NNUE data :

Image
-simple is not always best but best is always simple-
Dann Corbit
Posts: 12792
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: re-validate EPD test suite bm's

Post by Dann Corbit »

Until a position has been driven to checkmate and all other moves have been proven draws or losses, it is always possible that there are better or equal moves. And test sets that were once challenging like WAC become trivial over time.

There is a great deal of effort spent in this forum to refine test sets. Vincent Lejeune and Jon Dart's test sets are constantly refined.

A test set I worked on years ago (STS) has been greatly improved recently through the efforts of many. When I first started working on STS in 2008, I was using 32 bit engines like Rybka (and two others for verification) at one hour per position. Today's best engines would get vastly superior analysis after one second on modern hardware. Both the hardware and the software will be roughly 32000 times stronger, so a billion times better analysis today.

Fifteen years from today, the current hardware and software will be one billion times better than today's as well, making our current analysis moot, other than things that are literally proven (like Chest319 output without fancy pruning in play).
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
ImNotStockfish
Posts: 56
Joined: Tue Sep 14, 2021 12:29 am
Full name: .

Re: re-validate EPD test suite bm's

Post by ImNotStockfish »

The problem that I've seen most often with these issues is just that they wanted to write "am" and instead used "bm".
User avatar
tissatussa
Posts: 33
Joined: Sat Sep 24, 2016 4:13 am
Location: Netherlands
Full name: Roelof Berkepeis

Re: re-validate EPD test suite bm's

Post by tissatussa »

@ImNotStockfish hard to believe .. please be serious, or i'll report you.
-simple is not always best but best is always simple-
User avatar
tissatussa
Posts: 33
Joined: Sat Sep 24, 2016 4:13 am
Location: Netherlands
Full name: Roelof Berkepeis

Re: re-validate EPD test suite bm's

Post by tissatussa »

@Dann Corbit : thanks for your reaction, this gives relevant background info !
-simple is not always best but best is always simple-
User avatar
Eelco de Groot
Posts: 4673
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: re-validate EPD test suite bm's

Post by Eelco de Groot »

Qh4 seems like a threat to Black's King that you may make as a poor player like me, the solution move Qf3 not so much. So I liked the explanation (from Sam Davis) especially if this testset is for humans and maybe this problem was copied from some chessbook for humans that they misinterpreted for the testset later (making it a bm) but it does seem very careless that I agree.

Qh4 clearly not good says Crystal in best four moves:

[d]2q1r2k/5R1p/pp1B2pN/2p1P3/1n1b4/3P2Q1/1P4K1/8 w - - Engine: Crystal 6 PMT (512 MB)
gemaakt door the Stockfish developers (see AUTHORS f

51 693:09 +4.44 1.Df3 Dc6 2.Pg4 Dxf3+ 3.Kxf3 Pd5
4.Ke4 h5 5.Kxd5 hxg4 6.e6 Kg8 7.Tf4 Kg7
8.Txg4 Lxb2 9.e7 Lf6 10.Ke6 g5
11.Tg1 Kg6 12.Kd7 Lxe7 13.Lxe7 Th8
14.Txg5+ (34.061.997.091) 819

50 693:09 +0.24 1.Dg5 Pd5 2.Pf5 Dxf5 3.Txf5 Pe3+
4.Kh3 Pxf5 5.b3 Kg7 6.Df6+ Kg8 7.Lc7 Pg7
8.Dc6 Te6 9.Dd5 Kh8 10.b4 a5 11.bxa5 bxa5
12.Ld6 h6 13.Lxc5 Lxe5 14.d4 (34.061.997.091) 819

50 693:09 0.00 1.Le7 Db7+ 2.Kh2 Dxe7 3.Txe7 Txe7
4.e6 Txe6 5.Db8+ Kg7 6.Dg8+ Kxh6
7.Dxe6 Lxb2 8.Dxb6 Ld4 9.Dd8 Pxd3
10.Dh4+ Kg7 11.De7+ Kg8 12.Dd8+ Kg7 (34.061.997.091) 819

50 693:09 0.00 1.Dh4 Pd5 2.Pf5 Pf4+ 3.Dxf4 gxf5
4.Dh6 Dc6+ 5.Kg3 Tg8+ 6.Kf4 Le3+
7.Kxe3 Tg3+ 8.Kd2 Dg2+ 9.Kc3 Txd3+
10.Kxd3 De4+ 11.Kc3 Db4+ 12.Kc2 De4+ (34.061.997.091) 819
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
User avatar
tissatussa
Posts: 33
Joined: Sat Sep 24, 2016 4:13 am
Location: Netherlands
Full name: Roelof Berkepeis

Re: re-validate EPD test suite bm's

Post by tissatussa »

@Eelco de Groot who is Sam Davis ?
-simple is not always best but best is always simple-
User avatar
Eelco de Groot
Posts: 4673
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: re-validate EPD test suite bm's

Post by Eelco de Groot »

Hello, seriously, I do not know really who, I just know he is not Stockfish. Look at the 'Full name' field at right.
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan