Old S80 Test Suite revisited

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
Marek Soszynski
Posts: 586
Joined: Wed May 10, 2006 7:28 pm
Location: Birmingham, England

Old S80 Test Suite revisited

Post by Marek Soszynski »

Years ago I compiled a test suite - "The Soszynski 80" or S80 for short - but abandoned it after several of the solutions were doubted and I hadn't the time to defend them analytically where that was possible.

http://sccs.muldermedia.de/service/down ... d&file=s80

How does S80 fare today, after eight years of advances in hardware and software? Most of the positions can be solved almost immediately. But the remainder are interesting if only because some that were questioned now seem correct, while one or two that weren't challenged now seem questionable.

Here's a position from the latter category:

1.e4 e5 2.Nc3 Nc6 3.Bc4 Na5
[d]r1bqkbnr/pppp1ppp/8/n3p3/2B1P3/2N5/PPPP1PPP/R1BQK1NR w KQkq - 0 4

What does your engine play?
Marek Soszynski
User avatar
Eelco de Groot
Posts: 4673
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: Old S80 Test Suite revisited

Post by Eelco de Groot »

It seems that Black almost has enough compensation if he plays 5... g6.

Running two engines parallel, Athlon 2009 MHz, can't look very deep, first I let both Togas analyze position after 3.. Na5 and both engines quickly choose 4. Bxf7. Then I went a few moves forward after 5. Qh5+. Analysis stopped after 18 ply deep for Toga Mara Beta 4, when the score seemed to level off to about half a pawn in the red.


1. e4 e5 2. Nc3 Nc6 3. Bc4 Na5 4. Bxf7+ Kxf7 5. Qh5+ *

[d]r1bq1bnr/pppp1kpp/8/n3p2Q/4P3/2N5/PPPP1PPP/R1B1K1NR b KQ -

Engine: Toga Mara Beta 4 (64 MB)
by Fabien Letouzey, Thomas Gaksch and Tam

14/40 1:43 -1.91 5...g6 6.Qxe5 Nc6 7.Qxh8 h6 8.Qh7+ Bg7
9.e5 d6 10.Ne4 dxe5 11.f3 Nf6
12.Nxf6 Qxf6 13.Ne2 Nb4 14.O-O Nxc2
15.Rb1 Qb6+ 16.Rf2 (31.278.985) 301

15/40 2:18 -1.44 5...g6 6.Qxe5 Nc6 7.Qxh8 h6 8.Qh7+ Bg7
9.e5 d6 10.Ne4 dxe5 11.f3 Qh4+
12.Kd1 Qxe4 13.fxe4 Nf6 14.Qxg7+ Kxg7
15.d3 Bg4+ 16.Nf3 Bxf3+ 17.gxf3 Nd4
18.Rf1 (43.134.945) 311

16/53 4:28 -0.78 5...g6 6.Qxe5 Nc6 7.Qxh8 h6 8.Qh7+ Bg7
9.Nd5 Nb4 10.Nf3 d6 11.Nf4 Nxc2+
12.Kd1 Nxa1 13.Qxg6+ Kf8 14.Nh5 Qe7
15.b3 Nf6 16.Nxg7 Qxg7 (86.601.237) 321

17/53 9:58 -0.59 5...g6 6.Qxe5 Nc6 7.Qxh8 h6 8.Qh7+ Bg7
9.Nd5 Nb4 10.Nf3 d6 11.Nf4 Nxc2+
12.Kd1 Nxa1 13.e5 Ne7 14.e6+ Kf6
15.b4 Qh8 16.Bb2+ Kf5 17.Qxg7 Qxg7
18.Bxg7 Kxf4 (198.346.770) 331

18/55 24:27 -0.59 5...g6 6.Qxe5 Nc6 7.Qxh8 h6 8.Qh7+ Bg7
9.Nd5 Nb4 10.Nf3 d6 11.Nf4 Nxc2+
12.Kd1 Nxa1 13.e5 Ne7 14.e6+ Kf6
15.b4 Qh8 16.Bb2+ Kf5 17.Qxg7 Qxg7
18.Bxg7 Kxf4 (492.955.925) 335

best move: g7-g6 time: 26:06.750 min n/s: 335.807 CPU 100.0% n/s(1CPU): 335.807 nodes: 526.120.000

And in parallel a 3.1.2 type Toga, I have not checked what settings or exactly which build this was. There is a huge list with at least 30 Toga Checkov's that is just 30 different settings, not counting different builds and this is just in one copy of the Shredder interface. Then there are new Toga Maras, 4 versions and 30 Builds already, the Toga Blueberry's.. One loses track after a while, it is impossible to keep them all apart. :( This one found 5... g6 a bit faster and already in the first position but Toga Mara was faster there to play Bxf7.

r1bq1bnr/pppp1kpp/8/n3p2Q/4P3/2N5/PPPP1PPP/R1B1K1NR b KQ -

Engine: Toga II 3.1.2 Checkov 1 (64 MB)
by Thomas Gaksch and Fabien Letouzey


14/47 0:10 -1.49 5...g6 6.Qxe5 Nf6 7.Qxa5 d5 8.e5 d4
9.exf6 dxc3 10.Nf3 Qxf6 11.Ne5+ Kg8
12.bxc3 Bg7 13.d4 Qb6 14.Qxb6 axb6
15.Bg5 Be6 (3.726.273) 360

15/47 0:26 -1.67 5...g6 6.Qxe5 Nf6 7.Qxa5 d5 8.e5 d4
9.exf6 dxc3 10.Nf3 Qxf6 11.Qxc7+ Qe7+
12.Qxe7+ Bxe7 13.dxc3 Bf5 14.Nd4 Bg4
15.Bf4 g5 16.h3 Bd7 17.Be5 (9.555.119) 363

16/54 1:10 -1.69 5...g6 6.Qxe5 Nf6 7.Qxa5 d5 8.e5 d4
9.exf6 dxc3 10.Nf3 Qxf6 11.Qxc7+ Qe7+
12.Qxe7+ Bxe7 13.dxc3 Bf5 14.Nd4 Bg4
15.Bf4 Rhe8 16.O-O g5 17.Be5 Rad8
18.Rfe1 (25.761.035) 367

17/58 3:44 -1.22++ 5...g6 6.Qxe5 Nc6 7.Qxh8 h6 8.Qh7+ Bg7
9.Nd5 Nb4 10.Nf3 d6 11.Nf4 Qf6
12.O-O Qxf4 13.d3 Qf6 14.Re1 Nxc2
15.e5 (83.755.511) 372

18/58 4:53 -1.51 5...g6 6.Qxe5 Nc6 7.Qxh8 h6 8.Qh7+ Bg7
9.Nd5 Nb4 10.Nf3 d6 11.Nf4 Qf6
12.O-O Qxf4 13.d3 Qf6 14.Re1 Nxc2
15.e5 Qe7 (110.152.837) 375

19/63 14:25 -0.74++ 5...g6 6.Qxe5 Nc6 7.Qxh8 h6 8.Qh7+ Bg7
9.Nd5 Nb4 10.Nf3 d6 11.Nf4 Nxc2+
12.Ke2 Nxa1 13.e5 dxe5 14.Nxe5+ Kf6
15.Qxg6+ Kxe5 16.d4+ Qxd4 17.Qxg7+ Nf6
18.Qxc7+ Kf5 (332.322.415) 384

20/63 18:23 -0.53 5...g6 6.Qxe5 Nc6 7.Qxh8 h6 8.Qh7+ Bg7
9.Nd5 Nb4 10.Nf3 d6 11.Nf4 Qf6 12.d4 Nxc2+
13.Ke2 Nxa1 14.Be3 Nc2 15.Rc1 Nxe3
16.Rxc7+ Ne7 17.fxe3 Bg4 18.Rxb7 (423.806.542) 384

best move: g7-g6 time: 26:06.765 min n/s: 384.391 CPU 100.0% n/s(1CPU): 384.391 nodes: 602.250.000

Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
User avatar
Marek Soszynski
Posts: 586
Joined: Wed May 10, 2006 7:28 pm
Location: Birmingham, England

Re: Old S80 Test Suite revisited

Post by Marek Soszynski »

Eelco,

How long before the different Togas avoid 4.Bxf7?! (if they do)? The position seems to be a good test of over-tactical engines.
Marek Soszynski
User avatar
Eelco de Groot
Posts: 4673
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: Old S80 Test Suite revisited

Post by Eelco de Groot »

Hello Marek,

I haven't checked that, but White seems to have at least a draw and enough compensation for a full piece. For the moment. The other moves all seem to lead to a draw score also going by Rybka 2.2n2 analysis. Toga does not have Multi-PV in this edition or I would try that. I think, chances are that Toga just would keep playing the Bishop sac! If it leads to a loss it would only find out later!

Rybka in four best moves chooses another move at seventeen plies:


[d]r1bqkbnr/pppp1ppp/8/n3p3/2B1P3/2N5/PPPP1PPP/R1BQK1NR w KQkq -

Engine: Rybka 2.2n2 mp 32-bit PVtips5menbases (128 MB)
by Vasik Rajlich

15 1:52 +0.67 4.Bxf7+ Kxf7 5.Qh5+ Ke6 6.Qf5+ Kd6
7.d4 Kc6 8.Qxe5 d5 9.exd5+ Kb6 10.d6 cxd6
11.Qb5+ Kc7 12.Qxa5+ b6 13.Qb5 Nf6
14.Nf3 Bd7 15.Qc4+ (11.582.355) 105

15 2:08 +0.15 4.Be2 Bc5 5.Nf3 d6 6.Na4 Nc6 7.O-O Nf6
8.Nxe5 Nxe5 9.d4 Nxe4 10.Nxc5 Nxc5
11.dxc5 dxc5 12.Be3 (13.245.696) 105

15 3:13 +0.14 4.Qe2 Nxc4 5.Qxc4 c6 6.Nf3 b5 7.Qb3 Qc7
8.O-O Nf6 9.d3 Be7 10.Bg5 O-O 11.a4 bxa4 (19.346.484) 102

15 2:26 +0.12 4.Bb3 Nf6 5.Nf3 Bb4 6.O-O d6 7.d4 O-O
8.dxe5 dxe5 9.Nxe5 Nxb3 10.axb3 Qxd1
11.Rxd1 Bxc3 12.bxc3 Nxe4 13.Rd3 f6
14.Nd7 Bxd7 (14.753.296) 102

_____________________________________________________________


16 4:17 +0.67 4.Bxf7+ Kxf7 5.Qh5+ Ke6 6.Qf5+ Kd6
7.d4 Kc6 8.Qxe5 d5 9.exd5+ Kb6 10.d6 cxd6
11.Qb5+ Kc7 12.Qxa5+ b6 13.Qb5 Nf6
14.Nf3 Bd7 15.Qc4+ (26.718.368) 106

16 4:42 +0.24 4.Be2 Bc5 5.Nf3 d6 6.Na4 Nc6 7.O-O Nf6
8.Nxe5 Nxe5 9.d4 Nxe4 10.Nxc5 Nxc5
11.dxe5 dxe5 12.Qxd8+ Kxd8 13.Rd1+ Bd7
14.Rd5 (28.868.651) 104

16 5:06 +0.15 4.Qe2 Nxc4 5.Qxc4 c6 6.Nf3 b5 7.Qe2 Qc7
8.O-O b4 9.Nd1 Nf6 10.Qc4 a5 (30.862.038) 103

16 6:15 +0.14 4.Bb5 Nf6 5.Nf3 c6 6.Bd3 d5 7.exd5 Bd6
8.b4 Bxb4 9.Bb2 Bg4 10.O-O cxd5
11.h3 (36.756.926) 100


_____________________________________________________________


17 10:09 +0.16 4.Be2 Bc5 5.Nf3 d6 6.Na4 Nc6 7.O-O Nf6
8.Nxe5 dxe5 9.Nxc5 Qe7 10.Na4 Nxe4
11.d3 (62.760.814) 105

17 11:58 +0.14 4.Bb5 Nf6 5.Nf3 Bb4 6.O-O O-O 7.a3 Bxc3
8.dxc3 d6 9.b4 Nc6 10.Qd3 Qe7 11.h3 (72.231.413) 102

17 10:52 +0.11 4.Qe2 Nxc4 5.Qxc4 c6 6.Nf3 b5 7.Qe2 Qa5
8.Nxe5 b4 9.Nc4 Qg5 10.f4 Qxf4 11.d4 Qh4+
12.g3 (66.456.640) 104

17 13:25 +0.10 4.Bb3 Bc5 5.Nf3 d6 6.O-O Nf6 7.d4 exd4
8.Nxd4 O-O 9.Bf4 Nc6 10.Nxc6 bxc6
11.Qd3 Ng4 (80.158.765) 101
'
'
17 9:19 +0.05 4.Bxf7+ Kxf7 5.Qh5+ g6 6.Qxe5 Nc6
7.Qxh8 h6 8.Nd5 Bg7 9.Qh7 Nb4 10.Nf3 d6
11.Nf4 Nxc2+ 12.Kd1 Nxa1 13.Qxg6+ (57.565.385) 105

I also tried the Soszynski 80 test at a very fast timecontrol, it seems a good tactical test to see if something is really wrong for a new programbuild and takes just six minutes 8-) For impatient programmers!

5 seconds per position to compensate for the faster hardware, just the Athlon, but on a quad some programs should still solve well over sixty I think:

Toga Mara Beta 4

Code: Select all

 Right until now: 46 of 80  ;  3:41m
I/O Fault!

         1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
 -------------------------------------------------------------------------------------
   0 |   0   0   0   0   -   -   1   0   0   -   0   0   -   -   0   0   -   0   0   2
  20 |   -   -   -   0   -   3   0   -   -   -   3   -   0   -   -   -   1   4   -   -
  40 |   0   -   0   0   -   2   1   0   0   0   -   -   1   3   0   0   0   -   -   -
  60 |   -   0   -   1   -   -   3   1   0   1   0   0   3   0   -   -   0   -   -   3

   1 sec ->  30/80
   2 sec ->  37/80
   3 sec ->  39/80
   4 sec ->  45/80
   5 sec ->  46/80
  n/s: 663.386  
  TotTime: 6:47m    SolTime: 3:41m
 
Toga Mara Beta 2

Code: Select all

Right until now: 51 of 80  ;  3:26m
I/O Fault!

         1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
 -------------------------------------------------------------------------------------
   0 |   0   0   0   0   4   0   3   0   0   -   0   0   1   -   0   1   -   0   0   1
  20 |   -   1   4   2   -   0   0   -   -   0   -   -   0   -   -   -   1   4   -   -
  40 |   0   -   0   0   -   2   1   0   0   -   4   -   3   0   -   0   0   -   1   -
  60 |   -   0   -   4   -   -   1   0   0   0   0   1   2   0   -   -   0   -   -   2

   1 sec ->  31/80
   2 sec ->  40/80
   3 sec ->  44/80
   4 sec ->  46/80
   5 sec ->  51/80
  n/s: 664.987  
  TotTime: 6:47m    SolTime: 3:26m
 
Rybka 2.2n2

Code: Select all

Right until now: 50 of 80  ;  3:15m
I/O Fault!

         1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
 -------------------------------------------------------------------------------------
   0 |   0   0   -   -   4   -   0   0   0   1   0   4   0   -   0   2   -   0   3   0
  20 |   -   2   -   0   -   0   1   1   -   3   1   -   0   -   -   -   3   0   -   3
  40 |   0   -   0   0   0   4   1   0   -   4   4   -   -   -   1   0   0   -   -   -
  60 |   -   1   -   -   -   -   0   2   0   2   -   0   1   0   -   0   0   1   -   0

   1 sec ->  28/80
   2 sec ->  37/80
   3 sec ->  41/80
   4 sec ->  45/80
   5 sec ->  50/80
  n/s: 126.123  
  TotTime: 5:37m    SolTime: 3:15m
 


Soszynski 80 Ranking

Code: Select all

               solved   T     CCRL-Rating  CPU         Hash
   Toga Mara 2   51     206   ----         Athlon 2GHz 128 Mb
   Rybka 2.2n2   50     195   2990         Athlon 2GHz 128 Mb
   Toga Mara 4   46     221   ----         Athlon 2GHz 128 Mb
   
 
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
User avatar
Marek Soszynski
Posts: 586
Joined: Wed May 10, 2006 7:28 pm
Location: Birmingham, England

Re: Old S80 Test Suite revisited

Post by Marek Soszynski »

Eelco,

The test suite needs some attention before it can be recommended. In the Vienna Game position the best move can be changed to an avoid move; maybe that can be usefully done with other doubtful solutions too. I will look into it.
Marek Soszynski