stockfish and test positions

Uri Blass · Post by **Uri Blass** » Tue May 31, 2011 7:51 am

I noticed that stockfish is relatively very slow in some test positions
and it seems that it does too much pruning

for example
This position from Arasan's test suite

old glaurung can solve it in 22 seconds and depth 12

[D]2q2r2/3n1p2/p2p2k1/3PpRp1/P1n1P3/2P2QB1/1rB1R1P1/6K1 w - - 0 1

New game, 5'/40+5'/40+5'/40
2q2r2/3n1p2/p2p2k1/3PpRp1/P1n1P3/2P2QB1/1rB1R1P1/6K1 w - - 0 1

Analysis by Glaurung 1.2.1:

1.Qf3-g4 Kg6-g7 2.Qg4xg5+ Kg7-h8
+- (1.56) Depth: 2 00:00:00
1.Bc2-d3 Rb2xe2 2.Bd3xe2
+- (3.32) Depth: 2 00:00:00
1.Bc2-d3 Kg6-h6 2.Re2xb2 Nc4xb2
+- (2.45) Depth: 3 00:00:00
1.Qf3-g4 f7-f6 2.Qg4-h3
+- (4.00) Depth: 3 00:00:00
1.Qf3-g4 f7-f6 2.Bc2-d3 Nd7-b6 3.Re2xb2 Nc4xb2
+- (3.15) Depth: 4 00:00:00
1.Qf3-g4 f7-f6 2.Bc2-d3 Rb2xe2 3.Bd3xe2 Qc8-c5+ 4.Rf5-f2
+- (3.12) Depth: 5 00:00:00
1.Qf3-g4 f7-f6 2.Bc2-d3 Rb2xe2 3.Qg4xe2 Qc8-c5+ 4.Rf5-f2
+- (2.35) Depth: 6 00:00:00 15kN
1.Qf3-g4 f7-f6 2.Bc2-d3 Rb2xe2 3.Qg4xe2 Nd7-b6 4.Qe2-d1
+- (2.29) Depth: 7 00:00:00 61kN
1.Qf3-g4 f7-f6 2.Bc2-d3 Rb2xe2 3.Qg4xe2 Nd7-b6 4.a4-a5 Nc4xa5 5.Bd3xa6 Qc8xc3
+- (2.23) Depth: 8 00:00:00 139kN
1.Qf3-g4 f7-f6 2.Bc2-d3 Nd7-c5 3.Bd3xc4 Nc5xe4 4.Bc4-d3 Qc8xf5 5.Bd3xe4
+- (2.15) Depth: 9 00:00:02 314kN
1.Qf3-g4 f7-f6 2.Bc2-d3 Nd7-c5 3.Bd3xc4 Nc5xe4 4.Qg4xe4 Rb2xe2 5.Rf5xg5+ Kg6xg5 6.Bg3-h4+ Kg5-h6 7.Bc4xe2 Qc8xc3 8.Be2xa6
+- (1.43) Depth: 10 00:00:03 808kN
1.Bc2-d3 f7-f6 2.Bg3-e1 Qc8-c5+ 3.Be1-f2 Qc5-c8 4.Re2xb2 Nc4xb2 5.Bd3-f1 Kg6-g7 6.Qf3-h5 Nb2xa4
+- (1.65) Depth: 10 00:00:04 1139kN
1.Bc2-d3 f7-f6 2.Qf3-g4 Rb2xe2 3.Qg4xe2 Nc4-a3 4.Bd3xa6 Qc8xc3 5.Rf5-f3 Qc3-d4+ 6.Bg3-f2 Qd4xa4
+- (1.75) Depth: 11 00:00:08 2379kN
1.Bc2-d3 f7-f6 2.Qf3-g4 Rb2xe2 3.Qg4xe2 Nc4-b6 4.Bg3-f2 Nd7-c5 5.Bd3xa6 Qc8xa6 6.Qe2xa6 Nc5xa6 7.Bf2xb6
+- (1.48) Depth: 12 00:00:17 4729kN
1.Rf5xg5+ Kg6xg5 2.Qf3-f5+ Kg5-h6 3.Bg3-f4+ e5xf4 4.g2-g4 Qc8-c5+ 5.Kg1-h1 Rb2xc2 6.Re2xc2 Nc4-d2 7.Rc2xd2 Qc5xd5 8.e4xd5 Nd7-e5 9.Qf5-f6+ Kh6-h7 10.Qf6xd6
+- (8.57) Depth: 12 00:00:20 5977kN
1.Rf5xg5+ Kg6xg5 2.Qf3-f5+ Kg5-h6 3.Bg3-f4+ e5xf4 4.g2-g4 Qc8-c5+ 5.Kg1-h1 Rb2xc2 6.Re2xc2 Qc5-g1+ 7.Kh1xg1 Nc4-e5 8.Rc2-h2+ Kh6-g7 9.Qf5-h7+ Kg7-f6 10.Qh7-h4+ Kf6-g7 11.Qh4-h6+ Kg7-g8 12.Qh6-h7#
+- (#12) Depth: 13 00:00:22 6779kN
1.Rf5xg5+ Kg6xg5 2.Qf3-f5+ Kg5-h6 3.Bg3-f4+ e5xf4 4.g2-g4 Qc8-c5+ 5.Kg1-h1 Rb2xc2 6.Re2xc2 Qc5-g1+ 7.Kh1xg1 Nc4-e5 8.Rc2-h2+ Kh6-g7 9.Qf5-h7+ Kg7-f6 10.Qh7-h4+ Kf6-g7 11.Qh4-h6+ Kg7-g8 12.Qh6-h7#
+- (#12) Depth: 14 00:00:26 7779kN

(, 31.05.2011)

new stockfish 1 cpu cannot find it in a small time and needs depth 25

New game, 5'/40+5'/40+5'/40
2q2r2/3n1p2/p2p2k1/3PpRp1/P1n1P3/2P2QB1/1rB1R1P1/6K1 w - - 0 1

Analysis by Stockfish 2.1.1 JA:

1.Qf3-g4 Qc8-c5+ 2.Kg1-h1
+- (4.88) Depth: 1/1 00:00:00
1.Bc2-d3 Kg6-h6 2.Rf5xf7 Qc8-c5+ 3.Kg1-h2 Rf8xf7 4.Qf3xf7
+- (2.86) Depth: 2/3 00:00:00
1.Bc2-d3 f7-f6 2.Re2xb2 Nc4xb2
+- (2.78) Depth: 3/3 00:00:00
1.Bc2-d3 Rb2xe2 2.Bd3xe2 Kg6-g7 3.Rf5xg5+ Kg7-h8
+- (2.38) Depth: 4/5 00:00:00
1.Bc2-d3 f7-f6 2.Qf3-f2 Rb2-b3 3.Re2-c2 Rb3-b1+ 4.Bd3-f1
+- (1.81) Depth: 5/6 00:00:00
1.Re2-f2 Rb2xc2 2.Rf2xc2 f7-f6 3.Rc2-f2 Kg6-g7
+- (2.26) Depth: 6/8 00:00:01 14kN
1.Re2-f2 Rb2xc2 2.Rf2xc2 f7-f6 3.Rc2-f2 Kg6-h6 4.Qf3-e2 Kh6-g7
+- (2.50) Depth: 7/11 00:00:01 18kN
1.Re2-f2 f7-f6 2.Bc2-d3 Rb2xf2 3.Bg3xf2 Kg6-g7 4.Qf3-e2 Nd7-b6 5.Qe2-h5 Nb6xa4
± (1.05) Depth: 8/12 00:00:01 47kN
1.Bc2-d3 f7-f6 2.Qf3-g4 Rb2xe2 3.Qg4xe2 Nd7-b6 4.Bg3-f2 Nb6xa4 5.Bd3xc4 Na4xc3 6.Qe2-d3 Nc3xe4
± (1.29) Depth: 9/13 00:00:01 80kN
1.Bc2-d3 f7-f6 2.Bg3-e1 Rb2xe2 3.Qf3xe2 Nd7-b6 4.a4-a5 Nc4xa5 5.Bd3xa6 Qc8-c7 6.Be1-f2 Nb6xd5 7.c3-c4
± (0.96) Depth: 10/17 00:00:01 152kN
1.Bc2-d3 f7-f6 2.Bg3-e1 Rb2xe2 3.Qf3xe2 Nd7-b6 4.a4-a5 Nc4xa5 5.Bd3xa6 Qc8-c7 6.Rf5-f2 Kg6-g7 7.Be1-d2
± (0.88) Depth: 11/19 00:00:02 195kN
1.Bc2-d3 f7-f6 2.Bg3-e1 Rb2xe2 3.Qf3xe2 Nd7-b6 4.a4-a5 Nc4xa5 5.Bd3xa6 Qc8-c7 6.Rf5-f2 Kg6-g7 7.Ba6-d3 Rf8-h8 8.Qe2-f3
± (1.05) Depth: 12/19 00:00:02 288kN
1.Bc2-d3 f7-f6 2.Bg3-e1 Rb2xe2 3.Qf3xe2 Nd7-b6 4.Rf5-f3 Kg6-g7 5.Be1-f2 a6-a5 6.Rf3-h3 g5-g4 7.Rh3-g3 Rf8-h8 8.Rg3xg4+ Kg7-f7
± (1.25) Depth: 13/22 00:00:02 443kN
1.Bc2-d3 f7-f6 2.Bg3-e1 Rb2xe2 3.Qf3xe2 Nd7-b6 4.a4-a5 Nc4xa5 5.Bd3xa6 Qc8-c7 6.Rf5-f3 Kg6-g7 7.Qe2-f2 Nb6-c4 8.Rf3-h3 Qc7-b6 9.Qf2xb6 Nc4xb6 10.Ba6-b5
± (1.13) Depth: 14/24 00:00:06 851kN
1.Bg3-h4 f7-f6 2.Qf3-g4 Qc8-d8 3.Bc2-d3 Rb2xe2 4.Qg4xe2 Nc4-b2 5.Qe2xb2 g5xh4 6.Qb2-e2 Nd7-c5 7.Qe2-h5+ Kg6-g7 8.Qh5-g4+ Kg7-f7 9.Bd3-e2 Rf8-g8 10.Qg4xh4
± (0.92) Depth: 15/26 00:00:11 2998kN
1.Bg3-h4 f7-f6 2.Qf3-g4 Qc8-d8 3.Bc2-d3 Rb2xe2 4.Bd3xe2 Nc4-e3 5.Qg4-h5+ Kg6-g7 6.Rf5xg5+ f6xg5 7.Bh4xg5 Nd7-f6 8.Qh5-h6+ Kg7-g8 9.Qh6-g6+ Kg8-h8 10.Bg5xe3 Qd8-d7 11.Be2xa6 Qd7-g7 12.Qg6-h6+ Qg7xh6 13.Be3xh6 Rf8-b8
± (1.37) Depth: 16/30 00:00:16 5358kN
1.Bg3-h4 f7-f6 2.Qf3-g4 Qc8-d8 3.Bc2-d3 Rb2xe2 4.Bd3xe2 Nc4-e3 5.Qg4-h5+ Kg6-g7 6.Rf5xg5+ f6xg5 7.Bh4xg5 Nd7-f6 8.Qh5-h6+ Kg7-g8 9.Qh6-g6+ Kg8-h8 10.Bg5xe3 Qd8-d7 11.Be2xa6 Qd7-g7 12.Qg6-f5 Nf6-g4 13.Qf5-h5+ Kh8-g8 14.Be3-b6
± (1.25) Depth: 17/34 00:00:19 6023kN
1.Bg3-h4 f7-f6 2.Qf3-g4 Qc8-d8 3.Bc2-d3 Rb2xe2 4.Bd3xe2 Rf8-h8 5.Bh4-f2 Nd7-b6 6.Rf5-f3 Kg6-g7 7.a4-a5 Nc4xa5 8.Be2xa6 Na5-b3 9.Ba6-e2 Qd8-c8 10.Qg4-e6 Qc8xe6 11.d5xe6 Nb6-a4 12.Be2-b5 Na4-c5
² (0.68) Depth: 18/34 00:00:30 12034kN
1.Bg3-h4 f7-f6 2.Qf3-g4 Qc8-d8 3.Bc2-d3 Rb2xe2 4.Bd3xe2 Rf8-h8 5.Bh4-f2 Nd7-b6 6.Rf5-f3 Kg6-g7 7.Rf3-h3 Qd8-c8 8.Rh3xh8 Qc8xg4 9.Be2xg4 Kg7xh8 10.Bg4-e2 a6-a5 11.Kg1-h2 Kh8-g7 12.Kh2-g3 Kg7-g6
± (0.72) Depth: 19/34 00:00:37 15333kN
1.Bg3-h4 f7-f6 2.Qf3-g4 Qc8-d8 3.Bc2-d3 Rb2xe2 4.Bd3xe2 Rf8-h8 5.Bh4-f2 Nd7-b6 6.a4-a5 Nc4xa5 7.Be2xa6 Kg6-g7 8.Qg4-e2 Qd8-c7 9.Qe2-f3 Nb6-d7 10.Ba6-b5 Na5-c4 11.Qf3-e2 Nd7-b6 12.Rf5-f3 Kg7-f7 13.Kg1-f1 Kf7-g7
± (0.72) Depth: 20/34 00:00:46 19281kN
1.Bg3-h4 f7-f6 2.Qf3-g4 Qc8-d8 3.Bc2-d3 Rb2xe2 4.Bd3xe2 Rf8-h8 5.Bh4-f2 Nd7-b6 6.a4-a5 Nc4xa5 7.Be2xa6 Kg6-g7 8.Qg4-e2 Qd8-c7 9.Qe2-f3 Nb6-d7 10.Ba6-b5 Na5-c4 11.Qf3-e2 Nd7-b6 12.Rf5-f3 Kg7-f7 13.Kg1-f1 Kf7-g7
± (0.72) Depth: 21/34 00:00:55 23548kN
1.Bg3-h4 f7-f6 2.Bh4xg5 f6xg5 3.Re2-f2 Rb2xc2 4.Rf2xc2 Nd7-f6 5.Rc2-f2 Qc8-d8 6.g2-g4 a6-a5 7.Kg1-f1 Nc4-b6 8.Rf2-h2 Kg6-f7 9.Qf3-h3 Nb6xa4 10.Qh3-h7+ Kf7-e8 11.Qh7-g6+ Ke8-d7 12.Rh2-f2 Na4xc3 13.Rf5xf6 Rf8xf6 14.Rf2xf6 Nc3xe4 15.Rf6-f7+ Kd7-e8
± (0.96) Depth: 22/35 00:01:50 47475kN
1.Bg3-h4 f7-f6 2.Bh4xg5 f6xg5 3.Re2-f2 Rb2xc2 4.Rf2xc2 Nd7-f6 5.Rc2-f2 Qc8-d8 6.Qf3-e2 Qd8-c8 7.Qe2-d3 Qc8-c7 8.g2-g4 Qc7-c8 9.Qd3-f3 Qc8-d8 10.Kg1-g2 Nc4-b6 11.a4-a5 Nb6-c4 12.Qf3-d3 Qd8-c8 13.Qd3-h3 Kg6-g7
² (0.68) Depth: 23/39 00:03:13 83531kN
1.Bg3-h4 f7-f6 2.Bh4xg5 f6xg5 3.Re2-f2 Rb2xc2 4.Rf2xc2 Nd7-f6 5.Rc2-f2 Qc8-d8 6.Qf3-e2 Qd8-c8 7.Qe2-d3 Qc8-c7 8.g2-g4 a6-a5 9.Qd3-e2 Qc7-c8 10.Qe2-f3 Qc8-d8 11.Qf3-h3 Qd8-c7 12.Kg1-h1 Qc7-e7 13.Rf2-h2 Qe7-g7 14.Rf5xf6+ Kg6xf6 15.Qh3-f1+ Kf6-g6 16.Qf1xc4 Qg7-a7
± (0.76) Depth: 24/39 00:04:39 116812kN
1.Rf5xg5+ Kg6xg5 2.Qf3-f5+ Kg5-h6 3.Bg3-f4+ e5xf4 4.g2-g4 Qc8-c5+ 5.Kg1-h1 Rb2xc2 6.Re2xc2 Kh6-g7 7.Rc2-h2 Rf8-h8 8.Qf5-g5+
+- (105.64) Depth: 25/39 00:08:38 217951kN

(, 31.05.2011)

The reason is that stockfish prunes too much and I wonder if the developers try to improve stockfish by having rules to do less pruning
in relevant positions.

I do not suggest to accept a change if it does better in test suites but only that it is a direction for improvement.

The idea is to start with testing a change that say do not prune
if you sacrifice and have a compensation in the term of strong attack(when the change improve stockfish's results in test suites)
and if the change does not work in games then to try to change the definition of strong attack to be as minimalistic as possible but still help to solve the test positions(so it is going to have a relatively small effect in positions with no good sacrifices)

Don · Post by **Don** » Tue May 31, 2011 6:13 pm

Uri Blass wrote:I noticed that stockfish is relatively very slow in some test positions
and it seems that it does too much pruning

for example
This position from Arasan's test suite

old glaurung can solve it in 22 seconds and depth 12

.... stuff

new stockfish 1 cpu cannot find it in a small time and needs depth 25

... stuff

The reason is that stockfish prunes too much and I wonder if the developers try to improve stockfish by having rules to do less pruning
in relevant positions.

This concept is the holy grail of computer chess.

I do not suggest to accept a change if it does better in test suites but only that it is a direction for improvement.

The idea is to start with testing a change that say do not prune
if you sacrifice and have a compensation in the term of strong attack(when the change improve stockfish's results in test suites)
and if the change does not work in games then to try to change the definition of strong attack to be as minimalistic as possible but still help to solve the test positions(so it is going to have a relatively small effect in positions with no good sacrifices)

In general it's difficult to tune by using tactical positions as a guide. Computers are rarely losing games in positions where (from our point of view) there is a clear tactical shot, but they still make dubious moves that make it more and more difficult to hold the position.

An experiment we could do is to run the new test suite, or any other good tactical set against several versions of the same program and see how the tactical performance changes with each version. You have stockfish going back to glaurung for example, and you have several version of Komodo going back to Doch and in all of these versions you have ELO improvements spanning several hundred points. How strong is the correlation in general between the programs and their game tested strength?

Dann Corbit · Post by **Dann Corbit** » Tue May 31, 2011 9:04 pm

Don wrote:
Uri Blass wrote:I noticed that stockfish is relatively very slow in some test positions
and it seems that it does too much pruning

for example
This position from Arasan's test suite

old glaurung can solve it in 22 seconds and depth 12

.... stuff

new stockfish 1 cpu cannot find it in a small time and needs depth 25

... stuff

The reason is that stockfish prunes too much and I wonder if the developers try to improve stockfish by having rules to do less pruning
in relevant positions.

This concept is the holy grail of computer chess.

I do not suggest to accept a change if it does better in test suites but only that it is a direction for improvement.

The idea is to start with testing a change that say do not prune
if you sacrifice and have a compensation in the term of strong attack(when the change improve stockfish's results in test suites)
and if the change does not work in games then to try to change the definition of strong attack to be as minimalistic as possible but still help to solve the test positions(so it is going to have a relatively small effect in positions with no good sacrifices)
In general it's difficult to tune by using tactical positions as a guide. Computers are rarely losing games in positions where (from our point of view) there is a clear tactical shot, but they still make dubious moves that make it more and more difficult to hold the position.

An experiment we could do is to run the new test suite, or any other good tactical set against several versions of the same program and see how the tactical performance changes with each version. You have stockfish going back to glaurung for example, and you have several version of Komodo going back to Doch and in all of these versions you have ELO improvements spanning several hundred points. How strong is the correlation in general between the programs and their game tested strength?

I think that the quiet positions are the area that computers need to focus on. They all do very well in tactical positions, but in a quiet position with no captures in the next ten ply or so, what move do they choose?

Don · Post by **Don** » Tue May 31, 2011 9:19 pm

Dann Corbit wrote:
Don wrote:
Uri Blass wrote:I noticed that stockfish is relatively very slow in some test positions
and it seems that it does too much pruning

for example
This position from Arasan's test suite

old glaurung can solve it in 22 seconds and depth 12

.... stuff

new stockfish 1 cpu cannot find it in a small time and needs depth 25

... stuff

The reason is that stockfish prunes too much and I wonder if the developers try to improve stockfish by having rules to do less pruning
in relevant positions.

This concept is the holy grail of computer chess.

I do not suggest to accept a change if it does better in test suites but only that it is a direction for improvement.

The idea is to start with testing a change that say do not prune
if you sacrifice and have a compensation in the term of strong attack(when the change improve stockfish's results in test suites)
and if the change does not work in games then to try to change the definition of strong attack to be as minimalistic as possible but still help to solve the test positions(so it is going to have a relatively small effect in positions with no good sacrifices)
In general it's difficult to tune by using tactical positions as a guide. Computers are rarely losing games in positions where (from our point of view) there is a clear tactical shot, but they still make dubious moves that make it more and more difficult to hold the position.

An experiment we could do is to run the new test suite, or any other good tactical set against several versions of the same program and see how the tactical performance changes with each version. You have stockfish going back to glaurung for example, and you have several version of Komodo going back to Doch and in all of these versions you have ELO improvements spanning several hundred points. How strong is the correlation in general between the programs and their game tested strength?
I think that the quiet positions are the area that computers need to focus on. They all do very well in tactical positions, but in a quiet position with no captures in the next ten ply or so, what move do they choose?

I agree. The move a program chooses is very much bound to it's evaluation function in the absence of unambiguous tactics. That's why Komodo is so focused on evaluation and less on search. I mistake in evaluation can take many ply of searching to "correct" and many could be 20 ply or more beyond what it might normally do.

Also, I believe that because modern programs are so aggressive about pruning and reducing we have to be especially good in evaluation.

Uri Blass · Post by **Uri Blass** » Wed Jun 01, 2011 10:16 am

Dann Corbit wrote:
Don wrote:
Uri Blass wrote:I noticed that stockfish is relatively very slow in some test positions
and it seems that it does too much pruning

for example
This position from Arasan's test suite

old glaurung can solve it in 22 seconds and depth 12

.... stuff

new stockfish 1 cpu cannot find it in a small time and needs depth 25

... stuff

The reason is that stockfish prunes too much and I wonder if the developers try to improve stockfish by having rules to do less pruning
in relevant positions.

This concept is the holy grail of computer chess.

I do not suggest to accept a change if it does better in test suites but only that it is a direction for improvement.

The idea is to start with testing a change that say do not prune
if you sacrifice and have a compensation in the term of strong attack(when the change improve stockfish's results in test suites)
and if the change does not work in games then to try to change the definition of strong attack to be as minimalistic as possible but still help to solve the test positions(so it is going to have a relatively small effect in positions with no good sacrifices)
In general it's difficult to tune by using tactical positions as a guide. Computers are rarely losing games in positions where (from our point of view) there is a clear tactical shot, but they still make dubious moves that make it more and more difficult to hold the position.

An experiment we could do is to run the new test suite, or any other good tactical set against several versions of the same program and see how the tactical performance changes with each version. You have stockfish going back to glaurung for example, and you have several version of Komodo going back to Doch and in all of these versions you have ELO improvements spanning several hundred points. How strong is the correlation in general between the programs and their game tested strength?
I think that the quiet positions are the area that computers need to focus on. They all do very well in tactical positions, but in a quiet position with no captures in the next ten ply or so, what move do they choose?

I think that computers do well in tactical positions only relative to humans.
The importance of solving test positions is not only in the root position because a similiar idea can be later in the tree and a program that is slower in solving a test suite faster can practically do positional mistakes because of not seeing the win early enough.

It is logical to try to improve chess programs in positions when they are relatively weak to other programs and there are tactical positions when stockfish is relatively weak to other programs.

Of course I do not suggest to tune it for test suites but to try to find ideas that help both in test suites and in games.

It is clearly one direction for improvement and it is a fact that stockfish did not get a big improvement lately by other methods.

Don · Post by **Don** » Wed Jun 01, 2011 2:53 pm

Uri Blass wrote:
Dann Corbit wrote:
Don wrote:
Uri Blass wrote:I noticed that stockfish is relatively very slow in some test positions
and it seems that it does too much pruning

for example
This position from Arasan's test suite

old glaurung can solve it in 22 seconds and depth 12

.... stuff

new stockfish 1 cpu cannot find it in a small time and needs depth 25

... stuff

The reason is that stockfish prunes too much and I wonder if the developers try to improve stockfish by having rules to do less pruning
in relevant positions.

This concept is the holy grail of computer chess.

I do not suggest to accept a change if it does better in test suites but only that it is a direction for improvement.

The idea is to start with testing a change that say do not prune
if you sacrifice and have a compensation in the term of strong attack(when the change improve stockfish's results in test suites)
and if the change does not work in games then to try to change the definition of strong attack to be as minimalistic as possible but still help to solve the test positions(so it is going to have a relatively small effect in positions with no good sacrifices)
In general it's difficult to tune by using tactical positions as a guide. Computers are rarely losing games in positions where (from our point of view) there is a clear tactical shot, but they still make dubious moves that make it more and more difficult to hold the position.

An experiment we could do is to run the new test suite, or any other good tactical set against several versions of the same program and see how the tactical performance changes with each version. You have stockfish going back to glaurung for example, and you have several version of Komodo going back to Doch and in all of these versions you have ELO improvements spanning several hundred points. How strong is the correlation in general between the programs and their game tested strength?
I think that the quiet positions are the area that computers need to focus on. They all do very well in tactical positions, but in a quiet position with no captures in the next ten ply or so, what move do they choose?
I think that computers do well in tactical positions only relative to humans.
The importance of solving test positions is not only in the root position because a similiar idea can be later in the tree and a program that is slower in solving a test suite faster can practically do positional mistakes because of not seeing the win early enough.

It is logical to try to improve chess programs in positions when they are relatively weak to other programs and there are tactical positions when stockfish is relatively weak to other programs.

Of course I do not suggest to tune it for test suites but to try to find ideas that help both in test suites and in games.

It is clearly one direction for improvement and it is a fact that stockfish did not get a big improvement lately by other methods.

There is no questions that (all other things being equal) better tactics will also help the program play a better positional game. But the concept of tactical play and positional play is really just terminology that we use (as humans) to distinguish between winning big and winning small. If a program had nothing but material as an evaluation function it would be quite capable of winning big (winning a piece or a pawn) but it would be constantly losing small. Most importantly, it would not be laying the groundwork for winning big.

The only way to help the evaluation appreciate winning big is to tell it the values of the pieces. That is only 5 terms and if you count combinations that win the "bishop pair" then you could say there are 6 features not counting checkmate or forced draws of course which I am assuming are not static evaluation function terms.

However, there are hundreds of potential issues that speak to the long term potential of the position such as mobility and many pawn structure terms. If your program understands these, it will play correctly and it will get into positions where a "big" win is relatively simple.

Here is a simple thought experiment. Imagine that you have 2 programs which are more or less identical except that one program has the material removed from the evaluation function. The other program has ONLY material and no other evaluation. If you played a 1 ply game you would the see the "positional" program making normal moves. The "tactical" program is just as likely to play 1. a4 on the first move as any other move. The positional program will build up it's position. BOTH programs will be somewhat capable of tactical as well as positional play and with increasing depth both will improve in all areas of the game. Which program will play the best game? I think the positional program will.

Given (let's say) a 10 ply search, the positional program will find tactics. If it can remove the opponents queen that will be a very good thing because it will allow the computer to increase all the small wins without the opponents queen frustrating its plans. And it won't let the opponent win it's pawns or pieces because they will hurt it's mobility and pawn structure.

The tactical program however will also play positional chess but only because bad positions lead to the eventual loss of material.

I have not backed up the assertion I am about to make with any tests, but I believe that if you played these two hypothetical programs against each other, the tactical program would win at very trivial depths but at depths modern programs are capable of it would crush the tactical program without any chance.

Larry and I have discovered positions where programs like houdini, robbo and Komodo require a 5 ply search to see trivial tactics that much weaker programs can see on a 2 ply search. And this requires modern program to require more nodes and time to see. This is because modern programs "cheat" a lot - trading risk for depth. They play the odds and favor missing big tactics over missing small tactics. The mechanism is usually in the form of various margins designed to avoid doing what a "good and proper" program should be doing.

The conclusion to me is inescapable, you have to make a lot of assumptions about what is likely to happen tactically in order to improve the strength of a chess program, so positional play is more important (in some difficult to define sense) than tactics.

The trick, as you point out, is to find ways to get the best of both worlds. I try to find ways for Komodo to not miss tactics in it's eagerness to plow ahead to the next iteration.

Having said all of that, I think you will find that the top programs are pretty strong tactically. Everything is relative. You cannot play grandmaster plus level and completely suck at tactics.

mcostalba · Post by **mcostalba** » Thu Jun 02, 2011 8:13 am

Uri Blass wrote: The reason is that stockfish prunes too much

The interesting part of this post is to experiment how, starting from a somehow arbitrary axiom (like the one you write above) it is possible to build up a complex and lengthy discussion out of it. This is what I call "having fun with words"

Test positions are resolved or missed by engines in most of cases by pure luck IMHO, it is enough an infinitesimal change in an evaluation term or in a parameter to steer the search in one direction instead of another and so "discover" a tactical solution sooner or later with many plies of difference. It is a kind of butterfly effect applied to chess engines

Don · Post by **Don** » Thu Jun 02, 2011 1:51 pm

mcostalba wrote:
Uri Blass wrote: The reason is that stockfish prunes too much
The interesting part of this post is to experiment how, starting from a somehow arbitrary axiom (like the one you write above) it is possible to build up a complex and lengthy discussion out of it. This is what I call "having fun with words"

Yes, see some of the disputes I had with Bob Hyatt where we are arguing about some poorly defined concept and each of us are talking about two different things.

In this case I have always hated the term "positional" and "tactical" play because they each have different meanings to different people and they are rather abstract concepts anyway. There is no clear and unambiguous definition for these terms.

Test positions are resolved or missed by engines in most of cases by pure luck IMHO, it is enough an infinitesimal change in an evaluation term or in a parameter to steer the search in one direction instead of another and so "discover" a tactical solution sooner or later with many plies of difference. It is a kind of butterfly effect applied to chess engines

There is a mate position I often test after each program changes and there is no pattern as to whether the program will solve it more quickly or less quickly and the depth will vary for no obvious reasons. I like the position because 20 years ago or more I used the same position, believing there was probably a forced mate but not not being able to prove it with a computer.

Uri Blass · Post by **Uri Blass** » Thu Jun 02, 2011 3:40 pm

mcostalba wrote:
Uri Blass wrote: The reason is that stockfish prunes too much
The interesting part of this post is to experiment how, starting from a somehow arbitrary axiom (like the one you write above) it is possible to build up a complex and lengthy discussion out of it. This is what I call "having fun with words"

Test positions are resolved or missed by engines in most of cases by pure luck IMHO, it is enough an infinitesimal change in an evaluation term or in a parameter to steer the search in one direction instead of another and so "discover" a tactical solution sooner or later with many plies of difference. It is a kind of butterfly effect applied to chess engines

It is clear that if stockfish cannot find some simple tactics for other programs then it prunes too much in the relevant position.

Houdini can find it in 2 seconds at depth 13 and 2.641.769 nodes are enough for it.
Many weak programs are faster than stockfish in finding the right move.
I do not buy the luck story.
It is possible that a small change may fix the problem in this specific position so I do not suggest to use a single position as first testing but many positions when the target is to do changes that improve stockfish results in test positions and also to improve stockfish result in games.

If stockfish is 100 times slower than houdini in some test position then I believe that it is a weakness of stockfish and fixing the weakness in a correct way may improve the result of stockfish in games.

The target of chess programmers should be to improve the level of the program in all positions and test positions are part of the positions that you want to improve so I see no reason not to test in this direction by adding rules when not to prune that improve the results in test suites.

It is obvious that stockfish outsearches houdini and I believe that houdini has a superior search relative to stockfish so I guess that houdini has better rules when not to prune relative to stockfish.

Dann Corbit · Post by **Dann Corbit** » Thu Jun 02, 2011 9:43 pm

Uri Blass wrote:
mcostalba wrote:
Uri Blass wrote: The reason is that stockfish prunes too much
The interesting part of this post is to experiment how, starting from a somehow arbitrary axiom (like the one you write above) it is possible to build up a complex and lengthy discussion out of it. This is what I call "having fun with words"

Test positions are resolved or missed by engines in most of cases by pure luck IMHO, it is enough an infinitesimal change in an evaluation term or in a parameter to steer the search in one direction instead of another and so "discover" a tactical solution sooner or later with many plies of difference. It is a kind of butterfly effect applied to chess engines
It is clear that if stockfish cannot find some simple tactics for other programs then it prunes too much in the relevant position.

Houdini can find it in 2 seconds at depth 13 and 2.641.769 nodes are enough for it.
Many weak programs are faster than stockfish in finding the right move.
I do not buy the luck story.
It is possible that a small change may fix the problem in this specific position so I do not suggest to use a single position as first testing but many positions when the target is to do changes that improve stockfish results in test positions and also to improve stockfish result in games.

If stockfish is 100 times slower than houdini in some test position then I believe that it is a weakness of stockfish and fixing the weakness in a correct way may improve the result of stockfish in games.

The target of chess programmers should be to improve the level of the program in all positions and test positions are part of the positions that you want to improve so I see no reason not to test in this direction by adding rules when not to prune that improve the results in test suites.

It is obvious that stockfish outsearches houdini and I believe that houdini has a superior search relative to stockfish so I guess that houdini has better rules when not to prune relative to stockfish.

I am not sure that the observation about missing some tactics holds.

In the case of Stockfish, it is a very, very deep searcher (the deepest easily of all the strong engines, in fact). It seems to me that the very high strength of Stockfish may be *mainly* due to search.

As Christophe Theron once said, "Search is also knowledge".

It could be that evaluation changes are what is needed to increase the strength of Stockfish rather than search changes (or, more conservatively, that evaluation changes could produce benefit faster than search changes).

Often, tactical positions involve some sort of strange decision, like tossing a piece to gain an advantage:
[d]1r3rk1/3bbppp/1qn2P2/p2pP1P1/3P4/2PB1N2/6K1/qNBQ1R2 w - -
which in general is a bad idea, but once in a while pans out. Houdini solves the above position quickly, and Stockfish takes much longer. But correcting that might make Stockfish play worse and not better.

Since Stockfish is open source, we can easily test our theories if we have enough interest in it.

stockfish and test positions

stockfish and test positions

Re: stockfish and test positions

Re: stockfish and test positions

Re: stockfish and test positions

Re: stockfish and test positions

Re: stockfish and test positions

Re: stockfish and test positions

Re: stockfish and test positions

Re: stockfish and test positions

Re: stockfish and test positions