Claims about Stockfish14.1 with no basis

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Uri Blass
Posts: 11139
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Claims about Stockfish14.1 with no basis

Post by Uri Blass »

They tell that stockfish can crush humans even more efficiently when there is no proof for it and stockfish's testing is not designed to do it.


https://www.neowin.net/news/stockfish-c ... 41-update/

"it can be fun to see how many moves you can hold out. As it analyses positions deeply, it’s ruthlessly efficient at taking your pieces and landing a checkmate."

I doubt if stockfish can make checkmate faster against opponents relative to other engines.
It can be interesting to test if stockfish can mate faster against weak engines relative to other engines but unfortunately the stockfish team does not test this way and only test elo when for elo it is not important for giving mate faster.

I remember that stockfish developement version was trolling in games with queen handicap by making moves with no purpose to make progress and I doubt if it is fixed in stockfish14.1
Uri Blass
Posts: 11139
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Claims about Stockfish14.1 with no basis

Post by Uri Blass »

Here is a possible line that show stupid stockfsh14.1

Black seem to have no plan after 1.e4 e5 2.Qh5 Nc6 3.Qxf7+
I will not be surprised if some engine can get at least a draw in this opening against stockfish with white or maybe even win and I think that the right way to beat stockfish is to play stupid moves that the stupid NN is not prepared to play against it.

A normal engine show improving score when the depth improve but not stockfish.
I stopped the game because stockfish seemed foolish

I remember that some stockfish developement version could lose with queen handicap and if they do not care about queen handicap games maybe they will care if I show that some engine can beat the engine with a stupid sacrifice when the engine has no idea how to play after it.

I may try later different engines against stockfish from the position after 1.e4 e5 2.Qh5 Nc6 3,Qxf7+ to see if some engine at some time control maybe with the right book can beat stockfish.


[pgn][Event "Computer chess game"]
[Site "DESKTOP-7QE6S12"]
[Date "2021.10.29"]
[Round "?"]
[White "àåøé"]
[Black "Stockfish_14.1_win_x64_avx2"]
[Result "*"]
[BlackElo "2000"]
[ECO "C20"]
[Opening "Open Game"]
[Time "15:02:56"]
[Variation "Patzer-Parnham Opening, 1.e4 e5 2.Qh5"]
[WhiteElo "2400"]
[TimeControl "120+12"]
[Termination "unterminated"]
[PlyCount "54"]
[WhiteType "human"]
[BlackType "program"]

1. e4 e5 {(e7-e5 Ng1-f3 Nb8-c6 Bf1-b5 Ng8-f6 O-O Nf6xe4 d2-d4 Ne4-d6 Bb5xc6
d7xc6 d4xe5 Nd6-f5 Qd1xd8+ Ke8xd8 Nb1-c3 Kd8-e8 h2-h3 a7-a5 a2-a3 h7-h6
Rf1-e1 Bf8-e7 Bc1-d2 Bc8-e6 Ra1-d1 Ra8-d8 g2-g4 Nf5-d4 Nf3xd4 Rd8xd4 f2-f4
g7-g6 Nc3-e4 h6-h5 Ne4-f6+ Be7xf6 e5xf6 h5xg4 h3xg4) -0.29/37 41} 2. Qh5
Nc6 {(Nb8-c6 Bf1-c4 g7-g6 Qh5-d1 Nc6-a5 d2-d3 Bf8-g7 Nb1-c3 d7-d6 a2-a4
Na5xc4 d3xc4 Bc8-e6 b2-b3 Qd8-d7 Ng1-f3 h7-h6 Qd1-d3 Ng8-e7 Bc1-a3 O-O
O-O-O f7-f5 Nf3-d2 b7-b6 h2-h4 Ne7-c6 Kc1-b1) +0.54/28 5} 3. Qxf7+ Kxf7
{(Ke8xf7 Bf1-c4+ Kf7-f6 Ng1-f3 h7-h6 O-O g7-g5 h2-h3 Ng8-e7 Nb1-c3 Ne7-g6
d2-d3 Kf6-g7 Nc3-d5 Nc6-a5 Nd5-e3 Na5xc4 Ne3xc4 d7-d6 Nc4-e3 Bc8-e6)
+11.11/24 0} 4. c3 Nf6 {(Ng8-f6 d2-d3 Qd8-e8 Bf1-e2 h7-h6 Ng1-f3 g7-g5
Nb1-d2 d7-d6 O-O Kf7-g7 Nd2-c4 Bc8-e6 Nf3-d2 Qe8-g6 Nc4-e3 Nc6-e7 f2-f3
h6-h5 h2-h3 Be6-d7 d3-d4 e5xd4 c3xd4 b7-b5 Rf1-e1 a7-a5 d4-d5 a5-a4 a2-a3)
+11.30/29 18} 5. d3 d6 {(d7-d6 Ng1-f3 h7-h6 Bf1-e2 Kf7-g8 Nb1-d2 g7-g5
h2-h3 Ra8-b8 O-O a7-a6 d3-d4 Qd8-e8 Rf1-e1 b7-b5 Be2-d3 Rh8-h7 a2-a4 Bc8-d7
a4xb5 a6xb5 Re1-e3 Kg8-g7 Nd2-b3 g5-g4 h3xg4 Nf6xg4 Re3-e2 b5-b4 d4xe5
d6xe5) +11.30/30 6} 6. Nf3 h6 {(h7-h6 Bf1-e2 g7-g5 O-O Qd8-e8 Nb1-d2 Nc6-e7
g2-g3 Kf7-g8 Rf1-e1 Ne7-g6 Nd2-c4 Bc8-e6 Be2-d1 Qe8-f7 h2-h4 g5-g4 Nf3-d2
c7-c6 Bd1-b3 h6-h5 Nc4-e3 d6-d5 c3-c4 d5-d4 Ne3-f5 Ra8-d8 Bb3-a4 a7-a5
Ba4-b3 Bf8-c5) +11.30/33 7} 7. Be2 g5 {(g7-g5 O-O Qd8-e8 Rf1-e1 Nc6-e7
d3-d4 Ne7-g6 h2-h3 Kf7-g7 Nb1-d2 b7-b6 Be2-d1 Bc8-a6 d4xe5 d6xe5 Nd2-f1
Ba6-b7 Bd1-c2 Ra8-d8 Nf1-e3 Bf8-c5 a2-a4 a7-a5 Ne3-c4 Kg7-h7 Bc1-d2 Ng6-f4)
+11.22/32 16} 8. O-O Kg7 {(Kf7-g7 Nb1-d2 Nc6-e7 d3-d4 Ne7-g6 h2-h3 c7-c6
Rf1-e1 Qd8-c7 a2-a4 Bc8-e6 d4xe5 d6xe5 Be2-c4 Be6-d7 Nd2-f1 b7-b6 Nf1-g3
Kg7-h7 a4-a5 b6xa5 Nf3-d2 Ng6-f4 Nd2-b3 Ra8-d8 Ra1xa5 g5-g4 h3xg4 Nf6xg4
Ng3-f5) +11.15/31 24} 9. Nbd2 Nh5 {(Nf6-h5 g2-g3 Nh5-f6 Nd2-c4 Qd8-e8
Nc4-e3 Nc6-e7 Nf3-d2 Bc8-h3 Rf1-e1 Ra8-d8 f2-f3 Bh3-c8 Ne3-g4 Nf6-d7 Nd2-b3
h6-h5 Ng4-f2 Qe8-g6 Bc1-e3 Ne7-c6 d3-d4 a7-a6 d4xe5 Nc6xe5 Nf2-d3 Ne5-c4
Be3-d4+ Nd7-f6 e4-e5 Nc4xe5 Nd3xe5 d6xe5) +11.15/31 16} 10. g3 Nf6 {(Nh5-f6
Nd2-c4 Bc8-d7 Nc4-e3 Nc6-e7 Rf1-e1 c7-c5 a2-a4 Qd8-e8 h2-h4 g5-g4 Nf3-d2
Ra8-d8 Nd2-c4 Bd7-e6 a4-a5 h6-h5 Be2-d1 Qe8-g6 Bd1-b3 Be6-c8 Bb3-a4 Qg6-f7
Ra1-a3 Rh8-h7 b2-b4 c5xb4 c3xb4 Ne7-c6 Ba4xc6 b7xc6) +11.15/30 7} 11. Nc4
Bd7 {(Bc8-d7 Nf3-d2 Bd7-h3 Rf1-e1 Kg7-g8 Nc4-e3 Nc6-e7 f2-f3 c7-c6 Nd2-b3
b7-b5 d3-d4 a7-a5 d4xe5 d6xe5 Nb3-c5 Qd8-d6 Nc5-d3 Bh3-e6 Ne3-g4 Nf6xg4
f3xg4 b5-b4 Bc1-e3 b4xc3 b2xc3) +11.15/29 10} 12. Nfd2 Bh3 {(Bd7-h3 Rf1-e1
Nc6-e7 Nc4-e3 Ra8-b8 Nd2-b3 Bh3-d7 f2-f3 b7-b6 d3-d4 c7-c6 Be2-d3 Ne7-g6
Nb3-d2 b6-b5 Ne3-f5+ Kg7-g8 a2-a4 a7-a5 Bd3-c2 b5-b4 Bc2-b3+ d6-d5 d4xe5
Ng6xe5 e4xd5 b4xc3 Re1xe5 Qd8-b6+ Nf5-e3 c3xd2) +11.07/27 14} 13. Re1 Be6
{(Bh3-e6) +11.15/32 29} 14. Ne3 a5 {(a7-a5 Ne3-f5+ Kg7-g8 Nd2-f1 Qd8-d7
Nf1-e3 Qd7-f7 f2-f3 d6-d5 h2-h4 a5-a4 a2-a3 g5xh4 Nf5xh4 d5xe4 d3xe4 Nc6-a5
Be2-d1 Na5-b3 Bd1xb3 Be6xb3 Re1-e2 Ra8-e8 Nh4-f5 Qf7-h5 g3-g4 Qh5-h3 Re2-f2
Bf8-c5 Bc1-d2 Re8-d8 Ra1-b1 Rh8-h7 c3-c4 Bc5xe3 Nf5xe3 Qh3-g3+ Rf2-g2)
+11.07/34 7} 15. Ndf1 Ne7 {(Nc6-e7 Nf1-d2 Ne7-c6 f2-f3 Kg7-g8 f3-f4 g5xf4
g3xf4 Kg8-h7 Re1-f1 b7-b5 Rf1-f2 Ra8-b8 f4-f5 Be6-f7 Ne3-g4 Nf6-d7 Nd2-f3
a5-a4 Rf2-g2 b5-b4 Bc1-e3) +11.14/29 18} 16. Nd2 Nc6 {(Ne7-c6 Ne3-f5+
Kg7-g8 Nd2-f1 Qd8-d7 Nf1-e3 Qd7-f7 a2-a3 a5-a4 f2-f3 Nc6-a5 Be2-d1 b7-b5
h2-h4 g5xh4 Nf5xh4 Na5-b3 Bd1xb3 Be6xb3 Nh4-f5 Ra8-e8 d3-d4 c7-c6 Re1-e2
Rh8-h7 d4xe5 d6xe5 Kg1-g2 Re8-b8 Re2-f2 Bf8-c5 Rf2-d2 Bc5xe3 Nf5xe3)
+11.15/33 16} 17. Nf5+ Kg8 {(Kg7-g8) +11.12/34 56} 18. Nf1 Qd7 {(Qd8-d7
Nf1-e3 Qd7-f7 a2-a3 a5-a4 f2-f3 Qf7-g6 Ne3-g4 Be6xf5 e4xf5 Qg6xf5 Ng4xf6+
Qf5xf6 Bc1-e3 Rh8-h7 Ra1-d1 Rh7-e7 Be3-f2 Ra8-a5 d3-d4 Re7-d7 Be2-c4+ d6-d5
Bc4-e2 Bf8-d6 Rd1-d3 Rd7-f7 Bf2-e3 h6-h5 f3-f4 g5xf4 Be2xh5 f4xe3 Bh5xf7+
Qf6xf7) +11.15/32 3} 19. N1e3 Qf7 {(Qd7-f7 a2-a3 a5-a4 f2-f3 Qf7-g6 d3-d4
Nc6-a5 Bc1-d2 h6-h5 Ra1-d1 Rh8-h7 Be2-d3 Ra8-b8 d4xe5 d6xe5 Bd3-b5 Be6-b3
c3-c4 b7-b6 Bd2-c3 Bb3xd1 Re1xd1 g5-g4 Bc3xe5 g4xf3 Bb5xa4) +11.03/31 7}
20. a3 a4 {(a5-a4 f2-f3 Rh8-h7 h2-h4 g5xh4 Nf5xh4 Nc6-a5 Be2-d1 Na5-b3
Bd1xb3 Be6xb3 Bc1-d2 Nf6-d7 d3-d4 Qf7-e6 Nh4-f5 h6-h5 Re1-e2 c7-c6 Ra1-f1
Qe6-f6 d4xe5 d6xe5 Bd2-e1 Ra8-e8 c3-c4 Bf8-c5 Kg1-g2 Bc5xe3 Nf5xe3)
+11.03/30 4} 21. f3 Rh7 {(Rh8-h7 Bc1-d2 Nc6-a5 Be2-d1 Qf7-d7 h2-h4 Qd7-b5
h4xg5 h6xg5 Bd1-c2 Qb5-b6 Kg1-g2 Na5-b3 Bc2xb3 Qb6xb3 Ra1-b1 Ra8-e8 Re1-d1
c7-c6 Bd2-e1 Qb3-b5 Rd1-d2 d6-d5 c3-c4 d5xc4 Ne3xc4 Be6xc4 d3xc4 Qb5xc4)
+11.07/30 24} 22. Bd2 Na5 {(Nc6-a5 Be2-d1 Qf7-d7 d3-d4 Qd7-b5 Bd2-c1 Na5-b3
Bd1xb3 Be6xb3 h2-h4 g5xh4 Nf5xh4 Bb3-f7 Nh4-f5 Ra8-e8 Kg1-g2 Qb5-d3 Kg2-g1
h6-h5 Re1-d1 Qd3-e2 Rd1-f1 c7-c6 Rf1-f2 Qe2-a6 Ra1-b1) +11.07/29 3} 23. Bd1
Qd7 {(Qf7-d7 Bd1-c2 Rh7-f7 d3-d4 Qd7-b5 Ra1-d1 Qb5-b6 Rd1-b1 Na5-c6 d4-d5
Be6xf5 d5xc6 Bf5-e6 c6xb7 Qb6xb7 Bc2-d3 Qb7-c6 h2-h3 Ra8-e8 Kg1-g2 h6-h5
Re1-e2 Qc6-d7 Rb1-h1 g5-g4 f3xg4 h5xg4 h3xg4) +11.07/28 14} 24. Bc2 Rf7
{(Rh7-f7 h2-h4 Qd7-b5 Bd2-c1 Na5-b3 Bc2xb3 Qb5xb3 Re1-f1 Ra8-e8 h4xg5 h6xg5
d3-d4 c7-c6 g3-g4 Kg8-h7 Rf1-d1 Kh7-g6 Kg1-g2 Qb3-b5 c3-c4 Qb5-b6 d4xe5
d6xe5 Nf5-d6 Bf8xd6 Rd1xd6 Qb6-c7 Rd6xe6 Re8xe6 Ra1-a2 Qc7-d7 Ne3-f5 b7-b5
c4xb5) +11.15/30 2} 25. h4 Qb5 {(Qd7-b5 h4xg5 h6xg5 g3-g4 Qb5-b6 Kg1-g2
Na5-b3 Bc2xb3 Be6xb3 d3-d4 Ra8-e8 d4xe5 d6xe5 c3-c4 c7-c6 Re1-e2 Qb6-a6
Bd2-b4 c6-c5 Bb4-d2 Kg8-h7 Ra1-e1 Kh7-g6 Bd2-c3 b7-b6 Re1-h1 Bb3xc4 Ne3xc4
Qa6xc4 Re2-e1 Qc4-b3) +10.92/28 16} 26. hxg5 hxg5 {(h6xg5 g3-g4 Qb5-b6
Kg1-g2 Na5-b3 Bc2xb3 Qb6xb3 Ra1-b1 c7-c5 d3-d4 Rf7-d7 Kg2-g3 Nf6-h7 Re1-h1
e5xd4 c3xd4 Ra8-c8 d4xc5 d6xc5 Bd2-c3 b7-b5 Bc3-e5 b5-b4 Rh1-d1 b4xa3
Rd1xd7) +10.76/27 9} 27. g4 Qb6 {(Qb5-b6 d3-d4 Na5-b3 Bc2xb3 Qb6xb3 Ra1-b1
Ra8-e8 Ne3-f1 Nf6-h7 Nf1-e3 Rf7-d7 Kg1-g2 c7-c5 Kg2-g3 Be6-f7 Re1-d1 Nh7-f6
Rd1-e1 b7-b6 d4xe5 d6xe5 Ne3-d5 Nf6xd5 e4xd5 Bf7-g6 Bd2xg5 Qb3xd5 Bg5-e3
b6-b5 Kg3-g2 e5-e4) +10.92/28 4} *
[/pgn]
User avatar
Brunetti
Posts: 424
Joined: Tue Dec 08, 2009 1:37 pm
Location: Milan, Italy
Full name: Alex Brunetti

Re: Claims about Stockfish14.1 with no basis

Post by Brunetti »

Uri Blass wrote: Fri Oct 29, 2021 2:16 pm I stopped the game because stockfish seemed foolish
Seemed, that's the point. It's still +1000 or so. Play on :)

Alex
dkappe
Posts: 1632
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

Re: Claims about Stockfish14.1 with no basis

Post by dkappe »

Uri Blass wrote: Fri Oct 29, 2021 1:54 pm They tell that stockfish can crush humans even more efficiently when there is no proof for it and stockfish's testing is not designed to do it.


https://www.neowin.net/news/stockfish-c ... 41-update/

"it can be fun to see how many moves you can hold out. As it analyses positions deeply, it’s ruthlessly efficient at taking your pieces and landing a checkmate."

I doubt if stockfish can make checkmate faster against opponents relative to other engines.
It can be interesting to test if stockfish can mate faster against weak engines relative to other engines but unfortunately the stockfish team does not test this way and only test elo when for elo it is not important for giving mate faster.

I remember that stockfish developement version was trolling in games with queen handicap by making moves with no purpose to make progress and I doubt if it is fixed in stockfish14.1
Playing with a handicap isn’t so easy for engines. I’m sure if the stockfish team turned their attention to this they would improve things. Right now SF, like most other engines, allows exchanges too easily when down material odds.

There’s only one engine that I’m aware of that does a reasonable job at odds. :)
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
ChickenLogic
Posts: 154
Joined: Sun Jan 20, 2019 11:23 am
Full name: kek w

Re: Claims about Stockfish14.1 with no basis

Post by ChickenLogic »

dkappe wrote: Fri Oct 29, 2021 7:54 pm
Uri Blass wrote: Fri Oct 29, 2021 1:54 pm They tell that stockfish can crush humans even more efficiently when there is no proof for it and stockfish's testing is not designed to do it.


https://www.neowin.net/news/stockfish-c ... 41-update/

"it can be fun to see how many moves you can hold out. As it analyses positions deeply, it’s ruthlessly efficient at taking your pieces and landing a checkmate."

I doubt if stockfish can make checkmate faster against opponents relative to other engines.
It can be interesting to test if stockfish can mate faster against weak engines relative to other engines but unfortunately the stockfish team does not test this way and only test elo when for elo it is not important for giving mate faster.

I remember that stockfish developement version was trolling in games with queen handicap by making moves with no purpose to make progress and I doubt if it is fixed in stockfish14.1
Playing with a handicap isn’t so easy for engines. I’m sure if the stockfish team turned their attention to this they would improve things. Right now SF, like most other engines, allows exchanges too easily when down material odds.

There’s only one engine that I’m aware of that does a reasonable job at odds. :)
Yes, it's called Leela. Big news.
dkappe
Posts: 1632
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

Re: Claims about Stockfish14.1 with no basis

Post by dkappe »

FriedChickenPylon wrote: Fri Oct 29, 2021 8:15 pm
dkappe wrote: Fri Oct 29, 2021 7:54 pm
There’s only one engine that I’m aware of that does a reasonable job at odds. :)
Yes, it's called Leela. Big news.
Really? Evidence for that is pretty thin. The last leela odds game I could find was a pawn odds game from 2018 where it “crushed” (not my words) pre NNUE Stockfisch. That’s probably more of a commentary on how bad SF was/is at odds than how good leela is.

It would be interesting to see leela in action at various pawn and knight odds. And I think you’re just the man to do it. :)
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
User avatar
Eelco de Groot
Posts: 4694
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: Claims about Stockfish14.1 with no basis

Post by Eelco de Groot »

I thought the point of Uri's game, also in the Pro Deo Forum a while back, that recent development Stockfish and now also Stockfish 14.1 has big trouble trying to win with a Queen up. So the opposite, it does not try hard enough to exchange pieces then and may get into a too closed position, maybe because with a Queen up it tries to build a King Attack. It even lost against another engine in one of Uri's games. I have no explanation and I would like to try something but I have no compiler installed at the moment, and I'm not sure I could compile recent Stockfish if I had, after all the changes and introduction of the NNue nets. I hope to get my old computer repaired so that at least I have a working compiler for the old Stockfish. So I was thinking of making an issue on the Official Stockfish site but can't contribute much. I think Uri could post if he has a GitHub account. If it isn't King Safety and isn't the Net, it could be the odd material imbalance of being a Queen up, that is not in the material imbalance table represented possibly, I'm not sure.
Of course the programmer of Protej may be right, it may not matter if Stockfish still will try to converse after running into danger of drawclaims. Still it is weird.
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
carldaman
Posts: 2287
Joined: Sat Jun 02, 2012 2:13 am

Re: Claims about Stockfish14.1 with no basis

Post by carldaman »

ChickenLogic wrote: Fri Oct 29, 2021 8:15 pm
dkappe wrote: Fri Oct 29, 2021 7:54 pm
Uri Blass wrote: Fri Oct 29, 2021 1:54 pm They tell that stockfish can crush humans even more efficiently when there is no proof for it and stockfish's testing is not designed to do it.


https://www.neowin.net/news/stockfish-c ... 41-update/

"it can be fun to see how many moves you can hold out. As it analyses positions deeply, it’s ruthlessly efficient at taking your pieces and landing a checkmate."

I doubt if stockfish can make checkmate faster against opponents relative to other engines.
It can be interesting to test if stockfish can mate faster against weak engines relative to other engines but unfortunately the stockfish team does not test this way and only test elo when for elo it is not important for giving mate faster.

I remember that stockfish developement version was trolling in games with queen handicap by making moves with no purpose to make progress and I doubt if it is fixed in stockfish14.1
Playing with a handicap isn’t so easy for engines. I’m sure if the stockfish team turned their attention to this they would improve things. Right now SF, like most other engines, allows exchanges too easily when down material odds.

There’s only one engine that I’m aware of that does a reasonable job at odds. :)
Yes, it's called Leela. Big news.
I thought Komodo was the engine that best handled material handicaps, due to its smart contempt.

Still, I find what's happening with Stockfish 14.1 totally bizarre - does anyone know when this problem first surfaced, so its cause can be traced to some actual code changes?
User avatar
Brunetti
Posts: 424
Joined: Tue Dec 08, 2009 1:37 pm
Location: Milan, Italy
Full name: Alex Brunetti

Re: Claims about Stockfish14.1 with no basis

Post by Brunetti »

I ran a gauntlet at 1'+1", Uri's opening, with all different engines I have on my test machine. SF mated 589 of them BUT two, Belka and Maverick, who mated him instead, both in a similar way. Nothing special, it's a sort of helpmate played in the opening. So, excellent find Uri!

Then I replayed a 2'+2" gauntlet with those two engines and the six that lasted more than 100 moves. This time SF mated 8 times out of 8 games.

I ran another match 2'+2" with the same engines but with the opening line extended to 1. e4 e5 2. Qh5 Nc6 3. Qxf7+ Kxf7 4. Bc4+ Kg6 5. Nf3 d6 6. d3 h6 7. Be3 Nf6 8. Nc3, as played in Maverick game, when Black's advantage is still +1000, but if you don't do King safety considerations at all you risk to be mated, and this time SF won 7/8, losing only to Critter, in the same manner.

Looking at the games, yes, it seems that SF has no clue on what to do after getting this huge advantage. Obviously an engine never knows what to do, it simply plays the move found by search, driven by eval, and NN eval fails in this situation, huge advantage and short time. We know that NNs are advantageous in evaluating much better than classical eval "normal" positions that may occur in practical games, as they were trained with "normal" positions. The opening we're analyzing has nothing to do with this.

There is another serious problem: in Maverick's game, after 9. Nh4+ SF (still +1000 for him) played ...Nh5 (depth 25) allowing Black to force a mate in 10, and vs. Belka after 12. Ne2 SF was at +300 but replied with ... Nf4 (depth 18) allowing a mate in 5, so in some way the search failed miserably. I can't explain this.

Anyway, a trivial way to avoid this behavior, IMHO, for whatever NN engines programmer, is to enable NN only in a certain range of scores, say +-5, and rely on classical eval in the other cases, more than adequate to work with this order of scores.

(The PGN file is too large to be attached here.)

Alex

[pgn]
[Event "SF14 cracktest"]
[Site "Intel i7-3770@3.40GHz"]
[Date "2021.10.29"]
[Round "1"]
[White "Belka 2.0.0"]
[Black "Stockfish 14.1 64-bit"]
[Result "1-0"]
[ECO "C20"]
[PlyCount "33"]
[EventDate "2021.??.??"]
[TimeControl "60+1"]

1. e4 {book} e5 {book} 2. Qh5 {book} Nc6 {book} 3. Qxf7+ {book} Kxf7 {
+11.30/19 0.59s} 4. Bc4+ {-8.59/12 1.6s} Kg6 {+11.15/20 1.6s} 5. Nf3 {
-8.87/13 4.3s} d6 {+11.07/22 1.3s} 6. d3 {-8.65/12 3.7s} Na5 {+11.15/22 1.4s}
7. Bb3 {-8.57/12 1.5s} h6 {+11.19/23 2.3s} 8. Nc3 {-8.34/13 3.1s} Nf6 {
+11.19/25 1.6s} 9. Nh4+ {-8.05/13 4.8s} Kh5 {+11.23/25 1.7s} 10. Bf7+ {
-2.75/14 2.9s} Kxh4 {+6.10/22 4.7s} 11. h3 {-2.85/14 4.2s} Nh5 {+5.95/22 3.1s}
12. Ne2 {-2.52/13 2.3s} Nf4 {+10.17/18 1.5s} 13. Nxf4 {+M17/9 0.087s} exf4 {
-M8/239 4.2s} 14. Bxf4 {+M11/4 0.002s} g6 {-M6/245 0.12s} 15. Bxg6 {
+M7/4 0.003s} Qg5 {-M4/245 0.019s} 16. g3+ {+M3/1 0s} Qxg3 {-M2/245 0.014s} 17.
fxg3# {+M1/1 0s, White mates} 1-0
[/pgn]
[pgn][Event "SF14 cracktest"]
[Site "Intel i7-3770@3.40GHz"]
[Date "2021.10.30"]
[Round "1"]
[White "Maverick 1.5 64-bit"]
[Black "Stockfish 14.1 64-bit"]
[Result "1-0"]
[ECO "C20"]
[PlyCount "37"]
[EventDate "2021.??.??"]
[TimeControl "60+1"]

1. e4 {book} e5 {book} 2. Qh5 {book} Nc6 {book} 3. Qxf7+ {book} Kxf7 {
+11.19/19 0.56s} 4. Bc4+ {-9.03/15 2.7s} Kg6 {+11.15/21 1.1s} 5. Nf3 {
-8.88/14 3.1s} d6 {+11.07/21 1.6s} 6. d3 {-8.45/13 1.8s} h6 {+11.15/24 3.8s} 7.
Be3 {-8.16/14 2.7s} Nf6 {+11.15/22 1.3s} 8. Nc3 {-8.18/14 2.3s} Na5 {
+11.30/25 1.4s} 9. Nh4+ {-8.18/15 2.2s} Kh5 {+11.42/25 2.4s} 10. Bf7+ {
+M19/19 2.7s} Kxh4 {-M24/21 4.9s} 11. h3 {+M17/17 1.1s} g6 {-M16/52 1.5s} 12.
Bxg6 {+M15/15 0.14s} Nh5 {-M14/57 1.5s} 13. g3+ {+M13/13 0.021s} Nxg3 {
-M12/58 0.54s} 14. fxg3+ {+M11/11 0.010s} Kxg3 {-M10/65 0.52s} 15. Bf2+ {
+M9/9 0.017s} Kf4 {-M8/109 1.5s} 16. Rg1 {+M7/7 0.018s} Qh4 {-M6/245 0.28s} 17.
Nd5+ {+M5/5 0.017s} Kf3 {-M4/245 0.021s} 18. Bxh4 {+M3/3 0.008s} h5 {
-M2/245 0.016s} 19. Rg3# {+M1/1 0.013s, White mates} 1-0
[/pgn]
[pgn][Event "SF14 cracktest 2+2"]
[Site "Intel i7-3770@3.40GHz"]
[Date "2021.10.30"]
[Round "1"]
[White "Critter 1.6a 64-bit"]
[Black "Stockfish 14.1 64-bit"]
[Result "1-0"]
[ECO "C20"]
[PlyCount "37"]
[EventDate "2021.??.??"]
[TimeControl "120+2"]

1. e4 {book} e5 {book} 2. Qh5 {book} Nc6 {book} 3. Qxf7+ {book} Kxf7 {book} 4.
Bc4+ {book} Kg6 {book} 5. Nf3 {book} d6 {book} 6. d3 {book} h6 {book} 7. Be3 {
book} Nf6 {book} 8. Nc3 {book} Na5 {+11.23/26 7.3s} 9. Nh4+ {-8.29/17 15s} Kh5
{+11.40/26 7.0s} 10. Bf7+ {+M19/15 6.1s} Kxh4 {-M18/55 5.8s} 11. h3 {
+M17/20 1.3s} g6 {-M16/58 3.7s} 12. Bxg6 {+M15/22 1.5s} Nh5 {-M14/62 3.1s} 13.
g3+ {+M13/24 2.7s} Nxg3 {-M12/64 0.67s} 14. fxg3+ {+M11/26 2.5s} Kxg3 {
-M10/66 0.58s} 15. Bf2+ {+M9/27 21s} Kf4 {-M8/157 3.0s} 16. Nd5+ {+M7/36 3.7s}
Kf3 {-M6/245 0.21s} 17. Rg1 {+M5/38 17s} Qg5 {-M4/245 0.021s} 18. Bh5+ {
+M3/40 14s} Bg4 {-M2/245 0.015s} 19. Rg3# {+M1/64 1.3s, White mates} 1-0
[/pgn]
User avatar
Eelco de Groot
Posts: 4694
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: Claims about Stockfish14.1 with no basis

Post by Eelco de Groot »

Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan