Stockfish 120324 a Disaster

smatovic · Post by **smatovic** » Fri Mar 15, 2024 4:04 pm

Dann Corbit wrote: ↑Fri Mar 15, 2024 3:28 pm I guess that engine writers will train for imbalanced openings more and more.
That way, they can do well in the current contests.

Yes, computer chess is already on extra time.

--
Srdja

Graham Banks · Post by **Graham Banks** » Sat Mar 16, 2024 4:11 am

Uri Blass wrote: ↑Fri Mar 15, 2024 2:46 pm
Graham Banks wrote: ↑Thu Mar 14, 2024 10:51 pm
Uri Blass wrote: ↑Thu Mar 14, 2024 1:15 pmI remember that even CCRL started to use slightly biased book.
No.
I remembered something like that.

Maybe my mistake is because of reading the following thread when larry kaufman claimed that the few decisive games are from opening that today are not considered balanced.

viewtopic.php?p=950339&sid=6250f0b1dba5 ... 91#p950339

Maybe I am wrong but it seemed to me that I read somewhere that you decided to allow book exit of more than 0.5 but not book exit of more than 0.7 or something like that.

Yes - in the books I use, the evaluation for either engine must dip below 0.70 for one move of the first 10 out of book.

bmp1974 · Post by **bmp1974** » Sat Mar 16, 2024 4:08 pm

Draude wrote: ↑Wed Mar 13, 2024 12:32 pm Engine is a total DISASTER!!

https://www.abrok.eu/stockfish/
Author: Disservin
Date: Tue Mar 12 19:09:50 2024 +0100
Timestamp: 1710266990

This engine will never become my analysis engine. In the following position, Stockfish 120324 needs 92s on 12 cores to find Qxf6+ . This is the worst value of ALL Stockfishs in a year. It is worse than even Stockfish 070324, which took 48s on 8 cores. Sad. Stockfish 15 only takes 20s.

Qxf6+!
[d]1B1r4/rp2npkp/2b1pbp1/1qp5/nPN1R3/1P1P1QP1/2P2PBP/5R1K w - - 0 1

Analysis by Stockfish 120324-avx2:

1. Qxf6+ Kxf6 2. Be5+ Kg5 3. Bg7 Bxe4 4. f4+ Kh5 5. Bxe4 g5 6. Ne5 Qc6 7. g4+ Kh4 8. Bf6 h6 9. fxg5 Ng8 10. Bxd8 Qd6 11. gxh6+ Qxd8 12. Rg1 Kg5 13. Nxf7+ Kh4 14. h7 Nf6 15. Nxd8 Nxh7 16. Bxh7 Nc3 17. Nxe6

Depth: 42/65 00:01:32 973MN, tb=16065

Needed 92s. Crazy!

Analysis by SF-Cor 1Mar24-3072-avx2:

1.Qxf6+ Kxf6 2.Be5+ Kg5 3.Bg7 Bxe4 4.f4+ Kh5 5.Bxe4 g5 6.Ne5 Qc6 7.Bf6 h6 8.Bxe7 Qd5 9.Bxd8 Qxe5 10.fxe5 Nc3 11.bxc5 Nxe4 12.dxe4 Ra2 13.c6 bxc6 14.Rf2 g4 15.Kg2 Kg6 16.Bb6 h5 17.h3 Ra1 18.hxg4 hxg4 19.Rf6+ Kg7 20.Rf1 Ra2 21.Rf2 Kg6 22.b4 Ra1 23.Re2 c5 24.b5 Ra3
Depth: 25/52 00:00:05 53MN

SF-Cor 1Mar24-3072 finds the best move in 5s. Stockfish 120324 is a Disaster!

Also my NNUE 3072 (with corchess code and newest 3072-parameters) beat Stockfish 16.1 (NNUE 2560)!

Clearly, SF development is Regressing...

What are you talking? In my system SF latest dev version finds Qxf6 within 10 sec. Please update your hardware or you are trying blame game.

Hai · Post by **Hai** » Sun Mar 17, 2024 9:38 am

Here you can see a lot disasters:
Top Chess Engines Testsuite 2024 v2

https://www.mediafire.com/file/cypaz2t0 ... 2.pgn/file

ImNotStockfish · Post by **ImNotStockfish** » Sun Mar 17, 2024 12:42 pm

Hai wrote: ↑Sun Mar 17, 2024 9:38 am Here you can see a lot disasters:
Top Chess Engines Testsuite 2024 v2
https://www.mediafire.com/file/cypaz2t0 ... 2.pgn/file

"2024 v2" and still has mistakes and nonsensical positions in 2024

The intended solution is a cursed win and by default engines use 50mr
https://glarean-magazin.ch/2021/09/20/c ... s-puzzles/
[pgn]
[FEN "2b1k3/8/6R1/2n5/8/B1r1N3/1pB5/6K1 w - - 0 1"]
[SetUp "1"]

1.Bxb2 Rxe3 2.Rg8+ Kd7 3.Bf5+ Ne6 4.Rxc8 Kxc8 5.Kf2 *[/pgn]

And these 4 prove nothing about how strong an engine is. Huntsman beats any engine in matefinding and it doesn't mean it's strong overall.

4k3/pp6/8/8/8/8/PPPPPPPP/RNBQKBNR w KQ - 0 1
4k3/pp5p/8/8/8/8/PPPPPPPP/RNBQKBNR w KQ - 0 1
4k3/3ppp2/8/8/8/8/PPPPPPPP/RNBQKBNR w KQ - 0 1
4k3/2p2pp1/8/8/8/8/PPPPPPPP/RNBQKBNR w KQ - 0 1

AndrewGrant · Post by **AndrewGrant** » Tue Mar 19, 2024 10:38 am

Uri Blass wrote: ↑Thu Mar 14, 2024 1:15 pm I think it is dependent on the conditions.
Eduard post about playing with normal book and not about tcec conditions.

Do you have an evidence that stockfish's lead is actually growing with normal book?
I remember that even CCRL started to use slightly biased book.

If I look at the ssdf rating list I see that lc0 is leading
https://ssdf.bosjo.net/list.htm

The lead grows under all reasonable conditions. CCRL is not evidence against this claim, as CCRL fails to detect strength differences between a myriad of engines at the top due to the testing methodology employed. That is not an attack on the list, but simply a statement of fact. If you are looking at the progression of the top 4 or 5 engines on CCRL to gauge their progress, you'll reach the incorrect belief that there has been no progress by any engine for a lengthy period of time.

Unbalanced books are normal books in 2024. For a group of people who so heavily believe they know better than the Stockfish developers, meaningful data ( and meaningful arguments ) are entirely missing from the windy and witless ravings. I'm no fan of the argument from authority, and I detest any form of credential-ism, but when a super-massive majority of highly involved ( and sometimes respected ) individuals disagrees with you, the proper recourse is to bring overwhelming evidence of your claims, or to sit back and examine what you believe to be true.

Its a shame that so many on this forum, at least of those who remain, are so readily captivated by the endless cycle of testing virtually verbatim, and near equal strength versions of engines, only to claim the error bars as proof of distinction, rather than as errors as they are intended. You might employ the following exercise at your desk, where you flip a coin 100 times using various flipping methods. Take note of the method that returns the most heads, and attempt to convince yourself that something in particular about that method is giving you the edge. If you succeed in convincing yourself, seek out your local community college for a refresher course on basic probability and statistics.

The following thoughts, if ever had by an individual here, are signs of a need to contact your local college:
- Wow, [Sugar/Corchess/Eman/Shashchess] did better than Stockfish in [Event]. I wonder what special stuff the author did.
- This version of Stockfish with a [slightly different neural network] beat Stockfish master. I wonder why the maintainers have no clue about this.
- Stockfish 16.1 did worse than Stockfish 16 in CCRL. That must mean the developers allowed regressions to slip in under CCRL conditions
- This new version of Stockfish can't solve this hand-picked position as quickly as the previous version. This must be indicative of a general loss in problem solving power of the engine.

Peter Berger · Post by **Peter Berger** » Tue Mar 19, 2024 12:14 pm

There actually is a reasonable argument, that has been brought up by various people and has not been adressed.

Recent Stockfish versions have clearly become worse at beating weaker engines from equal opening positions at longer time controls. Just try Stockfish-Crafty for a few hundred games - Crafty will get way more draws than is to be expected.

People don't do tests against weaker engines as they are not the competitors. To measure sth meaningful, you do unbalanced openings, all logical. But will you realize when your engine becomes weaker at beating weaker engines, if no one does test this?

Draude · Post by **Draude** » Tue Mar 19, 2024 1:00 pm

People don't do tests against weaker engines as they are not the competitors

Well said! Indeed they are not!

But will you realize when your engine becomes weaker at beating weaker engines

Perhaps not! But as you said, they are not the competitors.

Just try Stockfish-Crafty for a few hundred games - Crafty will get way more draws than is to be expected

Yes, your few hundred games tests are statistically sound, and they bring great insight to chess engine programmers! Perhaps Crafty is stronger than you expected?

Uri Blass · Post by **Uri Blass** » Tue Mar 19, 2024 1:07 pm

There is one test without unbalanced opening that is FRC but also in the FRC CCRL rating list I do not see tests against very weak opponents to see if stockfish does better than other engines against them or not.

It may be interesting if engines that get results of 1% or 2% against Stockfish in FRC get better results against other top engines or not.

It seems that engines that get slightly more than 10% against Stockfish in FRC do better against other engines.

See for example the following:

https://computerchess.org.uk/ccrl/404FR ... has_11_0_0

Peter Berger · Post by **Peter Berger** » Tue Mar 19, 2024 1:08 pm

Draude wrote: ↑Tue Mar 19, 2024 1:00 pm Yes, your few hundred games tests are statistically sound, and they bring great insight to chess engine programmers! Perhaps Crafty is stronger than you expected?

Actually, probably ten games are enough, as Crafty will get one draw most likely to my experience. Think about, what this means statistically if I am right - a little statistical lesson for the reader.

To your other remark: I am no chess engine programmer, so it is not my responsibility to offer great insight, read and think about what I write ( or ignore), just as you see fit.

Stockfish 120324 a Disaster

Re: Stockfish 120324 a Disaster

Re: Stockfish 120324 a Disaster

Re: Stockfish 120324 a Disaster

Re: Stockfish 120324 a Disaster

Re: Stockfish 120324 a Disaster

Re: Stockfish 120324 a Disaster

Re: Stockfish 120324 a Disaster

Re: Stockfish 120324 a Disaster

Re: Stockfish 120324 a Disaster

Re: Stockfish 120324 a Disaster