I'm disappointed with Stockfish dev.

Uri Blass · Post by **Uri Blass** » Sun Feb 19, 2023 9:37 pm

Sopel wrote: ↑Sun Feb 19, 2023 8:37 pm
Uri Blass wrote: ↑Sun Feb 19, 2023 7:28 pm
syzygy wrote: ↑Sun Feb 19, 2023 5:49 pm
CornfedForever wrote: ↑Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
So you have no idea how SF development progresses, testing one patch at a time?

Your doctor will never know for sure whether what he gave you was the reason you got better. Doctors work far more with gut feeling than SF developers. This is why SF got so strong.
It is clear that the stockfish team does not test one patch at a time because they often accept some patches nearly at the same time.
Can you provide some examples of this?

I mean that the stockfish team does not test a patch against the previous version.
Maybe the words I used are not correct but the idea is that testing some change that may work for some old version without testing the same version for the new version that you use it for it may be wrong(The idea is that it is possible that patch A works for some old version but does not work for a new version)

Simply watch the latest patches for example

https://abrok.eu/stockfish/

Author: Dubslow
Date: Sat Feb 18 14:01:08 2023 +0100
Timestamp: 1676725268
Remove one `reduction` call
test at long time control
https://tests.stockfishchess.org/tests/ ... 23fef375ed
time of the test
start time 2023-02-15 19:37:43
last updated 2023-02-16 01:01:13

Author: Dubslow
Date: Sat Feb 18 13:34:40 2023 +0100
Timestamp: 1676723680

Simplify late countermove bonus condition

test at long time control
https://tests.stockfishchess.org/tests/ ... 29a5565991
time of the test
start time 2023-01-30 17:08:54
last updated 2023-02-12 06:52:55

Author: mstembera
Date: Sat Feb 18 13:30:48 2023 +0100
Timestamp: 1676723448

Simplify nnueComplexity calculation.

test at long time control
https://tests.stockfishchess.org/tests/ ... d71f77002a
time of the test
start time 2023-02-08 19:50:42
last updated 2023-02-11 09:13:38

It is clear that there were no games between different versions of 18.2.2023 because the tests are earlier to 18.2.2023 so the previous versions from 18.2.2023 did not play sometimes the tests are even against versions earier to 9.2.2023 and one test even started in january 2023 and the stockfish team accepted some patches also in 9.2.2023 and in 2.2.2023 or 3.2.2023

DrEinstein · Post by **DrEinstein** » Sun Feb 19, 2023 9:39 pm

Sopel wrote: ↑Sun Feb 19, 2023 8:37 pm
Uri Blass wrote: ↑Sun Feb 19, 2023 7:28 pm
syzygy wrote: ↑Sun Feb 19, 2023 5:49 pm
CornfedForever wrote: ↑Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
So you have no idea how SF development progresses, testing one patch at a time?

Your doctor will never know for sure whether what he gave you was the reason you got better. Doctors work far more with gut feeling than SF developers. This is why SF got so strong.
It is clear that the stockfish team does not test one patch at a time because they often accept some patches nearly at the same time.
Can you provide some examples of this?

Last time it happened yesterday, when vondele applied 6 patches at the same time to master. And each of these patches was tested against the (same) old master. No problem if all six are independent of each other....

syzygy · Post by **syzygy** » Sun Feb 19, 2023 9:48 pm

Uri Blass wrote: ↑Sun Feb 19, 2023 7:28 pm It is clear that the stockfish team does not test one patch at a time because they often accept some patches nearly at the same time.

Each patch is tested individually.

Unlike a doctor's patient, you can run multiple instances of SF at the same time, so you can individually test multiple patches in parallel.

For the reason SF is so strong then I think that the main reason is that for some reason people give stockfish more computer time relative to other engines.

The point is that fishtest allows the developers to know with sufficient certainty which patches work and which do not.

I prefer if after patch A pass they stop testing patches B and patch C and update the developement version and only after updating the developement version go back to test patch B and patch C against the new developement version.

But you keep ignoring the fact that resources are finite, and that the SF developers very rightly seek to optimise the development process.

Uri Blass · Post by **Uri Blass** » Mon Feb 20, 2023 12:50 am

syzygy wrote: ↑Sun Feb 19, 2023 9:48 pm
Uri Blass wrote: ↑Sun Feb 19, 2023 7:28 pm It is clear that the stockfish team does not test one patch at a time because they often accept some patches nearly at the same time.
Each patch is tested individually.

Unlike a doctor's patient, you can run multiple instances of SF at the same time, so you can individually test multiple patches in parallel.

For the reason SF is so strong then I think that the main reason is that for some reason people give stockfish more computer time relative to other engines.
The point is that fishtest allows the developers to know with sufficient certainty which patches work and which do not.

I prefer if after patch A pass they stop testing patches B and patch C and update the developement version and only after updating the developement version go back to test patch B and patch C against the new developement version.
But you keep ignoring the fact that resources are finite, and that the SF developers very rightly seek to optimise the development process.

I know you can run multiple instances of SF at the same time but the point is that there is no test to show that X+1 is no regression relative to X.

I will describe what they do in time order.

1)They have version developement version X1
2)test X1+a against X1
Test is not finished and they already updated X1 so the developement version is X2
3)test is finished and X1+a against X1 pass so they decided to accept a.
4)They update X2 to X2+a

They tested that X1+a is better than X1 but they never tested that X2+a is better than X2.

It is not the only disadvantage of testing.
It is possible that their way cause stockfish to improve faster relative to the alternative and I do not know if it is the case or not but the fact that stockfish is leading does not prove that their way cause stockfish to improve faster and it is possible that the only reason that stockfish is number 1 is the fact that they use more computer time relative to other people.

I can add that from my point of view understanding is more important and interesting than fast improvement.
It is better to get only 5 elo improvement and understand why you get the improvement and not to get 6 elo improvement without understanding why.

I prefer to see programmers not only of stockfish give positions that the engine does better relative to previous version when they release a new engine and not let users to try to play games and find out in what type of positions the engine play better moves.

syzygy · Post by **syzygy** » Mon Feb 20, 2023 3:51 am

Uri Blass wrote: ↑Mon Feb 20, 2023 12:50 amI will describe what they do in time order.

I understand what you mean, and I have already responded to it.

I can add that from my point of view understanding is more important and interesting than fast improvement.
It is better to get only 5 elo improvement and understand why you get the improvement and not to get 6 elo improvement without understanding why.

For 99.9% of the patches it is impossible to "understand" why they work.

I prefer to see programmers not only of stockfish give positions that the engine does better relative to previous version when they release a new engine and not let users to try to play games and find out in what type of positions the engine play better moves.

You are obviously free to have your preferences, but you should not expect that SF developers take your criticism seriously.

The beautiful thing is that you and all likeminded people can fork SF and improve it.
Or you can just study it and try to "understand" why it works.

I prefer to see programmers not only of stockfish give positions that the engine does better relative to previous version when they release a new engine and not let users to try to play games and find out in what type of positions the engine play better moves.

And you still don't see how misguided this is?

It is trivial to write a script that will find positions where version N+1 of an engine does better than version N, especially if the two versions are just a few Elo apart. And then you can run a script to find positions where version N+1 does worse than N, post them here, and complain.

syzygy · Post by **syzygy** » Mon Feb 20, 2023 4:08 am

Uri Blass wrote: ↑Mon Feb 20, 2023 12:50 amI prefer to see programmers not only of stockfish give positions that the engine does better relative to previous version when they release a new engine and not let users to try to play games and find out in what type of positions the engine play better moves.

One the one hand you are pointing out that a patch that improves version N might not improve version N+1.
You also regularly point out that if verison N+2 is stronger than version N+1 and version N+1 is stronger than version N, it is possible that version N is stronger than N+2.

Yet, you seem to believe that if version N+1 is better than version N at one hand-picked position, then that shows that version N+1 is better than version N.

Whiskers · Post by **Whiskers** » Mon Feb 20, 2023 4:26 am

The most common variant/clone of Stockfish that I've heard of is Crystal, which is Stockfish but designed to be better at solving fortresses, difficult tactical problems, and generally the type of anti-engine problems that people are always trying to fool top engines with. And it is undoubtedly better in that regard.

However, when it comes to actual playing strength, Crystal is weaker, because it prunes less and considers moves that 99.99% of the time are an absolute waste of nodes, and thus doesn't search as deep. Stockfish isn't built for the edge cases, it's built to be the best at the middle of the road normal positions, the ones that come up in our games that we analyze and the games that it plays against top engines.

Engines think Black is better in this position even though in reality White wins easily.

[fen]rrrrkrrr/pppppppp/8/8/8/8/PPPPPPPP/BBBQKBBB b kq - 0 1[/fen]

Does that mean that bishops are worth more than rooks? Of course not! Some positions are simply the exceptions to the rule, but catering to them would make the engine handle the rule worse. It's a bit of a dilemma, and I personally think Stockfish does the best it can.

Lazy_Frank · Post by **Lazy_Frank** » Mon Feb 20, 2023 6:56 am

...
It is possible that their way cause stockfish to improve faster relative to the alternative and I do not know if it is the case or not but the fact that stockfish is leading does not prove that their way cause stockfish to improve faster and it is possible that the only reason that stockfish is number 1 is the fact that they use more computer time relative to other people.

I can add that from my point of view understanding is more important and interesting than fast improvement.
It is better to get only 5 elo improvement and understand why you get the improvement and not to get 6 elo improvement without understanding why.

I prefer to see programmers not only of stockfish give positions that the engine does better relative to previous version when they release a new engine and not let users to try to play games and find out in what type of positions the engine play better moves.

Uri, as i understand not for all SF developers Stockfish project is to build free available strongest chess engine.
Some of them its a platform show off programming skills, for some to be in first place in some list etc.
After all SF devs also are humans.

Deal with that, that makes your life easier.

RubiChess · Post by **RubiChess** » Mon Feb 20, 2023 7:33 am

Whiskers wrote: ↑Mon Feb 20, 2023 4:26 am [fen]rrrrkrrr/pppppppp/8/8/8/8/PPPPPPPP/BBBQKBBB b kq - 0 1[/fen]
Does that mean that bishops are worth more than rooks? Of course not! Some positions are simply the exceptions to the rule, but catering to them would make the engine handle the rule worse. It's a bit of a dilemma, and I personally think Stockfish does the best it can.

This position has nothing to do with chess. This is why chess engines don't handle it well.

Lazy_Frank · Post by **Lazy_Frank** » Mon Feb 20, 2023 8:01 am

Whiskers wrote: ↑Mon Feb 20, 2023 4:26 am The most common variant/clone of Stockfish that I've heard of is Crystal, which is Stockfish but designed to be better at solving fortresses, difficult tactical problems, and generally the type of anti-engine problems that people are always trying to fool top engines with. And it is undoubtedly better in that regard.

However, when it comes to actual playing strength, Crystal is weaker, because it prunes less and considers moves that 99.99% of the time are an absolute waste of nodes, and thus doesn't search as deep. Stockfish isn't built for the edge cases, it's built to be the best at the middle of the road normal positions, the ones that come up in our games that we analyze and the games that it plays against top engines.

Engines think Black is better in this position even though in reality White wins easily.

[fen]rrrrkrrr/pppppppp/8/8/8/8/PPPPPPPP/BBBQKBBB b kq - 0 1[/fen]

Does that mean that bishops are worth more than rooks? Of course not! Some positions are simply the exceptions to the rule, but catering to them would make the engine handle the rule worse. It's a bit of a dilemma, and I personally think Stockfish does the best it can.

No exception of the rule for me. A pair of archers/bishops (one white square and one black square) is a very strong two pieces combination, while rooks are the worst piece behind the own pawn, even more behind the own pawns chain.
As experiment take off two black pawns ...

I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.