I'm disappointed with Stockfish dev.

CornfedForever · Post by **CornfedForever** » Sun Feb 19, 2023 12:41 am

syzygy wrote: ↑Sat Feb 18, 2023 11:34 pm
CornfedForever wrote: ↑Sat Feb 18, 2023 11:33 pmYou mention that "New nets only gain like 2-3 elo". Okay, but a new net can also lose elo, right? So when you couple a 'new net' and a 'tweaked engine' (other than the net), how do you know if it's really the net that has lost the elo or the tweak in the engine?
Just like any other patch.

I guess that's what I was getting at with my earlier question. As so many new development versions come with new nets...it seems like it would be difficult to tell which (or if both) resulted in the lost/gained elo.
Just like any other patch.

Perhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.

Sopel · Post by **Sopel** » Sun Feb 19, 2023 2:08 am

CornfedForever wrote: ↑Sun Feb 19, 2023 12:41 am
syzygy wrote: ↑Sat Feb 18, 2023 11:34 pm
CornfedForever wrote: ↑Sat Feb 18, 2023 11:33 pmYou mention that "New nets only gain like 2-3 elo". Okay, but a new net can also lose elo, right? So when you couple a 'new net' and a 'tweaked engine' (other than the net), how do you know if it's really the net that has lost the elo or the tweak in the engine?
Just like any other patch.

I guess that's what I was getting at with my earlier question. As so many new development versions come with new nets...it seems like it would be difficult to tell which (or if both) resulted in the lost/gained elo.
Just like any other patch.

Perhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.

statistics IS THE PROPER CONTROLS

and of course testing 1 thing at a time is better, that's why 99% of tests test 1 thing a time. FFS.

RubiChess · Post by **RubiChess** » Sun Feb 19, 2023 8:56 am

CornfedForever wrote: ↑Sat Feb 18, 2023 7:51 pm Picking a BIG, DIVERSE, FIXED set of middle game 'control' positions of various kinds ( how you vet this set, I'm not sure...) to test an engines evaluation...

Send me this fixed set of positions and I will write the perfect engine for you. Well, it will be 'perfect' in your definition...

Uri Blass · Post by **Uri Blass** » Sun Feb 19, 2023 10:59 am

RubiChess wrote: ↑Sun Feb 19, 2023 8:56 am
CornfedForever wrote: ↑Sat Feb 18, 2023 7:51 pm Picking a BIG, DIVERSE, FIXED set of middle game 'control' positions of various kinds ( how you vet this set, I'm not sure...) to test an engines evaluation...
Send me this fixed set of positions and I will write the perfect engine for you. Well, it will be 'perfect' in your definition...

If you do it simply by memorizing the solutions then it is not good.
If you try to do it by some evaluation that is not if then for many specific positions then I suspect that you will have a problem to get a perfect engine
by his definition.

abgursu · Post by **abgursu** » Sun Feb 19, 2023 11:49 am

CornfedForever wrote: ↑Sun Feb 19, 2023 12:41 am
syzygy wrote: ↑Sat Feb 18, 2023 11:34 pm
CornfedForever wrote: ↑Sat Feb 18, 2023 11:33 pmYou mention that "New nets only gain like 2-3 elo". Okay, but a new net can also lose elo, right? So when you couple a 'new net' and a 'tweaked engine' (other than the net), how do you know if it's really the net that has lost the elo or the tweak in the engine?
Just like any other patch.

I guess that's what I was getting at with my earlier question. As so many new development versions come with new nets...it seems like it would be difficult to tell which (or if both) resulted in the lost/gained elo.
Just like any other patch.

Perhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.

I get what you're saying. I would personally agree on that if SF developers aren't human and would wait all day on computers to master a green test immediately. That could be automated maybe however there are some dubious, special and not finalized in idea green tests and automation can't decide which to patch and which to not. Also there are incredibly much failed tests, so patching just one test at a time and repeating the others again and again with every new base would slow things down incredibly. We have to test all those again because reds may turn into greens with new bases too, there isn't a straight math of which would turn green and which would turn red without testing. And that would take months to finish only the current tests even if no new ideas came. SF team cover that up with Regression tests which I believe the best it gets. If you believe a net is better than the current net, you can always test it with fishtest or by yourself and if you still believe that yours is better even after test fails then you can patch it yourself on your computer.

syzygy · Post by **syzygy** » Sun Feb 19, 2023 5:49 pm

CornfedForever wrote: ↑Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.

So you have no idea how SF development progresses, testing one patch at a time?

Your doctor will never know for sure whether what he gave you was the reason you got better. Doctors work far more with gut feeling than SF developers. This is why SF got so strong.

Uri Blass · Post by **Uri Blass** » Sun Feb 19, 2023 7:28 pm

syzygy wrote: ↑Sun Feb 19, 2023 5:49 pm
CornfedForever wrote: ↑Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
So you have no idea how SF development progresses, testing one patch at a time?

Your doctor will never know for sure whether what he gave you was the reason you got better. Doctors work far more with gut feeling than SF developers. This is why SF got so strong.

It is clear that the stockfish team does not test one patch at a time because they often accept some patches nearly at the same time.

For the reason SF is so strong then I think that the main reason is that for some reason people give stockfish more computer time relative to other engines.

I prefer if after patch A pass they stop testing patches B and patch C and update the developement version and only after updating the developement version go back to test patch B and patch C against the new developement version.

Maybe it is going to make the progress of stockfish slower and I do not know but I think that in this case the information is going to be more interesting.

There are many developement versions and maybe it is interesting if other people test every developement version against the previous version with 100,000 games from normal book at 60+0.6 with 8 cores to see if there are some regressions.

mclane · Post by **mclane** » Sun Feb 19, 2023 7:43 pm

Stockfish loses more and more ground.
Only a matter of time until other programs take over.
Maybe they should close sources.

CornfedForever · Post by **CornfedForever** » Sun Feb 19, 2023 7:48 pm

Uri Blass wrote: ↑Sun Feb 19, 2023 7:28 pm
syzygy wrote: ↑Sun Feb 19, 2023 5:49 pm
CornfedForever wrote: ↑Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
So you have no idea how SF development progresses, testing one patch at a time?

Your doctor will never know for sure whether what he gave you was the reason you got better. Doctors work far more with gut feeling than SF developers. This is why SF got so strong.

It is clear that the stockfish team does not test one patch at a time because they often accept some patches nearly at the same time.

For the reason SF is so strong then I think that the main reason is that for some reason people give stockfish more computer time relative to other engines.

I prefer if after patch A pass they stop testing patches B and patch C and update the developement version and only after updating the developement version go back to test patch B and patch C against the new developement version.

Maybe it is going to make the progress of stockfish slower and I do not know but I think that in this case the information is going to be more interesting.

There are many developement versions and maybe it is interesting if other people test every developement version against the previous version with 100,000 games from normal book at 60+0.6 with 8 cores to see if there are some regressions.

That (which I underline) is indeed what I am saying. That gives you better controls so you are less likely to throw babies out with the bathwater. Also, the argument that it would 'take longer'....I am not sure of the merit there because if you look at HOW SLOW the elo progress really is since the introduction of NNUE. It might take 'longer' (still not sure how it would given thousands of full games are ran currently...)but ultimately you are more likely to keep the babies and therefore more likely to move forward on more solid ground.

Just my thoughts...people will have different thoughts.

Sopel · Post by **Sopel** » Sun Feb 19, 2023 8:37 pm

Uri Blass wrote: ↑Sun Feb 19, 2023 7:28 pm
syzygy wrote: ↑Sun Feb 19, 2023 5:49 pm
CornfedForever wrote: ↑Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
So you have no idea how SF development progresses, testing one patch at a time?

Your doctor will never know for sure whether what he gave you was the reason you got better. Doctors work far more with gut feeling than SF developers. This is why SF got so strong.
It is clear that the stockfish team does not test one patch at a time because they often accept some patches nearly at the same time.

Can you provide some examples of this?

I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.

Re: I'm disappointed with Stockfish dev.