I'm disappointed with Stockfish dev.

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

CornfedForever
Posts: 648
Joined: Mon Jun 20, 2022 4:08 am
Full name: Brian D. Smith

Re: I'm disappointed with Stockfish dev.

Post by CornfedForever »

syzygy wrote: Sat Feb 18, 2023 11:34 pm
CornfedForever wrote: Sat Feb 18, 2023 11:33 pmYou mention that "New nets only gain like 2-3 elo". Okay, but a new net can also lose elo, right? So when you couple a 'new net' and a 'tweaked engine' (other than the net), how do you know if it's really the net that has lost the elo or the tweak in the engine?
Just like any other patch.
I guess that's what I was getting at with my earlier question. As so many new development versions come with new nets...it seems like it would be difficult to tell which (or if both) resulted in the lost/gained elo.
Just like any other patch.
Perhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: I'm disappointed with Stockfish dev.

Post by Sopel »

CornfedForever wrote: Sun Feb 19, 2023 12:41 am
syzygy wrote: Sat Feb 18, 2023 11:34 pm
CornfedForever wrote: Sat Feb 18, 2023 11:33 pmYou mention that "New nets only gain like 2-3 elo". Okay, but a new net can also lose elo, right? So when you couple a 'new net' and a 'tweaked engine' (other than the net), how do you know if it's really the net that has lost the elo or the tweak in the engine?
Just like any other patch.
I guess that's what I was getting at with my earlier question. As so many new development versions come with new nets...it seems like it would be difficult to tell which (or if both) resulted in the lost/gained elo.
Just like any other patch.
Perhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
statistics IS THE PROPER CONTROLS

and of course testing 1 thing at a time is better, that's why 99% of tests test 1 thing a time. FFS.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
User avatar
RubiChess
Posts: 640
Joined: Fri Mar 30, 2018 7:20 am
Full name: Andreas Matthies

Re: I'm disappointed with Stockfish dev.

Post by RubiChess »

CornfedForever wrote: Sat Feb 18, 2023 7:51 pm Picking a BIG, DIVERSE, FIXED set of middle game 'control' positions of various kinds ( how you vet this set, I'm not sure...) to test an engines evaluation...
Send me this fixed set of positions and I will write the perfect engine for you. Well, it will be 'perfect' in your definition...
Uri Blass
Posts: 10790
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: I'm disappointed with Stockfish dev.

Post by Uri Blass »

RubiChess wrote: Sun Feb 19, 2023 8:56 am
CornfedForever wrote: Sat Feb 18, 2023 7:51 pm Picking a BIG, DIVERSE, FIXED set of middle game 'control' positions of various kinds ( how you vet this set, I'm not sure...) to test an engines evaluation...
Send me this fixed set of positions and I will write the perfect engine for you. Well, it will be 'perfect' in your definition...
If you do it simply by memorizing the solutions then it is not good.
If you try to do it by some evaluation that is not if then for many specific positions then I suspect that you will have a problem to get a perfect engine
by his definition.
abgursu
Posts: 92
Joined: Thu May 14, 2020 3:34 pm
Full name: A. B. Gursu

Re: I'm disappointed with Stockfish dev.

Post by abgursu »

CornfedForever wrote: Sun Feb 19, 2023 12:41 am
syzygy wrote: Sat Feb 18, 2023 11:34 pm
CornfedForever wrote: Sat Feb 18, 2023 11:33 pmYou mention that "New nets only gain like 2-3 elo". Okay, but a new net can also lose elo, right? So when you couple a 'new net' and a 'tweaked engine' (other than the net), how do you know if it's really the net that has lost the elo or the tweak in the engine?
Just like any other patch.
I guess that's what I was getting at with my earlier question. As so many new development versions come with new nets...it seems like it would be difficult to tell which (or if both) resulted in the lost/gained elo.
Just like any other patch.
Perhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
I get what you're saying. I would personally agree on that if SF developers aren't human and would wait all day on computers to master a green test immediately. That could be automated maybe however there are some dubious, special and not finalized in idea green tests and automation can't decide which to patch and which to not. Also there are incredibly much failed tests, so patching just one test at a time and repeating the others again and again with every new base would slow things down incredibly. We have to test all those again because reds may turn into greens with new bases too, there isn't a straight math of which would turn green and which would turn red without testing. And that would take months to finish only the current tests even if no new ideas came. SF team cover that up with Regression tests which I believe the best it gets. If you believe a net is better than the current net, you can always test it with fishtest or by yourself and if you still believe that yours is better even after test fails then you can patch it yourself on your computer.
syzygy
Posts: 5694
Joined: Tue Feb 28, 2012 11:56 pm

Re: I'm disappointed with Stockfish dev.

Post by syzygy »

CornfedForever wrote: Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
So you have no idea how SF development progresses, testing one patch at a time?

Your doctor will never know for sure whether what he gave you was the reason you got better. Doctors work far more with gut feeling than SF developers. This is why SF got so strong.
Uri Blass
Posts: 10790
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: I'm disappointed with Stockfish dev.

Post by Uri Blass »

syzygy wrote: Sun Feb 19, 2023 5:49 pm
CornfedForever wrote: Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
So you have no idea how SF development progresses, testing one patch at a time?

Your doctor will never know for sure whether what he gave you was the reason you got better. Doctors work far more with gut feeling than SF developers. This is why SF got so strong.

It is clear that the stockfish team does not test one patch at a time because they often accept some patches nearly at the same time.

For the reason SF is so strong then I think that the main reason is that for some reason people give stockfish more computer time relative to other engines.

I prefer if after patch A pass they stop testing patches B and patch C and update the developement version and only after updating the developement version go back to test patch B and patch C against the new developement version.

Maybe it is going to make the progress of stockfish slower and I do not know but I think that in this case the information is going to be more interesting.

There are many developement versions and maybe it is interesting if other people test every developement version against the previous version with 100,000 games from normal book at 60+0.6 with 8 cores to see if there are some regressions.
User avatar
mclane
Posts: 18891
Joined: Thu Mar 09, 2006 6:40 pm
Location: US of Europe, germany
Full name: Thorsten Czub

Re: I'm disappointed with Stockfish dev.

Post by mclane »

Stockfish loses more and more ground.
Only a matter of time until other programs take over.
Maybe they should close sources.
What seems like a fairy tale today may be reality tomorrow.
Here we have a fairy tale of the day after tomorrow....
CornfedForever
Posts: 648
Joined: Mon Jun 20, 2022 4:08 am
Full name: Brian D. Smith

Re: I'm disappointed with Stockfish dev.

Post by CornfedForever »

Uri Blass wrote: Sun Feb 19, 2023 7:28 pm
syzygy wrote: Sun Feb 19, 2023 5:49 pm
CornfedForever wrote: Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
So you have no idea how SF development progresses, testing one patch at a time?

Your doctor will never know for sure whether what he gave you was the reason you got better. Doctors work far more with gut feeling than SF developers. This is why SF got so strong.

It is clear that the stockfish team does not test one patch at a time because they often accept some patches nearly at the same time.

For the reason SF is so strong then I think that the main reason is that for some reason people give stockfish more computer time relative to other engines.

I prefer if after patch A pass they stop testing patches B and patch C and update the developement version and only after updating the developement version go back to test patch B and patch C against the new developement version.

Maybe it is going to make the progress of stockfish slower and I do not know but I think that in this case the information is going to be more interesting.


There are many developement versions and maybe it is interesting if other people test every developement version against the previous version with 100,000 games from normal book at 60+0.6 with 8 cores to see if there are some regressions.
That (which I underline) is indeed what I am saying. That gives you better controls so you are less likely to throw babies out with the bathwater. Also, the argument that it would 'take longer'....I am not sure of the merit there because if you look at HOW SLOW the elo progress really is since the introduction of NNUE. It might take 'longer' (still not sure how it would given thousands of full games are ran currently...)but ultimately you are more likely to keep the babies and therefore more likely to move forward on more solid ground.

Just my thoughts...people will have different thoughts.
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: I'm disappointed with Stockfish dev.

Post by Sopel »

Uri Blass wrote: Sun Feb 19, 2023 7:28 pm
syzygy wrote: Sun Feb 19, 2023 5:49 pm
CornfedForever wrote: Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
So you have no idea how SF development progresses, testing one patch at a time?

Your doctor will never know for sure whether what he gave you was the reason you got better. Doctors work far more with gut feeling than SF developers. This is why SF got so strong.
It is clear that the stockfish team does not test one patch at a time because they often accept some patches nearly at the same time.
Can you provide some examples of this?
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.