Perhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.syzygy wrote: ↑Sat Feb 18, 2023 11:34 pmJust like any other patch.CornfedForever wrote: ↑Sat Feb 18, 2023 11:33 pmYou mention that "New nets only gain like 2-3 elo". Okay, but a new net can also lose elo, right? So when you couple a 'new net' and a 'tweaked engine' (other than the net), how do you know if it's really the net that has lost the elo or the tweak in the engine?
Just like any other patch.I guess that's what I was getting at with my earlier question. As so many new development versions come with new nets...it seems like it would be difficult to tell which (or if both) resulted in the lost/gained elo.
I'm disappointed with Stockfish dev.
Moderator: Ras
-
- Posts: 648
- Joined: Mon Jun 20, 2022 4:08 am
- Full name: Brian D. Smith
Re: I'm disappointed with Stockfish dev.
-
- Posts: 391
- Joined: Tue Oct 08, 2019 11:39 pm
- Full name: Tomasz Sobczyk
Re: I'm disappointed with Stockfish dev.
statistics IS THE PROPER CONTROLSCornfedForever wrote: ↑Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.syzygy wrote: ↑Sat Feb 18, 2023 11:34 pmJust like any other patch.CornfedForever wrote: ↑Sat Feb 18, 2023 11:33 pmYou mention that "New nets only gain like 2-3 elo". Okay, but a new net can also lose elo, right? So when you couple a 'new net' and a 'tweaked engine' (other than the net), how do you know if it's really the net that has lost the elo or the tweak in the engine?
Just like any other patch.I guess that's what I was getting at with my earlier question. As so many new development versions come with new nets...it seems like it would be difficult to tell which (or if both) resulted in the lost/gained elo.
and of course testing 1 thing at a time is better, that's why 99% of tests test 1 thing a time. FFS.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.
Maybe you copied your stockfish commits from someone else too?
I will look into that.
-
- Posts: 640
- Joined: Fri Mar 30, 2018 7:20 am
- Full name: Andreas Matthies
Re: I'm disappointed with Stockfish dev.
Send me this fixed set of positions and I will write the perfect engine for you. Well, it will be 'perfect' in your definition...CornfedForever wrote: ↑Sat Feb 18, 2023 7:51 pm Picking a BIG, DIVERSE, FIXED set of middle game 'control' positions of various kinds ( how you vet this set, I'm not sure...) to test an engines evaluation...
-
- Posts: 10790
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: I'm disappointed with Stockfish dev.
If you do it simply by memorizing the solutions then it is not good.RubiChess wrote: ↑Sun Feb 19, 2023 8:56 amSend me this fixed set of positions and I will write the perfect engine for you. Well, it will be 'perfect' in your definition...CornfedForever wrote: ↑Sat Feb 18, 2023 7:51 pm Picking a BIG, DIVERSE, FIXED set of middle game 'control' positions of various kinds ( how you vet this set, I'm not sure...) to test an engines evaluation...
If you try to do it by some evaluation that is not if then for many specific positions then I suspect that you will have a problem to get a perfect engine
by his definition.
-
- Posts: 92
- Joined: Thu May 14, 2020 3:34 pm
- Full name: A. B. Gursu
Re: I'm disappointed with Stockfish dev.
I get what you're saying. I would personally agree on that if SF developers aren't human and would wait all day on computers to master a green test immediately. That could be automated maybe however there are some dubious, special and not finalized in idea green tests and automation can't decide which to patch and which to not. Also there are incredibly much failed tests, so patching just one test at a time and repeating the others again and again with every new base would slow things down incredibly. We have to test all those again because reds may turn into greens with new bases too, there isn't a straight math of which would turn green and which would turn red without testing. And that would take months to finish only the current tests even if no new ideas came. SF team cover that up with Regression tests which I believe the best it gets. If you believe a net is better than the current net, you can always test it with fishtest or by yourself and if you still believe that yours is better even after test fails then you can patch it yourself on your computer.CornfedForever wrote: ↑Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.syzygy wrote: ↑Sat Feb 18, 2023 11:34 pmJust like any other patch.CornfedForever wrote: ↑Sat Feb 18, 2023 11:33 pmYou mention that "New nets only gain like 2-3 elo". Okay, but a new net can also lose elo, right? So when you couple a 'new net' and a 'tweaked engine' (other than the net), how do you know if it's really the net that has lost the elo or the tweak in the engine?
Just like any other patch.I guess that's what I was getting at with my earlier question. As so many new development versions come with new nets...it seems like it would be difficult to tell which (or if both) resulted in the lost/gained elo.
-
- Posts: 5694
- Joined: Tue Feb 28, 2012 11:56 pm
Re: I'm disappointed with Stockfish dev.
So you have no idea how SF development progresses, testing one patch at a time?CornfedForever wrote: ↑Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
Your doctor will never know for sure whether what he gave you was the reason you got better. Doctors work far more with gut feeling than SF developers. This is why SF got so strong.
-
- Posts: 10790
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: I'm disappointed with Stockfish dev.
syzygy wrote: ↑Sun Feb 19, 2023 5:49 pmSo you have no idea how SF development progresses, testing one patch at a time?CornfedForever wrote: ↑Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
Your doctor will never know for sure whether what he gave you was the reason you got better. Doctors work far more with gut feeling than SF developers. This is why SF got so strong.
It is clear that the stockfish team does not test one patch at a time because they often accept some patches nearly at the same time.
For the reason SF is so strong then I think that the main reason is that for some reason people give stockfish more computer time relative to other engines.
I prefer if after patch A pass they stop testing patches B and patch C and update the developement version and only after updating the developement version go back to test patch B and patch C against the new developement version.
Maybe it is going to make the progress of stockfish slower and I do not know but I think that in this case the information is going to be more interesting.
There are many developement versions and maybe it is interesting if other people test every developement version against the previous version with 100,000 games from normal book at 60+0.6 with 8 cores to see if there are some regressions.
-
- Posts: 18890
- Joined: Thu Mar 09, 2006 6:40 pm
- Location: US of Europe, germany
- Full name: Thorsten Czub
Re: I'm disappointed with Stockfish dev.
Stockfish loses more and more ground.
Only a matter of time until other programs take over.
Maybe they should close sources.
Only a matter of time until other programs take over.
Maybe they should close sources.
What seems like a fairy tale today may be reality tomorrow.
Here we have a fairy tale of the day after tomorrow....
Here we have a fairy tale of the day after tomorrow....
-
- Posts: 648
- Joined: Mon Jun 20, 2022 4:08 am
- Full name: Brian D. Smith
Re: I'm disappointed with Stockfish dev.
That (which I underline) is indeed what I am saying. That gives you better controls so you are less likely to throw babies out with the bathwater. Also, the argument that it would 'take longer'....I am not sure of the merit there because if you look at HOW SLOW the elo progress really is since the introduction of NNUE. It might take 'longer' (still not sure how it would given thousands of full games are ran currently...)but ultimately you are more likely to keep the babies and therefore more likely to move forward on more solid ground.Uri Blass wrote: ↑Sun Feb 19, 2023 7:28 pmsyzygy wrote: ↑Sun Feb 19, 2023 5:49 pmSo you have no idea how SF development progresses, testing one patch at a time?CornfedForever wrote: ↑Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
Your doctor will never know for sure whether what he gave you was the reason you got better. Doctors work far more with gut feeling than SF developers. This is why SF got so strong.
It is clear that the stockfish team does not test one patch at a time because they often accept some patches nearly at the same time.
For the reason SF is so strong then I think that the main reason is that for some reason people give stockfish more computer time relative to other engines.
I prefer if after patch A pass they stop testing patches B and patch C and update the developement version and only after updating the developement version go back to test patch B and patch C against the new developement version.
Maybe it is going to make the progress of stockfish slower and I do not know but I think that in this case the information is going to be more interesting.
There are many developement versions and maybe it is interesting if other people test every developement version against the previous version with 100,000 games from normal book at 60+0.6 with 8 cores to see if there are some regressions.
Just my thoughts...people will have different thoughts.
-
- Posts: 391
- Joined: Tue Oct 08, 2019 11:39 pm
- Full name: Tomasz Sobczyk
Re: I'm disappointed with Stockfish dev.
Can you provide some examples of this?Uri Blass wrote: ↑Sun Feb 19, 2023 7:28 pmIt is clear that the stockfish team does not test one patch at a time because they often accept some patches nearly at the same time.syzygy wrote: ↑Sun Feb 19, 2023 5:49 pmSo you have no idea how SF development progresses, testing one patch at a time?CornfedForever wrote: ↑Sun Feb 19, 2023 12:41 amPerhaps so...but that sounds more and more like an excuse for not having 'proper controls' in place. I mean, when you are allergic to something, a good doctor will have you stop or do '1 thing at a time' until you get to the actual reason for the allergy. To do otherwise would risk throwing babies out with the bathwater.
Your doctor will never know for sure whether what he gave you was the reason you got better. Doctors work far more with gut feeling than SF developers. This is why SF got so strong.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.
Maybe you copied your stockfish commits from someone else too?
I will look into that.