I'm disappointed with Stockfish dev.

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

connor_mcmonigle
Posts: 544
Joined: Sun Sep 06, 2020 4:40 am
Full name: Connor McMonigle

Re: I'm disappointed with Stockfish dev.

Post by connor_mcmonigle »

CornfedForever wrote: Sun Mar 12, 2023 6:59 pm
DrEinstein wrote: Sun Mar 12, 2023 11:12 am So we are all patiently waiting for the next big jump to the bigger hill. I believe, or want to believe, that Stockfish is not yet standing on top of the highest mountain.
I wonder how one might define next "big jump". All the 'big jumps' have likely come and gone as engine strength is closer to topping out. What is left are likely 'little jumps'. The issue I (and I think others - but I do not speak for them ) see is that those are harder to find...and probably harder under the traditional testing framework to - these days, actually know 'what tweaks" actually' are responsible for those...really, very a little jumps if only because they fall closer to the 'margin of error'. You get a '+' and presume you 'have it' when it is part of multiple 'patches' working together...then later we find something in the tweaks/patches being disregarded or at least changed. And some people...do not seem to want to admit to seeing this 2 steps forward, 1 step back/1step forward, 2 step back thing happening. But it is a viable 'blind approach' that can work over time.

I (like to think) I know a little about quantum physics. There reality is just so 'odd' that no one currently fully understands it...you just "follow the math" into the darkness. Chess though is different animal as we know there are 'only' 10 to the 40 legal moves possible in a game, you play it on only 64 squars and Knights do not move like Bishops...etc.

Sure you can see VERY slow, incremental progress with the path being taken (and steps backward...). However, being at a bit of a loss for exactly what tweak 'works' means it resembles more 'wishcraft' than science - throwing things against the wall and hoping 'something' sticks (and often not knowing exactly what or why it stuck). It's almost like blindly taking herbs to combat Covid-19 until you eventually find in your testing a statistical 'hit' that seems to indicate 'something' in one of those herbs resulted in a tiny number of people not dying who might otherwise would have....vs identifying 'what' specific thing in a given herb actually is responsible and using that...or looking at things differently and finding a spike protein and using it to alert the bodies immune system to respond to something that looks like it...or viral vector technologies for dealing with other disease. etc. Wishcraft vs Science. Both can work...but with one you tend to know 'why' it is working...which in theory should mean 'less steps back'.
That's a lot of words to say very little. Most search patches tested on fishtest are motivated by some understanding of how Stockfish's search works. The patches aren't written by a bunch monkeys at type writers. Parameter tweaks are usually the result of SPSA tuning. There's not much to be learned from the fact that 482 is better than 480, but it's certainly not a random/unmotivated change.

In any case, it's entirely unclear what your proposed alternative to the current testing methodology is which makes any substantive conversation impossible. You seem to claim the current testing methodology is no longer viable (without any evidence beyond the anecdotal), but you haven't even proposed an alternative.
syzygy
Posts: 5693
Joined: Tue Feb 28, 2012 11:56 pm

Re: I'm disappointed with Stockfish dev.

Post by syzygy »

CornfedForever wrote: Sun Mar 12, 2023 6:59 pmHowever, being at a bit of a loss for exactly what tweak 'works' means it resembles more 'wishcraft' than science - throwing things against the wall and hoping 'something' sticks (and often not knowing exactly what or why it stuck).
So tell us how we can establish with 100% certainty whether a +0.5 Elo patch is indeed an improvement.
User avatar
Rebel
Posts: 7299
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: I'm disappointed with Stockfish dev.

Post by Rebel »

syzygy wrote: Sun Mar 12, 2023 9:52 pm
CornfedForever wrote: Sun Mar 12, 2023 6:59 pmHowever, being at a bit of a loss for exactly what tweak 'works' means it resembles more 'wishcraft' than science - throwing things against the wall and hoping 'something' sticks (and often not knowing exactly what or why it stuck).
So tell us how we can establish with 100% certainty whether a +0.5 Elo patch is indeed an improvement.
http://rebel13.nl/text/example1.html

100K games looks pretty reliable.
90% of coding is debugging, the other 10% is writing bugs.
syzygy
Posts: 5693
Joined: Tue Feb 28, 2012 11:56 pm

Re: I'm disappointed with Stockfish dev.

Post by syzygy »

Rebel wrote: Mon Mar 13, 2023 12:04 am
syzygy wrote: Sun Mar 12, 2023 9:52 pm
CornfedForever wrote: Sun Mar 12, 2023 6:59 pmHowever, being at a bit of a loss for exactly what tweak 'works' means it resembles more 'wishcraft' than science - throwing things against the wall and hoping 'something' sticks (and often not knowing exactly what or why it stuck).
So tell us how we can establish with 100% certainty whether a +0.5 Elo patch is indeed an improvement.
http://rebel13.nl/text/example1.html

100K games looks pretty reliable.
Not really. Still "wishcraft" according to that guy's definition.
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: I'm disappointed with Stockfish dev.

Post by Sopel »

Rebel wrote: Mon Mar 13, 2023 12:04 am
syzygy wrote: Sun Mar 12, 2023 9:52 pm
CornfedForever wrote: Sun Mar 12, 2023 6:59 pmHowever, being at a bit of a loss for exactly what tweak 'works' means it resembles more 'wishcraft' than science - throwing things against the wall and hoping 'something' sticks (and often not knowing exactly what or why it stuck).
So tell us how we can establish with 100% certainty whether a +0.5 Elo patch is indeed an improvement.
http://rebel13.nl/text/example1.html

100K games looks pretty reliable.
With 100k games, +0.5 Elo will likely mean around 90% chance that it's positive Elo (depends on draw rate). That's worse than current Stockfish practice that's being debated.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
User avatar
Rebel
Posts: 7299
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: I'm disappointed with Stockfish dev.

Post by Rebel »

Sopel wrote: Mon Mar 13, 2023 2:51 pm
Rebel wrote: Mon Mar 13, 2023 12:04 am
syzygy wrote: Sun Mar 12, 2023 9:52 pm
CornfedForever wrote: Sun Mar 12, 2023 6:59 pmHowever, being at a bit of a loss for exactly what tweak 'works' means it resembles more 'wishcraft' than science - throwing things against the wall and hoping 'something' sticks (and often not knowing exactly what or why it stuck).
So tell us how we can establish with 100% certainty whether a +0.5 Elo patch is indeed an improvement.
http://rebel13.nl/text/example1.html

100K games looks pretty reliable.
With 100k games, +0.5 Elo will likely mean around 90% chance that it's positive Elo (depends on draw rate). That's worse than current Stockfish practice that's being debated.
With nowadays draw rates you can evaluate different.

A hypothetical 100K match may end in 50137-49863 (50.1%) traditionally meaning less than 1 elo progress but you can also consider 137 more won games and consider the version as better ?
90% of coding is debugging, the other 10% is writing bugs.
syzygy
Posts: 5693
Joined: Tue Feb 28, 2012 11:56 pm

Re: I'm disappointed with Stockfish dev.

Post by syzygy »

Rebel wrote: Mon Mar 13, 2023 5:15 pm
Sopel wrote: Mon Mar 13, 2023 2:51 pm
Rebel wrote: Mon Mar 13, 2023 12:04 am
syzygy wrote: Sun Mar 12, 2023 9:52 pm
CornfedForever wrote: Sun Mar 12, 2023 6:59 pmHowever, being at a bit of a loss for exactly what tweak 'works' means it resembles more 'wishcraft' than science - throwing things against the wall and hoping 'something' sticks (and often not knowing exactly what or why it stuck).
So tell us how we can establish with 100% certainty whether a +0.5 Elo patch is indeed an improvement.
http://rebel13.nl/text/example1.html

100K games looks pretty reliable.
With 100k games, +0.5 Elo will likely mean around 90% chance that it's positive Elo (depends on draw rate). That's worse than current Stockfish practice that's being debated.
With nowadays draw rates you can evaluate different.

A hypothetical 100K match may end in 50137-49863 (50.1%) traditionally meaning less than 1 elo progress but you can also consider 137 more won games and consider the version as better ?
You are certainly right that, with a high draw rate, W-L can be lower to achieve the same confidence that version A is better than version B than with a low draw rate.
274 wins, 99726 draws, 0 losses -> high confidence that A>B
50137 wins, 0 draws, 49863 losses -> low confidence that A>B

But there is probably quite a bit of noise in fishtest (leading to wins and losses for both sides, so lower draw rate but W/L remaining the same -> less confidence at same number of games, more games needed). Both SPRT and fixed number of games should suffer from this.

I assume the various statistical models being used take this into account, but I'm not sure (I never broke my head on it, as we say in Dutch :D).

(Of course the point remains that absolute certainty does not exist. But let's wait for Cornfed to enlighten us on how to make the mathematically impossible possible.)
CornfedForever
Posts: 648
Joined: Mon Jun 20, 2022 4:08 am
Full name: Brian D. Smith

Re: I'm disappointed with Stockfish dev.

Post by CornfedForever »

syzygy wrote: Mon Mar 13, 2023 9:04 pm [
(Of course the point remains that absolute certainty does not exist. But let's wait for Cornfed to enlighten us on how to make the mathematically impossible possible.)
Dude, nothing is 100% certain. Just stop it with the straw-man.
syzygy
Posts: 5693
Joined: Tue Feb 28, 2012 11:56 pm

Re: I'm disappointed with Stockfish dev.

Post by syzygy »

CornfedForever wrote: Mon Mar 13, 2023 10:59 pm
syzygy wrote: Mon Mar 13, 2023 9:04 pm [
(Of course the point remains that absolute certainty does not exist. But let's wait for Cornfed to enlighten us on how to make the mathematically impossible possible.)
Dude, nothing is 100% certain. Just stop it with the straw-man.
Hey, you were the wishcraft guy... Did you lose it?
User avatar
AlexChess
Posts: 1562
Joined: Sat Feb 06, 2021 8:06 am
Full name: Alex Morales

Re: I'm disappointed with Stockfish dev.

Post by AlexChess »

syzygy wrote: Mon Mar 13, 2023 11:30 pm
CornfedForever wrote: Mon Mar 13, 2023 10:59 pm
syzygy wrote: Mon Mar 13, 2023 9:04 pm [
(Of course the point remains that absolute certainty does not exist. But let's wait for Cornfed to enlighten us on how to make the mathematically impossible possible.)
Dude, nothing is 100% certain. Just stop it with the straw-man.
Hey, you were the wishcraft guy... Did you lose it?
What about to simply re-insert a much improved Contempt option to avoid that a ThreadRipper 128 threads boosted SF hitting 75 MN/s often draws against Raspberry-P3 SF calculating only 79 kN/s? :wink:
Chess engines and dedicated chess computers fan since 1981 :D macOS Sequoia 16GB-512GB, Windows 11 & Ubuntu ARM64.
ProteusSF Dev Forum