Recent Stockfish Development Version (Jan26)

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

ernest
Posts: 2053
Joined: Wed Mar 08, 2006 8:30 pm

Recent Stockfish Development Version (Jan26)

Post by ernest »

From Stockfish Jan 26 (abrok/stockfish)
I wonder, why did they allow that STC Elo -6,39 ???
---------------------------
Author: Michael Chaly
Date: Fri Jan 26 20:55:16 2024 +0100
Timestamp: 1706298916

Do more double extensions

Parameter tweak from Black Marlin chess engine. Choose a significantly
lower value that triggers in 95% of cases, compared to the usual 84% in
standard benchmark runs.

Since the introduction by
https://github.com/official-stockfish/S ... ba36aca92e
this constant has only decreased in value over time.
2-16-17-18-21-22-25-26-52-71-75-93-140

Failed STC really fast:
https://tests.stockfishchess.org/tests/ ... 0db026df7b
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 13216 W: 3242 L: 3485 D: 6489 Elo -6.39
Ptnml(0-2): 50, 1682, 3371, 1471, 34

Was reasonable at LTC:

https://tests.stockfishchess.org/tests/ ... 0db026e210
Elo: 1.18 ± 1.5 (95%) LOS: 94.3%
Total: 50000 W: 12517 L: 12347 D: 25136 Elo +1.18
Ptnml(0-2): 31, 5598, 13579, 5754, 38
nElo: 2.45 ± 3.0 (95%) PairsRatio: 1.03
Ciekce
Posts: 197
Joined: Sun Oct 30, 2022 5:26 pm
Full name: Conor Anstey

Re: Recent Stockfish Development Version (Jan26)

Post by Ciekce »

ernest wrote: Wed Feb 07, 2024 3:48 am I wonder, why did they allow that STC Elo -6,39 ???
because it gained at longer time controls

that should be fairly obvious, no?
CornfedForever
Posts: 650
Joined: Mon Jun 20, 2022 4:08 am
Full name: Brian D. Smith

Re: Recent Stockfish Development Version (Jan26)

Post by CornfedForever »

ernest wrote: Wed Feb 07, 2024 3:48 am From Stockfish Jan 26 (abrok/stockfish)
I wonder, why did they allow that STC Elo -6,39 ???
While I am certainly not the person to answer that....perhaps because it was so beneficial for VLTC and VVLTC?

I mean, by it's very nature, STC is going to be more shallow and less precise than those two.

In any case, I've always said they are engaging in 'wishcraft' more often than they would like to admit - throwing this and that at the wall, hoping a quarter of an elo sticks, taking something out, putting something (back) in.

That's not to say it's a stupid approach of course. Ponderous for sure; but it gets them from point a to b to c...sometimes with a half a step back and sometimes, like a snail, they get lucky and stumble down the slopes a bit quicker.
Whiskers
Posts: 246
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: Recent Stockfish Development Version (Jan26)

Post by Whiskers »

ernest wrote: Wed Feb 07, 2024 3:48 am From Stockfish Jan 26 (abrok/stockfish)
I wonder, why did they allow that STC Elo -6,39 ???
---------------------------
Author: Michael Chaly
Date: Fri Jan 26 20:55:16 2024 +0100
Timestamp: 1706298916

Do more double extensions

Parameter tweak from Black Marlin chess engine. Choose a significantly
lower value that triggers in 95% of cases, compared to the usual 84% in
standard benchmark runs.

Since the introduction by
https://github.com/official-stockfish/S ... ba36aca92e
this constant has only decreased in value over time.
2-16-17-18-21-22-25-26-52-71-75-93-140

Failed STC really fast:
https://tests.stockfishchess.org/tests/ ... 0db026df7b
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 13216 W: 3242 L: 3485 D: 6489 Elo -6.39
Ptnml(0-2): 50, 1682, 3371, 1471, 34

Was reasonable at LTC:

https://tests.stockfishchess.org/tests/ ... 0db026e210
Elo: 1.18 ± 1.5 (95%) LOS: 94.3%
Total: 50000 W: 12517 L: 12347 D: 25136 Elo +1.18
Ptnml(0-2): 31, 5598, 13579, 5754, 38
nElo: 2.45 ± 3.0 (95%) PairsRatio: 1.03

The stockfish devs care way more about VLTC strength than STC because even the CCRL blitz time control(120" + 1.2") is considered a "very long time control" by testing standards, and pretty much all important chess engine tournaments are played at longer time controls than that (except for CCC blitz, which is "merely" LTC length, but the huge amount of threads more than makes up for it).

As for home analysis, a 120" + 1.2" time control is usually not going to correspond to more than 10 seconds spent on a particular move. How long do you let Stockfish think when you're analyzing at home? Probably closer to a minute. So home analysis is also "VLTC time control".

In general VLTC is a much more useful time control to us than STC, so it makes sense to sacrifice some STC elo in the name of VLTC elo.
Ciekce
Posts: 197
Joined: Sun Oct 30, 2022 5:26 pm
Full name: Conor Anstey

Re: Recent Stockfish Development Version (Jan26)

Post by Ciekce »

CornfedForever wrote: Wed Feb 07, 2024 5:33 am In any case, I've always said they are engaging in 'wishcraft' more often than they would like to admit - throwing this and that at the wall, hoping a quarter of an elo sticks, taking something out, putting something (back) in.
I really do not get the need that people often seem to have, to comment on the SF development process as if they know better.
Jouni
Posts: 3741
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: Recent Stockfish Development Version (Jan26)

Post by Jouni »

26.1. was regression in NCM, Pohl and MCERL lists. Testing below 180 + 1,8 soon useless?
Jouni
ernest
Posts: 2053
Joined: Wed Mar 08, 2006 8:30 pm

Re: Recent Stockfish Development Version (Jan26)

Post by ernest »

Jouni wrote: Wed Feb 07, 2024 2:36 pm 26.1. was regression in NCM, Pohl and MCERL lists. Testing below 180 + 1,8 soon useless?
Thanks, Jouni !
That was really the reason for my question... 8-)
CornfedForever
Posts: 650
Joined: Mon Jun 20, 2022 4:08 am
Full name: Brian D. Smith

Re: Recent Stockfish Development Version (Jan26)

Post by CornfedForever »

Ciekce wrote: Wed Feb 07, 2024 1:43 pm
CornfedForever wrote: Wed Feb 07, 2024 5:33 am In any case, I've always said they are engaging in 'wishcraft' more often than they would like to admit - throwing this and that at the wall, hoping a quarter of an elo sticks, taking something out, putting something (back) in.
I really do not get the need that people often seem to have, to comment on the SF development process as if they know better.
Perhaps you should have read the REST of my comment? The OP was asking about the loss of elo at STC. elo comes...and...it goes with these developmental versions...it's in the nature of the testing. You can't argue otherwise. What's important is that the progress continues to slope up in the long run.
Ciekce
Posts: 197
Joined: Sun Oct 30, 2022 5:26 pm
Full name: Conor Anstey

Re: Recent Stockfish Development Version (Jan26)

Post by Ciekce »

CornfedForever wrote: Thu Feb 08, 2024 7:05 am Perhaps you should have read the REST of my comment? The OP was asking about the loss of elo at STC. elo comes...and...it goes with these developmental versions...it's in the nature of the testing. You can't argue otherwise. What's important is that the progress continues to slope up in the long run.
I did read the rest of your comment. It wasn't relevant to the point I was making, so I didn't quote it.
Jouni wrote: Wed Feb 07, 2024 2:36 pm 26.1. was regression in NCM, Pohl and MCERL lists. Testing below 180 + 1,8 soon useless?
as you've been told an *embarrassing* amount now, you are taking noise as a regression and need to learn what an error bar is, because apparently it's not within anyone's power to teach you
Jouni
Posts: 3741
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: Recent Stockfish Development Version (Jan26)

Post by Jouni »

11.2. version has confirmed -4 elo regression now. In discord big discussion which patch is causing it. One quess "triple ext regresses"?
Jouni