Stockfish 14.1 regression

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

carldaman
Posts: 2287
Joined: Sat Jun 02, 2012 2:13 am

Stockfish 14.1 regression

Post by carldaman »

I know this possibility has been brought up since the release of SF14.1, but with CCRL and CEGT results out, it's now looking more like a certainty. It's not too surprising, either, given the odd buggy behavior where SF struggles to win while a queen up(!) and I wonder how much of this disappointing regression comes from that flaw alone.

Still, one also has to wonder about the testing procedures that the SF framework is using. It all seems to be geared towards quantity (of games) rather than quality, or else something like this wouldn't have been released so quickly.

If I were the Komodo developers, I would be all smiles. :) It seems that commercial engines aren't dead after all!
connor_mcmonigle
Posts: 544
Joined: Sun Sep 06, 2020 4:40 am
Full name: Connor McMonigle

Re: Stockfish 14.1 regression

Post by connor_mcmonigle »

You seem to have misinterpreted CCRL's results. After 697 SF 14.1 games, CCRL Blitz has SF 14.1 at 3684 elo (while SF 14 is at 3656 Elo on the CCRL Blitz list). Even with 697 games, the CI remains too wide to really say anything with confidence, though it does seem likely SF 14.1 is a fair bit stronger. CCRL 40/15 places SF 14.1 at 3503 Elo after only 330 games and SF 14 at 3505 Elo. Suggesting a -2 Elo regression when the 95% CI is at +/-30 is ridiculous.

Please try to do more research prior to posting misinformation.
User avatar
Werner
Posts: 3011
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: Stockfish 14.1 regression

Post by Werner »

CEGT results
40/20
2 Stockfish 14.0NNUE x64 1CPU 3583 14 14 1921 65.4% 3468 67.0%
3 Stockfish 14.1NNUE x64 1CPU 3578 16 16 1161 60.8% 3498 76.1%

40/4
Stockfish 14.1 NNUE x64 1CPU = ca. ELO 3685 out of 2000 games (+1 / +27)
Stockfish 20211021 NNUE x64 1CPU = ca. ELO 3684 out of 2000 games
Stockfish 14.0 NNUE x64 1CPU = ELO 3658 out of 3500 games
Werner
carldaman
Posts: 2287
Joined: Sat Jun 02, 2012 2:13 am

Re: Stockfish 14.1 regression

Post by carldaman »

connor_mcmonigle wrote: Sun Nov 07, 2021 10:34 pm You seem to have misinterpreted CCRL's results. After 697 SF 14.1 games, CCRL Blitz has SF 14.1 at 3684 elo (while SF 14 is at 3656 Elo on the CCRL Blitz list). Even with 697 games, the CI remains too wide to really say anything with confidence, though it does seem likely SF 14.1 is a fair bit stronger. CCRL 40/15 places SF 14.1 at 3503 Elo after only 330 games and SF 14 at 3505 Elo. Suggesting a -2 Elo regression when the 95% CI is at +/-30 is ridiculous.

Please try to do more research prior to posting misinformation.
A regression is suggested by the numbers as well as the engine's behavior. I'm only calling it like I see it.
Of course, it could be a slight and negligible regression in the end, or maybe none at all at faster time controls.
I'm more interested in LTC, anyway, and those are the results I checked.

We are so quick-triggered these days to label things as 'misinformation'. Let's see if the claim is wrong first!
CCRL only requires 200 LTC games to make a rating 'official' and 330 is much more than that, while CEGT has even more games.
carldaman
Posts: 2287
Joined: Sat Jun 02, 2012 2:13 am

Re: Stockfish 14.1 regression

Post by carldaman »

carldaman wrote: Mon Nov 08, 2021 12:42 am
connor_mcmonigle wrote: Sun Nov 07, 2021 10:34 pm You seem to have misinterpreted CCRL's results. After 697 SF 14.1 games, CCRL Blitz has SF 14.1 at 3684 elo (while SF 14 is at 3656 Elo on the CCRL Blitz list). Even with 697 games, the CI remains too wide to really say anything with confidence, though it does seem likely SF 14.1 is a fair bit stronger. CCRL 40/15 places SF 14.1 at 3503 Elo after only 330 games and SF 14 at 3505 Elo. Suggesting a -2 Elo regression when the 95% CI is at +/-30 is ridiculous.

Please try to do more research prior to posting misinformation.
A regression is suggested by the numbers as well as the engine's behavior. I'm only calling it like I see it.
Of course, it could be a slight and negligible regression in the end, or maybe none at all at faster time controls.
I'm more interested in LTC, anyway, and those are the results I checked. [Edit: I should have made that clear in the original post.]

We are so quick-triggered these days to label things as 'misinformation'. Let's see if the claim is wrong first!
CCRL only requires 200 LTC games to make a rating 'official' and 330 is much more than that, while CEGT has even more games.
DrCliche
Posts: 65
Joined: Sun Aug 19, 2018 10:57 pm
Full name: Nickolas Reynolds

Re: Stockfish 14.1 regression

Post by DrCliche »

carldaman wrote: Mon Nov 08, 2021 12:42 amWe are so quick-triggered these days to label things as 'misinformation'. Let's see if the claim is wrong first!

Well, you made wildly exaggerated claims that appear to have no basis in reality. That sounds like misinformation to me ¯\_(ツ)_/¯ .

"Hey now, I can't prove my hyperbolic bullshit, which is objectively almost certainly wrong, but there's a vanishingly small chance I'm right, so let's just wait and see you trigger happy cancelbots!!!!"

:roll:
carldaman
Posts: 2287
Joined: Sat Jun 02, 2012 2:13 am

Re: Stockfish 14.1 regression

Post by carldaman »

SF14.1 appears slightly lower rated at LTC on two rating lists and is exhibiting buggy behavior on top of that.
It's not unwarranted to call that out. It doesn't sound like wild claims at all to me.
We should monitor the situation to see how things develop. Where there's smoke, there's also often fire. :)

Being oblivious to such things is ostrich-like behavior, imo. :|
User avatar
Graham Banks
Posts: 45119
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Stockfish 14.1 regression

Post by Graham Banks »

What concerns me (and it could well change) is that after 196 games for the 40/15 list, SF 14.1 64-bit 4CPU is -29 Elo to SF14 64-bit 4CPU.

Yes- the engine is set up correctly, and it's running on the 5950x.
gbanksnz at gmail.com
carldaman
Posts: 2287
Joined: Sat Jun 02, 2012 2:13 am

Re: Stockfish 14.1 regression

Post by carldaman »

Thanks for that info, Graham. :)


Also, I came across this interesting recent post from fishcooking, about regressive patches:

https://groups.google.com/g/fishcooking/c/ucZK0gAGJ68
connor_mcmonigle
Posts: 544
Joined: Sun Sep 06, 2020 4:40 am
Full name: Connor McMonigle

Re: Stockfish 14.1 regression

Post by connor_mcmonigle »

carldaman wrote: Mon Nov 08, 2021 1:13 am SF14.1 appears slightly lower rated at LTC on two rating lists and is exhibiting buggy behavior on top of that.
It's not unwarranted to call that out. It doesn't sound like wild claims at all to me.
We should monitor the situation to see how things develop. Where there's smoke, there's also often fire. :)

Being oblivious to such things is ostrich-like behavior, imo. :|
CEGT's results show -5 elo with a +/-16 elo CI at LTC. Therefore, we can't come close to claiming a regression given that data. Statistics doesn't care how many games CCRL/CEGT considers sufficient for assigning a rating to an engine. Therefore, the following quote constitutes misinformation in my opinion:
I know this possibility has been brought up since the release of SF14.1, but with CCRL and CEGT results out, it's now looking more like a certainty.
It's far from a certainty.