I know this possibility has been brought up since the release of SF14.1, but with CCRL and CEGT results out, it's now looking more like a certainty. It's not too surprising, either, given the odd buggy behavior where SF struggles to win while a queen up(!) and I wonder how much of this disappointing regression comes from that flaw alone.
Still, one also has to wonder about the testing procedures that the SF framework is using. It all seems to be geared towards quantity (of games) rather than quality, or else something like this wouldn't have been released so quickly.
If I were the Komodo developers, I would be all smiles. It seems that commercial engines aren't dead after all!
You seem to have misinterpreted CCRL's results. After 697 SF 14.1 games, CCRL Blitz has SF 14.1 at 3684 elo (while SF 14 is at 3656 Elo on the CCRL Blitz list). Even with 697 games, the CI remains too wide to really say anything with confidence, though it does seem likely SF 14.1 is a fair bit stronger. CCRL 40/15 places SF 14.1 at 3503 Elo after only 330 games and SF 14 at 3505 Elo. Suggesting a -2 Elo regression when the 95% CI is at +/-30 is ridiculous.
Please try to do more research prior to posting misinformation.
40/4
Stockfish 14.1 NNUE x64 1CPU = ca. ELO 3685 out of 2000 games (+1 / +27)
Stockfish 20211021 NNUE x64 1CPU = ca. ELO 3684 out of 2000 games
Stockfish 14.0 NNUE x64 1CPU = ELO 3658 out of 3500 games
connor_mcmonigle wrote: ↑Sun Nov 07, 2021 10:34 pm
You seem to have misinterpreted CCRL's results. After 697 SF 14.1 games, CCRL Blitz has SF 14.1 at 3684 elo (while SF 14 is at 3656 Elo on the CCRL Blitz list). Even with 697 games, the CI remains too wide to really say anything with confidence, though it does seem likely SF 14.1 is a fair bit stronger. CCRL 40/15 places SF 14.1 at 3503 Elo after only 330 games and SF 14 at 3505 Elo. Suggesting a -2 Elo regression when the 95% CI is at +/-30 is ridiculous.
Please try to do more research prior to posting misinformation.
A regression is suggested by the numbers as well as the engine's behavior. I'm only calling it like I see it.
Of course, it could be a slight and negligible regression in the end, or maybe none at all at faster time controls.
I'm more interested in LTC, anyway, and those are the results I checked.
We are so quick-triggered these days to label things as 'misinformation'. Let's see if the claim is wrong first!
CCRL only requires 200 LTC games to make a rating 'official' and 330 is much more than that, while CEGT has even more games.
connor_mcmonigle wrote: ↑Sun Nov 07, 2021 10:34 pm
You seem to have misinterpreted CCRL's results. After 697 SF 14.1 games, CCRL Blitz has SF 14.1 at 3684 elo (while SF 14 is at 3656 Elo on the CCRL Blitz list). Even with 697 games, the CI remains too wide to really say anything with confidence, though it does seem likely SF 14.1 is a fair bit stronger. CCRL 40/15 places SF 14.1 at 3503 Elo after only 330 games and SF 14 at 3505 Elo. Suggesting a -2 Elo regression when the 95% CI is at +/-30 is ridiculous.
Please try to do more research prior to posting misinformation.
A regression is suggested by the numbers as well as the engine's behavior. I'm only calling it like I see it.
Of course, it could be a slight and negligible regression in the end, or maybe none at all at faster time controls.
I'm more interested in LTC, anyway, and those are the results I checked. [Edit: I should have made that clear in the original post.]
We are so quick-triggered these days to label things as 'misinformation'. Let's see if the claim is wrong first!
CCRL only requires 200 LTC games to make a rating 'official' and 330 is much more than that, while CEGT has even more games.
carldaman wrote: ↑Mon Nov 08, 2021 12:42 amWe are so quick-triggered these days to label things as 'misinformation'. Let's see if the claim is wrong first!
Well, you made wildly exaggerated claims that appear to have no basis in reality. That sounds like misinformation to me ¯\_(ツ)_/¯ .
"Hey now, I can't prove my hyperbolic bullshit, which is objectively almost certainly wrong, but there's a vanishingly small chance I'm right, so let's just wait and see you trigger happy cancelbots!!!!"
SF14.1 appears slightly lower rated at LTC on two rating lists and is exhibiting buggy behavior on top of that.
It's not unwarranted to call that out. It doesn't sound like wild claims at all to me.
We should monitor the situation to see how things develop. Where there's smoke, there's also often fire.
Being oblivious to such things is ostrich-like behavior, imo.
carldaman wrote: ↑Mon Nov 08, 2021 1:13 am
SF14.1 appears slightly lower rated at LTC on two rating lists and is exhibiting buggy behavior on top of that.
It's not unwarranted to call that out. It doesn't sound like wild claims at all to me.
We should monitor the situation to see how things develop. Where there's smoke, there's also often fire.
Being oblivious to such things is ostrich-like behavior, imo.
CEGT's results show -5 elo with a +/-16 elo CI at LTC. Therefore, we can't come close to claiming a regression given that data. Statistics doesn't care how many games CCRL/CEGT considers sufficient for assigning a rating to an engine. Therefore, the following quote constitutes misinformation in my opinion:
I know this possibility has been brought up since the release of SF14.1, but with CCRL and CEGT results out, it's now looking more like a certainty.