The Stockfish ELO problem

Modern Times · Post by **Modern Times** » Sun Aug 07, 2022 2:39 pm

Rebel wrote: ↑Sun Aug 07, 2022 2:29 pm Not or, but and.

Yes, some of both.

Sopel · Post by **Sopel** » Sun Aug 07, 2022 2:49 pm

Rebel wrote: ↑Sat Aug 06, 2022 10:04 pm Some remarks
1. Komodo scales extremely well (+56,+61,+59).
2. SF15 went down from +82 to +26 (last SF run, equals 40m/26m about CCRL 40/15 | CEGT 40/20 at 2CPU).
3. SF13 went up from -100 to -19.
4. Draw rate last SF run 91.7% but SF15 never lost a game.

That's very good news. An oracle engine doesn't scale, so Stockfish is closer to a perfect engine compared to komodo.

smatovic · Post by **smatovic** » Sun Aug 07, 2022 4:03 pm

Modern Times wrote: ↑Sun Aug 07, 2022 12:44 pm Yes, there is another way of looking at it - either

- Stockfish scales badly, or
- Stockfish is incredibly good on low core counts and short time controls

Sopel wrote: ↑Sun Aug 07, 2022 2:49 pm That's very good news. An oracle engine doesn't scale, so Stockfish is closer to a perfect engine compared to komodo.

+1

--
Srdja

Werewolf · Post by **Werewolf** » Sun Aug 07, 2022 4:08 pm

Could this work as a test method:

Compile a list of very hard, but solvable, test positions.

Compare time to solve for different engines on 1,2,4,8,16,32,64 cores?

MonteCarlo · Post by **MonteCarlo** » Sun Aug 07, 2022 8:15 pm

Yeah, everything I've seen so far is consistent with the hypothesis that SF just starts out stronger rather than that something is amiss (as has been mentioned already, an imaginary perfect engine wouldn't scale at all).

If K really did just scale that much better than SF, then one would expect that there's some level with sufficently high core count and slow TC that K starts outscoring SF (rather than just approaching it as draw rate increases), and I've not seen this yet.

That's not to say that SF couldn't improve in this regard; I just haven't seen compelling evidence that this is an SF problem rather than a "nature of very high level chess" problem.

Cheers!

Rebel · Post by **Rebel** » Sun Aug 07, 2022 8:21 pm

Sopel wrote: ↑Sun Aug 07, 2022 2:49 pm
Rebel wrote: ↑Sat Aug 06, 2022 10:04 pm Some remarks
1. Komodo scales extremely well (+56,+61,+59).
2. SF15 went down from +82 to +26 (last SF run, equals 40m/26m about CCRL 40/15 | CEGT 40/20 at 2CPU).
3. SF13 went up from -100 to -19.
4. Draw rate last SF run 91.7% but SF15 never lost a game.
That's very good news. An oracle engine doesn't scale, so Stockfish is closer to a perfect engine compared to komodo.

Meaning at increasing time control and more threads Komodo can catch up and overtake you? Oh wait, it already happened

Sopel · Post by **Sopel** » Sun Aug 07, 2022 9:43 pm

Rebel wrote: ↑Sun Aug 07, 2022 8:21 pm
Sopel wrote: ↑Sun Aug 07, 2022 2:49 pm
Rebel wrote: ↑Sat Aug 06, 2022 10:04 pm Some remarks
1. Komodo scales extremely well (+56,+61,+59).
2. SF15 went down from +82 to +26 (last SF run, equals 40m/26m about CCRL 40/15 | CEGT 40/20 at 2CPU).
3. SF13 went up from -100 to -19.
4. Draw rate last SF run 91.7% but SF15 never lost a game.
That's very good news. An oracle engine doesn't scale, so Stockfish is closer to a perfect engine compared to komodo.

Meaning at increasing time control and more threads Komodo can catch up and overtake you? Oh wait, it already happened

You can come up at any result with flawed enough methodology. This has the same issues as CCRL.

Rebel · Post by **Rebel** » Sun Aug 07, 2022 10:00 pm

Sopel wrote: ↑Sun Aug 07, 2022 9:43 pm
Rebel wrote: ↑Sun Aug 07, 2022 8:21 pm
Sopel wrote: ↑Sun Aug 07, 2022 2:49 pm
Rebel wrote: ↑Sat Aug 06, 2022 10:04 pm Some remarks
1. Komodo scales extremely well (+56,+61,+59).
2. SF15 went down from +82 to +26 (last SF run, equals 40m/26m about CCRL 40/15 | CEGT 40/20 at 2CPU).
3. SF13 went up from -100 to -19.
4. Draw rate last SF run 91.7% but SF15 never lost a game.
That's very good news. An oracle engine doesn't scale, so Stockfish is closer to a perfect engine compared to komodo.

Meaning at increasing time control and more threads Komodo can catch up and overtake you? Oh wait, it already happened
You can come up at any result with flawed enough methodology. This has the same issues as CCRL.

Calling something "flawed" without describing what is flawed is empty rhetoric. Instead (I think) it would be wise to put some energy in long time control with many cores. You have the hardware for it.

RubiChess · Post by **RubiChess** » Sun Aug 07, 2022 10:02 pm

Sopel wrote: ↑Sun Aug 07, 2022 9:43 pm
Rebel wrote: ↑Sun Aug 07, 2022 8:21 pm
Sopel wrote: ↑Sun Aug 07, 2022 2:49 pm
Rebel wrote: ↑Sat Aug 06, 2022 10:04 pm Some remarks
1. Komodo scales extremely well (+56,+61,+59).
2. SF15 went down from +82 to +26 (last SF run, equals 40m/26m about CCRL 40/15 | CEGT 40/20 at 2CPU).
3. SF13 went up from -100 to -19.
4. Draw rate last SF run 91.7% but SF15 never lost a game.
That's very good news. An oracle engine doesn't scale, so Stockfish is closer to a perfect engine compared to komodo.

Meaning at increasing time control and more threads Komodo can catch up and overtake you? Oh wait, it already happened
You can come up at any result with flawed enough methodology. This has the same issues as CCRL.

The main issue in this rating list seems that SF15/4threads wasn't tested, only 14.1. At least SF15/4CPU is not mentioned in http://www.cegt.net/40_40%20Rating%20Li ... liste.html

But as this list also uses moves/time control, I want to mention this https://github.com/official-stockfish/S ... ssues/4000 again.

Regards, Andreas

Wolfgang · Post by **Wolfgang** » Sun Aug 07, 2022 11:47 pm

RubiChess wrote: ↑Sun Aug 07, 2022 10:02 pm ...
The main issue in this rating list seems that SF15/4threads wasn't tested, only 14.1. At least SF15/4CPU is not mentioned in http://www.cegt.net/40_40%20Rating%20Li ... liste.html
...

Reason for that is that our main "40/20-4CPU" tester stopped testing.
But this will be made but takes some time

The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem