The Stockfish ELO problem

AndrewGrant · Post by **AndrewGrant** » Sun Aug 07, 2022 4:17 am

dkappe wrote: ↑Sun Aug 07, 2022 4:16 am
AndrewGrant wrote: ↑Sun Aug 07, 2022 2:39 am
dkappe wrote: ↑Sun Aug 07, 2022 2:38 am
AndrewGrant wrote: ↑Sun Aug 07, 2022 2:33 am I do mean statistical. If you test SF/Komodo against the pool of { Stockfish, Komodo, Houdini, Sugar, Shashchess, Fat Fritz II, Fire, Ethereal, Leela, Berserk, Koivisto }, that pool is heavily skewed towards a Stockfish engine. Which means any result could be a result of a particular ability or inability to play against Stockfish. Competing hypothesis for the results seen.

I don't mean ethical, which is why I posted a list of engines above and a reader can make their own determination how much of the pool is Stockfish.
I’d be curious to see your tests on the similarity of SF and FF2. Any pgn’s you could share?
Much easier to derive similarity from knowing its the same code, than from looking at PGNs. That can be left as an exercise for the reader as well.
You don’t have any evidence? By that logic, Ethereal would never improve as it’s Nets improved as the Engine code remained mostly the same.

Extremely daft take even for you. Muting thread.

dkappe · Post by **dkappe** » Sun Aug 07, 2022 4:22 am

Andy,

When logic deserts you you resort to insults. Nice.

Rebel · Post by **Rebel** » Sun Aug 07, 2022 9:20 am

xr_a_y wrote: ↑Sat Aug 06, 2022 10:42 pm If i'm not mistaken, I think here : https://tcec-chess.com/bayeselo.txt
is various 32 threads nodes limited SF.
SF seems to scale well with nodes limites (and thus with TC).

Don't know how this blend with your analysis ?

CCRL and CEGT rely on normal openings, TCEC does not. Other examples using unusual openings are the lists of Stefan Pohl and my own GRL. All 3 don't show the CCRL / CEGT pattern and have no problem to show significant elo progress, example. Unusual openings favor Stockfish search.

Modern Times · Post by **Modern Times** » Sun Aug 07, 2022 9:23 am

Rebel wrote: ↑Sun Aug 07, 2022 9:20 am
CCRL and CEGT rely on normal openings, TCEC does not. Other examples using unusual openings are the lists of Stefan Pohl and my own GRL. All 3 don't show the CCRL / CEGT pattern and have no problem to show significant elo progress,

Yeah, but is that real ?

Rebel · Post by **Rebel** » Sun Aug 07, 2022 9:44 am

AndrewGrant wrote: ↑Sun Aug 07, 2022 12:43 am I think this analysis is bunk because I don't trust the samples from CCRL and CEGT,

There is nothing wrong with the rating lists, the test shows how bad SF scales with increasing time control and multiple cores contrary to Komodo.

There is more to scaling, wrote a tool that creates a statistic of changed best move during the last 5 iterations.

GUI : Arena using [F4]
SF15 vs Dragon 2.5
Threads : 10
TC : 40/120
Games : 10

Changing best move during the last 5 iterations

SF15 vs Komodo Dragon 2.5

Code: Select all

    IT  SF15 perc  Komo perc
    4   229   22%   529  52%
    3    91    9%   198  19%
    2    44    4%    57   5%
    1     2    0%     8   0%
    0     0    0%     0   0%
    Tot 366         792

Quite some difference.

Rebel · Post by **Rebel** » Sun Aug 07, 2022 9:54 am

Modern Times wrote: ↑Sun Aug 07, 2022 9:23 am
Rebel wrote: ↑Sun Aug 07, 2022 9:20 am
CCRL and CEGT rely on normal openings, TCEC does not. Other examples using unusual openings are the lists of Stefan Pohl and my own GRL. All 3 don't show the CCRL / CEGT pattern and have no problem to show significant elo progress,
Yeah, but is that real ?

Look here and scroll to: CCRL vs GRL a comparison -> observation-1

Stockfish and Komodo massively profited from the gambit openings.

Raphexon · Post by **Raphexon** » Sun Aug 07, 2022 12:39 pm

Rebel wrote: ↑Sun Aug 07, 2022 9:44 am
AndrewGrant wrote: ↑Sun Aug 07, 2022 12:43 am I think this analysis is bunk because I don't trust the samples from CCRL and CEGT,
There is nothing wrong with the rating lists, the test shows how bad SF scales with increasing time control and multiple cores contrary to Komodo.

There is more to scaling, wrote a tool that creates a statistic of changed best move during the last 5 iterations.

GUI : Arena using [F4]
SF15 vs Dragon 2.5
Threads : 10
TC : 40/120
Games : 10

Changing best move during the last 5 iterations

SF15 vs Komodo Dragon 2.5
Code: Select all
    IT  SF15 perc  Komo perc
    4   229   22%   529  52%
    3    91    9%   198  19%
    2    44    4%    57   5%
    1     2    0%     8   0%
    0     0    0%     0   0%
    Tot 366         792
Quite some difference.

Can't really change best move if SF already started with the best move...

Modern Times · Post by **Modern Times** » Sun Aug 07, 2022 12:44 pm

Raphexon wrote: ↑Sun Aug 07, 2022 12:39 pm Can't really change best move if SF already started with the best move...

Yes, there is another way of looking at it - either

- Stockfish scales badly, or
- Stockfish is incredibly good on low core counts and short time controls

Damir · Post by **Damir** » Sun Aug 07, 2022 12:53 pm

it is what I have been saying all along. SF team are busy on improving SF on low cores instead of focussing on improving SF on bigger cores say like 16/32/64 cores. As Ed say improving the scalling would play a big part and add additional elo to SF...

If I were SF team I would completely abandon on improving SF on low cores but this of course is just a wishful thinking...

Rebel · Post by **Rebel** » Sun Aug 07, 2022 2:29 pm

Modern Times wrote: ↑Sun Aug 07, 2022 12:44 pm
Raphexon wrote: ↑Sun Aug 07, 2022 12:39 pm Can't really change best move if SF already started with the best move...
Yes, there is another way of looking at it - either

- Stockfish scales badly, or
- Stockfish is incredibly good on low core counts and short time controls

Not or, but and.

It's main strength is its search, pruning and reductions.

Since pruning and reductions are never perfect they may decide at a given depth to start pruning less, it might help to find better moves sooner. The reason is that pruning and reductions not only prune the tree but also prune the chess knowledge you worked so hard on, whether HCE or NNUE.

The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem

Re: The Stockfish ELO problem