The Stockfish ELO problem

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

AndrewGrant
Posts: 1963
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: The Stockfish ELO problem

Post by AndrewGrant »

dkappe wrote: Sun Aug 07, 2022 4:16 am
AndrewGrant wrote: Sun Aug 07, 2022 2:39 am
dkappe wrote: Sun Aug 07, 2022 2:38 am
AndrewGrant wrote: Sun Aug 07, 2022 2:33 am I do mean statistical. If you test SF/Komodo against the pool of { Stockfish, Komodo, Houdini, Sugar, Shashchess, Fat Fritz II, Fire, Ethereal, Leela, Berserk, Koivisto }, that pool is heavily skewed towards a Stockfish engine. Which means any result could be a result of a particular ability or inability to play against Stockfish. Competing hypothesis for the results seen.

I don't mean ethical, which is why I posted a list of engines above and a reader can make their own determination how much of the pool is Stockfish.
I’d be curious to see your tests on the similarity of SF and FF2. Any pgn’s you could share?
Much easier to derive similarity from knowing its the same code, than from looking at PGNs. That can be left as an exercise for the reader as well.
You don’t have any evidence? By that logic, Ethereal would never improve as it’s Nets improved as the Engine code remained mostly the same.
Extremely daft take even for you. Muting thread.
dkappe
Posts: 1632
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

Re: The Stockfish ELO problem

Post by dkappe »

Andy,

When logic deserts you you resort to insults. Nice.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
User avatar
Rebel
Posts: 7435
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: The Stockfish ELO problem

Post by Rebel »

xr_a_y wrote: Sat Aug 06, 2022 10:42 pm If i'm not mistaken, I think here : https://tcec-chess.com/bayeselo.txt
is various 32 threads nodes limited SF.
SF seems to scale well with nodes limites (and thus with TC).

Don't know how this blend with your analysis ?
CCRL and CEGT rely on normal openings, TCEC does not. Other examples using unusual openings are the lists of Stefan Pohl and my own GRL. All 3 don't show the CCRL / CEGT pattern and have no problem to show significant elo progress, example. Unusual openings favor Stockfish search.
90% of coding is debugging, the other 10% is writing bugs.
Modern Times
Posts: 3784
Joined: Thu Jun 07, 2012 11:02 pm

Re: The Stockfish ELO problem

Post by Modern Times »

Rebel wrote: Sun Aug 07, 2022 9:20 am
CCRL and CEGT rely on normal openings, TCEC does not. Other examples using unusual openings are the lists of Stefan Pohl and my own GRL. All 3 don't show the CCRL / CEGT pattern and have no problem to show significant elo progress,
Yeah, but is that real ?
User avatar
Rebel
Posts: 7435
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: The Stockfish ELO problem

Post by Rebel »

AndrewGrant wrote: Sun Aug 07, 2022 12:43 am I think this analysis is bunk because I don't trust the samples from CCRL and CEGT,
There is nothing wrong with the rating lists, the test shows how bad SF scales with increasing time control and multiple cores contrary to Komodo.

There is more to scaling, wrote a tool that creates a statistic of changed best move during the last 5 iterations.

GUI : Arena using [F4]
SF15 vs Dragon 2.5
Threads : 10
TC : 40/120
Games : 10

Changing best move during the last 5 iterations

SF15 vs Komodo Dragon 2.5

Code: Select all

    IT  SF15 perc  Komo perc
    4   229   22%   529  52%
    3    91    9%   198  19%
    2    44    4%    57   5%
    1     2    0%     8   0%
    0     0    0%     0   0%
    Tot 366         792
Quite some difference.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 7435
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: The Stockfish ELO problem

Post by Rebel »

Modern Times wrote: Sun Aug 07, 2022 9:23 am
Rebel wrote: Sun Aug 07, 2022 9:20 am
CCRL and CEGT rely on normal openings, TCEC does not. Other examples using unusual openings are the lists of Stefan Pohl and my own GRL. All 3 don't show the CCRL / CEGT pattern and have no problem to show significant elo progress,
Yeah, but is that real ?
Look here and scroll to: CCRL vs GRL a comparison -> observation-1

Stockfish and Komodo massively profited from the gambit openings.
90% of coding is debugging, the other 10% is writing bugs.
Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: The Stockfish ELO problem

Post by Raphexon »

Rebel wrote: Sun Aug 07, 2022 9:44 am
AndrewGrant wrote: Sun Aug 07, 2022 12:43 am I think this analysis is bunk because I don't trust the samples from CCRL and CEGT,
There is nothing wrong with the rating lists, the test shows how bad SF scales with increasing time control and multiple cores contrary to Komodo.

There is more to scaling, wrote a tool that creates a statistic of changed best move during the last 5 iterations.

GUI : Arena using [F4]
SF15 vs Dragon 2.5
Threads : 10
TC : 40/120
Games : 10

Changing best move during the last 5 iterations

SF15 vs Komodo Dragon 2.5

Code: Select all

    IT  SF15 perc  Komo perc
    4   229   22%   529  52%
    3    91    9%   198  19%
    2    44    4%    57   5%
    1     2    0%     8   0%
    0     0    0%     0   0%
    Tot 366         792
Quite some difference.
Can't really change best move if SF already started with the best move...
Modern Times
Posts: 3784
Joined: Thu Jun 07, 2012 11:02 pm

Re: The Stockfish ELO problem

Post by Modern Times »

Raphexon wrote: Sun Aug 07, 2022 12:39 pm Can't really change best move if SF already started with the best move...
Yes, there is another way of looking at it - either

- Stockfish scales badly, or
- Stockfish is incredibly good on low core counts and short time controls
Damir
Posts: 2905
Joined: Mon Feb 11, 2008 3:53 pm
Location: Denmark
Full name: Damir Desevac

Re: The Stockfish ELO problem

Post by Damir »

it is what I have been saying all along. SF team are busy on improving SF on low cores instead of focussing on improving SF on bigger cores say like 16/32/64 cores. As Ed say improving the scalling would play a big part and add additional elo to SF...

If I were SF team I would completely abandon on improving SF on low cores but this of course is just a wishful thinking... :) :)
User avatar
Rebel
Posts: 7435
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: The Stockfish ELO problem

Post by Rebel »

Modern Times wrote: Sun Aug 07, 2022 12:44 pm
Raphexon wrote: Sun Aug 07, 2022 12:39 pm Can't really change best move if SF already started with the best move...
Yes, there is another way of looking at it - either

- Stockfish scales badly, or
- Stockfish is incredibly good on low core counts and short time controls
Not or, but and.

It's main strength is its search, pruning and reductions.

Since pruning and reductions are never perfect they may decide at a given depth to start pruning less, it might help to find better moves sooner. The reason is that pruning and reductions not only prune the tree but also prune the chess knowledge you worked so hard on, whether HCE or NNUE.
90% of coding is debugging, the other 10% is writing bugs.