What is the strongest chess engine in the world? — A Reflection

supernova · Post by **supernova** » Thu Nov 14, 2024 9:40 pm

a_node_uncut wrote: ↑Thu Nov 14, 2024 4:26 am Over the past days I have invested much of my time in researching engine rating list/tournaments. I have since arrived at a disappointing conclusion:

Introduction
"The strongest chess engine in the world is Stockfish" is a statement echoed by virtually all chess players who consider themselves informed. But how valid is Stockfish's #1 spot, really? By what metric is Stockfish the best chess-playing program, and how objective is that metric? Are those players discussed really as informed as they say they are, or have they rather been deliberately misinformed?

Traditional Engine Testing
Ever since times of antiquity, chess had always been played starting from the standard starting position. Sure, the position may seem symmetrical and boring at a glance, but there exists a vast amount of theory, knowledge, and tactics. That, combined with the first-move advantage of white, creates a dynamic game with much imbalances. These starting positions have endured more centuries, remaining the cornerstone of the game even after the advent of chess engines.

It was only due to necessity and viewer engagement did engine v engine match organizers switch to pre-arranged openings. Even then, the book lines are limited in length, and highly reflects human opening repertoires, even at the topmost level. The early days of computer chess had been one of fierce competition, driven by dreams, motivations, and—perhaps most importantly—creativity.

Fishtest and the death of creativity
Shortly after the establishment of the Fishtest testing platform, Stockfish rose quickly through the ranks, eventually landing on the top of every rating list. The Fishtest platform is efficient and effective, but it also stifled creativity, much like the corporate culture that dominates the modern age. Despite cramming more Elo than ever into their engines, the understanding of how and why each heuristics work began to drastically fall. As someone who learned chess programming knowledge the more traditional way, many "tweaks" and "improvements" found in modern Stockfish code are not just difficult to understand, but completely opaque and incomprehensible. This weakening of theoretical basis came with a disastrous consequence.

The UHO Strategy
Stockfish's Elo rating plateaued, and by the release of Stockfish 16, progress had all but stalled. The testing system, being flawed and underinformed, began to crumble. Meanwhile, rival engines like Lc0 (developed by Alexander Lyashuk et al.), Ethereal (by Andrew Grant), and Berserk (by Jay Honnold) were rapidly closing the gap.

Amid this stagnation, the Stockfish team found a glimmer of hope: UHO (Unbalanced Human Openings). UHO forces the engine to defend suboptimal opening lines. Proponents of UHO claims that it makes viewing experience more fun, and helps reduce draw rate. However, UHO’s relevance to top-level chess is questionable. The openings it promotes are rarely seen in elite human play and are far less significant in terms of theory. More importantly, these offbeat openings place a greater emphasis on tactical sharpness rather than positional understanding. Only under these testing conditions can Stockfish retain a clear advantage, and the Stockfish team, of course, exploited it.

The Stockfish team quickly optimized their engines for UHO conditions. With aggressive and unprecedented tactics, they managed to pressure and manipulate most major tournaments to adopt UHO openings. One noticeable exception is the CCRL, whose operators (correctly) stuck to balanced, theory-rich openings. But the Stockfish team is quick to rally behind other rating lists, such as SP-CC, that are more amenable with their strategic interests. The Stockfish team also unleashed a massive campaign to align every other engine with their testing standards. Forums and online communities, overrun by keyboard warriors stubborn with their SPRT methodology, became major battlegrounds for this ideological shift. As more and more engines joined on the SPRT hype, the vast compute capacity of Fishtest became increasingly more advantageous.

What now?
The Stockfish mafia had largely taken over, but their power is not unlimited. For example, CCRL and CEGT shows that Stockfish and Torch are neck in neck. But a much more important asset for us is Talkchess. With effective and decisive moderation policies, Talkchess remains as one of the last online technical communities not yet overrun by Stockfish zealots. However, recently, certain proponents of the Stockfish team began to push for moderator elections, threatening to completely destroy what little we have left of Talkchess. Therefore, I propose adopting the following policies to overcome these difficulties:

Charter Amendment: Intolerance to different testing methodologies (SPRT bashing) should not be allowed.

Advocacy for alternative testing methods: The Talkcess community should advocate for alternative testing methodologies, to counter the Stockfish effort

Vetting moderator candidates by the FG (Founders' Group): People with significant biases should not be allowed moderator, even if Stockfish insiders would very much like them to.

Thank you for your time,
Max L

Hello,

It looks to me ChatGpt generated

:

Stockfish's Dominance Questioned: The article challenges the widespread belief that Stockfish is the strongest chess engine, suggesting that this perception may not be entirely objective. The claim is that the dominance of Stockfish is largely due to specific testing conditions and practices, rather than an inherent superiority in all aspects of chess engine performance.

Traditional vs. Modern Testing: It highlights a shift from traditional engine testing based on standard chess openings to more modern practices involving pre-arranged openings. The article implies that these changes may have impacted the way engines are evaluated, possibly favoring those optimized for such conditions.

Fishtest Platform: The Fishtest platform, used for testing and improving Stockfish, is critiqued for stifling creativity and for leading to a plateau in Stockfish's Elo rating. The article suggests that the focus on incremental improvements through tweaks has led to a lack of understanding of the underlying heuristics driving these changes.

UHO Strategy: The introduction and emphasis on UHO (Unbalanced Human Openings) is presented as a controversial strategy that Stockfish allegedly uses to maintain its competitive edge. The article argues that these openings are less relevant to elite human play and shift the focus from positional understanding to tactical sharpness.

Influence and Manipulation: There is an accusation that the Stockfish team has pressured major tournaments to adopt testing conditions favorable to their engine. The article claims that this manipulation extends to online communities and forums, where advocates for Stockfish's methodology are said to dominate discussions.

Resistance and Advocacy: The article concludes with a call to action for communities like Talkchess to resist the influence of Stockfish proponents and to advocate for diverse testing methodologies. It suggests that maintaining a variety of approaches is crucial to preserving the integrity and innovation in the field of computer chess.

Sources of Information:
The text does not explicitly cite specific sources for its claims, making it difficult to verify the accuracy of the statements. It appears to be based on observations and interpretations by the author, Max L. For a comprehensive understanding, one would need to refer to documented testing methodologies, tournament records, and discussions from established chess engine communities and forums.

Analysis:

Subjectivity and Bias: The tone suggests potential bias, as it positions the Stockfish team as a dominant force with undue influence. While it raises valid points about testing methodologies and the evolution of chess engines, the lack of concrete evidence or references weakens its arguments.

Complexity of Chess Engines: The development and testing of chess engines involve complex algorithms and significant computational resources. The article does not fully explore these technical aspects, which are essential for understanding the nuances of engine performance and ranking.

Need for Balanced View: While the article highlights the dominance of Stockfish and the associated controversies, a balanced view would also consider the contributions and innovations brought by the Stockfish team and other engines in advancing computer chess.

Uri Blass · Post by **Uri Blass** » Thu Nov 14, 2024 10:50 pm

jefk wrote: ↑Thu Nov 14, 2024 9:27 pm Uri B wrote
SSDF give every program their book and does not use UHO
but as you know with sufficient calculation time, the modern top Nnue engines can
perform without book (and perform even better than with using a lousy book)

But then also all games end in a draw, so using UHO or similar (Tcec openings) isn't so odd.

Except that for normal chess games it doesn't matter anymore which is the 'strongest '
engine (for eg. problem chess) as the top five engines all are good enough
(eg. for analyzing positions in human chess, or correspondence chess).

It is clear that no book is better than a bad book.
If some author of an engine choose a bad book it is their fault(they can have in the book only moves that at least are not worse than the choice of the engine with no book).

I know that with sufficient calculation time games end with draw but I still do not see 100% draws in the SSDF results except maybe the match between stockfish16 and lc0(result 20-20 and maybe 100% draws).

alfilete · Post by **alfilete** » Thu Nov 14, 2024 10:52 pm

I believe he is absolutely right; chess begins with the starting position—that's the first rule, knowing how to set up the pieces. Anything else isn't chess; it's something different. If we have to play 1,000 games to see a winner, then that's what we’ll have to do. Otherwise, the winner hasn't truly won a chess match.

Just my humble opinion. Best regards.

jefk · Post by **jefk** » Fri Nov 15, 2024 11:04 am

alfilete wrote

If some author of an engine choose a bad book it's their fault

and

I still do not see 100% draws in the SSDF results

well, maybe because of the bad books ?

Also, lower ranked engines will not always draw against each other; but sooner
or later, all top ranked engines will draw. You (Manuel R) seem to be a typical human chess
player, not a Iccf correspondence player, so for you the draw problem may not be so
apparent. But it does exist, whether you like it or not (it's not a matter of 'opinion').

To reduce draws, i people would still like to distinguish between top engines
(or strong correspondence players) there are several things you can do, (again,
whether you like it or not), faster time controls, other start positions
modification of endgame rules, etc. etc. Then it's not 'normal' chess anymore I
agree, but that was not the question. The question was, which is the 'strongest
engine'. Well to determine that, i would suggest to simply use difficult test
sets, difficult -possibly artificial- chess problems, an area where we still (*) do
not have reached a (strong) solution of chess. So currently it's not Sf etc which
is/ are the strongest engine(s), but it are Crystal, Huntsman, etc.

(*) using the computer, chess problem composers in coming years will be able
to construct more and more difficult problems, and such efforts can continue
for many years, at least as long as some people find it an interesting exercise.

Uri Blass · Post by **Uri Blass** » Fri Nov 15, 2024 12:04 pm

jefk wrote: ↑Fri Nov 15, 2024 11:04 am alfilete wrote

If some author of an engine choose a bad book it's their fault
and
I still do not see 100% draws in the SSDF results
well, maybe because of the bad books ?

Also, lower ranked engines will not always draw against each other; but sooner
or later, all top ranked engines will draw. You (Manuel R) seem to be a typical human chess
player, not a Iccf correspondence player, so for you the draw problem may not be so
apparent. But it does exist, whether you like it or not (it's not a matter of 'opinion').

To reduce draws, i people would still like to distinguish between top engines
(or strong correspondence players) there are several things you can do, (again,
whether you like it or not), faster time controls, other start positions
modification of endgame rules, etc. etc. Then it's not 'normal' chess anymore I
agree, but that was not the question. The question was, which is the 'strongest
engine'. Well to determine that, i would suggest to simply use difficult test
sets, difficult -possibly artificial- chess problems, an area where we still (*) do
not have reached a (strong) solution of chess. So currently it's not Sf etc which
is/ are the strongest engine(s), but it are Crystal, Huntsman, etc.

(*) using the computer, chess problem composers in coming years will be able
to construct more and more difficult problems, and such efforts can continue
for many years, at least as long as some people find it an interesting exercise.

If all the top engines draw against themselves then the strongest engine is the engine that get better result against weaker opponents.

Reducing draws between top engines is not the target of everybody.

jefk · Post by **jefk** » Fri Nov 15, 2024 1:23 pm

strongest engine is the engine that get better result against weaker opponents.

this also is a reasonable statement, but then compared to eg. TCEC we should have
bigger leagues and as result engines as SF should maybe include (bigger) contempt
or other tricks to beat the weak engines (or beat them faster)
As it stands now an engine as Obsidian may do this better than SF.

PS for such purposes the (top engine) books should be geared towards sharper play
(eg. 1.e4 but then also having something against e5 eg. Scotch gambit (or bishops
opening) because of Berlin for Black when playing 'normal' book moves. But then
again an engine as eg. Crystal may perform better than the default SF which isn't
specifically tuned for sharp positions.

chesskobra · Post by **chesskobra** » Fri Nov 15, 2024 2:30 pm

I have also thought that engines should bring their own repertoires (at least for competitions). For engines it may not be a good strategy to play only solid lines in such competitions.

Unbalanced openings may give more decisive results, but do they really give more decisive game pairs? Even if they do, sooner or later most game pairs at top level will be drawn. Then some people will be constantly looking for new unbalanced openings that their favourite engines win from one side and defend from the other side, which would give them a temporary advantage in such openings.Then the question is why should we trust people's favourite unbalanced opening based competition? I don't know how TCEC openings are selected, what goes on behind the scenes. But I don't trust it.

At the moment I don't think the difference between stockfish and other top 10 engines is like the difference between Carlsen and other top 10 players.I would like to know what we can say about the strengths of stockfish relative to other top engines?

Michel · Post by **Michel** » Sat Nov 16, 2024 11:55 am

The only reason for using unbalanced openings is that they make testing more effective (fewer games are needed to prove that a patch makes the engine stronger). This cannot be proved theoretically (although a heuristic argument is given here https://www.cantate.be/Fishtest/draw_ra ... ed_elo.pdf ) but it can verified experimentally using this methodology https://github.com/official-stockfish/f ... ning-books .

Father · Post by **Father** » Sun Nov 17, 2024 7:50 pm

jefk wrote: ↑Thu Nov 14, 2024 9:27 pm Uri B wrote
SSDF give every program their book and does not use UHO
but as you know with sufficient calculation time, the modern top Nnue engines can
perform without book (and perform even better than with using a lousy book)

But then also all games end in a draw, so using UHO or similar (Tcec openings) isn't so odd.

Except that for normal chess games it doesn't matter anymore which is the 'strongest '
engine (for eg. problem chess) as the top five engines all are good enough
(eg. for analyzing positions in human chess, or correspondence chess).

In my own vision and experience, the most powerful chess machine is "LeelaQueenOdds". My qualification is comprehensive and qualitative, typical of a consolidated balance sheet in an accounting-like manner.

a_node_uncut · Post by **a_node_uncut** » Sun Nov 17, 2024 8:44 pm

Brunetti wrote: ↑Thu Nov 14, 2024 8:51 am What do engine testing techniques have to do with forum moderation?!

Alex

If left unmoderated, the same Stockfish group that forced everyone to test, play, and think in their way will come to dominate Talkchess. If you take actions by then, it would be all too late.

What is the strongest chess engine in the world? — A Reflection

Re: What is the strongest chess engine in the world? — A Reflection

Re: What is the strongest chess engine in the world? — A Reflection

Re: What is the strongest chess engine in the world? — A Reflection

Re: What is the strongest chess engine in the world? — A Reflection

Re: What is the strongest chess engine in the world? — A Reflection

Re: What is the strongest chess engine in the world? — A Reflection

Re: What is the strongest chess engine in the world? — A Reflection

Re: What is the strongest chess engine in the world? — A Reflection

Re: What is the strongest chess engine in the world? — A Reflection

Re: What is the strongest chess engine in the world? — A Reflection