Introduction
"The strongest chess engine in the world is Stockfish" is a statement echoed by virtually all chess players who consider themselves informed. But how valid is Stockfish's #1 spot, really? By what metric is Stockfish the best chess-playing program, and how objective is that metric? Are those players discussed really as informed as they say they are, or have they rather been deliberately misinformed?
Traditional Engine Testing
Ever since times of antiquity, chess had always been played starting from the standard starting position. Sure, the position may seem symmetrical and boring at a glance, but there exists a vast amount of theory, knowledge, and tactics. That, combined with the first-move advantage of white, creates a dynamic game with much imbalances. These starting positions have endured more centuries, remaining the cornerstone of the game even after the advent of chess engines.
It was only due to necessity and viewer engagement did engine v engine match organizers switch to pre-arranged openings. Even then, the book lines are limited in length, and highly reflects human opening repertoires, even at the topmost level. The early days of computer chess had been one of fierce competition, driven by dreams, motivations, and—perhaps most importantly—creativity.
Fishtest and the death of creativity
Shortly after the establishment of the Fishtest testing platform, Stockfish rose quickly through the ranks, eventually landing on the top of every rating list. The Fishtest platform is efficient and effective, but it also stifled creativity, much like the corporate culture that dominates the modern age. Despite cramming more Elo than ever into their engines, the understanding of how and why each heuristics work began to drastically fall. As someone who learned chess programming knowledge the more traditional way, many "tweaks" and "improvements" found in modern Stockfish code are not just difficult to understand, but completely opaque and incomprehensible. This weakening of theoretical basis came with a disastrous consequence.
The UHO Strategy
Stockfish's Elo rating plateaued, and by the release of Stockfish 16, progress had all but stalled. The testing system, being flawed and underinformed, began to crumble. Meanwhile, rival engines like Lc0 (developed by Alexander Lyashuk et al.), Ethereal (by Andrew Grant), and Berserk (by Jay Honnold) were rapidly closing the gap.
Amid this stagnation, the Stockfish team found a glimmer of hope: UHO (Unbalanced Human Openings). UHO forces the engine to defend suboptimal opening lines. Proponents of UHO claims that it makes viewing experience more fun, and helps reduce draw rate. However, UHO’s relevance to top-level chess is questionable. The openings it promotes are rarely seen in elite human play and are far less significant in terms of theory. More importantly, these offbeat openings place a greater emphasis on tactical sharpness rather than positional understanding. Only under these testing conditions can Stockfish retain a clear advantage, and the Stockfish team, of course, exploited it.
The Stockfish team quickly optimized their engines for UHO conditions. With aggressive and unprecedented tactics, they managed to pressure and manipulate most major tournaments to adopt UHO openings. One noticeable exception is the CCRL, whose operators (correctly) stuck to balanced, theory-rich openings. But the Stockfish team is quick to rally behind other rating lists, such as SP-CC, that are more amenable with their strategic interests. The Stockfish team also unleashed a massive campaign to align every other engine with their testing standards. Forums and online communities, overrun by keyboard warriors stubborn with their SPRT methodology, became major battlegrounds for this ideological shift. As more and more engines joined on the SPRT hype, the vast compute capacity of Fishtest became increasingly more advantageous.
What now?
The Stockfish mafia had largely taken over, but their power is not unlimited. For example, CCRL and CEGT shows that Stockfish and Torch are neck in neck. But a much more important asset for us is Talkchess. With effective and decisive moderation policies, Talkchess remains as one of the last online technical communities not yet overrun by Stockfish zealots. However, recently, certain proponents of the Stockfish team began to push for moderator elections, threatening to completely destroy what little we have left of Talkchess. Therefore, I propose adopting the following policies to overcome these difficulties:
- Charter Amendment: Intolerance to different testing methodologies (SPRT bashing) should not be allowed.
- Advocacy for alternative testing methods: The Talkcess community should advocate for alternative testing methodologies, to counter the Stockfish effort
- Vetting moderator candidates by the FG (Founders' Group): People with significant biases should not be allowed moderator, even if Stockfish insiders would very much like them to.
Max L