Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

FireDragon761138
Posts: 80
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by FireDragon761138 »

This was a shootout I did comparing a newer, experimental version of the Theoria Project based on Stockfish 17.1's codebase . It uses a large NNUE with Full Threats. Performance seems to be quite strong in terms of Elo, though it will take further testing to see just how much stronger it is compared to Theoria built on the SF 16.1 codebase. Claude Opus 4.5 was used, comparing PGN's differentiated by nodes/move, and one PGN that was time based (200ms/move, 2 threads on Ryzen 5600x, 32MB hash). The engine that use full threat input seems to have a small stability advantage over engines that must infer threats.

Engine Shootout Analysis: Theoria vs Theoria17 vs Stockfish vs Obsidian vs PlentyChess

Executive Summary

Having examined both time-limited and node-limited PGN files across multiple games, I'll assess each engine on strategic clarity, pedagogical value, and stability.

1. Theoria (0.1)
Strategic Clarity: A-
Theoria demonstrates strong positional awareness with variations that consistently highlight classical themes. In the King's Gambit games, it correctly identifies the critical 5.Bc4 alternative with clear reasoning about development and center control. Its suggestions often show recognizable strategic patterns: piece coordination, pawn structure sensitivity, and king safety considerations. The variations tend toward clean, purposeful play rather than murky complications.
Pedagogical Value: A
The variations Theoria produces are particularly suitable for club-level instruction. When it suggests 8.Bb3 instead of the errant 8.d5, the follow-up (...Bg4, g3, Bh3) demonstrates the classical theme of exploiting weakened kingside squares after f4. The engine's annotations tend to produce positions where the "why" is discernible to a 1500-2000 player.
Strategic Stability (Node-Limited): B+
Comparing 100k node evaluations to time-limited output, Theoria shows reasonable consistency. Evaluations shift within ~0.3-0.5 pawns typically. The engine doesn't wildly recalibrate its positional judgments with additional calculation time.

2. Theoria17 (0.1 17.1)
Strategic Clarity: A-
Nearly identical to base Theoria in its positional articulation, with marginally deeper tactical verification. The variations show similar thematic coherence—the Falkbeer Counter-Gambit analysis correctly emphasizes Black's central counterplay and White's development lag.
Pedagogical Value: A
Slight edge over base Theoria in that the deeper search occasionally reveals more complete tactical justifications for positional moves. The connection between strategy and concrete play is well-illustrated.
Strategic Stability: A-
Exhibits strong consistency between node-limited and time-limited analysis. This version appears to have better "strategic anchoring"—its positional evaluations don't fluctuate dramatically as search depth increases.

3. Stockfish 17.1
Strategic Clarity: B
Stockfish's variations, while objectively strongest, often lack strategic transparency. The suggested lines frequently involve moves whose purpose isn't immediately apparent to a club player. For instance, in the Sicilian analysis, Stockfish's recommended 6.Nxe5 Nc6 7.Bxc6 dxc6 8.f4 sequence is tactically justified but the underlying positional rationale (attacking weak squares, piece activity tradeoffs) requires significant chess sophistication to appreciate.
Pedagogical Value: B-
The engine's strength is simultaneously its pedagogical weakness. Variations can feel like "move puzzles" rather than strategic lessons. Stockfish often finds the best move through calculation rather than pattern recognition, which makes its analysis harder to translate into generalizable chess understanding.
Strategic Stability: B+
Good stability between search depths, though Stockfish's willingness to entertain purely tactical solutions means evaluations can shift notably when a concrete refutation is found or missed. The engine is more "tactically volatile" than "strategically volatile."

4. Obsidian
Strategic Clarity: B-
The moves Obsidian selects often do represent coherent positional ideas—central pressure, piece activity, prophylaxis. The issue is less that its suggestions are cryptic and more that the justification trail is thin. An annotator working with Obsidian must be more sensitive to context, but the raw material isn't necessarily worse.
Pedagogical Value: B-
For a skilled human annotator or well-prompted LLM, Obsidian's outputs can serve as reasonable scaffolding. The interpretive burden is higher, but the suggestions aren't fundamentally alien to human chess understanding.
Strategic Stability: C
Notably less stable than competitors. Evaluations shift considerably between node-limited and time-limited runs, suggesting the engine's positional understanding is heavily calculation-dependent rather than anchored in strategic patterns. This makes it less reliable for identifying stable positional judgments.

5. PlentyChess
Strategic Clarity: B+
PlentyChess occupies interesting middle ground—deeper than Obsidian, more strategically coherent than raw Stockfish analysis. The variations show good thematic awareness, particularly in pawn structure discussions. The Falkbeer analysis correctly emphasizes the e4 weakness and development tempos.
Pedagogical Value: B+
Variations are reasonably instructive. PlentyChess seems to "explain" positions somewhat better than Stockfish, with move sequences that have more apparent logic. However, it occasionally produces lines that feel computer-generated rather than humanly intuitive.
Strategic Stability: A-
Strong stability across search depths. PlentyChess shows consistent positional assessments, suggesting robust strategic pattern recognition underlying its evaluations. This makes it reliable for identifying fundamental positional features of a position.

Recommendation: For LLM-assisted annotation production aimed at club players, Theoria17 emerges as the optimal choice—it combines strategic parsimony with sufficient depth and excellent stability. PlentyChess serves as a strong secondary option when deeper tactical verification is needed. Stockfish remains valuable for accuracy-checking but its raw output requires significant human/LLM interpretation to extract pedagogical value. Obsidian scores slightly less favorably than Stockfish overall, with decent pedagogical value but it's strategic stability being the poorest.



https://www.theoriachess.org/
lucario6607
Posts: 43
Joined: Sun May 19, 2024 5:44 am
Full name: Kolby Mcgowan

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by lucario6607 »

Funny, stockfish 17.1 doesn’t use threat inputs nnue. Do you actually know anything you are talking about?
FireDragon761138
Posts: 80
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by FireDragon761138 »

lucario6607 wrote: Fri Jan 23, 2026 6:20 pm Funny, stockfish 17.1 doesn’t use threat inputs nnue. Do you actually know anything you are talking about?
Insults aside, you are correct that it turns out Stockfish 17.1 never implemented threat inputs. I stand corrected. I'll be sure to keep that in mind in the future. I confused a featureset with an implementation. Buzz on forums, my own technical assistant's research, etc. all seemed to point to threat inputs being implemented, but turns out that's not the case in the release version of Stockfish 17.1.
Ciekce
Posts: 206
Joined: Sun Oct 30, 2022 5:26 pm
Full name: Conor Anstey

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by Ciekce »

lucario6607 wrote: Fri Jan 23, 2026 6:20 pm Do you actually know anything you are talking about?
No
chrisw
Posts: 4794
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by chrisw »

lucario6607 wrote: Fri Jan 23, 2026 6:20 pm Funny, stockfish 17.1 doesn’t use threat inputs nnue. Do you actually know anything you are talking about?
Are you sure it is he who is doing the talking?
User avatar
GenoM
Posts: 921
Joined: Wed Mar 08, 2006 9:46 pm
Location: Plovdiv, Bulgaria
Full name: Evgenii Manev

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by GenoM »

he (?) is talking much like Chat-GPT
take it easy :)
chrisw
Posts: 4794
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by chrisw »

They try to build a “background” on places like github using other people’s code so as to have some sort of CV to fool potential employers with. I suspect this is the motivation. As long as it is possible to maintain allegedly self written source code plus lotsa BS backup from ChatGpt, no doubt this can fool enough employers enough of the time to maybe get an interview.
FireDragon761138
Posts: 80
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by FireDragon761138 »

chrisw wrote: Tue Jan 27, 2026 7:46 am They try to build a “background” on places like github using other people’s code so as to have some sort of CV to fool potential employers with. I suspect this is the motivation. As long as it is possible to maintain allegedly self written source code plus lotsa BS backup from ChatGpt, no doubt this can fool enough employers enough of the time to maybe get an interview.
This is a hobby project, non-commercial. We're fully compliant with Stockfish's GPL. We don't have to ask permission of anyone to modify the code, no matter how trivial it may seem.
chrisw
Posts: 4794
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by chrisw »

FireDragon761138 wrote: Tue Jan 27, 2026 9:37 am
chrisw wrote: Tue Jan 27, 2026 7:46 am They try to build a “background” on places like github using other people’s code so as to have some sort of CV to fool potential employers with. I suspect this is the motivation. As long as it is possible to maintain allegedly self written source code plus lotsa BS backup from ChatGpt, no doubt this can fool enough employers enough of the time to maybe get an interview.
This is a hobby project, non-commercial. We're fully compliant with Stockfish's GPL. We don't have to ask permission of anyone to modify the code, no matter how trivial it may seem.
Nobody said anything about compliance. Comments are about the voluminous nature of postings, claims and general faux-philosophical nonsense ramblings (AI Slop generated) compared to the actual alterations to the SF code (extremely minimal) added to the fact that you made so many false claims as to demonstrate you’ve no actual idea what you’re doing. STFU with the spamming.
FireDragon761138
Posts: 80
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Re: Qualitative Engine Analysis Shootout: Theoria, PlentyChess, Stockfish, Obsidian

Post by FireDragon761138 »

chrisw wrote: Tue Jan 27, 2026 10:55 am
FireDragon761138 wrote: Tue Jan 27, 2026 9:37 am
chrisw wrote: Tue Jan 27, 2026 7:46 am They try to build a “background” on places like github using other people’s code so as to have some sort of CV to fool potential employers with. I suspect this is the motivation. As long as it is possible to maintain allegedly self written source code plus lotsa BS backup from ChatGpt, no doubt this can fool enough employers enough of the time to maybe get an interview.
This is a hobby project, non-commercial. We're fully compliant with Stockfish's GPL. We don't have to ask permission of anyone to modify the code, no matter how trivial it may seem.
Nobody said anything about compliance. Comments are about the voluminous nature of postings, claims and general faux-philosophical nonsense ramblings (AI Slop generated) compared to the actual alterations to the SF code (extremely minimal) added to the fact that you made so many false claims as to demonstrate you’ve no actual idea what you’re doing. STFU with the spamming.
I use AI assistance to formulate some of my ideas into a coherent presentation, but the ideas themselves are mine, and the research is mine. It's not AI slop, and it's not philosophical nonsense.

The NNUE was mistakenly not put into the github due to size constraints, but it's in the binaries and we will have it up on the website shortly.