Breakthrough in combining MoE into chess engine algothrim

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

syzygy
Posts: 5974
Joined: Tue Feb 28, 2012 11:56 pm

Re: Breakthrough in combining MoE into chess engine algothrim

Post by syzygy »

Mark Tang wrote: Sat May 02, 2026 9:35 am Yes, I have to admit that such writing a long text that needs clear logic is quite tiring for me.So I tell AI my idea and let it help me to generate the whole passage but the experiment and idea behind the passage are all real and original.
Well it tells and it is a major put off.

If you can't write a long text, then just write a short text.
User avatar
towforce
Posts: 13025
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: Breakthrough in combining MoE into chess engine algothrim

Post by towforce »

syzygy wrote: Sat May 02, 2026 1:28 pm
Mark Tang wrote: Sat May 02, 2026 9:35 am Yes, I have to admit that such writing a long text that needs clear logic is quite tiring for me.So I tell AI my idea and let it help me to generate the whole passage but the experiment and idea behind the passage are all real and original.
Well it tells and it is a major put off.

If you can't write a long text, then just write a short text.
Too many words: "concise writing is better than long AI generated text" would do. :wink:
Human chess is partly about tactics and strategy, but mostly about memory
berttuyt
Posts: 2
Joined: Sun Nov 22, 2020 10:34 pm
Full name: Bert Tuyt

Re: Breakthrough in combining MoE into chess engine algothrim

Post by berttuyt »

As others have pointed out, you don't need long text. Just limit yourself to short text, with clear logic. And try to describe what you did in a quantitative way, not just catchy phrases.

For example, describe in detail the Expert neural networks used within your MoE framework. If you use multiple networks, then there is this disadvantage that if you update them incrementally (like NNUE), that it scales linear with the number of networks. If your implementation is still faster, than most likely the topology is slightly different and/or smaller.
These are the details we are all interested in.

MoE is not a radically new concept, but as there are some/many challenges associated with this, we don't see many implementations in current chess programs. We are only able to bring computer chess on a higher level, if we not only provide base ideas (which we all have), but implement and describe that in a open and transparent way, that it is straightforward to reproduce. And code sharing is a good example, and could support your ideas.
In this way we are building on the shoulders of giants. And it is the reason that computer chess was able to progress, the way it did.

Bert
ehenkes
Posts: 56
Joined: Sun Feb 14, 2021 4:58 pm
Location: Germany
Full name: Erhard Henkes

Re: Breakthrough in combining MoE into chess engine algothrim

Post by ehenkes »

It’s just an experimental chess engine, and will never release to the public.
A lot of words. Where can I download and test your engine vs Stockfish 18? :D

The proposal sounds ambitious, but in its current form it reads more like promotional AI language than a serious technical argument.

The central idea is easy to understand: instead of one NNUE evaluation network, use several specialized NNUE style evaluators and let a gating mechanism decide which ones should contribute in a given position. That part is at least conceptually reasonable. However, the text repeatedly jumps from “interesting possibility” to “major breakthrough” without providing the evidence needed to justify such claims.

The first problem is that it assumes MoE can be transferred into a chess engine almost for free. That is not obvious at all. In a classical alpha beta engine, evaluation must be extremely fast, cache friendly, and predictable. A routing network, multiple experts, additional memory traffic, and expert selection overhead may easily cancel out any theoretical gain in model capacity. In chess engines, elegant ideas often fail because the engineering cost per node is too high.

The second problem is that the text treats specialization as automatically beneficial. It is true that opening play, tactical middlegames, positional play, and endgames differ greatly. But it does not follow that several experts will outperform one well trained network under real search conditions. The burden of proof is not on the skeptic. It is on the proposer. One would need strong experimental evidence showing better Elo at fixed time controls, not just a plausible story.

Third, the claim that this approach could scale to massive parameter counts while still keeping very high NPS is speculative. In chess, raw arithmetic is only part of the story. Memory locality, cache reuse, branching behavior, and incremental update structure matter enormously. A larger sparse model may look efficient on paper while being slower in practice on CPU hardware, which is still the main battlefield for top alpha beta engines.

Fourth, the training discussion is vague. Phrases such as “train the gate network with reinforcement learning” and “end to end fine tuning” sound impressive, but they avoid the hardest questions. How is expert collapse prevented? How is balanced usage maintained? How is consistency between experts ensured so that search does not become noisy or unstable? How is the incremental update mechanism preserved across multiple experts without creating unacceptable implementation complexity?

Fifth, the text overstates novelty. Specialized evaluation is not a magical new concept. Chess engines have long separated concerns implicitly or explicitly through handcrafted terms, phase dependent interpolation, endgame logic, and different search heuristics for different position types. Even within neural evaluation, the mere addition of routing and multiple subnetworks does not automatically create a new generation of engine design.

So the fair conclusion is this: the idea may be worth exploring as a research direction, but the text does not demonstrate a breakthrough. It offers a speculative architecture, wraps it in grand language, and skips the hard empirical questions. Until there are rigorous benchmarks showing clear Elo gains at practical time controls and acceptable speed on real hardware, this should be treated as an interesting hypothesis, not as proof of a new era in chess engine design.