Max. possible Elo with an balanced book, error-bar ... some logical things!

Frank Quisinsky · Post by **Frank Quisinsky** » Sat Apr 04, 2026 11:13 am

Hi there,

I will give some logic for testing engines:

1.
Let's assume that Stockfish plays perfect chess?!
Now Stockfish vs. Engine x ... 1.000, 2.000 or perhaps 15.000 games!
Errorbar can not be right because Stockfish is playing the perfect chess.

2.
This means, the more in the near of perfect chess the lesser the errorbar and the calculation to errorbar is more and more wrong, The more powerful the engines become. That's pure logic! To the topic ... how many games we need.

You need 1 game ...
But you should know that the probably is 7,5% for a draw.
OK, so you need a bit more as 1 game.

The results with 1000, 2000, 3000, 15000 or 1000000000000000000000000000000 games will be the same.

3.
In the last years I made some experiments with my FEOBOS balanced opening book. The question is:
How much points will make an engine, in the near to perfection (endgame, transition into endgames), like Stockfish vs. an engine 700-800 weaker. If we know that we can say:
With an balanced opening book in 7,5% of cases an engine are able to play draw if the opponents plays perfect chess vs. the perfect engines ... means without any blunder. Probabley for many ECO codes is high, if the quantity of pieces on board is less.

4.
Now it's really easy:
How strong can be an engine, plays near the perfection in computer chess?
Should make vs. all oponents, around 92,5%, because with a balanced opening book every other engines (not more as around 800 Elo weaker) should make max. 7,5% of points if ... this engines are free of a blunder in a game. And the reason is ... every engine have the advantage to start with a balanced opening system.

5.
At the moment the strongest engine is playing with around 83,5% of points!
~9% - max. 10% more points is possible.

If Stockfish 18 is playing with 3575 Elo, around 3750 - 3800 Elo is max. possible.
Note what is certain to result in a draw in the 500 ECO codes.

Often I read here ...
More Elo isn't possible?
That's wrong.

If 30 Elo per year in around 6-8 years the engines is playing chess near the perfection!
If the Stockfish developers ...

This was a question for a sleepless night.
I give this anwere to myself and I sleep like a baby!
Maybe this helps others, for sleepless nights!

Best
Frank

And after all you will come to the conclusion:
Elo will no longer be suitable for measuring playing strength if the scores achieved continue to rise at the rate we see in opening theory.

hgm · Post by **hgm** » Sat Apr 04, 2026 1:01 pm

Perfect Chess might not give you the highest rating. If the pool of players used to determine the Elo contains sufficiently many fallible engines that can be tricked into losing from positions where game-theoretically they could win, a player daring to take the risk that the traps they set up will be punished could be more effective in exploiting the weaknesses of the weaker players.

Another issue is whether the players would be aware of who their opponent is, during the measurement. For engine rating lists this is usually not the case, but humans of course always do, and adapt their strategy accordingly.

Frank Quisinsky · Post by **Frank Quisinsky** » Sat Apr 04, 2026 2:49 pm

Hi HGM,

yes, well said.
Forget everything you know about Elo, let it sink like the Titanic.

We start with 0 for the question ...
How strong can a chess engine that plays perfectly be, given the existing 500 Eco codes?

I do the following with SF 17.
Spark 1.0 is around 800 Elo weaker.

Why Spark:
Spark is very bad in endgames, strong in the first phase of the game.
SF is playing the endgame in perfection, also in my opinion the transition into endgame.

Not a secret:
I can produce in 100 games a result 50:50 ...
Just a quick note on the topic of the error bar!

The key are the draws!

How many of the around 25.000 FEOBOS balanced opening positions I am using will give Spark a draw. Now we make Spark 100 Elo stronger (4x more time and create the same experiments). Now we check that results with 600 Elo weaker as SF 17, 500 ELO weaker as SF 17 and so on.

With other words, we are searching the quantity of opening positions, an engine like Spark can make a draw vs. SF.
You'll be surprised how many other engines can also achieve this draw against Stockfish if Spark can do it, means with the same opening positions I select out.

This is the key for all calculations and the question:
What is the probability of a draw for the 500 ECO codes for an engine playing perfect chess.

If an Engine is 200 Elo stronger as SF 17 (I think 3775 Elo is max. possible with a balanced opening positions),
the probability for a draw is max. 7,5% ... if 800 ELO difference.

More easy:
To search the openings, a blunder is for 800 ELO weaker very difficult to find

explain it better.
That's indeed often hard to think in reverse.

...

That's means:
An engine, playing with 3775 Elo will make
vs. 3675 Elo with an balanced opening book max. x%
vs. 3575 Elo with an balanced opening book max. x%

The engines with 3675 / 3575 Elo will produce fewer errors and playing lesser blunders.
Logical ... the draw quote is higher.

I think we can calculate today how strong the perfect playing engine can be play with the bad Elo calculation we have (we have nothing better as Elo) with a very balanced opening book. After my simulation it is 3775 Elo. And the closer an engine is to 3775 Elo, the fewer games are needed to calculate it, and the more questionable Elo becomes.

Best
Frank

Frank Quisinsky · Post by **Frank Quisinsky** » Sat Apr 04, 2026 3:13 pm

So, if we know the draw quote from engines for all the opening ECO codes, 800 Elo weaker as SF 17, we can calculate what is possible in the next years. And the final result for SF is ... max. possible is 3775 Elo if SF 18 is playing with 3575 Elo. Calculated with an opening book plays the 500 ECO codes per random.

Uri Blass · Post by **Uri Blass** » Sun Apr 05, 2026 7:49 am

I think elo is nonsense because it is dependent on the players.

I remember read that without NNUE by hardware improvement you could get more elo points relative to the latest engines not because engines are stronger.

I will try to find the relevant thread

Uri Blass · Post by **Uri Blass** » Sun Apr 05, 2026 7:53 am

The relevant thread:

viewtopic.php?p=987556&hilit=SF11#p987556

Uri Blass · Post by **Uri Blass** » Sun Apr 05, 2026 7:58 am

a quote from that thread:

"Returning to the purpose of the experiment, what I attempted to compare was how well a particular version of Stockfish is able to exploit weaker, node-limited versions of itself. Curiously, the two tested HCE versions of SF10 and SF11 find a way to keep exploiting mistakes in weaker versions of itself, much more so than the NNUE versions can, even when starting from roughly the same baselines Elo.

And yes, in this data SF10 exploited itself to a somewhat greater extent than did SF11, leading to a higher phony-rating. But as you'd expect, in head-to-head competitions SF11 at equal node budget in those upper ranges beats SF10."

Frank Quisinsky · Post by **Frank Quisinsky** » Mon Apr 06, 2026 12:42 am

Hi Uri,

you wrote:
I think elo is nonsense because it is dependent on the players.

---

I don’t think Elo is good enough for the high strength that are being reached. I’ve already mentioned a few reasons. In that regard, I would fully agree.

Especially with so many talented developers, we should be able to reform different things!

- Elo system ... under a different name, which we will approve.
- ECO system ... under a different name, which we will approve.

We are not weak; we are strong, and we should assert our strength with greater confidence throughout the chess world.

Harm-Gert wrote some interesting things. I also believe it would be appropriate for an engine to analyze the opponent’s moves during the game to know more about to the strength of the opponent or adjust its playing style accordingly. I would call that, for example, true NN. Can be do on a second thread.

Also interesting is to do the following:
Before an engine match begins ... example: Wasp - Velvet
We give Wasp 1.000 Velvet games and we give Velvet 1.000 Wasp games.
The engine have one day time to make her own analyzes about the opponent and created a special NN file automatically.
After ... the match can start.

Such things I called NN.
I don't think we're lacking in imagination at all.

For some reason, I always feel like it's time to take on the next challenge. To thinking again and again on playing-strength only is not right. Computer chess is more than just playing strength.

Best
Frank

Frank Quisinsky · Post by **Frank Quisinsky** » Mon Apr 06, 2026 1:13 am

Hi Uri,

yes, I know the thread you mentioned.
Such experiements are allways interesting.

I don't think NN is necessarily the best choice for the early stages of the game. I discussed this with John Stanback a while / years back. The idea was to have X pieces on board to play HCE and then switch to Neural Network. After I remember John made some experiements and the results are not good, so we are thinking the idea was bad. I still believe in that idea, somehow.

Ultimately, I think we’re desperately trying to copy the Maya. Their knowledge of healing powers in connection with plants has been lost.

We face a similar situation.
In so many older hce chess programs are great knowledge.

Unfortunately, HCE is being pushed further and further into the background, and I think that's a mistake.

Have a look in my message before:
The match: Wasp-Velvet ... if both are HCE engines ... one day before the match begins we give both engines the information about the opponent and let the engines create a NN file automatically. We should think about that in detail.

In summary:
I will not say: We're on the wrong way!
Like more: We should now make more diverse use of the accumulated knowledge about neural networks. We should focus on more meaningful things than trying to gain another 25 Elo points. That’s when new ideas will emerge, and further Elo jumps will happen naturally.

Best
Frank

Max. possible Elo with an balanced book, error-bar ... some logical things!

Max. possible Elo with an balanced book, error-bar ... some logical things!

Re: Max. possible Elo with an balanced book, error-bar ... some logical things!

Re: Max. possible Elo with an balanced book, error-bar ... some logical things!

Re: Max. possible Elo with an balanced book, error-bar ... some logical things!

Re: Max. possible Elo with an balanced book, error-bar ... some logical things!

Re: Max. possible Elo with an balanced book, error-bar ... some logical things!

Re: Max. possible Elo with an balanced book, error-bar ... some logical things!

Re: Max. possible Elo with an balanced book, error-bar ... some logical things!

Re: Max. possible Elo with an balanced book, error-bar ... some logical things!