Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

mwyoung · Post by **mwyoung** » Wed Nov 11, 2020 3:08 am

JohnWoe wrote: ↑Tue Nov 10, 2020 10:09 pm [d]n2Bqk2/5p1p/Q4KP1/p7/8/8/8/8 w - - 0 1
Mayhem can't see the best move. Null move propably?

Code: Select all

exclude: none best +tail                                          
dep	score	nodes	time	(not shown:  tbhits	knps	seldep)
 15	+0,25 	149,1M	1:07.00	Qd6+ 
 14	+0,30 	60,6M  	0:26.09	Qd6+ 
 13	+0,26 	21,3M  	0:09.36	Qd6+ 
 12	+0,32 	9,79M  	0:04.48	Qd6+ 
 11	+0,31 	4,54M  	0:02.22	Qd6+ 
 10	+0,50 	1,89M  	0:01.00	Qd6+ 
  9	+0,52 	670535	0:00.42	Qd6+ 
  8	+0,67 	357122	0:00.26	Qd6+ 
  7	+0,26 	155040	0:00.14	Qc8 
  6	+0,21 	66985  	0:00.09	Qc8 
  5	+0,27 	29203  	0:00.06	Qc8 
  4	+0,60 	14625  	0:00.05	Qc8 
  3	+1,18 	2057    	0:00.01	g7+ 
  2	+0,51 	571      	0:00.00	Qxa8 
  1	+0,92 	90        	0:00.00	Qd6+ 
  0	#

SF12 too...

Code: Select all

42	  0.00 	116,1M	1:45.27	Qd6+ Kg8 gxh7+ Kh8 Bxa5 Qc8 Bd2 Nb6 Qxb6 Qf8 Qb3 Qg7+ Ke7 Qe5+ Kxf7 Qf5+ Ke7 Qxh7+ Qf7 Qe4+ Qe6 Qxe6+ Kxe6 Kg8 Ke5 Kf8 Bb4+ Ke8 Ke6 Kd8 Kd6 Kc8 Kc6 Kb8 Kb6 Kc8 
 41	  0.00 	105,3M	1:36.63	Qd6+ Kg8 gxh7+ Kh8 Bxa5 Qc8 Bd2 Nb6 Qxb6 Qf8 Qb3 Qg7+ Ke7 Qe5+ Kxf7 Qf5+ Ke7 Qxh7+ Qf7 Qe4+ Qe6 Qxe6+ Kxe6 Kg8 Ke5 Kf8 Bb4+ Ke8 Ke6 Kd8 Kd6 Kc8 Kc6 Kb8 Kb6 Kc8 Kc6 
 40	  0.00 	68,0M  	1:05.72	Qd6+ Kg8 gxh7+ Kh8 Bxa5 Qc8 Bd2 Nb6 Qxb6 Qf8 Bf4 Qg7+ Ke7 f6+ Ke6 Qg4+ Kf7 Qd7+ Kxf6 Qg7+ Kf5 Qg6+ Qxg6 
 39	  0.00 	56,4M  	0:55.76	Qd6+ Kg8 gxh7+ Kh8 Bxa5 Qc8 Bd2 Nb6 Qxb6 Qf8 Bf4 Qg7+ Ke7 f6+ Ke6 Qg4+ Kf7 Qd7+ Kxf6 Qg7+ Kf5 Qg6+ Qxg6 
 38	  0.00 	47,1M  	0:47.09	Qd6+ Kg8 gxh7+ Kh8 Bxa5 Qc8 Bd2 Nb6 Qxb6 Qf8 Bf4 Qg7+ Ke7 f6+ Ke6 Qg4+ Kf7 Qd7+ Kxf6 Qg7+ Kf5 Qg6+ Qxg6 
 37	  0.00 	40,7M  	0:39.78	Qd6+ Kg8 gxh7+ Kh8 Bxa5 Qc8 Bd2 Nb6 Qxb6 Qf8 Bf4 Qg7+ Ke7 f6+ Ke6 Qg4+ Kf7 Qd7+ Kxf6 Qg7+ Kf5 Qg6+ Qxg6 
 36	  0.00 	32,4M  	0:31.73	Qd6+ Kg8 gxh7+ Kh8 Bxa5 Qc8 Bd2 Nb6 Qxb6 Qf8 Bf4 Qg7+ Ke7 f6+ Ke6 Qg4+ Kf7 Qd7+ Kxf6 Qg7+ Kf5 Qg6+ Qxg6 
 35	  0.00 	28,9M  	0:28.65	Qd6+ Kg8 gxh7+ Kh8 Bxa5 Qc8 Bd2 Nb6 Qxb6 Qf8 Bf4 Qg7+ Ke7 f6+ Ke6 Qg4+ Kf7 Qd7+ Kxf6 Qg7+ Kf5 Qg6+ Qxg6 
 34	  0.00 	25,9M  	0:26.14	Qd6+ Kg8 gx

[d]5q1k/6r1/8/8/6R1/2Q5/2K5/8 w - - 0 7

Code: Select all

exclude: none best +tail                                          
dep	score	nodes	time	(not shown:  tbhits	knps	seldep)
 12	+0,22 	177,7M	1:18.68	Rh4+ 
 11	+0,30 	40,9M  	0:15.82	Rh4+ 
 10	+0,23 	24,5M  	0:10.15	Rh4+ 
  9	+0,30 	2,32M  	0:01.20	Rh4+ 
  8	+0,33 	802763	0:00.47	Rh4+ 
  7	+0,27 	435471	0:00.29	Rh4+ 
  6	+0,17 	291404	0:00.23	Rh4+ 
  5	+0,27 	133790	0:00.13	Rh4+ 
  4	+0,14 	39837  	0:00.07	Kd1 
  3	+0,18 	5574    	0:00.02	Rg5 
  2	+0,26 	1180    	0:00.00	Rg5 
  1	+0,33 	89        	0:00.00	Qxg7+ 
  0	#

Mayhem NNUE sees Rh4.

The point is these engines are optimized for game play. Stockfish would simply avoid these kind of endgames

And this needs to be pointed out. Stockfish did not see Rh4. Stockfish only played Rh4. But thinks Rh4 is only equal. Mayhem has no clue how to force the mate.

Alayan · Post by **Alayan** » Wed Nov 11, 2020 3:23 am

BrendanJNorman wrote: ↑Wed Nov 11, 2020 2:42 am I feel like some people are desperately trying to keep clutch of their biases here.

This "avoid the endgames" is nonsensical for 3 reasons:

1. Endgames are the natural result of surviving the middlegame (can we assume Stockfish woukd survive the middlegame against a 32 man tablebase? )

2. I suppose this "avoid the endgames" logic is based also on the fact that tablebases are sometimes called "endgame tablebases" - but the thing is, this is *only* because we have so few pieces. With 32 man, it'd just be called "the oracle" or something - the all-seeing God - in ALL positions.

3. As you mentioned, we only argue from the basis of endgames because it is the only field where we can prove with verifiable data (7 man TBS) that Stockfish has no idea. Any person with a decent grasp of a) chess and b) logic can understand that as complexity is introduced and Stockfish makes the inevitable errrors, the TB will remain PERFECT which means as we seen, INSTANT announcement of mate in x against Stockfish.

Surprised to see people I consider pretty smart arguing against this obvious point.

7-men TB don't show that Stockfish has "no idea".

7-men TB show that Stockfish with a few billion nodes is extremely good in an overwhelming proportion of positions, and a large proportion of the positions where it still fails are nonsensical (the given example with two bishops of the same color or 3 knights are clear cases of things that will NEVER be relevant for holding a game from the start position).

With more pieces on board, the proportion of positions where Stockfish fails will increase for a given node count. How much it will increase is one unknown. Another unknown is how these positions are distributed in regard to "imperfect eval". There is a strong correlation between SF's own eval and its likelihood of blundering a draw into a loss. The point is that the proportion of position SF gets wrong while being completely oblivious is much lower than the proportion of TB-drawn position where SF plays the wrong move but can detect the position is bad. 1. h4 and 1. e4 might be drawn all the same for the TB, but for Stockfish it's much less likely to err after 1. e4 than 1. h4. Luckily, SF will never play 1. h4 anyway.

The errors are not inevitable at all. They only arise in specific positions, which may or may not be forced on the board. Claiming they must happen is a leap of faith.

Also, positions that Stockfish fails to win are only relevant if it fails to understand they are bad and it can be forced to be the weak side in those positions. The TB isn't going to blunder into a lost position, and getting into a lost position SF fails to punish is irrelevant to the main point of Stockfish losing.

Chess's complexity is finite.

My challenge is still valid. If SF mistakes are so big and frequent it would be a no-brainer to accept it as it wouldn't take much effort to succeed.

BrendanJNorman · Post by **BrendanJNorman** » Wed Nov 11, 2020 3:43 am

Alayan wrote: ↑Wed Nov 11, 2020 3:23 am
My challenge is still valid. If SF mistakes are so big and frequent it would be a no-brainer to accept it as it wouldn't take much effort to succeed.

Strawman. Nobody said that "SF mistakes are so big and frequent" so your challenge makes no sense, as I have pointed out before.

The real premise is how close SF is to being able to survive against a 32 man tablebase.

Playing an engine vs engine match against you argues against your strawman, not the real question.

And the only way to know how SF will fare against a perfect 32 TB is to see how SF compares to the TBs we already have and extrapolate from that.

Extrapolating from that we make simple deductions:

1) If Stockfish still isn't flawless in 7 man positions, and mistakes (relative to the number of TB pieces) will increase with complexity, at 32 pieces Stockfish will be almost blind relative to the all-seeing TBs.

2) If Elo is being added very regularly at https://abrok.eu/stockfish/ then how on earth can you dare to say that Stockfish is anywhere NEAR perfect play? If Stockfish were even halfway to perfect play, you'd be squeezing out 1-2 Elo every year or so. Not every day. This is obvious.

Chessqueen · Post by **Chessqueen** » Wed Nov 11, 2020 3:51 am

OneTrickPony wrote: ↑Fri Nov 06, 2020 1:59 am Maybe it's 200 points away, maybe 400. We just don't know. The trick is going to be to select openings that are very close to losing - either drawn but difficult to hold or won but difficult to convert. TCEC is going into that direction and I feel it will be more than 200 ELO vs godlike engine from the future with such openings.

The starting position and the main line solid openings I think we are closer than 200 ELO though. It's hard to imagine current SF being outplayed from main line Ragozin, Berlin or Open Spanish positions even with all the time and advancements from the future.

I believe that all opening Books should be 5 moves, and it will create a more balanced match among engines.

Alayan · Post by **Alayan** » Wed Nov 11, 2020 4:03 am

BrendanJNorman wrote: ↑Wed Nov 11, 2020 2:49 am We are talking about a 32-man tablebase, not necessarily a hypothetical "perfect player" (even though, in essence, this is the same thing).

A tablebase doesn't need to "force" anything, it just reads from the database.

If Stockfish makes ONE inaccuracy, he is theoretically getting mated by force.

Playing 1. e3 is an inaccuracy. But 1. e3 doesn't lose by force.

There is the TB-pov of chess evaluation. Loss in X, draw, Win in X.

But for imperfect players, there is a gradient. That's what the imperfect eval of an engine is. It differentiates positions that are trivial to win, easy to win, hard to win, drawn but with good chances of winning, drawn with low chances of winning, drawn with high chances of losing...

If you used some enhanced scoring that differentiate the type of draws like r-mobility is doing and had TBs for those, you'd see that for complex positions that can't be resolved to 0.00 by engines, there is a significant correlation between the best enhanced score that can be forced and the engine eval. There is an underlying truth that having the first move is better, even though the regular WDL rules are too crude to show it objectively.

What we call an "inaccuracy" is something that moves the subjective evaluation of imperfect players but isn't supposed to change the TB evaluation (of course, we can't always know).

BrendanJNorman wrote: ↑Wed Nov 11, 2020 2:49 am Stockfish, even now, isn't strong enough to avoid this outcome because Stockfish himself isn't close to perfect chess.

Put it this way, if we threw Magnus Carlsen into a time machine and sent him back to 1857, guys like Paul Morphy would spout about him playing "perfect chess" - but we KNOW Calrsen would be hammered 100-0 even against old versions of Stockfish.

Now let's go 150 years INTO the future where we have 32 man tablebases (at this point even 9-10 man TBs seems inconceivable)...

...do you think people will be talking about Stockfish 12?

It's not because the current best chess entity crushes a past best chess entity that a future best chess entity MUST crush the current best chess entity.

It might, or it might not. Past elo gain trends are not indicative of future elo gain trends.

Is Stockfish-dev (25 elo above SF12 at fishtest RT) able to hold the start position as white against any future opponent, provided SF has a reasonable hash and a few billion nodes per move ? I doubt it.

Is it thousands of "elo" (ill-defined) from being able to ? I don't think so either.

BrendanJNorman wrote: ↑Wed Nov 11, 2020 2:57 am The premise of this thread is silly, guys, come on.

Stockfish is within 200 Elo of a 32 man tablebase (or "perfect chess")?

So the premise really is that given SF is around 3700 CCRL...

...a 32 man TB (or flawless, solved chess) is just 3900 CCRL.

An (engine+hw+tc) chess entity that would never lose from the start position would be equal with a 32-men TB (+ any strategy to pick challenging moves) when measuring elo from the start position.

It could still make gross mistakes in plenty of positions, as long as making those mistakes don't prevent it from not losing from the start position.

Leela is clearly weaker than SF from TCEC SuFi positions yet a recent set of bookless games with big hardware and long TC ended with +0-0=100, because Leela never had to face the sort of positions where it gets outplayed by Stockfish.

BrendanJNorman wrote: ↑Wed Nov 11, 2020 2:57 am The future will prove this to us (very soon I imagine since the NNUE revolution has just begun) and strongly imply that "perfect chess" may well be 5000 Elo or beyond.

The fact that they are still adding Elo to Stockfish via patches proves this point.

If Stockfish were ANYWHERE NEAR perfect chess, adding Elo would be like squeezing blood from a stone.

Adding elo to Stockfish has become increasingly difficult over time.

Stockfish is tested at 10s+0.1s (STC) and 60s+0.6s (LTC). At LTC, it won't have more than a few million nodes per moves, often much less than a million. That's three order magnitude less than "TCEC LTC conditions". This makes mistakes much more frequent.

Tests aren't done from the start position either, they are done from a wide set of short openings that overall is rather balanced, but is still more skewed than the start position. Despite this, the draw rate at LTC is around 91%, the highest ever recorded.

At CCRL 40/15, SF12 4CPU is only 35 elo above SF11 4CPU and 23 elo above an even older SF 4CPU. Error margins, different opponents, and so on limit how much trust I'm putting into it, but this data is indicative of significant elo compression.

Alayan · Post by **Alayan** » Wed Nov 11, 2020 5:12 am

BrendanJNorman wrote: ↑Wed Nov 11, 2020 3:43 am
Alayan wrote: ↑Wed Nov 11, 2020 3:23 am
My challenge is still valid. If SF mistakes are so big and frequent it would be a no-brainer to accept it as it wouldn't take much effort to succeed.
Strawman. Nobody said that "SF mistakes are so big and frequent" so your challenge makes no sense, as I have pointed out before.

The real premise is how close SF is to being able to survive against a 32 man tablebase.

Playing an engine vs engine match against you argues against your strawman, not the real question.

Not a strawman. If an engine's start position strength is thousands of elo away from perfect exploitative play, then the mistakes must be very reliably exploitable, and the advantages of 1000:1 resources, adversarial search, auxiliary engines... should be able to get wins.

The number of different legal chess positions has been mathematically proven to be below 10^48 (actually 10^46 but I don't think it includes the 50mr signature. History for repetition draws is not important here). Most of these positions only appear after gross and clear mistakes, the number of relevant chess positions is many magnitudes lower.

1000x the resources means 10^3 more positions visited, provided hash size isn't a limitation. Do this 10 times and you get 10^39 nodes which is more than enough to brute force weakly solve chess. Of course, on current hardware it would take forever to search that much nodes, you wouldn't have enough hash to avoid searching the same position repeatedly (wasting nodes to rediscover what was forgotten), and there would be some 50mr issues with hash. But from 10^9 to 10^12 nodes these issues are less significant.

Meanwhile, we also know that the elo gain coming from each further nodes doubling is decreasing, from all the tests done on this.

If there is 1000 elo to gain through perfect play by basically solving chess, then an expectation of 100+ elo gain (10%) when doing 10% of the doublings required to basically brute force chess wouldn't be far-fetched. It's inaccurate, making some optimistic assumptions and some pessimistic simplifying assumptions, but it's not unreasonable. And the all-is-fair challenge offer allows every trick in the book to do better than just unassisted SF searching 1000x longer.

BrendanJNorman wrote: ↑Wed Nov 11, 2020 3:43 am And the only way to know how SF will fare against a perfect 32 TB is to see how SF compares to the TBs we already have and extrapolate from that.

Extrapolating from that we make simple deductions:

1) If Stockfish still isn't flawless in 7 man positions, and mistakes (relative to the number of TB pieces) will increase with complexity, at 32 pieces Stockfish will be almost blind relative to the all-seeing TBs.

2) If Elo is being added very regularly at https://abrok.eu/stockfish/ then how on earth can you dare to say that Stockfish is anywhere NEAR perfect play? If Stockfish were even halfway to perfect play, you'd be squeezing out 1-2 Elo every year or so. Not every day. This is obvious.

Deduction (1) assumes that the only way to not make a crippling amount of mistakes in a game starting from the start position is to be 100.00% perfect in 7-men positions. The conclusion doesn't follow from the premise, because only a very small subset of legal positions may happen when playing SF vs TB32, so you can't know if any of the positions where SF is blind will actually occur and matter (and you need SF making a losing move, failing to make a winning move isn't going to be an issue vs TB32).

Deduction (2) isn't a deduction. You fail to address the multiple points I raised (edit : sorry, I realized I wrote my previous message while you wrote yours and you published yours before mine, so you couldn't address my points of course) : the record-low amount of decisive games, past gains are not indicative of future gains (though I'll grant you at fishtest conditions SF will most likely gain 30+ elo by this time next year), fishtest conditions have <0.1% of the nodes used at TCEC conditions (So SF is "perfect" in a much greater amount of positions in the latter conditions than the former), fishtest games are using a book more skewed than the start position (not much, double 1-0 are almost inexistant, but it matters) and with non-uniform skew.

I'll add that fishtest games don't use TBs and a majority of the losses occur with the weak side of the opening, while what is discussed is how Stockfish may perform from the strong side of the opening. I downloaded a sample of 2K games from the latest fishtest green LTC (idx 0 to 9 here https://tests.stockfishchess.org/tests/ ... 2301d6a794 ) : 108 white wins, 60 black wins. We can see the white loss likelihood at 6% is much lower than the black loss likelihood at 10.8% in this sample. That's a big difference.

BrendanJNorman · Post by **BrendanJNorman** » Wed Nov 11, 2020 6:34 am

Alayan wrote: ↑Wed Nov 11, 2020 5:12 am Not a strawman. If an engine's start position strength is thousands of elo away from perfect exploitative play, then the mistakes must be very reliably exploitable

Sorry man, but this word salad contains a false assumption.

Just because I (and others) theorized that Stockfish *might be* (the real assertion was that Stockfish would lose 100-0 against 32 man TBs) thousands of points away from the play of a 32 man TB, it doesn't mean that the mistakes Stockfish makes are easily exploitable with today's resources.

Put it this way: When a scientist says "the conditions Mars would need to support human life are x, y and z" he is theorizing.

There is no way as yet, that he can send a human to Mars to test his theories.

When a critic says to him "well, if you think humans can eaaaassssily (critic inserts

where needed) live on Mars, why don't you test in the desert with reduced ogygen??

" it is not useful, not the same, and it does nothing to disprove his theory.

This is why governments ask him (the scientist) for his thoughts and not the critic.

The truth is this:

Neither you, nor me, nor Magnus Carlsen knows what perfect chess looks like.

A 32 man TB will not arrive for probably 100 years, at which time, today's Stockfish will look weak and ridiculous.

A few years ago most of this community was fawning over the play of AlphaZero and talking about how AI might be solving chess and so on...

Now it seems within a short time, the brilliant minds in the CC community have surpassed A0's level.

But still, ALL engines are losing games.

Still, Stockfish is gaining Elo almost daily (weird for an engine so "close" to perfect play, huh?).

Still, there are not even 8 man tablebases, let alone 32.

Right now the reality is this: You guys are saying that "perfect chess" is about 3900 CCRL (200 points above current SF).

This is absurd, as we will see when an engine reaches 4000 in a few years.

Finally, let's repeat: The real premise of this thread is the assertion that Stockfish is within 200 points of perfect play.

Since nobody knows what perfect play looks like, it is very easy to say that Stockfish is close to perfect.

Just as it was easy for Capablanca to say that chess was basically solved (in the 1930s!).

There are 290 points between Stockfish 7 and Stockfish 12 - are you really saying that Stockfish 12 is 90 Elo closer to perfection (as in, equal to a 32man TB) than Stockfish 7 is to 12?

If so, let's just agree to disagree.

Chess will not be solved for a LOOOOOOOOONG time...even by Stockfish.

Uri Blass · Post by **Uri Blass** » Wed Nov 11, 2020 7:06 am

BrendanJNorman wrote: ↑Wed Nov 11, 2020 2:49 am
Uri Blass wrote: ↑Wed Nov 11, 2020 2:39 am
mwyoung wrote: ↑Tue Nov 10, 2020 4:27 pm
Uri Blass wrote: ↑Tue Nov 10, 2020 12:37 pm
mwyoung wrote: ↑Tue Nov 10, 2020 6:54 am
Uri Blass wrote: ↑Tue Nov 10, 2020 3:07 am
mwyoung wrote: ↑Mon Nov 09, 2020 11:28 pm
mwyoung wrote: ↑Mon Nov 09, 2020 10:46 pm
Alayan wrote: ↑Mon Nov 09, 2020 9:35 pm "Very simple endgame positions" where it fails.

Meanwhile, SF from 1.5 years ago + 6-men got 99.999% of 7-men positions right with a few Gnodes.

Some sample of the very natural and common positions where it failed in Aloril's testing :
[d]1K6/5b2/8/3k4/b4B2/2B5/3r4/8 b - - 0 1

Fails to see that 2 same color bishops can't force a win, very common issue in normal games...

[d]4N1q1/6B1/p1k5/8/8/4K2B/8/8 w - - 0 1

Q+P vs 3M, very common sort of position... The suggested move is still a draw under the 50 moves rule.

[d]3B4/B5R1/b7/2k5/7K/8/4b3/8 b - - 0 1

More nonsense with both sides having bishops of the same color.

[d]3N4/6k1/8/8/K3N2p/8/8/3N2q1 w - - 0 1

Very common imbalance, 3N vs Q...

And so on.

I agree! Stockfish is a idiot chess player. Here is another big fail by Stockfish that a 32 man engine would find instantly. Stockfish play has huge holes, and would be crushed with perfect play.

[d]n2Bqk2/5p1p/Q4KP1/p7/8/8/8/8 w - - 0 1

Stockfish 081120 - C Line, Blitz 5min+1sec
n2Bqk2/5p1p/Q4KP1/p7/8/8/8/8 w - - 0 1

Analysis by Stockfish 081120:

1.Qd6+ Kg8 2.gxh7+ Kh8 3.Qd5 Qf8 4.Ke5 Kxh7 5.Qe4+ Kg8 6.Qxa8 Qc5+ 7.Kf4 Qd6+ 8.Kg4 Kg7 9.Bf6+ Qxf6
The position is equal: = (0.00) Depth: 100/19 00:06:25 11232MN, tb=404542636
(, 09.11.2020)

Even with a 100 ply type B search. Stockfish is totally clueless about this position.

I will give you a hint 1.Qc8 wins.
Stockfish will never find the win here. So I will give the solution now.
[d]n2Bqk2/5p1p/Q4KP1/p7/8/8/8/8 w - - 0 1
1.Qc8 Kg8 2.Bc7 Qxc8 3.gxf7+ Kh8 4.Be5 Qc5 5.Bb2 Nc7 6.Ba1 a4 7.Bb2 a3 8.Ba1 a2 9.Bb2 a1R 10.Bxa1 Qe5+ 11.Bxe5 Nd5+ 12.Ke6+ Nf6 13.Bxf6#
Maybe you are right for latest stockfish but people showed that some older version found the win.
Stockfish and all top programs can find easily the loss after 2.Bc7

The main problem is finding 2.Bc7 because stockfish insist to use the null move pruning.
I wonder how much time we need to wait for stockfish to replace it by a better pruning rules.

There are certainly positions when stockfish is blind but it does not prove that you can force stockfish to go to one of them from the opening position and if the 32 piece tablebases only try to play for the longest draw in order to draw position and does not try to take advantage of stockfish's weaknesses then it is not clear that it can win.

It may be interesting if somebody already built tablebases with the longest draw for 5-7 piece tablebases in drawn position with the distance in moves for a draw to find if this tablebase really score better than stockfish without tablebases against weak engines in random drawn tablebases endgames.
The point that is being showed is you will always have blind spots or a error rate with a type B search. And this will happen in every game. And this type B search approximation is fine, but will result in crushing losses against perfect play.

These positions are no trick, but the result of a type B search. And this error rate does not go away, but becomes worse with with more game complexity. As the type B search will need to prune more lines!
It is clear that there are many positions that stockfish does not find the right move but the question is not if there are positions that stockfish does not find the right move but if it is possible to get them from the initial position.

The fact that you show a lot of positions when stockfish does not find the right move does not prove that it is possible to achieve one of them from the opening position.

I guess that it is possible to achieve one of them but I doubt if the strategy of playing the move that is the longest path to draw in drawn positions is enough to do it.
I guess I will not be seeing you upgrade your engine. Since Stockfish 12 plays near perfect chess.

I guess you are under the delusion that these type B search errors only happens in the endgame. And that is not correct.

We only use these endgame examples because we can prove the errors with perfect play.

No.
I do not claim that the errors only happens in the endgame.
Errors can happen in many positions but the question is if the perfect player is going to always force one of them so stockfish lose the match 100-0.
We are talking about a 32-man tablebase, not necessarily a hypothetical "perfect player" (even though, in essence, this is the same thing).

A tablebase doesn't need to "force" anything, it just reads from the database.

If Stockfish makes ONE inaccuracy, he is theoretically getting mated by force.

Stockfish, even now, isn't strong enough to avoid this outcome because Stockfish himself isn't close to perfect chess.

Put it this way, if we threw Magnus Carlsen into a time machine and sent him back to 1857, guys like Paul Morphy would spout about him playing "perfect chess" - but we KNOW Calrsen would be hammered 100-0 even against old versions of Stockfish.

Now let's go 150 years INTO the future where we have 32 man tablebases (at this point even 9-10 man TBs seems inconceivable)...

...do you think people will be talking about Stockfish 12?

I do not know that stockfish can beat Carlsen 100-0
There are no games to show it and I suspect that in a match of 100 games from the opening position Carlsen can get at least one draw.

The result of games of stockfish against old engines with similiar level to Carlsen prove nothing because Carlsen is going to play for a draw when the old engines do not play for a draw.

Collingwood · Post by **Collingwood** » Wed Nov 11, 2020 9:06 am

Alayan wrote: ↑Mon Nov 09, 2020 9:35 pm Meanwhile, SF from 1.5 years ago + 6-men got 99.999% of 7-men positions right with a few Gnodes.

This has not been tested. We don't know how many positions SF gets right. Obviously we can't test every one, but it's not obvious how to choose positions to test.

BrendanJNorman · Post by **BrendanJNorman** » Wed Nov 11, 2020 9:23 am

Collingwood wrote: ↑Wed Nov 11, 2020 9:06 am
Alayan wrote: ↑Mon Nov 09, 2020 9:35 pm Meanwhile, SF from 1.5 years ago + 6-men got 99.999% of 7-men positions right with a few Gnodes.
This has not been tested. We don't know how many positions SF gets right. Obviously we can't test every one, but it's not obvious how to choose positions to test.

How is it relevant even if true?

For the test to mean anything, Stockfish should not be allowed to use the 6 man tablebases nor an opening book.

If Stockfish is indeed playing "perfect chess" it will need neither.

Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions