Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

mwyoung · Post by **mwyoung** » Sun Nov 15, 2020 7:29 pm

towforce wrote: ↑Sun Nov 15, 2020 7:05 pm
mwyoung wrote: ↑Sun Nov 15, 2020 5:13 pm
towforce wrote: ↑Sun Nov 15, 2020 11:48 am
mwyoung wrote: ↑Sun Nov 15, 2020 4:51 am
towforce wrote: ↑Sun Nov 15, 2020 1:53 am For this mini-thought experiment, please assume that chess is drawn (I know it's not proven yet):

* losses strongly correlate with blunders

* the deeper the search, the fewer the number of blunders

Unfortunately, not all engines measure depth in the same way. However maybe we can come up with a "reasonable guess" based on experience.

Another complicating factor: some positions would require a prohibitively deep search to uncover the blunder. In these cases, knowledge would be needed: the eval would need to be able to avoid blunders that search cannot reach. The good news regarding this is that, thanks to NNs, engines are also getting cleverer now, as well as just faster. Again, exactly how "smart" a NN is is difficult to say - but again, we can have a go.

So if chess is drawn (which I believe it is), then the time to perfect chess engines depends on the shape of the 3-dimensional chart that plots blunders against depth and knowledge.

Edit: here's a simplistic view of what the 3d graph might look like (X = depth, Y = knowledge, Z = blunders. Simple expression produces a plane. Drag with the mouse to rotate up/down/left/right to see clearly) - link.
This is were some do not understand the problem.

"The deeper the search, the fewer number of blunders."

The problem is the type B search. A type B search is fine playing scrub humans, and other scrub engines. It gives us a great approximation.

The issue is you are making billions of guesses as to what lines to cut to achieve the great search depths we see today. And you only need to be wrong once against perfect play.

And no amount of search in a type B search can ever achieve perfect play.

This is why we see the errors as shown here in this thread. And why Stockfish fails in the examples against perfect play.

You haven't addressed the issue of knowledge which I raised (see above quoted text). You appear to be saying that the 3 dimensional chart should have a long tail on the way to Z=0 (if you're willing to assume that chess is a draw without a blunder). Maybe you could come up with your own mathematical expression and redraw my chart? "A picture is worth a thousand words".

In this post Albert Silver told us that in top level correspondence chess (TLCC), wins are rare in completed games. Let's consider some candidate reasons why this might be so (my preferred choice is option 1 - that TLCC is the cutting edge, and is almost there in terms of error-free chess).

1. Chess is a draw, a win requires a blunder, and TLCC has almost eliminated blunders

2. Chess is a draw, a win requires a blunder, blunders occur in TLCC, but TLCC suffers from groupthink, and hence the players fail to find each other's blunders

3. Chess is a win, but TLCC players are not good enough to find the available wins

Which of the above 3 choices do you prefer?
"You haven't addressed the issue of knowledge which I raised"

Yes, I have many times. And in the knowledge standard you are asking for only exist in one form. As I said before Chess is a 100% tactical game...

And I will take option 4. Chess is either a win or a draw, but it does not matter, as humans are a type B searcher, and the computers they using are a type B searcher. Even in correspondence chess, and hence the players fail to find each other's blunders.

"Another complicating factor: some positions would require a prohibitively deep search to uncover the blunder. In these cases, knowledge would be needed: the eval would need to be able to avoid blunders that search cannot reach."

And it is above that tells me you have no idea what you are talking about. You are just putting words together that you think make sense. But are logically flawed. Not only do you not know the rules of chess, but you are clueless as to how a type B search works.

If you had an eval that could "avoid blunders that search cannot reach."

If you had this type of evaluation. Do you know what would not be needed........A search of any kind.

Here is a simple test to see if you have an evaluation that meets your standard. If your STATIC EVALUATION outputs anything other then the 3 true evaluations of chess, and it is not correct 100% of the time. Your evaluation is flawed.

And yes this type of knowledge does exist in only one form, and it is called a table base.

You're basically right - but if a position was won, you'd want one more thing from the eval - distance to mate. If you had a choice of winning moves, your preference would be for the one that reaches mate first.

To summarise your answer as to why the draw ratio is so high in completed TLCC games: the players all use similar computers for analysis, and this is causing groupthink.

I cannot prove that you're wrong, but here's a bit of evidence against that assertion:

* TLCC having such a high draw ratio in completed games is relatively recent

* if it's caused by groupthink, the players must therefore be relying more on the computers (or the high draw ratio in completed games would have been there previously)

* therefore, one would expect the computers playing each other to also have high draw ratios

* we're not (yet) seeing such a high draw ratio in computers playing each other

If the high draw ratio in completed games in TLCC is actually a reflection of the fact that a blunder is required for a win in chess, and there aren't many blunders in TLCC these days, then the above problem doesn't arise.

* if it's caused by groupthink, the players must therefore be relying more on the computers (or the high draw ratio in completed games would have been there previously)

If I understand your question correctly. You are still comparing a type B search to type B search. And as I have said, you still can improve a type B search. But it is not a perfect search, and can never rise to the level of a perfect search with a type B search. Even given unlimited time for the search with a type B search.

Alayan · Post by **Alayan** » Sun Nov 15, 2020 8:40 pm

IF there are blunders THEN they can be exploited.

An imperfect opponent might not exploit such blunders all the time, but should be able to spot and exploit them some of the time, provided it is as strong or stronger and is not extremely similar in where it fails and where it succeeds.

All engines doing heavy pruning give them a common weakness to very deep tactics that follow bad-looking moves, but different engine/engine version/ICCF methods/search times will produce differences.

The number of mistakes decays exponentially with depth. If there is a lot of mistakes made, then some should be exploited even if many more remain unknown. The assumption that new blunders are avoided from e.g. d1 to d50, then nothing from d51 to d150, then at d151 lot of new blunders could be spotted, doesn't make any sense.

The assumption that you need to see as deep as a TB to not make a mistake is flawed. To give a simple example. I see much less deeper than Stockfish. I barely calculate a few nodes per move in a blitz game. If I reverse-analysis my games with Stockfish, I'll find a lot of inaccuracies, mistakes and blunders, but also plenty of moves that are deemed best or very close to best in non-book and non-obvious positions. General heuristics allow me to make some good choices at a much better than random rate even though I'm unable to calculate the long-term effects. In the same way, Stockfish will have plenty of blunders and misevaluated positions in its search tree, but it converges to good moves and will frequently avoid blunders because of seeing a minor disadvantage in a suboptimal line while missing the major disadvantage in the optimal line.

Uri Blass · Post by **Uri Blass** » Sun Nov 15, 2020 11:58 pm

towforce wrote: ↑Sun Nov 15, 2020 7:05 pm
mwyoung wrote: ↑Sun Nov 15, 2020 5:13 pm
towforce wrote: ↑Sun Nov 15, 2020 11:48 am
mwyoung wrote: ↑Sun Nov 15, 2020 4:51 am
towforce wrote: ↑Sun Nov 15, 2020 1:53 am For this mini-thought experiment, please assume that chess is drawn (I know it's not proven yet):

* losses strongly correlate with blunders

* the deeper the search, the fewer the number of blunders

Unfortunately, not all engines measure depth in the same way. However maybe we can come up with a "reasonable guess" based on experience.

Another complicating factor: some positions would require a prohibitively deep search to uncover the blunder. In these cases, knowledge would be needed: the eval would need to be able to avoid blunders that search cannot reach. The good news regarding this is that, thanks to NNs, engines are also getting cleverer now, as well as just faster. Again, exactly how "smart" a NN is is difficult to say - but again, we can have a go.

So if chess is drawn (which I believe it is), then the time to perfect chess engines depends on the shape of the 3-dimensional chart that plots blunders against depth and knowledge.

Edit: here's a simplistic view of what the 3d graph might look like (X = depth, Y = knowledge, Z = blunders. Simple expression produces a plane. Drag with the mouse to rotate up/down/left/right to see clearly) - link.
This is were some do not understand the problem.

"The deeper the search, the fewer number of blunders."

The problem is the type B search. A type B search is fine playing scrub humans, and other scrub engines. It gives us a great approximation.

The issue is you are making billions of guesses as to what lines to cut to achieve the great search depths we see today. And you only need to be wrong once against perfect play.

And no amount of search in a type B search can ever achieve perfect play.

This is why we see the errors as shown here in this thread. And why Stockfish fails in the examples against perfect play.

You haven't addressed the issue of knowledge which I raised (see above quoted text). You appear to be saying that the 3 dimensional chart should have a long tail on the way to Z=0 (if you're willing to assume that chess is a draw without a blunder). Maybe you could come up with your own mathematical expression and redraw my chart? "A picture is worth a thousand words".

In this post Albert Silver told us that in top level correspondence chess (TLCC), wins are rare in completed games. Let's consider some candidate reasons why this might be so (my preferred choice is option 1 - that TLCC is the cutting edge, and is almost there in terms of error-free chess).

1. Chess is a draw, a win requires a blunder, and TLCC has almost eliminated blunders

2. Chess is a draw, a win requires a blunder, blunders occur in TLCC, but TLCC suffers from groupthink, and hence the players fail to find each other's blunders

3. Chess is a win, but TLCC players are not good enough to find the available wins

Which of the above 3 choices do you prefer?
"You haven't addressed the issue of knowledge which I raised"

Yes, I have many times. And in the knowledge standard you are asking for only exist in one form. As I said before Chess is a 100% tactical game...

And I will take option 4. Chess is either a win or a draw, but it does not matter, as humans are a type B searcher, and the computers they using are a type B searcher. Even in correspondence chess, and hence the players fail to find each other's blunders.

"Another complicating factor: some positions would require a prohibitively deep search to uncover the blunder. In these cases, knowledge would be needed: the eval would need to be able to avoid blunders that search cannot reach."

And it is above that tells me you have no idea what you are talking about. You are just putting words together that you think make sense. But are logically flawed. Not only do you not know the rules of chess, but you are clueless as to how a type B search works.

If you had an eval that could "avoid blunders that search cannot reach."

If you had this type of evaluation. Do you know what would not be needed........A search of any kind.

Here is a simple test to see if you have an evaluation that meets your standard. If your STATIC EVALUATION outputs anything other then the 3 true evaluations of chess, and it is not correct 100% of the time. Your evaluation is flawed.

And yes this type of knowledge does exist in only one form, and it is called a table base.

You're basically right - but if a position was won, you'd want one more thing from the eval - distance to mate. If you had a choice of winning moves, your preference would be for the one that reaches mate first.

To summarise your answer as to why the draw ratio is so high in completed TLCC games: the players all use similar computers for analysis, and this is causing groupthink.

I cannot prove that you're wrong, but here's a bit of evidence against that assertion:

* TLCC having such a high draw ratio in completed games is relatively recent

* if it's caused by groupthink, the players must therefore be relying more on the computers (or the high draw ratio in completed games would have been there previously)

* therefore, one would expect the computers playing each other to also have high draw ratios

* we're not (yet) seeing such a high draw ratio in computers playing each other

If the high draw ratio in completed games in TLCC is actually a reflection of the fact that a blunder is required for a win in chess, and there aren't many blunders in TLCC these days, then the above problem doesn't arise.

TCEC stopped using the opening position for computer-computer games.

I suspect that we are going to have also at least 95% draws if you play a match between stockfish and lc0 or stockfish and dragon from the opening position at TCEC conditions but without wrong opening(give the engines to choose their moves).

Do you have a proof that my opinion is wrong?

towforce · Post by **towforce** » Mon Nov 16, 2020 12:31 am

Uri Blass wrote: ↑Sun Nov 15, 2020 11:58 pmTCEC stopped using the opening position for computer-computer games.

I suspect that we are going to have also at least 95% draws if you play a match between stockfish and lc0 or stockfish and dragon from the opening position at TCEC conditions but without wrong opening(give the engines to choose their moves).

Do you have a proof that my opinion is wrong?

Not me. I respect your opinion and consider myself to have just been educated!

In light of this new (to me) information, here's my quick reassessment of the thread title:

* we are nowhere near "perfect chess" (defined as 32 piece tablebase equivalence)

* at long time controls, we are quite close to "death by draw"

* Top level correspondence chess is even closer to death by draw

* it likely won't be long until pure computer chess reaches the current standard of top level correspondence chess

From this, I make the following speculation:

* without a blunder, chess is drawn

* the rate at which computers make blunders at long time controls is approaching zero

* hence, computers are close to being unbeatable, and top level correspondence chess is at the cutting edge of this

* we don't know when chess will be proven to be a draw, or how many people will be highly motivated to prove that chess is a draw

* my opinion is that if several people were highly motivated, and were able to spend the time, it could be done with today's technology without having to calculate the full game tree, and the task will become steadily easier as time passes

* however, when death by draw comes, unfortunately it is likely to have the effect of dramatically reducing interest in computer chess. The chess computer will be "nothing more" (ha!) than the "truth machine"

Uri Blass · Post by **Uri Blass** » Mon Nov 16, 2020 2:09 am

towforce wrote: ↑Mon Nov 16, 2020 12:31 am
Uri Blass wrote: ↑Sun Nov 15, 2020 11:58 pmTCEC stopped using the opening position for computer-computer games.

I suspect that we are going to have also at least 95% draws if you play a match between stockfish and lc0 or stockfish and dragon from the opening position at TCEC conditions but without wrong opening(give the engines to choose their moves).

Do you have a proof that my opinion is wrong?

Not me. I respect your opinion and consider myself to have just been educated!

In light of this new (to me) information, here's my quick reassessment of the thread title:

* we are nowhere near "perfect chess" (defined as 32 piece tablebase equivalence)

* at long time controls, we are quite close to "death by draw"

* Top level correspondence chess is even closer to death by draw

* it likely won't be long until pure computer chess reaches the current standard of top level correspondence chess

From this, I make the following speculation:

* without a blunder, chess is drawn

* the rate at which computers make blunders at long time controls is approaching zero

* hence, computers are close to being unbeatable, and top level correspondence chess is at the cutting edge of this

* we don't know when chess will be proven to be a draw, or how many people will be highly motivated to prove that chess is a draw

* my opinion is that if several people were highly motivated, and were able to spend the time, it could be done with today's technology without having to calculate the full game tree, and the task will become steadily easier as time passes

* however, when death by draw comes, unfortunately it is likely to have the effect of dramatically reducing interest in computer chess. The chess computer will be "nothing more" (ha!) than the "truth machine"

I read now what I wrote and it seems that I was not accurate but my main point is the same.
To be more correct I meant to say that TCEC started to use unbalanced positions in the superfinal to prevent draws instead of balanced openings or no opening because balanced opening lead to too many draws.

Personally I would like to see good books against no book in TCEC conditions to see if some good book can help to win games at top levels against no book.

Note that I am not sure if computers are close to be unbeatable at long time control but it is possible that it is the case
or alternatively you need to take advantage of their specific weakness to win and a normal perfect player who does not know about their specific weaknesses is going to draw most games against them.

towforce · Post by **towforce** » Mon Nov 16, 2020 10:16 am

Uri Blass wrote: ↑Mon Nov 16, 2020 2:09 amNote that I am not sure if computers are close to be unbeatable at long time control but it is possible that it is the case...

Consider two possibilities:

1. The "blunder" model is correct: chess is a draw, and a game can only be won if there's a blunder. Chess computers (and top level correspondence players) are making fewer blunders, so the draw ratio is now high

2. Any other model

Model 1 cleanly explains everything we see. Any other model would require something very unexpected to be true to explain the facts.

So the blunder model is not proven - but it fits the known information very well. Any other model would require something highly unexpected, and which has never before been seen in any other old turn-based game.

mwyoung · Post by **mwyoung** » Mon Nov 16, 2020 10:30 am

towforce wrote: ↑Mon Nov 16, 2020 10:16 am
Uri Blass wrote: ↑Mon Nov 16, 2020 2:09 amNote that I am not sure if computers are close to be unbeatable at long time control but it is possible that it is the case...

Consider two possibilities:

1. The "blunder" model is correct: chess is a draw, and a game can only be won if there's a blunder. Chess computers (and top level correspondence players) are making fewer blunders, so the draw ratio is now high

2. Any other model

Model 1 cleanly explains everything we see. Any other model would require something very unexpected to be true to explain the facts.

So the blunder model is not proven - but it fits the known information very well. Any other model would require something highly unexpected, and which has never before been seen in any other old turn-based game.

I am sorry but model 1 explains everything you see to your understanding.

And this is because you assume too much.

I can replicate exactly what you see with fast time controls and equal engines. And have many matches showing this result. Does this mean chess engines also play near perfect chess at very fast TC and much weaker hardware then TCEC, or that just two strong and equal engines can not best each other.....Hmmm.

If this is the case. I do not expect you to update your engines or computers for better chess engine performance. Just like Uri....Enjoy!

towforce · Post by **towforce** » Mon Nov 16, 2020 10:48 am

mwyoung wrote: ↑Mon Nov 16, 2020 10:30 amI can replicate exactly what you see with fast time controls and equal engines. And have many matches showing this result.

Are you sure about this? Remember TCEC deliberately imbalances openings to reduce the probability of drawn games. Are you saying that chess computers with state the art hardware and software are getting 95% draw rates at blitz time controls? If so, which matches produced this result?

mwyoung · Post by **mwyoung** » Mon Nov 16, 2020 10:49 am

towforce wrote: ↑Mon Nov 16, 2020 10:48 am
mwyoung wrote: ↑Mon Nov 16, 2020 10:30 amI can replicate exactly what you see with fast time controls and equal engines. And have many matches showing this result.

Are you sure about this? Remember TCEC deliberately imbalances openings to reduce the probability of drawn games. Are you saying that chess computers with state the art hardware and software are getting 95% draw rates at blitz time controls? If so, which matches produced this result?

Yes! You need to see my Lc0 vs Allie match. It was a draw fest!

"You're playing not to lose, Josh" Searching for Bobby Fischer.

towforce · Post by **towforce** » Mon Nov 16, 2020 10:53 am

mwyoung wrote: ↑Mon Nov 16, 2020 10:49 am
towforce wrote: ↑Mon Nov 16, 2020 10:48 am
mwyoung wrote: ↑Mon Nov 16, 2020 10:30 amI can replicate exactly what you see with fast time controls and equal engines. And have many matches showing this result.

Are you sure about this? Remember TCEC deliberately imbalances openings to reduce the probability of drawn games. Are you saying that chess computers with state the art hardware and software are getting 95% draw rates at blitz time controls? If so, which matches produced this result?
Yes

In that case, an important new test is needed: top computers at blitz time control (ponder off) v top computers at long time control (ponder on).

Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions

Re: Perfect chess engine elo ( 32 men TB) can be within 200 of Stocfish in Tcec LTC conditions