BrendanJNorman wrote: ↑Mon Nov 09, 2020 8:03 pm
Alayan wrote: ↑Mon Nov 09, 2020 6:41 pm
You NEED a way to evaluate which positions are more difficult to play than others. This is fundamental.
What could possibly be 'difficult' for an endgame tablebase?
But that's the point !
The tablebase has NO CLUE what is the difference between a difficult to hold draw and an easy to hold draw, because for the TB it's all the same. It's all easy.
But for the imperfect opponent you're trying to win against, there IS a huge difference.
So to beat the imperfect opponent, you need a way to model which positions are difficult for the imperfect opponent and try to create as much difficulty as possible for it.
BrendanJNorman wrote: ↑Mon Nov 09, 2020 8:03 pm
Consider the following position and hypothetical scenario.
Stockfish as black has just allowed white to enter this line where he can promote a pawn.
White has just played: a8=Q
[d]QN4n1/6r1/3k4/8/b2K4/8/8/8 b - -
Stockfish thinks he is
completely okay.
But Stockfish is deluded. He simply hasn't seen the forced
mate in 545 - which the TB sees
instantly.
And this is only a SEVEN man position. Stockfish's failures will increase exponentially as more pieces are added, while the TB remains perfect.
Chess is played with the 50 move rule. Weird pawnless positions with ultra-long shuffling are of very little practical relevance because they're virtually inexistant in practical play and you can't force them to happen at no cost.
In chess, if your opponent doesn't make mistakes, to create imbalances you need to make your position worse (it would score worse between imperfect opponent of a similar strength). This doesn't help overcoming the drawing margin.
No it's not. Stockfish + 6 men from 1.5 years ago could play perfectly 99.999% of 7-men positions in TCEC endgame conditions. Forcing on the board a position where it would go wrong is not a trivial task.
BrendanJNorman wrote: ↑Mon Nov 09, 2020 8:03 pm
Your argument contains false logic. There is an incalculably enormous difference between a 32 man TB and whatever "everything allowed" method yo you are proposing someone challenge you with.
The 32-men TB will get much less positions wrong than what I'm suggesting to be challenged with, sure.
But your argument about SF being very far from the TB relies on the assumption that SF is making a lot of inaccuracies/mistakes. If it makes few of them, then it's close from the TB;
If Stockfish is so inaccurate, then massively skewing the odds should create plenty of opportunities to take advantage of those.
A few years ago, a doubling of TC on big hardware/TC was still estimated to bring 30-50 elo.
Mwyoung with his hardware could easily have 10 doublings advantage on my SF, plus using whatever search method enhancement and auxiliary engines like Leela. With the old rules of thumb, this should mean crushing my SF easily.
BrendanJNorman wrote: ↑Mon Nov 09, 2020 8:03 pm
Firstly, the "everything allowed" has only engines weaker than stockfish and opening books (made from the games of engines weaker than stockfish or equal), plus 7-man tablebases (max) - how on earth is that a good test for how SF will handle a 32 man TB as an "opponent"?
You don't get it.
There is no "only engines weaker" clause.
My opponent can use Stockfish itself. It can run any version of it he wants, he can run the very same version I'm using to check if it blunders in different positions to try and steer the game towards positions where it blunders.
Using Leela or others, even if they are weaker than SF overall, can help to find blindspots and different move suggestions.
Opening books are much more useful than what you make them to be, there is a reason ICCF players and playchess engine room users maintain their owns.
BrendanJNorman wrote: ↑Mon Nov 09, 2020 8:03 pm
Your logic says: "Let's test how good a seabass is at swimming, by placing prime Phelps and Thorpe in a swimming race in perfect conditions. Thorpe can have a 10-yard headstart. If Phelps still wins, it means he is faster than a seabass".
It seems like you are arguing from a false premise.
That is: "If Stockfish has a low error rate against current bleeding edge opponents - this is proof that Stockfish could draw with a 32 man tablebase"
No. Stockfish drawing in this experiment would not be proof it could draw against any possible opponent. I have never claimed it to be so, do not put into my mouth words I didn't say.
I said it would be an indication at how far we are from perfect play, and I stand by this claim. If 1000x the resources fails to achieve a win, then thinking current SF is thousands of elo away from perfect play is dubious because if it makes so many mistakes, an opponent that is all around better should manage to take advantage. If it does win, it doesn't tell us how close we are to the ceiling but it demonstrates convincingly current SF in conditions similar to mine would be crushed by TB32+ opponent modeling, and that the original poster of this thread is just wrong thinking SF might be 200 elo off the TB32.
Of course, a single game is a tiny sample, but that's a start.
BrendanJNorman wrote: ↑Mon Nov 09, 2020 8:03 pm
The thing is, computer chess has a revolution every 8-10 years (estimate) where we see Rybka arrive, then the open-source revolution that birthed Stockfish, then the AlphaZero/Leela craze and now NNUE.
At almost every revolutionary moment, we had people kind of inferring that these engines were now approaching "perfect" chess - despite the fact that we are, as yet, only at 7 man tablebases.
Truth is, we are still FAAAAAR from perfect chess, and perfect chess would be unrecognizable.
We won't see perfect chess, perhaps for another 50 or 100 years.
Try understanding the moves (even with SF's help) of the solution of that mate in 545 above.
And that's only with 7 men on the board. Slightly less than 22% of the entire set - hardly even a glimpse at what perfect chess will look like.
I agree with you that since the Rybka days, there has always been people to think that the newest latest engines was just next to perfect play, only to be proven wrong when new engines came out beating the old version by dozens of elo. You're right that this thread's OP could look silly in a few years if we get an engine beating SF12 by 200 elo at TCEC conditions.
However, running these old "perfect" engines at 1:100 odds vs themselves (with the odds giver at a high TC) would give you plenty of wins, and correspondence players could reliably outmaneuver and beat engine slaves.
These engines made mistakes, and it was possible to show it even without access to engines of the future.