zullil wrote: ↑Thu Feb 06, 2020 5:13 pm
Apparently only one move wins. Good luck to all the centaurs. And to all the engines without endgame tables.
For sure. There must also be (or I hope there are) simpler examples, where "simpler" probably means some combination of fewer pieces and smaller DTM (i.e. not ~500 or ~1000).
Yes, I simply chose a "random" example. And imagine how many such positions might exist. We could redefine chess to have certain 8-man initial positions and no one would be able to give the theoretical result of the game. Forget about deciding about 1. g4
Last edited by zullil on Thu Feb 06, 2020 5:38 pm, edited 1 time in total.
The question is still if those examples are relevant, because the winning player can just tag the position as drawn (who cares if it wins if I can't find the win?) and go for a different won position that is clear.
The job of the winning player is not to maximize the engine's eval or play the fastest way to mate, their job is to play into positions that are the easiest to win, and "only one move wins" positions would be hard to win by definition, and could be just tagged as draws and avoided.
Ovyron wrote: ↑Thu Feb 06, 2020 5:38 pm
The question is still if those examples are relevant, because the winning player can just tag the position as drawn (who cares if it wins if I can't find the win?) and go for a different won position that is clear.
The job of the winning player is not to maximize the engine's eval or play the fastest way to mate, their job is to play into positions that are the easiest to win, and "only one move wins" positions would be hard to win by definition, and could be just tagged as draws and avoided.
And how will you even recognize such positions? Say those with 11 men? Or 19?
Ovyron wrote: ↑Thu Feb 06, 2020 5:38 pm
The job of the winning player is not to maximize the engine's eval or play the fastest way to mate...
We're not talking about the fastest way to mate (which even 7-man TBs won't help with). We're talking about the one lonely move that does not throw away the win, and we don't know how many or which positions those are with >7 men.
Ovyron wrote: ↑Thu Feb 06, 2020 5:38 pm
... "only one move wins" positions ... could be just tagged as draws and avoided.
Even if you had this magic tagging power (the only known source of this magic power is... yep... tablebases!), you cannot avoid a position if it is the current position or the start position!
Last edited by jp on Thu Feb 06, 2020 5:51 pm, edited 1 time in total.
zullil wrote: ↑Thu Feb 06, 2020 4:51 pm
Right. Is there a handy example of a 7-man position that is a theoretical draw for the side to move, but for which only one or two non-obvious moves hold the draw?
I'm interested in finding such positions too, e.g. the simplest possible endgame positions that are too hard for computers alone or even centaurs. They'd probably need to be at least 5-man, I guess.
[d]6N1/3n4/3k1b2/8/1r6/5K1Q/8/8 w - - 0 1
Apparently only one move wins. Good luck to all the centaurs. And to all the engines without endgame tables.
Stockfish (with 6-man tables) has Qf5 with eval +0.38 at depth 54. Wrong move, and eval is off by infinity!
zullil wrote: ↑Thu Feb 06, 2020 5:51 pm
Stockfish (with 6-man tables) has Qf5 with eval +0.38 at depth 54. Wrong move, and eval is off by infinity!
Let it keep running, so its humiliation will be complete.
But really (and this ties to my previous posts) the greatest humiliation is once the depths start being comparable to the DTM and it still has no clue, which is why I'd like short examples. Unfortunately, we're not going to get SF to depth 500 or 1000 at this time.
zullil wrote: ↑Thu Feb 06, 2020 5:51 pm
Stockfish (with 6-man tables) has Qf5 with eval +0.38 at depth 54. Wrong move, and eval is off by infinity!
Let it keep running, so its humiliation will be complete.
But really (and this ties to my previous posts) the greatest humiliation is once the depths start being comparable to the DTM and it still has no clue, which is why I'd like short examples. Unfortunately, we're not going to get SF to depth 500 or 1000 at this time.
Cfish has done better. It has the right move, though its current evaluation is not at all convincing. Of course, Cfish is currently using 6-man tables. I should disable those.
zullil wrote: ↑Thu Feb 06, 2020 6:09 pm
Cfish has done better. It has the right move, though its current evaluation is not at all convincing. Of course, Cfish is currently using 6-man tables. I should disable those.
+0.72 1. Kg2 ... 35. Kc2 (depth 54, 0:30:43)
Is this whole line correct, or just the first move? How many plies in the correct solution before conversion to 6 men?
zullil wrote: ↑Thu Feb 06, 2020 1:33 pm
Without 7-man endgame tables, Stockfish-dev's (static) evaluattion of this position is +4.67. But it's a draw. How many similar positions are there, say with eight or nine men, that Stockfish totally misevaluates?
Well yeah, that's why I was saying we can get a probability, not a certainty. It wouldn't be "solved" but it'd be something like 99.9% sure it's losing. The chances of such positions occurring in real games are very, very low. BTW, LC0 without EGTBs evaluates it as around 0.2 right away.
I wonder if there is a larger or a small percentage of such positions going up to 8, 9, 10-piece EGTBs. I'm guessing smaller. We could take random 5-piece and 6-piece EGTBs positions to see where programs without EGTBs misevaluate compared to EGTBs and it would probably be the same ratio as when going up to more pieces.
zullil wrote: ↑Thu Feb 06, 2020 1:33 pm
Without 7-man endgame tables, Stockfish-dev's (static) evaluation of this position is +4.67. But it's a draw.
Well yeah, that's why I was saying we can get a probability, not a certainty. It wouldn't be "solved" but it'd be something like 99.9% sure it's losing. The chances of such positions occurring in real games are very, very low. BTW, LC0 without EGTBs evaluates it as around 0.2 right away.
I disagree. The position there wasn't even remotely unusual. There must be millions like that one with a fortress.
Has Leela's endgame play improved? Otherwise any good (difficult) endgame evals by it coud be claimed to be largely fluke.
mmt wrote: ↑Thu Feb 06, 2020 7:07 pm
I wonder if there is a larger or a small percentage of such positions going up to 8, 9, 10-piece EGTBs.
We could take random 5-piece and 6-piece EGTBs positions to see where programs without EGTBs misevaluate
This would be interesting. What's a good way to do it?