hgm wrote: ↑Sun Jul 11, 2021 10:21 am
Perhaps the misunderstanding is about what you mean by 'strength of a move' expressed in Elo. I imagined you meant that the move would be typically played by a player with a certain rating. Like for a given situation, GMs would typically prefer move A, but patzers would prefer another move, because it is without their grasp to see the merits of move A. My point was that this cannot be decided from the evaluation drop.
By strength of move, I am referring to what strength the player is likely to be by playing that move. For example, 2.Qh5 trying to get scholars mate is likely to be played by say a 400 elo player, while a GM might play 2.Nf3 instead. A move played by a GM that loses a queen, but it is very hard to see is likely to be played by a GM, but not a 3,500 rated engine. So the "losing" move would have an elo strength of 3,500.
hgm wrote: ↑Sun Jul 11, 2021 10:21 am
A wrong move that eventually leads to forced loss of a Queen in a complex way might initially look good or even best to the engine, until it reaches the search depth that gets the loss within the horizon. Then the score will drop dramatically. So which score are you going to take to determine the Elo of that move?
You take the elo of that move based on the search depth or time you have attributed. We don't say we cannot evaluate a position because we can't calculate all variations to the end of the game. Engines do evaluations, as do humans. The evaluations may be wrong because they are not "perfect", but we still do them and use them. Should we stop showing engine evaluations because they may be inaccurate, due to the horizon and breadth effect? Should engines not use an evaluation function because if they evaluated to 20 ply, it might be proven wrong had they evaluated to 30 ply?
Or look at it this way. If you had an engine look 10 ply ahead, that engine would have a rating of say 2,000. A 20 ply look ahead would have a rating of 3,000. It takes 15 ply to realise the move you played was not the best one. If the evaluation function used 10 ply, you would give the move score to be 2,000. If the evaluation function used was 20 ply, they would rate the move as poor, and so its score would be based on a formula factoring in the 3,000 rating, and how poor the move was.