This is all true. Another problem worth pointing out is the counterintuitive phenomenon that a new evaluation term that is correct in most or even almost all positions can sometimes reduce the playing strength. Lots of times I have identified some piece of missing evaluation knowledge, implemented it as well as I could, and seen the playing strength drop by 10 or 15 Elo points. The most frustrating thing is that it often looks like the engine is playing better with the new piece of knowledge, and I can easily identify games where the new knowledge helps it win. Nevertheless, when a large number of games are played, the statistics prove that the program played better without the new knowledge. Sometimes tuning the weights can help, but not always.mcostalba wrote:Here is my opinion regarding this point.Uri Blass wrote: I disagree with you and I believe that strong players can help to improve stockfish's evaluation based on watching stockfish's games
Implementation of any evaluation idea in chess engines is always the combined effect of two contributions.
1 - The actual idea, for instance a new way to evaluate pawns structure or a new way to evaluate passed pawns.
2 - The tuning of the coefficents that _weights_ the idea among the other evaluation terms.
IMHO a strong player could be effective regarding point one, i.e. to propose an interesting idea. Then the next step is "to make the idea to work" (because normally even a good idea does _not_ work at first try) finding the right coefficents bounded to the idea in the evaluation code and this is a task up to real games testing possibly without human intervention.
The wrong thing to do IMHO is to pretend a human is effective in both the first and the second points.