lkaufman wrote:Bob, here's a question inspired by one of your above comments. You indicated that when Crafty was playing another engine, playing a gambit like Evan's was a bad idea, basically just losing a pawn. This implies that it takes more comp to offset a pawn in engine vs. engine play than in human vs human play. However, your scores for mobility and/or piece location must be very high in Crafty, since (for example) after 1e4 c5 the position is already evaluated as nearly winning for White. Presumably these very high scores for dynamic factors (relative to material) scored well in your testing. So how do you reconcile these apparently contradictory facts? How can it be right to score unclear gambits as sound if the engine usually loses when playing them?
I think the idea is split into two parts.
(1) giving up a pawn. We have a lot of evaluation terms that can cause crafty to sac a pawn, and when we get to king safety, even a piece. But any of these can blow up. Nothing worse than seeing the eval say 0.1, yet when I look at just material, I see -1.00. And then the eval goes 0.1, -.05, -0.2, -0.4, ... and slowly settles to -1.0. Why? I believe it is because it is really difficult, in a program, to sum up "advantages" equally. Some are pretty permanent (weak pawns, but not always), while some are transient (knight temporarily trapped on a1, for example, but it can escape.) As a human, I automatically look at these things and create 3 baskets of advantages. (1) permanent (say a knight on E5, no enemy pawns on F or D files to chase it away, no enemy knights nor an enemy bishop that can trade itself for that knight. So that is going to be around for a while. (2) fleeting. Things like an isolated pawn that can soon be traded away. (3) unclear. Things that are hard to categorize as either of the first two cases. I tend to look at these more closely to try to resolve the issue so that everything is a 1 or a 2.
Computer programs seem to have a tough time with this concept. I have seen major positional advantages evaporate over time. If you give up a pawn for such temporary advantages, things go bad.
2. not giving up a pawn but doing something else to get a positional advantage. Giving up the bishop pair for lasting pawn weaknesses. Assuming that they actually last and that you can exploit them.
From watching a lot of games, I generally feel better being material up and defending, rather than being material down and having the onus on me (my program, actually) to prove that it has sufficient compensation for the material.
Computers are bad about making a "lemon move" here or there when it becomes difficult to improve their positions further. And often, one "passing move" is enough to let the opponent begin to untangle whatever it is that got tangled when it took the pawn. And the bind unravels, and you are left with a missing pawn to deal with.
For years I followed the "Ken Thompson" mantra that said "nothing is worth a pawn." But after 15+ years of playing against GM players on ICC, I discovered that was not the smartest approach.
Nowadays we may well have gone too far in certain things, but I occasionally run "sanity tests". I take some major positional component (not a single value) such as passed pawns, or king safety, or mobility, and "scale" those values and run cluster matches. I will try things like 90%, 80% and then go on the other side with 110% and 120%.
What I want to see is an Elo "peak" at 100%, with it dropping off on either side. If it actually would improve at 90%, we'd then look at scaling the terms to pick up that lost Elo...
So I _hope_ our stuff is not too big. But here's a risk. Suppose you have some term that would really be best fit by a 3rd degree polynomial. But you don't know that. And you choose to fit it with a linear equation (straight line) by adjusting the slope and constant to get the best Elo. What you end up with is a term that is best for overall Elo, but which is quite bad for some cases. And if you kept the "linear fit" for the places where it worked well, and switched to a polynomial in the other places, you would get an even bigger Elo gain. But not knowing about the polynomial, that's lost. And there is no doubt that some of my positional terms fit *exactly* that description. Tuned for best overall Elo, but could be done better with an alternate formulation. Same is true for other things.
Good case is the LMR stuff. I looked briefly at the Stockfish stuff of trying to increase the reduction as you complete more and more moves. But here's a question. Given two sets of moves, MA(i) and MB(i) where both MA and MB have the same number of moves, but in MA the first couple are good and the rest are lemons, while in MB all but the last couple are good. What is the justification for reducing the middle move in MA the same as the middle move in MB, when the middle move in MA is bad while the middle move in MB is good? No justification at all. Just force-fitting a linear value to a polynomial data set, and finding the best liner approximation you can get, even though it is bad in many cases.
I do not like the idea of reducing (or extending) something just because it comes before or after the Nth move in the list. Surely there must be a better reason. I certainly do not do that kind of nonsense as a human. I believe we can do this stuff pretty well (fit the wrong type of curve, but at least optimize the poor fit so that we arrive at the best "poor fit" we can get.)
So with that level of distrust, I'd think it is pretty obvious why I'd rather have "pawn in hand" and fight to nullify the compensation, as opposed to having the compensation and fighting to eventually convert it to a pawn or more in hand...
Hope that wasn't too much rambling. It's an interesting topic. Particularly when I know we are optimizing something to work best in the general case, rather than modifying the code so that we have different alternatives for different cases so that the overall fit is much better. But we are working on these issues regularly...