Dave Gomboc wrote:michiguel wrote:I have been tuning eval parameters in Gaviota with "apparent" success. First ~800 fast games suggest a possible increase in ~70 elo points from the version it played in CCT11. However, look at this position to see what Gaviota played.
[d]r4rk1/1bqnbppp/pp1ppn2/8/2PNP3/2N1B1P1/PP3PBP/2RQR1K1 w - - 5 13
First move after the setup position and it played Nd5?!?!?!? What?
I do not know where the heck that came from, and clearly, the tuning did not work in this position or... did it?
Congratulations: your program is playing real chess.

If you were playing blitz chess with a friend, you could plunk down Nd5 in a second, say "Thematic!", punch the clock, and let your friend suffer while working out the consequences!
I'd be interested to know more about your evaluation tuning (both procedure and resulting parameter values).
Dave
I have ~6 million of relatively quiescent positions from comp-comp games. Of course, I know the outcome of those games and the result from the perspective of the "side to move" (0, 0.5, or 1). Just for this experiment, I modified the evaluation of gaviota to give me probability to win for the side to move (from 0 to 1). Then, I fit the eval parameters to minimize the difference between the probability predicted and the real result. This was the intermediate step to test other ideas. For instance, I intend to refit the parameters to the score that Gaviota will obtain after a short search with the parameters that I just got (and more iterations could follow).
This preliminary attempt sounds crude but apparently it seemed to work in this case (so far, after ~2000 fast games looks like Gaviota gained ~80 ELO points). This does not demonstrate that this system is great, but I guess it demonstrates how horrible my parameters were. Now that I pay attention to them, it is obvious and I should have figure out by myself if I had invested time to do a proper manual tuning. Still, with this method I got back some parameters that look fishy (70 cp for bishop pair seems too high). At any rate, I think that there is some potential in all this.
I tested in the past other things but they did not work. For instance, I tried to fit it to the evaluation given by stronger engines like Crafty and Yace (which have a command "score" and are stronger than Gaviota); but the parameters obtained were not good at all. Maybe the idea was flawed, maybe I did not have enough positions (~30 k). I don't know.
Miguel