Extensions, anyone?

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
hgm
Posts: 28361
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Extensions, anyone?

Post by hgm »

Dann Corbit wrote:I think there are many reasons why these changes are hard to measure.
Another obvious one is that the positions that contain the evaluation terms only arise in rare circumstances. In your example of wrong bishop with a rook pawn, how many games will we have to examine before we will find such a game?
This is why it is far better to directly measure the effect of a certain caharacteristic of a position, by determining it effect on winning rate when it occurs, rather than inirectly, by changing the weight of the evaluation term corresponding to his characteristic, and hope it will make the characteristic occur at a different frequency.

You have to be careful, though, to choose positions where the characteristic occurs as a strategic (long term) one. For piece values this means you should exclude tactical positions with a material imbalance, as the true imbalance there is not what you think. Only in quiet positions the material imbalance is a strategic feature.

Similary, giving one side a corner Knight in anotherwise symmetric middle-game position would be pointless if the Knight can move out of the corner in a single move. Ony if the Knight is tied to the corner, e.g. beause it has to keep defending a Pawn that can only be defended in other ways with great difficulty, the cornered Knight becomes a strategic feature, and can be expected to affect the winning rate. (If it was not a strategic featre it wold also not affect the score or move choice a lot, no matter how you evaluate it, as the search would simply makes the feature disapper at the expense of a tempo just before the leaves.)
Aleks Peshkov
Posts: 902
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia
Full name: Aleks Peshkov

Re: Extensions, anyone?

Post by Aleks Peshkov »

Looking how many commercial engines became Rybka-like, I bet that they all tune their evaluation to make moves Rybka made. It is much simpler then to measure winning performance of changing evaluation.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Extensions, anyone?

Post by bob »

michiguel wrote:
Dirt wrote:
bob wrote:If you make several changes over time, with a +7, a -5, a +3 a -1, a +0, a +4 a -3 and a -2 Elo change, overall you come out better. But eliminating those negative changes makes it much better. That was my point...
How confident can we be that the changes will combine in even a close to linear manner? The sum of the changes you list is only +3, but it wouldn't surprise me if the actual combined effect could be a +12. An effect like this seems the easiest way to explain Tord's ability to improve Glaurung by such large amounts.

From what I remember of your testing, you've been choosing to remove evaluation terms if testing shows the same strength with or without them. I'm wondering if that might be backwards.
It is backwards if human knowledge tells you that the evaluation term is safe and beneficial. There are many that provide maybe 1 elo point or less; however, we know that they never hurt and once in a while there are good. For instance, knowledge about the wrong bishop with only a rook pawn left. I do not think you can measure any elo difference, but the parameter should stay. Once you have added several of these small items, you get an engine that plays better endgames and who knows, maybe 20 points stronger. Most of endgame knowledge is ant work.

Miguel
I don't agree, because these are very complex programs. And things that are perfectly "safe and natural" can be quite dangerous or bad when mixed. I can think of lots of redundant eval terms such as (for rooks) mobility, open files, half open files, 7th rank, behind passers, etc. Not all of those are necessarily good when combined since some things can be derived from others...

In any case, I have found a few of those things that sound perfectly reasonable and have been in my program for years, yet when tested carefully removing them actually helped overall strength. Some things you might think are major issues have zero impact, and some things that appear to be minor things actually can be major. Going completely against intuition at times.