Enhancing parameters values

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

jsuberd

Enhancing parameters values

Post by jsuberd »

Hi

Well I am pretty sure all has been tried before, but what do think on these any way ?

Consider the parameters in engines which can be enhanced. Including those of eval and search. I suppose a common method in practice is to consider each one independent of others. Then several values is tried for that parameter and after some testing (playing games) it is decided whether it works or not. Some questions arise though.

1- What if the parameter is corelated with some other ones?
2- How can one be sure that the values tried are acceptable ones? (let alone best ones)

1- Lets suppose we can say some parameters are likely to be correlated. For example they all deal with mobility. One can use a method like genetic algorithm to get some initial values for actual testing which is playing games. A (large) set of positions can be tried. Their "base eval" could come from stronger/same engine with more thinking time for them. Then the fitness function can be the difference between the base eval and current eval. goal is to minimize this via changing the values of those parameters. This method could be quick, so a lot of carefully selected positions could be tried. So these new values could be put to actual time consuming tests which is playing games.

2- One idea would be to gain some data related to the parameter from engine , then based on the data, one can answer the questions, not completely though. For instance if the various values are tried, does the function of engine strength resemble a known pattern? Maybe a logarithmic , (partially) linear, with local maximums, ... . If so, after some calculation some better values for trying could be guessed. These guesses could come from actual games or some test positions.
MattieShoes
Posts: 718
Joined: Fri Mar 20, 2009 8:59 pm

Re: Enhancing parameters values

Post by MattieShoes »

Deep thought did automated eval tuning back in the 80's using GM games as the basis of a test suite, and using GM move as oracle for best move. They did all sorts of different things to see what might produce the best results. The paper is available here. The summation would be "lots of interesting ideas, some reason to be optimistic, but limited success."

One problem is that a deeper search may see tactics a shallower one misses. So from the perspective of the shallower search, the deeper one is *wrong*. Tuning the eval using those results... Probably not good

Knightcap uses temporal difference learning -- something I know nothing about but it may be of interest.

I think the general consensus is that hand-tuned evals produce the best results, but some automated system may provide a reasonable starting point for hand tuning.

As for measuring which eval parameters are correlated, I don't know of anything but it'd be interesting indeed :-) Rook-on-7th bonus and king tropism bonuses for rook are probably another correlated one -- they measure different things but generally a rook on 7th rank will have higher tropism score than otherwise, etc.

I think a big hurdle is "engine strength". There's no easy way to measure small changes in engine strength, so if you wanted to try 10 different values for a doubled pawn penalty and be able to measure the miniscule difference in strength, it'd require an obscene number of games.
jsuberd

Re: Enhancing parameters values

Post by jsuberd »

MattieShoes wrote:Deep thought did automated eval tuning back in the 80's using GM games as the basis of a test suite, and using GM move as oracle for best move. They did all sorts of different things to see what might produce the best results. The paper is available here. The summation would be "lots of interesting ideas, some reason to be optimistic, but limited success."
Well, I suppose humans occasionally make excellent moves but sometimes they fail to see obvious ones, computers OTOH are not that "smart" but they are more consistent. So the idea of using GM moves may work for many positions but if the moves miss some key points in some positions it the whole idea can be hurt. One idea would be to verify GM moves with other methods like putting engine to think on that for a long time. Or maybe some blunder checking. But anyway I have my doubts on this method. Practically the problem can arise when the is huge gap between engines eval/move and the GMs one.
One problem is that a deeper search may see tactics a shallower one misses. So from the perspective of the shallower search, the deeper one is *wrong*. Tuning the eval using those results... Probably not good
Such conflict may arise either because the given moves/values are not (nearly) correct, (inclusive) or that is because the engine can not grasp the basic points in the positions.

Some ideas could be :
- Remove positions which could potentially cause huge gap between the two methods, like tactical ones. This can be done when composing the test suite (guessing) or when in run time the disagreement is spotted periodically for certain positions.

- "Giving the engine a map on the road ahead ". That is, feeding the engine some critical position and moves [*] with their extracted values which we know would cause problems to it, due to lack of time spent on the positions. Maybe even some promising positions/moves. In short making the engine aware of what it needs to know.
Knightcap uses temporal difference learning -- something I know nothing about but it may be of interest.
IIRC this is a different issue. Its difference is more than enhancements and alike.
I think the general consensus is that hand-tuned evals produce the best results, but some automated system may provide a reasonable starting point for hand tuning.
IMO anything which could save some cpu cycles can be beneficial. :)
As for measuring which eval parameters are correlated, I don't know of anything but it'd be interesting indeed :-) Rook-on-7th bonus and king tropism bonuses for rook are probably another correlated one -- they measure different things but generally a rook on 7th rank will have higher tropism score than otherwise, etc.
Maybe many tables for various correlated parameters are already in action ...
I think a big hurdle is "engine strength". There's no easy way to measure small changes in engine strength, so if you wanted to try 10 different values for a doubled pawn penalty and be able to measure the miniscule difference in strength, it'd require an obscene number of games.
These all are some guesses. The important part is, when these guesses are more to the point, but wait, this is a guess too ...

[*] In an unrelated thought, what about trying to save some already done computation in game. That is when engine has a pv, P plies deep, it plays the first move, opponent answers as in pv, then instead of starting the search from scratch, how about using the pv with P-2 depth ? Ok, maybe it is silly for some reasons or maybe everyone is using this since it is so obvious. But any way I suppose this is different from ordinary ponder which is done in opponents time. This can be done with having ponder off too, I suspect.
MattieShoes
Posts: 718
Joined: Fri Mar 20, 2009 8:59 pm

Re: Enhancing parameters values

Post by MattieShoes »

jsuberd wrote:
MattieShoes wrote:Deep thought did automated eval tuning back in the 80's using GM games as the basis of a test suite, and using GM move as oracle for best move. They did all sorts of different things to see what might produce the best results. The paper is available here. The summation would be "lots of interesting ideas, some reason to be optimistic, but limited success."
Well, I suppose humans occasionally make excellent moves but sometimes they fail to see obvious ones, computers OTOH are not that "smart" but they are more consistent. So the idea of using GM moves may work for many positions but if the moves miss some key points in some positions it the whole idea can be hurt. One idea would be to verify GM moves with other methods like putting engine to think on that for a long time. Or maybe some blunder checking. But anyway I have my doubts on this method. Practically the problem can arise when the is huge gap between engines eval/move and the GMs one.
Well this was from the 80s, when top GMs were still much better than the best chess engines. They did have code to ignore positions where the GM best move was very bad from the engine eval... Though at the time this was probably to prevent tactics out of horizon. They were using 3 ply searches and GM's *usually* don't miss such short tactics. They may have checked positions for blunders too, I don't know.
One problem is that a deeper search may see tactics a shallower one misses. So from the perspective of the shallower search, the deeper one is *wrong*. Tuning the eval using those results... Probably not good
Such conflict may arise either because the given moves/values are not (nearly) correct, (inclusive) or that is because the engine can not grasp the basic points in the positions.

Some ideas could be :
- Remove positions which could potentially cause huge gap between the two methods, like tactical ones. This can be done when composing the test suite (guessing) or when in run time the disagreement is spotted periodically for certain positions.

- "Giving the engine a map on the road ahead ". That is, feeding the engine some critical position and moves [*] with their extracted values which we know would cause problems to it, due to lack of time spent on the positions. Maybe even some promising positions/moves. In short making the engine aware of what it needs to know.
They experimented with different ideas like weighting positions differently based on how close the engine was. Reciprocal weighting sounded very interesting, where it mostly scores based on positions where the engine is *almost* right.

With the obscene amount of processing power lying around, one could do other things too, like take a strong engine and rather than have best move, score *all* the moves. Then one can do the same in the testing -- then if two moves are nearly equally best, one wouldn't be penalized for choosing the second. Or if engine picks right move but totally hoses the ordering on the rest, then you know it may be picking the best move on accident. I think it'd help avoid tuning the engine to take the test well
I think the general consensus is that hand-tuned evals produce the best results, but some automated system may provide a reasonable starting point for hand tuning.
IMO anything which could save some cpu cycles can be beneficial. :)
I'd say the opposite. CPU cycles are cheap and plentiful nowadays, but my brain cycles are just as limited as they've always been ;-)
In an unrelated thought, what about trying to save some already done computation in game. That is when engine has a pv, P plies deep, it plays the first move, opponent answers as in pv, then instead of starting the search from scratch, how about using the pv with P-2 depth ? Ok, maybe it is silly for some reasons or maybe everyone is using this since it is so obvious. But any way I suppose this is different from ordinary ponder which is done in opponents time. This can be done with having ponder off too, I suspect.
All good engines store the results of previous searches in a hash table, so all the work done before is not wasted, even if the opponent doesn't make the expected response. With a search tree though, the last ply is where you spend almost all of your time anyway. Branching factor is about 2 in the better engines so the deepest ply takes about the same amount of time as all of the previous ones combined. The real advantage is in move ordering -- The engine has the best move it found in a shallower search, so it can try that one first, which results in more cut nodes and keeps branching factor lower.

Engines also ponder. In most implementations, they move and then pretend you chose the expected response immediately. They start a search with infinite time. If you make the expected response, they've got a head start and just update the time to stop searching accordingly. If not, they undo the predicted move, make the actual one, and start searching from there. The hash results from previous searches are still around though, so some time is saved regardless.

Knightcap temporal learning paper
I found it interesting, though I really need to read it more carefully to understand what it's actually doing.