The optimization function is good enough to find an optimum within minutes when I have a choice of terms
with fewer than (say) 20 parameters. For example, I tune material, mobility, and a handful of individual parameters.
So this semi-automation gives me good control over what I want to change and introduce, while saving my time resources.
Problem description
I ran into a problem in the algorithm that I'm sure some of you have already solved.
Suppose I match the material weights with (using arbitrary numbers)
What I want to show is that the average value for MG/EG seems to be OK,but mg and eg scores diverge.
Code: Select all
Material: P, N, B, R, Q P, N, B, R, Q
Start: MG: 80,300,320,500,980 EG:100,300,320, 500, 980
End: MG: 5, 24, 32, 48, 68 EG:192,595,610,1020,1990
At first I was looking for a bug, but there is no bug.
I have tried to understand what is going on and have come to the following conclusions.
1. the data contains a natural imbalance between mg and eg scores.
By natural I mean that the positions were randomly selected from pgn games.
2. phase model: (mg * phase + eg *(maxphase - phase)) / maxphase.
The average values can be mg(13),eg(11),max(24) in the data set.
This forces the tuner to bring the mg score to its minimum and maximize the eg score.
The average scores (mg+eg)/2 are fine. Especially when mg-error > eg-error.
This is a correct behavior of the tuner.
3. tapered evaluation
It may sound strange at this point, but I was thinking about what a tapered evaluation actually is.
The obvious thought is that many evaluation functions are mapped to one, but they are weighted differently.
I'd like to look at it more descriptively. A score is a result of a scoring function, so we have different scoring functions.
In most chess engines we have two scoring functions called "mg" and "eg", in one logic,
which could easily be separated by duplicating the code. Generalizing, we can replace two with "N" evaluation functions.
4. 2(N) evaluations and one error
We have managed to produce a single error, but we have two evaluation functions (mg,eg).
The tuner does not know and does not care what the weights are. Consequently, the evaluation function with
the larger average error will be minimized and will be balanced with the smaller average error to get the "optimized" sum.
That is the crux of the matter! There are infinitely many sums that produce the same optimal average.
The function with the larger error is permanently minimized and balanced as long as the algorithm is running.
This may be blurred or not even noticeable when several hundred parameters are optimized
because the algorithm takes too long to get to this stage.
5. one-to-one error calculation (my solution)
The only idea so far that solves the problem is to tune each evaluation function (mg,eg) separately.
So I enforce a one-to-one relationship between error and evaluation function (mg,eg evaluations/points).
Questions:
1. Is there an error in my thinking?
2. How did you guys solved it?
3. Tuning hundreds of parameters will not make the problem visible,
because the algorithm would take too long to reach that point ?!?
I look forward to any feedback. Thanks.