Stockfish's tuning method

Eelco de Groot · Post by **Eelco de Groot** » Sun Oct 23, 2011 6:54 pm

gaard wrote:FYI, for g/6", and g/60", I have a 4-sigma result in favor of the default. This weekend I am going to set up for testing with modifications to the ...CheckBonus coefficients. Results to follow...

I still think it is probably not good if you leave out count_1s<Max15>(undefended) like in Marco's original version I think he had this left out? The new king_defender terms all work in one direction and may even lead to negative attackUnits. The squares defended may not be the same as squares attacked and the more this is the case, the more noise is coming from the new terms. In other words, if the defenders leave "holes", and remember these squares are always attacked, because undefended is really a misnomer, then the defenders may simply have no effect at all, the new terms are pure noise all in the wrong direction. Also I thought that if I do not miscalculate, attackUnits = 0 is now a special case because it includes all the cases where the defense is stronger than the attack. Experimented in the past with a initialization function that starts below the x-axis to discourage attacks a bit. I think the case for this is now even stronger. So you could have a KingDangerTable[] array that starts with a negative term. You could take that further and also consider cases where there is just one attacker (forbidden now) and many defenders. attackUnits should lead to an KingDangerTable element in that case with a negative value, to discourage attacking and concentrating on another weaker spot in the enemy's camp. But if you want just the first term negative, I now have

Code: Select all

  // init_safety() initizes the king safety evaluation, based on UCI
  // parameters. It is called from read_weights().

  void init_safety() {

    const Value MaxSlope = Value(35);
    const Value Peak = Value(1280);
    Value t[100];

    // First setup the base table
    for (int i = 0; i < 100; i++)
    {
        t[i] = Value(int((0.5 * i * i) + (4 * i) - 4));

        if (i > 0)
            t[i] = Min(t[i], t[i - 1] + MaxSlope);

        t[i] = Min(t[i], Peak);
    }

    // Then apply the weights and get the final KingDangerTable[] array
    for (Color c = WHITE; c <= BLACK; c++)
        for (int i = 0; i < 100; i++)
            KingDangerTable[c][i] = apply_weight(make_score(t[i], 0), Weights[KingDangerUs + c]);
  }

that is supposed to do this. I had not really intended it for this, but thinking about it more I came up with this extra reasoning that KingDangerTable[0] should be negative.

Of course it is still possible that the old king safety calculation from Tord is almost entirely speculative in nature and in that case it may be best to ignore the defenders altogether

From the evidence it seems this is worth about twenty elo then. So much for introducing "more knowledge"

Anyone else got a bright idea...

Eelco

BeyondCritics · Post by **BeyondCritics** » Tue Mar 22, 2016 3:14 pm

Rémi Coulom wrote:Thanks for posting your algorithm.

...

In order to guarantee convergence of SPSA, it is necessary to decay the deltas and learning rate in time.

Mr. Coulom pointed towards the problem. As presented here, the "stockfish method" is a theoretically flawed method, that should fail to converge except for very simple cases.
Put it with a simple metaphor:
Assume you are playing golf, but unfortunately you are a very bad player, making random mistakes all the times. What is your strategy, to reach the goal in time?
If you think you are far from the goal, you can get away, hitting the ball boldly. The closer you are to the goal, the less bolder you hit your ball. If you make smaller and smaller steps, you have a decent chance to approach the goal arbritary close. Otherwise, if your hits have the same length all the time, your are expected to reach a vicintiy of your solution and then circle around for a very long time, i believe.
Of course, this is all known...

jdart · Post by **jdart** » Tue Mar 22, 2016 3:36 pm

There is also a variant of SPSA that uses RPROP (https://www.ualberta.ca/~szepesva/papers/rspsa_acg.pdf). This is often faster to converge.

However, if you are trying to minimize the number of runs you have to do, and each run is expensive, there are still other methods that may do that better. See for example http://link.springer.com/article/10.100 ... 015-9281-2.

--Jon

Rémi Coulom · Post by **Rémi Coulom** » Wed Mar 23, 2016 5:25 am

All my experiments with RSPSA gave terrible results (you can see the plots in my CLOP paper). I believe that plain SPSA is much better.

Stockfish's tuning method

Re: Stockfish's tuning method

Re: Stockfish's tuning method

Re: Stockfish's tuning method

Re: Stockfish's tuning method