Trading gradient: to trade or not to trade?

hgm · Post by **hgm** » Thu Feb 18, 2021 8:04 pm

If you are ahead enough, simplifying the position through trading away equal material is often a fast path to convert the win; in a simple position you are less likely to get surprised, and the same absolute advantage becomes a larger relative advantage. But when your absolute advantage is within the draw margin, trading just brings you closer to a certain draw. When there is not much material left, there also is not much material to gain, in order to expand your advantage.

This is a general truth for all chess variants that have draws. But is is extra important in a variant like Janggi (Korean Chess), where you need an advantage of at least two pieces to be able to checkmate a bare King (and in many cases even that wouldn't be enough). So being a Rook ahead in the middle-game is nice, but if you trade away everything else, it would still be a draw. You have to expand your advantage first, and with an advantage of a Rook you will have good prospects from grabbing something more, the more so if possible targets to grab remain on the board longer.

So in summary: when within the draw margin, you want to avoid equal trades, to keep the number of future victims high. With a winning advantage, you seek trading, to reduce the chances for surprises that might reduce your advantage.

I tried to find a simple way to build that into an evaluation function, and came up with the following:

Suppose E is the 'raw' evaluation (piece values, PST, ...), and T the threshold for winning. (I.e. when -T < E < T in the late end-game it wll be draw). Furthermore, suppose a measure for the total material on the board is M, with a maximum (opening) value MAX. Then define

float x = E/T; // advantage in 'draw units'
float phi = M/MAX; // game phase (T and MAX are floats)
E *= (1+phi + x*x)/(1+(1+phi)*x*x); // corrected E

In the opening phi = 1, and the correction factor is (2+x*x)/(1+2*x*x). When x << 1, x*x is negligible, and the factor is 2/1 = 2: small advantages are doubled, in anticipation that there will be ample opportunity to make them grow. OTOH, when x >> 1, 1 (or 2) are negligible compared to x*x, and the factor becomes 0.5. Large advantages (much larger than the draw margin) are reduced. If phi=0 (late end-game) the factor is 1, and the raw evaluation is used 'as is'.

So a small (raw) advantage E achieved would decrease the score from 2*E to E when equally trading away a fully populated board: needless trading will be discouraged. A large advantage would increase from 0.5*E to E upon equal trading, so in that case trading is encouraged. When the advantage is exactly at the win threshold (x = 1), the factor is always 1, irrespective of game state. Of course all the constants 1 could be tuned to different values in an attempt to fit empirical data better.

xr_a_y · Post by **xr_a_y** » Fri Feb 19, 2021 7:32 pm

Interesting idea of course.

Some questions though :
- in E you consider both pawn material and piece material. But trading pawn or trading piece is not the same
- this doesn't look like a texel tunable thing, is it ?

hgm · Post by **hgm** » Fri Feb 19, 2021 8:59 pm

E is just the evaluation, and it must consider all pieces, as usual. The variable describing the trading is the 'game phase' phi. This normally (i.e. when used for linear tapering) would not consider Pawns.

In this context it should describe 'vulnerability': if it is large in indicates there is a large opportunity for an existing advantage to change (on average increase, but also become more uncertain), if it is small te current advantage is nearly frozen. Pawns would definitely contribute to the vulnerability, as they are the weakest pieces, and cannot easily move to safety. So it seems that Pawns should even get a large weight in the game phase that is used for this.

I think this should be Texel-tunable.

xr_a_y · Post by **xr_a_y** » Fri Feb 19, 2021 9:01 pm

xr_a_y wrote: ↑Fri Feb 19, 2021 7:32 pm Interesting idea of course.

Some questions though :
- in E you consider both pawn material and piece material. But trading pawn or trading piece is not the same
- this doesn't look like a texel tunable thing, is it ?

I tried to Texel tune that anyway (using indeed the full eval)

Code: Select all

    // trading (idea from http://talkchess.com/forum3/viewtopic.php?f=7&t=76629)
    const float tradingX2MG = float(score[MG]*score[MG])/EvalConfig::tradingThreashold[MG];
    const float tradingX2EG = float(score[EG]*score[EG])/EvalConfig::tradingThreashold[EG];
    const float tradingFactorMG = (1+data.gp + tradingX2MG) / (1+(1+data.gp)*tradingX2MG);
    const float tradingFactorEG = (1+data.gp + tradingX2EG) / (1+(1+data.gp)*tradingX2EG);
    score[MG] *= tradingFactorMG * EvalConfig::tradingFactor[MG] / 128;
    score[EG] *= tradingFactorEG * EvalConfig::tradingFactor[EG] / 128;

this gives

Code: Select all

CONST_TEXEL_TUNING EvalScore   tradingThreashold       = {   4, 146};
CONST_TEXEL_TUNING EvalScore   tradingFactor           = { 150, 131};

Let me test it Elo-wise...

hgm · Post by **hgm** » Fri Feb 19, 2021 9:37 pm

This uses it a bit different as I had in mind: my idea was to apply the formula after tapering. The formula itself in fact describes a form of non-linear tapering, which would be better suited for causing the correct trading gradient. With linear tapering you can still encourage trading by having all eval parameters increase towards the end-game; the same advantage on the board would then be evaluated higher if (equal) material disappears. And you could of course also do the opposit. But you could not do both at the same time, depending on the advantage (i.e. encourage trading when 'enough' ahead, but discourage it when the current advantage would only give you a draw).

So the parameters of the formula would be determined by how the average result for a given advantage would improve or deteriorate with game phase for large and for small advantages. That would mean the 'uncorrected eval' that goes into the formula would no longer need to have different behavior for large and small advantages, so that it can be fitted well within the framework of linear tapering.

xr_a_y · Post by **xr_a_y** » Fri Feb 19, 2021 10:05 pm

Ok, I'm trying it after tapering now. Looks not good Elo-wise with the previous try.

Code: Select all

     const float tradingX2 = float(score*score)/EvalConfig::tradingThreashold;
     const float tradingFactor = (1+gp + tradingX2) / (1+(1+gp)*tradingX2);
     score *= tradingFactor * EvalConfig::tradingFactor / 128;

Texel tuned params

Code: Select all

CONST_TEXEL_TUNING ScoreType   tradingThreashold       = 100;
CONST_TEXEL_TUNING ScoreType   tradingFactor           = 150;

Test running.

xr_a_y · Post by **xr_a_y** » Sat Feb 20, 2021 8:49 am

Does not look good either, around -50Elo.
I'll come back on this later.

hgm · Post by **hgm** » Sat Feb 20, 2021 2:09 pm

This looks a bit suspect: you use tradingThreashold to scale the square of the score, so the tuned value of 100 basically always gives a very large tradingX2. (Only for scores between -10 and +10 (centi-Pawn?) you get reasonable values.) For advantages of 100cP tradingX2 would already be 100, and the formula would just multiply by 1/(1+gp). The game phase gp runs from 0 (end-game) to 1 (opening) in your case? This means it would multiply any significant advantage by 1 in the end-game, and by 0.5 in the opening, and only do something different when the raw eval is between -10 and 10 cP. (But who cares? That should be mostly noise anyway.)

It is really amazing that this has any effect on Elo at all. The doubling of the entire evaluation when you go towards the end-game should be perfectly compensatable by halving all end-game eval terms compared to what you had without this correction.

I would have expected tradingThreashold to be something like 10000 - 20000 in Chess (so that the opposite behavior w.r.t. trading encouragement occurs at a score of 100-140 cP, the draw margin.)

xr_a_y · Post by **xr_a_y** » Sat Feb 20, 2021 3:35 pm

hgm wrote: ↑Sat Feb 20, 2021 2:09 pm This looks a bit suspect: you use tradingThreashold to scale the square of the score, so the tuned value of 100 basically always gives a very large tradingX2. (Only for scores between -10 and +10 (centi-Pawn?) you get reasonable values.) For advantages of 100cP tradingX2 would already be 100, and the formula would just multiply by 1/(1+gp). The game phase gp runs from 0 (end-game) to 1 (opening) in your case? This means it would multiply any significant advantage by 1 in the end-game, and by 0.5 in the opening, and only do something different when the raw eval is between -10 and 10 cP. (But who cares? That should be mostly noise anyway.)

It is really amazing that this has any effect on Elo at all. The doubling of the entire evaluation when you go towards the end-game should be perfectly compensatable by halving all end-game eval terms compared to what you had without this correction.

I would have expected tradingThreashold to be something like 10000 - 20000 in Chess (so that the opposite behavior w.r.t. trading encouragement occurs at a score of 100-140 cP, the draw margin.)

I may have missed some error, maybe I the texel tuned values are not optimal at all, I'll try thing by hand soon.

Trading gradient: to trade or not to trade?

Trading gradient: to trade or not to trade?

Re: Trading gradient: to trade or not to trade?

Re: Trading gradient: to trade or not to trade?

Re: Trading gradient: to trade or not to trade?

Re: Trading gradient: to trade or not to trade?

Re: Trading gradient: to trade or not to trade?

Re: Trading gradient: to trade or not to trade?

Re: Trading gradient: to trade or not to trade?

Re: Trading gradient: to trade or not to trade?