Why don't engines consider the 50-move rule?

syzygy · Post by **syzygy** » Wed Nov 05, 2025 8:07 pm

Uri Blass wrote: ↑Wed Nov 05, 2025 4:51 am
syzygy wrote: ↑Wed Nov 05, 2025 2:19 am
Hamster wrote: ↑Mon Nov 03, 2025 10:38 am Is there any engine that considers the halfmove clock in evaluation for positions with >7 pieces?
Stockfish does:
https://github.com/official-stockfish/S ... pp#L83-L84
Code: Select all
    // Damp down the evaluation linearly when shuffling
    v -= v * pos.rule50_count() / 212;
This line also can explain:
"engines will show a significant advantage even when the halfmove clock is at 80 or 90."

v-=v*90/212 is still more than 0.5v and is not close to 0.

I think it makes sense. It would be a bad idea to tell the engine that it should give away all its advantage just to avoid a 50-move draw. If v is big, then v*90/212 is still a significant penalty that will make the engine accept a pawn move or a capture that reduces v but not to 0.

chrisw · Post by **chrisw** » Wed Nov 05, 2025 10:50 pm

syzygy wrote: ↑Wed Nov 05, 2025 8:07 pm
Uri Blass wrote: ↑Wed Nov 05, 2025 4:51 am
syzygy wrote: ↑Wed Nov 05, 2025 2:19 am
Hamster wrote: ↑Mon Nov 03, 2025 10:38 am Is there any engine that considers the halfmove clock in evaluation for positions with >7 pieces?
Stockfish does:
https://github.com/official-stockfish/S ... pp#L83-L84
Code: Select all
    // Damp down the evaluation linearly when shuffling
    v -= v * pos.rule50_count() / 212;
This line also can explain:
"engines will show a significant advantage even when the halfmove clock is at 80 or 90."

v-=v*90/212 is still more than 0.5v and is not close to 0.
I think it makes sense. It would be a bad idea to tell the engine that it should give away all its advantage just to avoid a 50-move draw. If v is big, then v*90/212 is still a significant penalty that will make the engine accept a pawn move or a capture that reduces v but not to 0.

If half move clock is 80 or 90 and you allow SF a few milliseconds to search, it will go the extra plies to 100 and then (if the position really is drawn) show a 0.0 eval. People are conflating the engine with the engine’s evaluation function.

syzygy · Post by **syzygy** » Thu Nov 06, 2025 2:41 am

chrisw wrote: ↑Wed Nov 05, 2025 10:50 pm If half move clock is 80 or 90 and you allow SF a few milliseconds to search, it will go the extra plies to 100 and then (if the position really is drawn) show a 0.0 eval. People are conflating the engine with the engine’s evaluation function.

True. If Hamster really did not see anything change when he set the half-move clock in the FEN manually to 80 or 90, then maybe he could post the position here so we can have a look. Even if an engine's evaluation completely ignores the half-move clock, the clock should start to have an effect on the search.

jaroslav.tavgen · Post by **jaroslav.tavgen** » Thu Nov 06, 2025 3:11 pm

Could you elaborate please? Why would the engine "throw away all the advantage to avoid a 50-move draw"? Do you mean that if it is +5 up it will sacrifice the knight for a pawn to make it +3 and not 0? If it is so, then why is it a bad thing?

syzygy · Post by **syzygy** » Thu Nov 06, 2025 6:59 pm

jaroslav.tavgen wrote: ↑Thu Nov 06, 2025 3:11 pm Could you elaborate please? Why would the engine "throw away all the advantage to avoid a 50-move draw"? Do you mean that if it is +5 up it will sacrifice the knight for a pawn to make it +3 and not 0? If it is so, then why is it a bad thing?

Well, in the end it will do this anyway once the search starts reporting a draw by the 50-move rule. So the (my) argument that going from +5 to +2.5 is better than going from +5 to 0 because the latter means allowing the engine to throw away all its advantage does not quite work.

Perhaps the real risk is that the engine is not actually shuffling but slowly improving its position, and prematurely damping down its evaluation towards 0 would just unnecessarily make it move too hastily.

Or perhaps SF's linear damping-down coefficient is so small because otherwise it would have too much impact on normal play. Something quadratic or piecewise linear (say starting from 60 ply) might intuitively be better. But without testing on a million games this is just unsupported speculation.

syzygy · Post by **syzygy** » Thu Nov 06, 2025 7:16 pm

This is the patch ("Tune a few NNUE related scaling parameters") that changed it from going linearly to 0 (as pos.rule50() goes to 100) from going linearly to about halving the evaluation:
https://github.com/official-stockfish/S ... 1dc7077195

So this was the result of parameter tuning. Later tunings seem to have left this result more or less intact (i.e. damping down to about v/2). So clearly this is better (in SF) than damping down linearly to 0.

Whether a linear function from 0 to 100 ply is optimal is another question.

chrisw · Post by **chrisw** » Thu Nov 06, 2025 8:23 pm

syzygy wrote: ↑Thu Nov 06, 2025 7:16 pm This is the patch ("Tune a few NNUE related scaling parameters") that changed it from going linearly to 0 (as pos.rule50() goes to 100) from going linearly to about halving the evaluation:
https://github.com/official-stockfish/S ... 1dc7077195

So this was the result of parameter tuning. Later tunings seem to have left this result more or less intact (i.e. damping down to about v/2). So clearly this is better (in SF) than damping down linearly to 0.

Whether a linear function from 0 to 100 ply is optimal is another question.

Well, we can see (in the limiting case) that

v = v * (207 - pos.rule50_count()) / 207;

is wrong already. It's correct for count=0, but way off for count=100. It's a HandCodedHeuristic, basically, with a tuneable parameter (the 207) requiring unbelievable numbers of games to meaningfully tune, but which is always wrong or at least suboptimal. Probably hardly worth the bother, but I guess the function should at least try to get the limits correct and then interpolate somehow.

chrisw · Post by **chrisw** » Thu Nov 06, 2025 8:28 pm

chrisw wrote: ↑Thu Nov 06, 2025 8:23 pm
syzygy wrote: ↑Thu Nov 06, 2025 7:16 pm This is the patch ("Tune a few NNUE related scaling parameters") that changed it from going linearly to 0 (as pos.rule50() goes to 100) from going linearly to about halving the evaluation:
https://github.com/official-stockfish/S ... 1dc7077195

So this was the result of parameter tuning. Later tunings seem to have left this result more or less intact (i.e. damping down to about v/2). So clearly this is better (in SF) than damping down linearly to 0.

Whether a linear function from 0 to 100 ply is optimal is another question.

Well, we can see (in the limiting case) that

v = v * (207 - pos.rule50_count()) / 207;

is wrong already. It's correct for count=0, but way off for count=100. It's a HandCodedHeuristic, basically, with a tuneable parameter (the 207) requiring unbelievable numbers of games to meaningfully tune, but which is always wrong or at least suboptimal. Probably hardly worth the bother, but I guess the function should at least try to get the limits correct and then interpolate somehow.

the former:

v = v * (100 - pos.rule50_count()) / 100;

at least gets the limit cases correct, and we can assume that since tuning found 207 better than 100, that decreasing the decay for most intermediate values of count suggests the decay should increase more than linearly with count. Which is pretty much what HandCodedIntuition would say anyway.

A sigmoid function fits the idea, methinks. Shall we wait while someone tunes it for SF?!

syzygy · Post by **syzygy** » Thu Nov 06, 2025 11:21 pm

chrisw wrote: ↑Thu Nov 06, 2025 8:28 pm Well, we can see (in the limiting case) that

v = v * (207 - pos.rule50_count()) / 207;

is wrong already. It's correct for count=0, but way off for count=100. It's a HandCodedHeuristic, basically, with a tuneable parameter (the 207) requiring unbelievable numbers of games to meaningfully tune, but which is always wrong or at least suboptimal. Probably hardly worth the bother, but I guess the function should at least try to get the limits correct and then interpolate somehow.

But 207 was the result of automated parameter tuning, and further tuning exercises over a period of several years have consistently confirmed that a linear decrease from v to about v/2 is optimal, at least among the possible linear functions. So I would not call 100 "right" and 207 "wrong".

But indeed this could be mainly about the behaviour for low values of rule50_count() and not for values near 100.

chrisw · Post by **chrisw** » Fri Nov 07, 2025 12:40 am

syzygy wrote: ↑Thu Nov 06, 2025 11:21 pm
chrisw wrote: ↑Thu Nov 06, 2025 8:28 pm Well, we can see (in the limiting case) that

v = v * (207 - pos.rule50_count()) / 207;

is wrong already. It's correct for count=0, but way off for count=100. It's a HandCodedHeuristic, basically, with a tuneable parameter (the 207) requiring unbelievable numbers of games to meaningfully tune, but which is always wrong or at least suboptimal. Probably hardly worth the bother, but I guess the function should at least try to get the limits correct and then interpolate somehow.
But 207 was the result of automated parameter tuning, and further tuning exercises over a period of several years have consistently confirmed that a linear decrease from v to about v/2 is optimal, at least among the possible linear functions. So I would not call 100 "right" and 207 "wrong".

But indeed this could be mainly about the behaviour for low values of rule50_count() and not for values near 100.

One imagines the parameter tuning was some sort of SPSA on the 100/207 parameter, but as soon as you do that you render the limit at count=100 to be incorrect. The tuning "worked" presumably because it had the effect of decreasing the effective decay at much lower counts. Sorry if the explanation is a bit convoluted, but you get the idea (as indicated, your closing sentence).

Needs a sigmoid, or an approximation to sigmoid. Something like

Case count below 60:
decay = 1.0 to 0.95

case count between 60 and 90:
decay = 0.95 to 0.5

case count above 90:
decay = 0.5 to 0.0

that would make two decay limit targets (60, 90)
and two decay factor targets (0.95, 0.5)
which should be doable with a tuner.
Or better, compose a suitable sigmoid and tune that.
Maybe this has been done already?

Edit: I realise I'm describing an accelerating decay factor, whereas a sigmoid would decelerate at high counts. Dunno which would be best - it just needs to top out at 0.0

Why don't engines consider the 50-move rule?

Re: Why don't engines consider the 50-move rule?

Re: Why don't engines consider the 50-move rule?

Re: Why don't engines consider the 50-move rule?

Re: Why don't engines consider the 50-move rule?

Re: Why don't engines consider the 50-move rule?

Re: Why don't engines consider the 50-move rule?

Re: Why don't engines consider the 50-move rule?

Re: Why don't engines consider the 50-move rule?

Re: Why don't engines consider the 50-move rule?

Re: Why don't engines consider the 50-move rule?