Horizon effect

hgm · Post by **hgm** » Thu Apr 27, 2017 7:42 pm

Imagine the following situation (which is frequently plaguing me in Tenjiku Shoki):

The search finds a PV that gains significant material, upto depth N. At depth N+1 the same move suddenly gives a very bad score, with a completely different continuation, because the material it gained along the PV at depth N turns out to be poisoned (or there was a long-term threat that only comes into view now, and needs immediate action to save the situation). Now what it then usually does is switch to a move that sacs material, but cannot be ignored. Like a spite check, or promotion on an unsafe square. And then appends the d=N PV (or actually the d=N-1 PV) behind it, because after the sac and the recapture, there isn't enough depth to see that the large gain by it is totally illusory. This is the classical horizon effect.

It seems quite cheap to recognize this situation in the root, as the PVs at any depth are available, and you just have to test for the score drop, and then the occurrence of the PV of the previous iteration as a tail to a bad capture and its refutation. In internal PV nodes it is a bit more difficult, because you might not have the PV of the previous iteration available. (But you would have if you hash the entire PV.)

The question is what you should do about it. Most of the time the sac to delay the inevitable is completely wrong, just adding to whatever misery you were heading for. But this would correct itself only at a two-ply-deeper search. Occasionally it could be a useful enhancement of the same plan, however.

Would it be useful to prune bad captures after a collapse of the PV score? Extending them by two ply doesn't really seem an option, as this would be very expensive, and all that extra time would be used to find out that you don't want to play the move. Say your score dropped from +6 at d=N to -3 at d=N+1, then in is to be expected that a Q x protected R sac, which loses you 4 in the first 2 ply, would still report +2 at the d=N+1 (but would make you end up at -7 eventually, seen at d=N+3). You just cannot trust the score of such a sac for the next two ply. Making it the new PV is very disrupting to the search, as eventually you will have to switch to the original PV without any silly sacrifices prefixed to it. This takes a lot of time at the higher depth, searching the nonsense variation first, which furthermore will experience a drastic score change. And if you time out before you are there, you would even play the sac!

For now I am just thinking of ignoring the moves 'after the fact', based on the observation that the tail of their PV mimics that of a previous one known to collapse at higher depth. This still would require you to search them (overturning the collapsed PV, so typically an expensive search). It would be really nice if you could prune them beforehand. Then it would even save you time. But it would backfire in cases where the prefixed move offers a true solution / amelioration. This could be obvious if the move was tactically connected to the PV that later collapsed, e.g. by capturing one of the pieces that was moved in there, or moving away one of the pieces that was captured in there. Searching such a sac could not possibly get the same PV as tail to the sac. But other interference with that PV is conceivable (e.g. the sac discovered a protection of one of the pieces that was captured in the failed PV). So it would probably be very hard to find a simple test that is accurate enough to afford skipping the search of the sacs altogether.

cdani · Post by **cdani** » Thu Apr 27, 2017 8:48 pm

Maybe is possible to have an estimation of the cost of the current pv you are searching, and if it goes wrong, at the precise point you find something went wrong you can decide to spend some time proportional to the cost, like extending 2, 3 plies or whatever to have a better estimation about if the fail is true. I'm not sure if this is really applicable, is just a very raw idea.

hgm · Post by **hgm** » Thu Apr 27, 2017 9:36 pm

Well, the typical case is that the fail is real, and that the deeper you search that line, the worse it gets. Usually it runs into an unavoidable mate, and just before it it starts to sac all its material in spite checks, making the score plummet. So spending extra time there would not change anything. The problem is that the alternative it picks does seem good, although it is predictably worse.

Spending extra time to find out something is good is time well spent. But spending extra time only to learn that the move sucks is truly a waste of time.

Evert · Post by **Evert** » Thu Apr 27, 2017 10:06 pm

Maybe there's some form of singular extension that works here? Or alternatively, something to be learned from the threat (the move/line that defets the null-move)?

Ferdy · Post by **Ferdy** » Fri Apr 28, 2017 7:07 am

hgm wrote:Imagine the following situation (which is frequently plaguing me in Tenjiku Shoki):

This is big (16x16 board + 78 pcs), what typical depth you will reach from starting position after 1 minute?

hgm · Post by **hgm** » Fri Apr 28, 2017 8:23 am

About 11 ply, with material-only eval. The following is from a 3.4GHz i7, where 4 other threads were already running. It could be 1.3 times faster when it runs alone (~350knps instead of 266knps)

Code: Select all

 11	+3.46	42.6M	2&#58;41.21	<done>
 11	+3.46	42.5M	2&#58;41.11	j5j6 l13n11 k4i6 m12m11 j3k4 d12d11 k4k7 i13i6 k7i7! j13j4+
 11	0.00	16.3M	1&#58;01.16	c1b2 e11e10 d4d6 j13j4? k2l2 j14j13 g4g13? g14g13 j3k2 j13j14 k2j4
 10	0.00	8.3M	0&#58;31.68	<done>
 10	0.00	7.8M	0&#58;29.67	c1b2 e11e10 d1e2 j13j4? k2l2 j14j13 g4g13? g14g13 j3k2 j13j14
  9	0.00	3.4M	0&#58;13.65	<done>
  9	0.00	2.7M	0&#58;10.70	c1b2 j13j4? k2l2 e11e10 g4g13? g14g13 j3k2 j14j13 k2j4
  9	-4.97	1.0M	0&#58;04.35	g4g13? g12g11 e4c6 g14g12 m4m6 g12b7! m6f13+ b7g12! n3o2 g12a6!
  8	0.00	132265	0&#58;00.67	g4g13? f12f11 c1b2 j13j4? j3j4 j14j13 j4j3 g14g11
  7	0.00	44778	0&#58;00.31	g4g13? f12f11 c1b2 j13j4? j3j4 j14j13 j4j3
  6	0.00	12618	0&#58;00.09	g4g13? f12f11 g3g4 j13j4? j3j4 j14j13
  5	0.00	7885	0&#58;00.07	g4g13? f12f11 g3g4 j13j4? j3j4
  4	0.00	3591	0&#58;00.04	g4g13? f12f11 j4j13? j14j13
  3	0.00	517	0&#58;00.03	g4g13? j13j4?
  3	-14.99	377	0&#58;00.03	j4j13? i13j13 g4g13? j13j3 k2j3
  2	0.00	4	0&#58;00.01	j4j13? g13g4?
  1	0.00	2	0&#58;00.00	j4j13?

The strongest opening move is indeed j5j6, and it settles on that within 3 min.

Ferdy · Post by **Ferdy** » Fri Apr 28, 2017 1:41 pm

hgm wrote:About 11 ply, with material-only eval. The following is from a 3.4GHz i7, where 4 other threads were already running. It could be 1.3 times faster when it runs alone (~350knps instead of 266knps)

Code: Select all

 11	+3.46	42.6M	2&#58;41.21	<done>
 11	+3.46	42.5M	2&#58;41.11	j5j6 l13n11 k4i6 m12m11 j3k4 d12d11 k4k7 i13i6 k7i7! j13j4+
 11	0.00	16.3M	1&#58;01.16	c1b2 e11e10 d4d6 j13j4? k2l2 j14j13 g4g13? g14g13 j3k2 j13j14 k2j4
 10	0.00	8.3M	0&#58;31.68	<done>
 10	0.00	7.8M	0&#58;29.67	c1b2 e11e10 d1e2 j13j4? k2l2 j14j13 g4g13? g14g13 j3k2 j13j14
  9	0.00	3.4M	0&#58;13.65	<done>
  9	0.00	2.7M	0&#58;10.70	c1b2 j13j4? k2l2 e11e10 g4g13? g14g13 j3k2 j14j13 k2j4
  9	-4.97	1.0M	0&#58;04.35	g4g13? g12g11 e4c6 g14g12 m4m6 g12b7! m6f13+ b7g12! n3o2 g12a6!
  8	0.00	132265	0&#58;00.67	g4g13? f12f11 c1b2 j13j4? j3j4 j14j13 j4j3 g14g11
  7	0.00	44778	0&#58;00.31	g4g13? f12f11 c1b2 j13j4? j3j4 j14j13 j4j3
  6	0.00	12618	0&#58;00.09	g4g13? f12f11 g3g4 j13j4? j3j4 j14j13
  5	0.00	7885	0&#58;00.07	g4g13? f12f11 g3g4 j13j4? j3j4
  4	0.00	3591	0&#58;00.04	g4g13? f12f11 j4j13? j14j13
  3	0.00	517	0&#58;00.03	g4g13? j13j4?
  3	-14.99	377	0&#58;00.03	j4j13? i13j13 g4g13? j13j3 k2j3
  2	0.00	4	0&#58;00.01	j4j13? g13g4?
  1	0.00	2	0&#58;00.00	j4j13?

The strongest opening move is indeed j5j6, and it settles on that within 3 min.

Nice depth considering a high number of moves per turn. In chess at low depth you can improve its play by improving evaluation. But in here it seems like you will be busy generating and evaluating captures. The king safety seems to be very important with the presence of poweful pieces. Perhaps you have to start writing king safety evaluation and observe if that horizon effect will be affected.

hgm · Post by **hgm** » Fri Apr 28, 2017 3:39 pm

Indeed, the fact that this is still material only exacerbates the horizon effect. It will not be easy to find sufficiently good evaluation terms, though, like in Chess it is not easy to find terms for trapped pieces.

The King is behind an enormous wall of strong pieces, (5 ranks deep), but because there are sliders that can jump over almost everything, they are still badly threatened through smothered mates. To create some breating space requires only a single tempo, but a tempo in the opening is worth more than a Queen.

Still, I have the feeling something fundamental is lacking in the search here. For a human it is so evident to see that things like spite checks are just adding to the eventual loss you will incur. The larger board also makes it much more likely that events are unrelated, so that initiating a random bad trade that necessitates recapture has any effect on the original (failing) PV. And that you have sliders that can jump over everything straight into the promotion zone, where they get a huge promotion and threaten their entire surroundings, makes that you virtually always have such delaying tactics available.

Perhaps some form of 'analogy probing' would be helpful. Every bad capture in the root could set up an analogy (as a hash-key difference) between the position and the root, and the leaf nodes in the sub-tree can then do a hash probe on the analogous position where the bad capture has not been done. This would then typically be a d=2 node. You could try to graft that node on the current leaf, by adding the material lost in the initial trade to the bound found in the hash table, which would likely be a much better estimate of the score than the static evaluation.

Ferdy · Post by **Ferdy** » Fri Apr 28, 2017 4:51 pm

Have you made a list of what the program is missing? like it failed to evaluate correctly because of lack of depth that it was not able to evaluate all the threats that are still to be resolved because of too many captures? or maybe it happily gained material at lower depths but failed to see the incoming deadly king attack.

hgm · Post by **hgm** » Fri Apr 28, 2017 5:20 pm

The latter is usually the case. And it is very common: A Fire Demon of player A is lured away to the wings, where it can burn away an enormous amount of material. But that allows the Demon of player B to develop and attack the center, destroying A's jumping generals there in a sacrifice. (Nominally the generals are worth less than the Demon.) Then the jumping generals of player B get a free view on the suffocated King of A, as all remaining pieces of A are transparent to them. It takes quite some moves to perform the mate, and the plan can only be set into motion after Demon A penetrates the wings deeply enough that it isn't able to quickly move back to the center.

I guess this could be described by an eval term that increases the value of the jumping generals that can move along a file if the opponent does not have the equivalent one (which could block it), and his King doesn't have any escape squares on a different file. Or, as a precursor, when they both do have the generals, but a Demon can burn them away.

The problem is that it is a tactical matter whether the Demon can actually burn the generals. In this sense it is comparable to a 7th-rank Pawn in Chess: if they promote you have a Queen, but if they can be stopped and consumed you lose a valuable passer. It can go one way or the other, and in both cases cause a huge score swing. While it is very hard to find a heuristic that estimates whether the Pawn will make it or not.

Horizon effect

Horizon effect

Re: Horizon effect

Re: Horizon effect

Re: Horizon effect

Re: Horizon effect

Re: Horizon effect

Re: Horizon effect

Re: Horizon effect

Re: Horizon effect

Re: Horizon effect