SF misevaluating pawn endings

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Spliffjiffer
Posts: 436
Joined: Thu Aug 02, 2012 7:48 pm
Location: Germany

Re: SF misevaluating pawn endings

Post by Spliffjiffer »

i could use 5men tb's, they are installed but i wanted to show that sth is wrong with SF evaluating some pawn endings without the help of third party software...Komodo for example has no problems with the 2 positions i posted with or without tb access but thats another story oc.
Wahrheiten sind Illusionen von denen wir aber vergessen haben dass sie welche sind.
Spliffjiffer
Posts: 436
Joined: Thu Aug 02, 2012 7:48 pm
Location: Germany

Re: SF misevaluating pawn endings

Post by Spliffjiffer »

well "wrong" is a hard word...i feel sry for that, excuse me...i should have sayed that there is room for improvements for SF evaluating pawn endings...full credit to the SF team !
Wahrheiten sind Illusionen von denen wir aber vergessen haben dass sie welche sind.
yanquis1972
Posts: 1766
Joined: Wed Jun 03, 2009 12:14 am

Re: SF misevaluating pawn endings

Post by yanquis1972 »

MikeB wrote:
yanquis1972 wrote:
Spliffjiffer wrote:you both tested with different conditions as i did:

jouni used multipv and used too low time for pos2...SF will change from Nxe3 to Nd5 later on missevaluating the pawn ending

michael used tb's
confirm no TBs single PV latest SF likes Kd4, very weird
actually I'm not seeing that with multilple threads - single pv, no tb it finds and sticks with f4 , and yes I do see Kd4 in single thread mode
so probably the wider search from multiple cores led to a 'lucky' hit, but as joerg says the issue is still there.

ofc the big question is, is it actually an issue? yes, 5 men syzygy are cheap, but should engines be absolutely reliant on them? i'm sure some would say yes, if it means speed/elo gains; personally, perhaps even especially when its an engine as widely used as SF, i think it's a problem that should be rectified.
peter
Posts: 3393
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: SF misevaluating pawn endings

Post by peter »

Hi Peer!
Spliffjiffer wrote:2 simple pawn endings SF has problems with...

Engine: asmfish 04.11.2016; CPU: amd a10 7870k (quad core); TB's: None

[d]8/1p4p1/5k2/5p2/8/3K3P/5PP1/8 w - - 0 1
1.f4...only drawing move...after a 15 min search asmfish fails to find this, scoring 1.Kd4 as best with -4.49 at depth 48.
After forcing 1.f4 asmfish fails high with 1...b5, giving a score of over +18 for black at low depths...then quickly resolves to 0.00.



[d]8/6pk/7p/8/8/2npPP1P/1RnK2P1/8 b - - 0 1
1...Nxe3...only drawing move...after a 15 min search asmfish fails to find this, scoring 1...Nd5 as best with -2.47 at depth 43.

After 1....Nxe3 2.Kxc3 Nd1 3.Kb3 [Here asmfish stubbornly tries to avoid the pawn endgame with 3...Ne3?] Nxb2 4.Kxb2 Kg6 5.Kc3 Kf5 6.g3 h5! (only move, which seems to be the one making asmfish misevaluating the whole line) its a draw...with 6.g3 SF fails high, scoring the position +41.34 for white at lower depths until evaluating the position with +0.08 after around 20 sec.

regards
Very good findings or yours, thanks!
While the first one seems to be a real half-blind spot in SF's search, that sometimes does and sometimes doesn't show up with SMP, the second one might be a more difficult position in which K10.2 and Fire5 only succeeded without help of tbs at single tríals. SF and H5 failed at short TC.
Edit: In the meantime I had a single success with SF too. 1...Nxe3 arises in output now and then at once but gets lost for ...Nd5 soon, this time the right move was kept, but I couldn't reproduce it after restarting the GUI,
Peter.
peter
Posts: 3393
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: SF misevaluating pawn endings

Post by peter »

peter wrote:
Spliffjiffer wrote:2 simple pawn endings SF has problems with...

Engine: asmfish 04.11.2016; CPU: amd a10 7870k (quad core); TB's: None

[d]8/1p4p1/5k2/5p2/8/3K3P/5PP1/8 w - - 0 1
1.f4...only drawing move...after a 15 min search asmfish fails to find this, scoring 1.Kd4 as best with -4.49 at depth 48.
After forcing 1.f4 asmfish fails high with 1...b5, giving a score of over +18 for black at low depths...then quickly resolves to 0.00.



[d]8/6pk/7p/8/8/2npPP1P/1RnK2P1/8 b - - 0 1
1...Nxe3...only drawing move...after a 15 min search asmfish fails to find this, scoring 1...Nd5 as best with -2.47 at depth 43.

After 1....Nxe3 2.Kxc3 Nd1 3.Kb3 [Here asmfish stubbornly tries to avoid the pawn endgame with 3...Ne3?] Nxb2 4.Kxb2 Kg6 5.Kc3 Kf5 6.g3 h5! (only move, which seems to be the one making asmfish misevaluating the whole line) its a draw...with 6.g3 SF fails high, scoring the position +41.34 for white at lower depths until evaluating the position with +0.08 after around 20 sec.

regards
...
the second one might be a more difficult position in which K10.2 and Fire5 only succeeded without help of tbs at single tríals. SF and H5 failed at short TC.
Edit: In the meantime I had a single success with SF too. 1...Nxe3 arises in output now and then at once but gets lost for ...Nd5 soon, this time the right move was kept, but I couldn't reproduce it after restarting the GUI,
And the reason for finding or not finding 1...Nxe3, Peer has spotted quite well in his winning line after 1...Nxe3 already too, 3...Nxb2 has to be found.

1...Nxe3 2.Kxc3 Nd1+3.Kb3 :

[d]8/6pk/7p/8/8/1K1p1P1P/1R4P1/3n4 b - - 0 1

Analysis by Brainfish 031216 64 POPCNT:
...
3...Ne3 4.Rd2 Nf5 5.Kc3 Ng3 6.Kxd3 h5 7.Kd4 Nf1 8.Rf2 Ng3 9.Ke5 Kg6 10.Rd2 Nf1 11.Rd6+ Kf7 12.Kf4 h4 13.Kg5 Ne3 14.Rd7+ Ke6 15.Rd4 Kf7 16.Rxh4 Nd5 17.Rf4+ Kg8 18.Rd4 Ne7 19.Rd7 Nc6 20.Rc7 Ne5 21.Kf5 Nd3 22.Rd7 Nc5 23.Rd5 Na4 24.Kg6 Kf8 25.Rh5 Nc3 26.Rh8+ Ke7 27.Rh4 Ke6 28.Rf4 Nb1 29.Re4+ Kd5 30.Kxg7 Kd6 31.h4 Na3 32.h5 Nb1 33.h6
+- (12.45) Depth: 39/69 00:03:39 7733MN
...
3...Ne3 4.Rd2
+- (11.87 --) Depth: 40/69 00:03:42 7815MN

And yes, tbs help again to find the simple looking 3...Nxb2 drawing, but without them I'd yet would call that an at least half blind spot too,
Peter.
User avatar
Eelco de Groot
Posts: 4660
Joined: Sun Mar 12, 2006 2:40 am
Full name:   Eelco de Groot

Re: SF misevaluating pawn endings

Post by Eelco de Groot »

Joerg posted on https://github.com/official-stockfish/S ... issues/760
@Stefano80 @Kingdefender But will this also be helpful in positions with more pawns like the one mentioned above? Iam not sure, maybe I will give it a try tomorrow. It should be easy enough to try.
I rather reply here now because I think Rocky's issue gets a bit sidetracked. This is a very different position. I don't think treating KPk bitbase evals differently will help a lot here. Your suggestion for changing LMP will, but unfortunately you lose some speed I think. Stefano's suggestion is more a logical change because you make better use of exact results in the KPk bitbase. I guess, in positions where you hit KPk it should even be a little speedup because you don't need to search a draw position anymore (only in Root to get a move) but not so much that Marco will be satisfied by that. It should leave LMR intact though.

I don't know how much extra code is involved.... If your point of view is that Syzygy bases should solve this, I don't think we have much chance. I depends if it can be cleanly done?

By the way for Peer's position 1 I have an alternative suggestion that even lowers the benchnumber :) Old bench with Vondele's latest refactor patch:

bench: 5884767

New bench with modificated latest Development Stock:

Stockfish 20161205 MOD _02
===========================
Total time (ms) : 3112
Nodes searched : 5879059
Nodes/second : 1889157

Code: Select all

      // Step 13. Pruning at shallow depth
      if (   !rootNode
	       && !(pos.count<ALL_PIECES>(WHITE) + pos.count<ALL_PIECES>(BLACK) <= 5)  // New condition
	    /* && !(PvNode && !pos.non_pawn_material(pos.side_to_move())) */
          &&  bestValue > VALUE_MATED_IN_MAX_PLY)
      {
Will return always something like this


8/1p4p1/5k2/5p2/8/3K3P/5PP1/8 w - -

Engine: Stockfish 20161205 MOD _02 HT (7 threads, i7 6700, 512 MB)
by T. Romstad, M. Costalba, J. Kiiski, G. Linscott

30/55 0:01 0.00 1.f4 b5 2.Kd4 Ke6 3.g4 Kf7 4.Kc5 g5
5.Kxb5 gxf4 6.Kc4 fxg4 7.hxg4 Kf6
8.Kd3 Kg5 9.Ke2 Kxg4 10.Kf2 f3
11.Kf1 f2 12.Kxf2 Kf4 13.Ke2 Ke4
14.Kf2 (28.197.992) 19857

31/55 0:01 0.00 1.f4 b5 2.Kd4 Ke6 3.g4 Kf7 4.Kc5 g5
5.Kxb5 gxf4 6.Kc4 fxg4 7.hxg4 Kf6
8.Kd3 Kg5 9.Ke2 Kxg4 10.Kf2 f3
11.Kf1 f2 12.Kxf2 Kf4 13.Ke2 Ke4
14.Kf2 (33.953.160) 20114

32/55 0:01 0.00 1.f4 b5 2.Kd4 Ke6 3.g4 Kf7 4.Kc5 g5
5.Kxb5 gxf4 6.Kc4 fxg4 7.hxg4 Kf6
8.Kd3 Kg5 9.Ke2 Kxg4 10.Kf2 f3
11.Kf1 f2 12.Kxf2 Kf4 13.Ke2 Ke4
14.Kf2 (40.140.731) 20222

33/55 0:02 0.00 1.f4 b5 2.Kd4 Ke6 3.g4 Kf7 4.Kc5 g5
5.Kxb5 gxf4 6.Kc4 fxg4 7.hxg4 Kf6
8.Kd3 Kg5 9.Ke2 Kxg4 10.Kf2 f3
11.Kf1 f2 12.Kxf2 Kf4 13.Ke2 Ke4
14.Kf2 (57.295.204) 20706

34/55 0:03 0.00 1.f4 b5 2.Kd4 Ke6 3.g4 Kf7 4.Kc5 g5
5.Kxb5 gxf4 6.Kc4 fxg4 7.hxg4 Kf6
8.Kd3 Kg5 9.Ke2 Kxg4 10.Kf2 f3
11.Kf1 Kg3 12.Kg1 f2+ 13.Kf1 Kh2
14.Kxf2 (80.122.010) 21286

35/55 0:05 0.00 1.f4 b5 2.Kd4 Ke6 3.g4 Kf7 4.Kc5 g5
5.Kxb5 gxf4 6.Kc4 fxg4 7.hxg4 Kf6
8.Kd3 Kg5 9.Ke2 Kxg4 10.Kf2 f3
11.Kf1 Kg3 12.Kg1 f2+ 13.Kf1 Kh2
14.Kxf2 (113.809.223) 22407

36/55 0:09 0.00 1.f4 b5 2.Kd4 Ke6 3.g4 Kf7 4.Kc5 g5
5.Kxb5 gxf4 6.Kc4 fxg4 7.hxg4 Kf6
8.Kd3 Kg5 9.Ke2 Kxg4 10.Kf2 f3
11.Kf1 Kg3 12.Kg1 f2+ 13.Kf1 Kh2
14.Kxf2 (234.291.665) 24842

37/55 0:10 0.00 1.f4 b5 2.Kd4 Ke6 3.g4 Kf7 4.Kc5 g5
5.Kxb5 gxf4 6.Kc4 fxg4 7.hxg4 Kf6
8.Kd3 Kg5 9.Ke2 Kxg4 10.Kf2 f3
11.Kf1 Kg3 12.Kg1 f2+ 13.Kf1 Kh2
14.Kxf2 (253.566.325) 24964

38/55 0:12 0.00 1.f4 b5 2.Kd4 Ke6 3.g4 Kf7 4.Kc5 g5
5.Kxb5 gxf4 6.Kc4 fxg4 7.hxg4 Kf6
8.Kd3 Kg5 9.Ke2 Kxg4 10.Kf2 f3
11.Kf1 Kg3 12.Kg1 f2+ 13.Kf1 Kh2
14.Kxf2 (304.774.696) 25387

39/55 0:19 0.00 1.f4 b5 2.Kd4 Ke6 3.g4 Kf7 4.Kc5 g5
5.Kxb5 gxf4 6.Kc4 fxg4 7.hxg4 Kf6
8.Kd3 Kg5 9.Ke2 Kxg4 10.Kf2 f3
11.Kf1 Kg3 12.Kg1 f2+ 13.Kf1 Kh2
14.Kxf2 (516.614.250) 27164

40/55 1:34 0.00 1.f4 b5 2.Kd4 Ke6 3.g4 Kf7 4.Kc5 g5
5.Kxb5 gxf4 6.Kc4 fxg4 7.hxg4 Kf6
8.Kd3 Kg5 9.Ke2 Kxg4 10.Kf2 f3
11.Kf1 Kg3 12.Kg1 f2+ 13.Kf1 Kh2
14.Kxf2 (2.782.650.592) 29573

41/55 4:04 0.00 1.f4 b5 2.Kd4 Ke6 3.g4 Kf7 4.Kc5 g5
5.Kxb5 gxf4 6.Kc4 fxg4 7.hxg4 Kf6
8.Kd3 Kg5 9.Ke2 Kxg4 10.Kf2 f3
11.Kf1 Kg3 12.Kg1 f2+ 13.Kf1 Kh2
14.Kxf2 (7.373.300.745) 30140

42/55 4:09 0.00 1.f4 b5 2.Kd4 Ke6 3.g4 Kf7 4.Kc5 g5
5.Kxb5 gxf4 6.Kc4 fxg4 7.hxg4 Kf6
8.Kd3 Kg5 9.Ke2 Kxg4 10.Kf2 f3
11.Kf1 Kg3 12.Kg1 f2+ 13.Kf1 Kh2
14.Kxf2 (7.490.207.986) 30054

43/55 4:31 0.00 1.f4 b5 2.Kd4 Ke6 3.g4 Kf7 4.Kc5 g5
5.Kxb5 gxf4 6.Kc4 fxg4 7.hxg4 Kf6
8.Kd3 Kg5 9.Ke2 Kxg4 10.Kf2 f3
11.Kf1 Kg3 12.Kg1 f2+ 13.Kf1 Kh2
14.Kxf2 (8.122.271.067) 29967
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan