Positions that all engines badly misevaluate

Warp · Post by **Warp** » Fri Jul 29, 2016 6:13 pm

Consider the following position:

[d]r3k2r/8/8/1p1p1p1p/pPpPpPpP/P1P1P1P1/8/4K3 w - - 0 1

Any person can quickly see that this is a complete impasse, a dead draw. It's completely trivial for white to force a draw: Just move the king and that's it.

However, it seems that no engine can see this. Every single engine I have tried (about 20 of them) evaluates this as a clear victory for black (typically evaluating this in the range between -10 and -11.) The engine that gets it the least wrong seems to be Texel, evaluating it at -3.23, but basically all other engines evaluate it at -10 or more.

Just for the fun of it, I told Stockfish "you think black can win this? Well, prove it!" and I put it to play against itself. Funnily, by about move 48 SF even tried to avoid the 50-move rule by sacrificing a rook by playing it to e5, but then as white it didn't take the bait, and just moved the king. Of course the game ended in 50-move rule.

Why do all engines misevaluate this so badly? Are there other similar positions?

Guenther · Post by **Guenther** » Fri Jul 29, 2016 6:25 pm

Warp wrote:Consider the following position:

r3k2r/8/8/1p1p1p1p/pPpPpPpP/P1P1P1P1/8/4K3 w - - 0 1

Any person can quickly see that this is a complete impasse, a dead draw. It's completely trivial for white to force a draw: Just move the king and that's it.

However, it seems that no engine can see this. Every single engine I have tried (about 20 of them) evaluates this as a clear victory for black (typically evaluating this in the range between -10 and -11.) The engine that gets it the least wrong seems to be Texel, evaluating it at -3.23, but basically all other engines evaluate it at -10 or more.

Just for the fun of it, I told Stockfish "you think black can win this? Well, prove it!" and I put it to play against itself. Funnily, by about move 48 SF even tried to avoid the 50-move rule by sacrificing a rook by playing it to e5, but then as white it didn't take the bait, and just moved the king. Of course the game ended in 50-move rule.

Why do all engines misevaluate this so badly? Are there other similar positions?

There are several programs which have blocked positions recognition.
(at least for the 'easy' safe types). Patzer is one of them and was the first.

Deep Patzer 3.80:

Code: Select all

exclude: none best +tail                                          
dep	score	nodes	time	(not shown:  tbhits	knps	seldep)
 22	  0.00 	17.9M  	0:16.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 21	  0.00 	10.9M  	0:09.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 20	  0.00 	6.55M  	0:05.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 19	  0.00 	3.94M  	0:03.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 18	  0.00 	2.66M  	0:02.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 17	  0.00 	2.05M  	0:01.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 16	  0.00 	1.62M  	0:01.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 15	  0.00 	1.25M  	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 14	  0.00 	955853	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 13	  0.00 	651518	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 12	  0.00 	454519	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 11	  0.00 	246948	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 10	  0.00 	156501	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
  9	  0.00 	60974  	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
  8	  0.00 	35683  	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
  7	  0.00 	9008    	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
  6	  0.00 	4991    	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
  5	  0.00 	792      	0:00.00	Kd2 O-O Ke1 Rb8
  4	  0.00 	431      	0:00.00	Kd2 O-O Ke1 Rb8
  3	  0.00 	52        	0:00.00	Kd2 O-O
  2	  0.00 	25        	0:00.00	Kd2 O-O
  1	  0.00 	3          	0:00.00	Kd2
  0	#

Edit:

I wonder why the castling right seems to be back after pasting that FEN in WB? (In analysis and game mode)

Guenther

Dann Corbit · Post by **Dann Corbit** » Fri Jul 29, 2016 6:43 pm

Crafty with draw detection will instantly see it as a draw (last version that could be compiled for this feature is Crafty 20.0).

Every engine will give you an evaluation train that indicates draw.
At some point, for each new ply, there will be exactly the same score.
The inability to make progress is another way to detect a computer chess draw.

velmarin · Post by **velmarin** » Fri Jul 29, 2016 7:33 pm

Don't get bored.
Looking for other fun, I do not think that this worry anyone that develops an engine.
It is very easy to put a routine that detects this, but this will not happen in 0.00000000000... 100% of games.

Dann Corbit · Post by **Dann Corbit** » Fri Jul 29, 2016 8:12 pm

velmarin wrote:Don't get bored.
Looking for other fun, I do not think that this worry anyone that develops an engine.
It is very easy to put a routine that detects this, but this will not happen in 0.00000000000... 100% of games.

Unless people aim to build a wall. It is a classical anti-computer strategy. I remember seeing it in a game posted to this forum a long time ago by a GM against a machine, where he successfully built a wall for a draw.

Dann Corbit · Post by **Dann Corbit** » Fri Jul 29, 2016 8:14 pm

Warp wrote:Consider the following position:

[d]r3k2r/8/8/1p1p1p1p/pPpPpPpP/P1P1P1P1/8/4K3 w - - 0 1

Any person can quickly see that this is a complete impasse, a dead draw. It's completely trivial for white to force a draw: Just move the king and that's it.

However, it seems that no engine can see this. Every single engine I have tried (about 20 of them) evaluates this as a clear victory for black (typically evaluating this in the range between -10 and -11.) The engine that gets it the least wrong seems to be Texel, evaluating it at -3.23, but basically all other engines evaluate it at -10 or more.

Just for the fun of it, I told Stockfish "you think black can win this? Well, prove it!" and I put it to play against itself. Funnily, by about move 48 SF even tried to avoid the 50-move rule by sacrificing a rook by playing it to e5, but then as white it didn't take the bait, and just moved the king. Of course the game ended in 50-move rule.

Why do all engines misevaluate this so badly? Are there other similar positions?

Crafty with draw detection enabled:

X:\users\dcorbit\arena\Engines\stockfish\troy>craftydd

Initializing multiple threads.
System is SMP, not NUMA.
EPD Kit revision date: 1996.04.21
unable to open book file [./book.bin].
book is disabled
unable to open book file [./books.bin].

Crafty v20.0 (1 cpus)

White(1): setboard r3k2r/8/8/1p1p1p1p/pPpPpPpP/P1P1P1P1/8/4K3 w - -
White(1): go
clearing hash tables
time surplus 0.00 time limit 30.00 (3:30)
depth time score variation (1)
15 0.28 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
15-> 0.28 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
16 0.38 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
16-> 0.38 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
17 0.47 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
17-> 0.47 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
18 0.58 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
18-> 0.60 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
19 0.74 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
19-> 0.74 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
20 0.97 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
20-> 0.99 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
21 1.66 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
21-> 1.71 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
22 10.05 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
22-> 10.16 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
23 13.80 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
23-> 13.83 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
^C 24 15.02 1/5* 1. Kf2
X:\users\dcorbit\arena\Engines\stockfish\troy>

Dann Corbit · Post by **Dann Corbit** » Fri Jul 29, 2016 8:22 pm

velmarin wrote:Don't get bored.
Looking for other fun, I do not think that this worry anyone that develops an engine.
It is very easy to put a routine that detects this, but this will not happen in 0.00000000000... 100% of games.

It's not expensive to find simple problems like this one.

For instance, form a bitmap of white pawns + white bishops.
Form a bitmap of black pawns + black bishops.

Step 1:
+9 bit shift ANDed with your own original non-shifted p+B bitmap
(Count collisions)

Step 2:
-9 bit shift ANDed with your own original non-shifted p+B bitmap
(Count collisions)

These two steps give you a count of pawn wall continuity.

Step 3:
+8 shift ANDed with the enemy p+B bitmap
(count collisions)

Step 3 gives you a count of rammed pawns.

If there are holes in the wall (especially on open files you do not own), subtract from the score.

If there are enemy knights on the board, you may have to do some careful calculation if you really want to know about the effectiveness of your wall. Knights really kill these walls, if the opponent is more than 3 pawns ahead.

I would say that if you have less than 6 pawns or if the enemy has knights, then you probably don't need to bother with the calculation.

A bit more advanced, and you can even see things like how to get started with WAC.230 (which, as Alex Szabo has shown doesn't actually win, but still it is the only move which has any winning chances).

velmarin · Post by **velmarin** » Fri Jul 29, 2016 8:27 pm

Well,
bored anyone who tries to play this position from the initial position.
Code ippolít each new game it is estimated that there are no more than9 Queens, more than 10 towers, or more than 8 pawns. This is useless and ridiculous.

Dann Corbit · Post by **Dann Corbit** » Fri Jul 29, 2016 8:31 pm

velmarin wrote:Well,
bored anyone who tries to play this position from the initial position.
Code ippolít each new game it is estimated that there are no more than9 Queens, more than 10 towers, or more than 8 pawns. This is useless and ridiculous.

For game playing, true.
But I also enjoy puzzle solving, and I am not alone.

jdart · Post by **jdart** » Fri Jul 29, 2016 8:34 pm

I think completely locked pawn walls are rare. You can have one that persists for a long time but seldom can you guarantee it won't ever be broken, as in the example given.

It is one of those things where you have an intuitive knowledge this is important but when you get into the trillions of possible positions scoring them all accurately becomes quite an issue.

--Jon

Positions that all engines badly misevaluate

Positions that all engines badly misevaluate

Re: Positions that all engines badly misevaluate

Re: Positions that all engines badly misevaluate

Re: Positions that all engines badly misevaluate

Re: Positions that all engines badly misevaluate

Re: Positions that all engines badly misevaluate

Re: Positions that all engines badly misevaluate

Re: Positions that all engines badly misevaluate

Re: Positions that all engines badly misevaluate

Re: Positions that all engines badly misevaluate