Positions that all engines badly misevaluate

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Warp
Posts: 9
Joined: Sun May 15, 2016 9:20 am

Positions that all engines badly misevaluate

Post by Warp »

Consider the following position:

[d]r3k2r/8/8/1p1p1p1p/pPpPpPpP/P1P1P1P1/8/4K3 w - - 0 1

Any person can quickly see that this is a complete impasse, a dead draw. It's completely trivial for white to force a draw: Just move the king and that's it.

However, it seems that no engine can see this. Every single engine I have tried (about 20 of them) evaluates this as a clear victory for black (typically evaluating this in the range between -10 and -11.) The engine that gets it the least wrong seems to be Texel, evaluating it at -3.23, but basically all other engines evaluate it at -10 or more.

Just for the fun of it, I told Stockfish "you think black can win this? Well, prove it!" and I put it to play against itself. Funnily, by about move 48 SF even tried to avoid the 50-move rule by sacrificing a rook by playing it to e5, but then as white it didn't take the bait, and just moved the king. Of course the game ended in 50-move rule.

Why do all engines misevaluate this so badly? Are there other similar positions?
Guenther
Posts: 4718
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Positions that all engines badly misevaluate

Post by Guenther »

Warp wrote:Consider the following position:

r3k2r/8/8/1p1p1p1p/pPpPpPpP/P1P1P1P1/8/4K3 w - - 0 1

Any person can quickly see that this is a complete impasse, a dead draw. It's completely trivial for white to force a draw: Just move the king and that's it.

However, it seems that no engine can see this. Every single engine I have tried (about 20 of them) evaluates this as a clear victory for black (typically evaluating this in the range between -10 and -11.) The engine that gets it the least wrong seems to be Texel, evaluating it at -3.23, but basically all other engines evaluate it at -10 or more.

Just for the fun of it, I told Stockfish "you think black can win this? Well, prove it!" and I put it to play against itself. Funnily, by about move 48 SF even tried to avoid the 50-move rule by sacrificing a rook by playing it to e5, but then as white it didn't take the bait, and just moved the king. Of course the game ended in 50-move rule.

Why do all engines misevaluate this so badly? Are there other similar positions?
There are several programs which have blocked positions recognition.
(at least for the 'easy' safe types). Patzer is one of them and was the first.


Deep Patzer 3.80:

Code: Select all

exclude: none best +tail                                          
dep	score	nodes	time	(not shown:  tbhits	knps	seldep)
 22	  0.00 	17.9M  	0:16.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 21	  0.00 	10.9M  	0:09.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 20	  0.00 	6.55M  	0:05.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 19	  0.00 	3.94M  	0:03.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 18	  0.00 	2.66M  	0:02.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 17	  0.00 	2.05M  	0:01.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 16	  0.00 	1.62M  	0:01.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 15	  0.00 	1.25M  	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 14	  0.00 	955853	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 13	  0.00 	651518	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 12	  0.00 	454519	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 11	  0.00 	246948	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
 10	  0.00 	156501	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
  9	  0.00 	60974  	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
  8	  0.00 	35683  	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
  7	  0.00 	9008    	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
  6	  0.00 	4991    	0:00.00	Kd2 O-O Ke1 Rb8 Kd2 Ra8
  5	  0.00 	792      	0:00.00	Kd2 O-O Ke1 Rb8
  4	  0.00 	431      	0:00.00	Kd2 O-O Ke1 Rb8
  3	  0.00 	52        	0:00.00	Kd2 O-O
  2	  0.00 	25        	0:00.00	Kd2 O-O
  1	  0.00 	3          	0:00.00	Kd2
  0	#
Edit:

I wonder why the castling right seems to be back after pasting that FEN in WB? (In analysis and game mode)

Guenther
Dann Corbit
Posts: 12870
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Positions that all engines badly misevaluate

Post by Dann Corbit »

Crafty with draw detection will instantly see it as a draw (last version that could be compiled for this feature is Crafty 20.0).

Every engine will give you an evaluation train that indicates draw.
At some point, for each new ply, there will be exactly the same score.
The inability to make progress is another way to detect a computer chess draw.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
velmarin
Posts: 1600
Joined: Mon Feb 21, 2011 9:48 am

Re: Positions that all engines badly misevaluate

Post by velmarin »

Don't get bored.
Looking for other fun, I do not think that this worry anyone that develops an engine.
It is very easy to put a routine that detects this, but this will not happen in 0.00000000000... 100% of games.
Dann Corbit
Posts: 12870
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Positions that all engines badly misevaluate

Post by Dann Corbit »

velmarin wrote:Don't get bored.
Looking for other fun, I do not think that this worry anyone that develops an engine.
It is very easy to put a routine that detects this, but this will not happen in 0.00000000000... 100% of games.
Unless people aim to build a wall. It is a classical anti-computer strategy. I remember seeing it in a game posted to this forum a long time ago by a GM against a machine, where he successfully built a wall for a draw.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12870
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Positions that all engines badly misevaluate

Post by Dann Corbit »

Warp wrote:Consider the following position:

[d]r3k2r/8/8/1p1p1p1p/pPpPpPpP/P1P1P1P1/8/4K3 w - - 0 1

Any person can quickly see that this is a complete impasse, a dead draw. It's completely trivial for white to force a draw: Just move the king and that's it.

However, it seems that no engine can see this. Every single engine I have tried (about 20 of them) evaluates this as a clear victory for black (typically evaluating this in the range between -10 and -11.) The engine that gets it the least wrong seems to be Texel, evaluating it at -3.23, but basically all other engines evaluate it at -10 or more.

Just for the fun of it, I told Stockfish "you think black can win this? Well, prove it!" and I put it to play against itself. Funnily, by about move 48 SF even tried to avoid the 50-move rule by sacrificing a rook by playing it to e5, but then as white it didn't take the bait, and just moved the king. Of course the game ended in 50-move rule.

Why do all engines misevaluate this so badly? Are there other similar positions?
Crafty with draw detection enabled:

X:\users\dcorbit\arena\Engines\stockfish\troy>craftydd

Initializing multiple threads.
System is SMP, not NUMA.
EPD Kit revision date: 1996.04.21
unable to open book file [./book.bin].
book is disabled
unable to open book file [./books.bin].

Crafty v20.0 (1 cpus)

White(1): setboard r3k2r/8/8/1p1p1p1p/pPpPpPpP/P1P1P1P1/8/4K3 w - -
White(1): go
clearing hash tables
time surplus 0.00 time limit 30.00 (3:30)
depth time score variation (1)
15 0.28 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
15-> 0.28 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
16 0.38 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
16-> 0.38 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
17 0.47 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
17-> 0.47 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
18 0.58 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
18-> 0.60 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
19 0.74 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
19-> 0.74 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
20 0.97 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
20-> 0.99 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
21 1.66 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
21-> 1.71 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
22 10.05 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
22-> 10.16 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
23 13.80 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
23-> 13.83 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
^C 24 15.02 1/5* 1. Kf2
X:\users\dcorbit\arena\Engines\stockfish\troy>
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12870
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Positions that all engines badly misevaluate

Post by Dann Corbit »

velmarin wrote:Don't get bored.
Looking for other fun, I do not think that this worry anyone that develops an engine.
It is very easy to put a routine that detects this, but this will not happen in 0.00000000000... 100% of games.
It's not expensive to find simple problems like this one.

For instance, form a bitmap of white pawns + white bishops.
Form a bitmap of black pawns + black bishops.

Step 1:
+9 bit shift ANDed with your own original non-shifted p+B bitmap
(Count collisions)

Step 2:
-9 bit shift ANDed with your own original non-shifted p+B bitmap
(Count collisions)

These two steps give you a count of pawn wall continuity.

Step 3:
+8 shift ANDed with the enemy p+B bitmap
(count collisions)

Step 3 gives you a count of rammed pawns.

If there are holes in the wall (especially on open files you do not own), subtract from the score.

If there are enemy knights on the board, you may have to do some careful calculation if you really want to know about the effectiveness of your wall. Knights really kill these walls, if the opponent is more than 3 pawns ahead.

I would say that if you have less than 6 pawns or if the enemy has knights, then you probably don't need to bother with the calculation.

A bit more advanced, and you can even see things like how to get started with WAC.230 (which, as Alex Szabo has shown doesn't actually win, but still it is the only move which has any winning chances).
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
velmarin
Posts: 1600
Joined: Mon Feb 21, 2011 9:48 am

Re: Positions that all engines badly misevaluate

Post by velmarin »

Well,
bored anyone who tries to play this position from the initial position.
Code ippolít each new game it is estimated that there are no more than9 Queens, more than 10 towers, or more than 8 pawns. This is useless and ridiculous.
Dann Corbit
Posts: 12870
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Positions that all engines badly misevaluate

Post by Dann Corbit »

velmarin wrote:Well,
bored anyone who tries to play this position from the initial position.
Code ippolít each new game it is estimated that there are no more than9 Queens, more than 10 towers, or more than 8 pawns. This is useless and ridiculous.
For game playing, true.
But I also enjoy puzzle solving, and I am not alone.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
jdart
Posts: 4434
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Positions that all engines badly misevaluate

Post by jdart »

I think completely locked pawn walls are rare. You can have one that persists for a long time but seldom can you guarantee it won't ever be broken, as in the example given.

It is one of those things where you have an intuitive knowledge this is important but when you get into the trillions of possible positions scoring them all accurately becomes quite an issue.

--Jon