Consider the following position:
[d]r3k2r/8/8/1p1p1p1p/pPpPpPpP/P1P1P1P1/8/4K3 w - - 0 1
Any person can quickly see that this is a complete impasse, a dead draw. It's completely trivial for white to force a draw: Just move the king and that's it.
However, it seems that no engine can see this. Every single engine I have tried (about 20 of them) evaluates this as a clear victory for black (typically evaluating this in the range between -10 and -11.) The engine that gets it the least wrong seems to be Texel, evaluating it at -3.23, but basically all other engines evaluate it at -10 or more.
Just for the fun of it, I told Stockfish "you think black can win this? Well, prove it!" and I put it to play against itself. Funnily, by about move 48 SF even tried to avoid the 50-move rule by sacrificing a rook by playing it to e5, but then as white it didn't take the bait, and just moved the king. Of course the game ended in 50-move rule.
Why do all engines misevaluate this so badly? Are there other similar positions?
Positions that all engines badly misevaluate
Moderator: Ras
-
Guenther
- Posts: 4718
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: Positions that all engines badly misevaluate
There are several programs which have blocked positions recognition.Warp wrote:Consider the following position:
r3k2r/8/8/1p1p1p1p/pPpPpPpP/P1P1P1P1/8/4K3 w - - 0 1
Any person can quickly see that this is a complete impasse, a dead draw. It's completely trivial for white to force a draw: Just move the king and that's it.
However, it seems that no engine can see this. Every single engine I have tried (about 20 of them) evaluates this as a clear victory for black (typically evaluating this in the range between -10 and -11.) The engine that gets it the least wrong seems to be Texel, evaluating it at -3.23, but basically all other engines evaluate it at -10 or more.
Just for the fun of it, I told Stockfish "you think black can win this? Well, prove it!" and I put it to play against itself. Funnily, by about move 48 SF even tried to avoid the 50-move rule by sacrificing a rook by playing it to e5, but then as white it didn't take the bait, and just moved the king. Of course the game ended in 50-move rule.
Why do all engines misevaluate this so badly? Are there other similar positions?
(at least for the 'easy' safe types). Patzer is one of them and was the first.
Deep Patzer 3.80:
Code: Select all
exclude: none best +tail
dep score nodes time (not shown: tbhits knps seldep)
22 0.00 17.9M 0:16.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
21 0.00 10.9M 0:09.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
20 0.00 6.55M 0:05.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
19 0.00 3.94M 0:03.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
18 0.00 2.66M 0:02.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
17 0.00 2.05M 0:01.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
16 0.00 1.62M 0:01.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
15 0.00 1.25M 0:00.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
14 0.00 955853 0:00.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
13 0.00 651518 0:00.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
12 0.00 454519 0:00.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
11 0.00 246948 0:00.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
10 0.00 156501 0:00.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
9 0.00 60974 0:00.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
8 0.00 35683 0:00.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
7 0.00 9008 0:00.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
6 0.00 4991 0:00.00 Kd2 O-O Ke1 Rb8 Kd2 Ra8
5 0.00 792 0:00.00 Kd2 O-O Ke1 Rb8
4 0.00 431 0:00.00 Kd2 O-O Ke1 Rb8
3 0.00 52 0:00.00 Kd2 O-O
2 0.00 25 0:00.00 Kd2 O-O
1 0.00 3 0:00.00 Kd2
0 #
I wonder why the castling right seems to be back after pasting that FEN in WB? (In analysis and game mode)
Guenther
-
Dann Corbit
- Posts: 12870
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Positions that all engines badly misevaluate
Crafty with draw detection will instantly see it as a draw (last version that could be compiled for this feature is Crafty 20.0).
Every engine will give you an evaluation train that indicates draw.
At some point, for each new ply, there will be exactly the same score.
The inability to make progress is another way to detect a computer chess draw.
Every engine will give you an evaluation train that indicates draw.
At some point, for each new ply, there will be exactly the same score.
The inability to make progress is another way to detect a computer chess draw.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
velmarin
- Posts: 1600
- Joined: Mon Feb 21, 2011 9:48 am
Re: Positions that all engines badly misevaluate
Don't get bored.
Looking for other fun, I do not think that this worry anyone that develops an engine.
It is very easy to put a routine that detects this, but this will not happen in 0.00000000000... 100% of games.
Looking for other fun, I do not think that this worry anyone that develops an engine.
It is very easy to put a routine that detects this, but this will not happen in 0.00000000000... 100% of games.
-
Dann Corbit
- Posts: 12870
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Positions that all engines badly misevaluate
Unless people aim to build a wall. It is a classical anti-computer strategy. I remember seeing it in a game posted to this forum a long time ago by a GM against a machine, where he successfully built a wall for a draw.velmarin wrote:Don't get bored.
Looking for other fun, I do not think that this worry anyone that develops an engine.
It is very easy to put a routine that detects this, but this will not happen in 0.00000000000... 100% of games.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
Dann Corbit
- Posts: 12870
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Positions that all engines badly misevaluate
Crafty with draw detection enabled:Warp wrote:Consider the following position:
[d]r3k2r/8/8/1p1p1p1p/pPpPpPpP/P1P1P1P1/8/4K3 w - - 0 1
Any person can quickly see that this is a complete impasse, a dead draw. It's completely trivial for white to force a draw: Just move the king and that's it.
However, it seems that no engine can see this. Every single engine I have tried (about 20 of them) evaluates this as a clear victory for black (typically evaluating this in the range between -10 and -11.) The engine that gets it the least wrong seems to be Texel, evaluating it at -3.23, but basically all other engines evaluate it at -10 or more.
Just for the fun of it, I told Stockfish "you think black can win this? Well, prove it!" and I put it to play against itself. Funnily, by about move 48 SF even tried to avoid the 50-move rule by sacrificing a rook by playing it to e5, but then as white it didn't take the bait, and just moved the king. Of course the game ended in 50-move rule.
Why do all engines misevaluate this so badly? Are there other similar positions?
X:\users\dcorbit\arena\Engines\stockfish\troy>craftydd
Initializing multiple threads.
System is SMP, not NUMA.
EPD Kit revision date: 1996.04.21
unable to open book file [./book.bin].
book is disabled
unable to open book file [./books.bin].
Crafty v20.0 (1 cpus)
White(1): setboard r3k2r/8/8/1p1p1p1p/pPpPpPpP/P1P1P1P1/8/4K3 w - -
White(1): go
clearing hash tables
time surplus 0.00 time limit 30.00 (3:30)
depth time score variation (1)
15 0.28 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
15-> 0.28 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
16 0.38 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
16-> 0.38 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
17 0.47 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
17-> 0.47 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
18 0.58 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
18-> 0.60 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
19 0.74 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
19-> 0.74 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
20 0.97 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
20-> 0.99 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
21 1.66 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
21-> 1.71 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
22 10.05 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
22-> 10.16 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
23 13.80 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
23-> 13.83 0.01 1. Kf2 Ra5 2. Kg2 Ra6 3. Kh2 Ra5 4.
Kg2
^C 24 15.02 1/5* 1. Kf2
X:\users\dcorbit\arena\Engines\stockfish\troy>
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
Dann Corbit
- Posts: 12870
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Positions that all engines badly misevaluate
It's not expensive to find simple problems like this one.velmarin wrote:Don't get bored.
Looking for other fun, I do not think that this worry anyone that develops an engine.
It is very easy to put a routine that detects this, but this will not happen in 0.00000000000... 100% of games.
For instance, form a bitmap of white pawns + white bishops.
Form a bitmap of black pawns + black bishops.
Step 1:
+9 bit shift ANDed with your own original non-shifted p+B bitmap
(Count collisions)
Step 2:
-9 bit shift ANDed with your own original non-shifted p+B bitmap
(Count collisions)
These two steps give you a count of pawn wall continuity.
Step 3:
+8 shift ANDed with the enemy p+B bitmap
(count collisions)
Step 3 gives you a count of rammed pawns.
If there are holes in the wall (especially on open files you do not own), subtract from the score.
If there are enemy knights on the board, you may have to do some careful calculation if you really want to know about the effectiveness of your wall. Knights really kill these walls, if the opponent is more than 3 pawns ahead.
I would say that if you have less than 6 pawns or if the enemy has knights, then you probably don't need to bother with the calculation.
A bit more advanced, and you can even see things like how to get started with WAC.230 (which, as Alex Szabo has shown doesn't actually win, but still it is the only move which has any winning chances).
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
velmarin
- Posts: 1600
- Joined: Mon Feb 21, 2011 9:48 am
Re: Positions that all engines badly misevaluate
Well,
bored anyone who tries to play this position from the initial position.
Code ippolít each new game it is estimated that there are no more than9 Queens, more than 10 towers, or more than 8 pawns. This is useless and ridiculous.
bored anyone who tries to play this position from the initial position.
Code ippolít each new game it is estimated that there are no more than9 Queens, more than 10 towers, or more than 8 pawns. This is useless and ridiculous.
-
Dann Corbit
- Posts: 12870
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Positions that all engines badly misevaluate
For game playing, true.velmarin wrote:Well,
bored anyone who tries to play this position from the initial position.
Code ippolít each new game it is estimated that there are no more than9 Queens, more than 10 towers, or more than 8 pawns. This is useless and ridiculous.
But I also enjoy puzzle solving, and I am not alone.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
jdart
- Posts: 4434
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Positions that all engines badly misevaluate
I think completely locked pawn walls are rare. You can have one that persists for a long time but seldom can you guarantee it won't ever be broken, as in the example given.
It is one of those things where you have an intuitive knowledge this is important but when you get into the trillions of possible positions scoring them all accurately becomes quite an issue.
--Jon
It is one of those things where you have an intuitive knowledge this is important but when you get into the trillions of possible positions scoring them all accurately becomes quite an issue.
--Jon