The largest Eval Jump I've ever seen from any program!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

User avatar
Eraserheads
Posts: 235
Joined: Fri Mar 10, 2006 9:19 am
Location: Quezon City, Philippines

The largest Eval Jump I've ever seen from any program!

Post by Eraserheads »

Virtual Chess 2 vs Rybka 3
Blitz 5 min. Shredder Classic GUI. I manually entered the moves.

I always love pitting the old programs like CSTAL and Virtual Chess against the likes of TOGA, SHREDDER or RYBKA. In this one game I played (on one pc) VChess 2 and Rybka. Play started normally, with Rybka slowly gaining the upper hand. Around move 60 it was favoring its position by as much as 14 points!
Rybka as Black was evaluating the position after move 60 as high as 14+ for itself, only to lose its advantage, and eventually draw the game. Vas, did I just stumble on a big bug?

[Event "?"]
[Site "?"]
[Date "2008.08.13"]
[Round "?"]
[White "Virtual Chess 2"]
[Black "Rybka 3"]
[Result "1/2-1/2"]

1. d4 {0s} Nf6 {book 0s} 2. c4 {8s} e6 {book 0s} 3. Nf3
{3s} b6 {book 0s} 4. a3 {3s} Ba6 {book 0s} 5. Qc2 {4s} Bb7
{book 0s} 6. Nc3 {4s} c5 {book 0s} 7. e4 {4s} cxd4 {book
0s} 8. Nxd4 {24s} Nc6 {book 0s} 9. Nxc6 {5s} Bxc6 {book 0s}
10. Bf4 {4s} Nh5 {-0.05/11 6s} 11. Be3 {3:46m} Qc7
{+0.04/12 19s} 12. b4 {15s} Be7 {-0.11/10 6s} 13. Be2 {4s}
Nf4 {-0.22/11 3s} 14. Bf3 {3s} O-O {-0.33/11 13s} 15. O-O
{4s} f5 {-0.37/10 3s} 16. Rad1 {5s} Rac8 {-0.23/10 7s}
17. Rfe1 {4s} Bf6 {-0.34/9 4s} 18. Bd4 {4s} a6 {-0.31/9 2s}
19. Bxf6 {17s} Rxf6 {-0.28/11 4s} 20. Re3 {0s} fxe4
{-0.22/10 6s} 21. Nxe4 {8s} Bxe4 {-0.18/11 3s} 22. Bxe4
{2s} Qxc4 {-0.18/11 1s} 23. Bxh7+ {5s} Kf8 {-0.25/11 1s}
24. Qxc4 {1s} Rxc4 {-0.22/12 2s} 25. Bb1 {2s} Ke7 {-0.18/12
3s} 26. h4 {4s} Rc3 {-0.28/12 6s} 27. Rde1 {11s} Rxe3
{-0.26/13 2s} 28. fxe3 {7s} Nd5 {-0.22/14 2s} 29. Be4 {16s}
Nc3 {-0.34/12 1s} 30. Bd3 {3s} b5 {-0.38/13 4s} 31. g4 {3s}
e5 {-0.58/12 4s} 32. Rc1 {4s} Nd5 {-0.40/14 7s} 33. Rc5
{8s} Kd6 {-0.40/14 2s} 34. e4 {3s} Nc7 {-0.42/14 3s}
35. Kg2 {11s} Ne6 {-0.52/15 2s} 36. g5 {5s} Rf8 {-0.59/15
4s} 37. Rc2 {5s} g6 {-0.55/15 5s} 38. Kg3 {7s} Ke7
{-0.55/16 1s} 39. Rf2 {37s} Nf4 {-0.39/16 7s} 40. Bb1 {0s}
Ra8 {-0.48/15 8s} 41. Rc2 {24s} a5 {-0.55/14 10s} 42. Rc5
{6s} d6 {-0.75/12 4s} 43. Rc7+ {12s} Kd8 {-1.26/12 3s}
44. Rf7 {9s} Ra6 {-1.14/12 1s} 45. h5 {40s} gxh5 {-1.74/11
1s} 46. Rf8+ {5s} Ke7 {-2.57/10 1s} 47. Rh8 {8s} axb4
{-3.50/11 0s} 48. Rh7+ {1:07m} Kf8 {-3.50/9 0s} 49. axb4
{6s} Ra1 {-3.96/12 1s} 50. Bc2 {42s} Rg1+ {-4.62/11 2s}
51. Kf3 {7s} Rc1 {-4.62/10 0s} 52. Rh8+ {31s} Kg7 {-4.62/10
0s} 53. Rc8 {4s} Ne6 {-4.62/10 0s} 54. Kg3 {4s} Nd4
{-4.62/9 0s} 55. Rd8 {3s} Nxc2 {-5.04/11 2s} 56. Rxd6 {6s}
Nxb4 {-5.04/9 0s} 57. Kh4 {6s} Rh1+ {-5.06/8 0s} 58. Kg3
{4s} Nc2 {-5.21/9 2s} 59. Rd5 {56s} h4+ {-5.62/9 2s}
60. Kg2 {3s} Ne3+ {-14.08/8 4s} 61. Kxh1 {6s} Nxd5 {-5.11/7
0s} 62. exd5 {1s} b4 {-1.66/12 1s} 63. Kh2 {19s} b3
{-1.68/12 1s} 64. d6 {3s} b2 {-1.75/13 3s} 65. d7 {1s} b1=Q
{-1.86/14 1s} 66. d8=Q {5s} Qf5 {-2.05/15 7s} 67. Qe7+
{13s} Kg6 {-2.17/13 1s} 68. Qd6+ {3s} Kxg5 {-2.17/14 1s}
69. Qd8+ {4s} Kh5 {-2.16/13 0s} 70. Qd1+ {4s} Qg4 {-2.50/12
0s} 71. Qd3 {3s} Qg3+ {-3.37/17 1s} 72. Qxg3 {7s} hxg3+
{-3.37/30 0s} 73. Kxg3 {0s} Kg5 {-3.37/36 0s} 74. Kf3 {4s}
Kf5 {-3.37/38 1s} 75. Ke3 {6s} e4 {-3.37/38 0s} 76. Ke2
{3s} Kf4 {-3.37/41 1s} 77. Kf2 {3s} e3+ {-3.37/42 2s}
78. Ke2 {3s} Ke4 {-3.37/41 1s} 79. Ke1 {3s} Kd3 {-3.37/41
0s} 80. Kd1 {5s} Kc4 {-3.37/41 0s} 81. Ke1 {4s} Kd5
{-3.37/42 0s} 82. Ke2 {3s} Kd4 {-3.37/43 10s} 83. Ke1 {3s}
Ke4 {-3.37/42 8s} 84. Ke2 {4s} Kf4 {-3.37/42 7s} 85. Ke1
{3s} Kf5 {0.00/40 0s} 86. Kd1 {4s} Ke5 {0.00/45 0s} 1/2-1/2


The position where Rybka (black) favored its position by as high as 14 points!

[D]8/6k1/8/1p1Rp1P1/4P2p/4n3/6K1/7r w - -
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: The largest Eval Jump I've ever seen from any program!

Post by George Tsavdaris »

Eraserheads wrote:Virtual Chess 2 vs Rybka 3
Blitz 5 min. Shredder Classic GUI. I manually entered the moves.

I always love pitting the old programs like CSTAL and Virtual Chess against the likes of TOGA, SHREDDER or RYBKA. In this one game I played (on one pc) VChess 2 and Rybka. Play started normally, with Rybka slowly gaining the upper hand. Around move 60 it was favoring its position by as much as 14 points!
Rybka as Black was evaluating the position after move 60 as high as 14+ for itself, only to lose its advantage, and eventually draw the game. Vas, did I just stumble on a big bug?
Nope, not a bug.

It was just needed some more seconds to see it's a bad move.
On a Quad or an Octal computer it would need less than 1 second to avoid it.

Analysis by Rybka 3 1-cpu 32-bit :

Virtual Chess 2 - Rybka 3, 2008
8/6k1/8/1p1Rp1P1/4P2p/8/2n3K1/7r b - - 0 1

Analysis by Rybka 3 1-cpu 32-bit :

60...Nc2-e3+
-+ (-11.97) Depth: 7 00:00:01 18kN
60...Nc2-e3+
-+ (-14.08) Depth: 8 00:00:06 251kN
60...Nc2-e3+
-+ (-14.13) Depth: 9 00:00:06 277kN
60...Nc2-e3+ 61.Kg2xh1 Ne3xd5 62.e4xd5 b5-b4 63.d5-d6 b4-b3 64.d6-d7 b3-b2 65.d7-d8Q b2-b1Q+ 66.Kh1-h2 Qb1-f5 67.Kh2-h1 Kg7-g6 68.Kh1-h2
-+ (-1.63) Depth: 10 00:00:19 1099kN

60...Rh1-a1 61.Kg2-h3 Nc2-e3 62.Rd5-d7+ Kg7-g6 63.Kh3xh4 Ra1-h1+ 64.Kh4-g3 Kg6xg5 65.Rd7-b7 Rh1-b1 66.Rb7-g7+ Kg5-f6 67.Rg7-c7
-+ (-4.60) Depth: 10 00:00:22 1196kN
60...Rh1-e1
-+ (-5.14) Depth: 10 00:00:27 1376kN
60...Rh1-e1
-+ (-5.59) Depth: 11 00:00:58 2475kN, tb=4
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
User avatar
Eraserheads
Posts: 235
Joined: Fri Mar 10, 2006 9:19 am
Location: Quezon City, Philippines

Re: The largest Eval Jump I've ever seen from any program!

Post by Eraserheads »

Thanks! I just have a dual core rig.

The position is still drawish, but Rybka still evaluates the position favorably.
Uri Blass
Posts: 10102
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: The largest Eval Jump I've ever seen from any program!

Post by Uri Blass »

George Tsavdaris wrote:
Eraserheads wrote:Virtual Chess 2 vs Rybka 3
Blitz 5 min. Shredder Classic GUI. I manually entered the moves.

I always love pitting the old programs like CSTAL and Virtual Chess against the likes of TOGA, SHREDDER or RYBKA. In this one game I played (on one pc) VChess 2 and Rybka. Play started normally, with Rybka slowly gaining the upper hand. Around move 60 it was favoring its position by as much as 14 points!
Rybka as Black was evaluating the position after move 60 as high as 14+ for itself, only to lose its advantage, and eventually draw the game. Vas, did I just stumble on a big bug?
Nope, not a bug.

It was just needed some more seconds to see it's a bad move.
On a Quad or an Octal computer it would need less than 1 second to avoid it.

Analysis by Rybka 3 1-cpu 32-bit :

Virtual Chess 2 - Rybka 3, 2008
8/6k1/8/1p1Rp1P1/4P2p/8/2n3K1/7r b - - 0 1

Analysis by Rybka 3 1-cpu 32-bit :

60...Nc2-e3+
-+ (-11.97) Depth: 7 00:00:01 18kN
60...Nc2-e3+
-+ (-14.08) Depth: 8 00:00:06 251kN
60...Nc2-e3+
-+ (-14.13) Depth: 9 00:00:06 277kN
60...Nc2-e3+ 61.Kg2xh1 Ne3xd5 62.e4xd5 b5-b4 63.d5-d6 b4-b3 64.d6-d7 b3-b2 65.d7-d8Q b2-b1Q+ 66.Kh1-h2 Qb1-f5 67.Kh2-h1 Kg7-g6 68.Kh1-h2
-+ (-1.63) Depth: 10 00:00:19 1099kN

60...Rh1-a1 61.Kg2-h3 Nc2-e3 62.Rd5-d7+ Kg7-g6 63.Kh3xh4 Ra1-h1+ 64.Kh4-g3 Kg6xg5 65.Rd7-b7 Rh1-b1 66.Rb7-g7+ Kg5-f6 67.Rg7-c7
-+ (-4.60) Depth: 10 00:00:22 1196kN
60...Rh1-e1
-+ (-5.14) Depth: 10 00:00:27 1376kN
60...Rh1-e1
-+ (-5.59) Depth: 11 00:00:58 2475kN, tb=4
I consider it as a bug because a program should never evaluate it as +14 in the first place.

Uri
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: The largest Eval Jump I've ever seen from any program!

Post by George Tsavdaris »

Eraserheads wrote:Thanks! I just have a dual core rig.

The position is still drawish, but Rybka still evaluates the position favorably.
No. Rybka is correct that it evaluates the position favorably since the position is won for black.

For example this position:
[D]8/6k1/8/1p1Pp1P1/7p/8/8/7K b - - 0 62
Is a black win.

Also in this position later in the game:
8/4Q1k1/8/4pqP1/7p/8/7K/8 b - - 0 67
Is a definite black win.
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
User avatar
smirobth
Posts: 2307
Joined: Wed Mar 08, 2006 8:41 pm
Location: Brownsville Texas USA

Re: The largest Eval Jump I've ever seen from any program!

Post by smirobth »

I don't know if that is a bug or not. Some other programs also briefly like 60...Ne3?!, although they reject it very quickly, in a second or less.

It is odd that Rybka plays 60...Ne3, but the really strange move by Rybka in this game comes on move 71:
[D]8/8/8/4p2k/6qp/3Q4/7K/8 b - - 0 71
Rybka played 71...Qg3??, which leads to an immediate and obvious draw, yet Rybka sees a 3 pawn advantage. Rybka relies on tablebases to avoid these endgame blunders, so people probably should not use Rybka without tablebases since Rybka is missing some very basic endgame knowledge and makes serious endgame mistakes without them. But on my system Rybka 3 still wants to play 71...Qg3, even with the 3-4-5 man tablebases present!!?? This does seem to be a bug. Any program using even just 3 man tablebases should avoid 71...Qg3 like the plague. 6 man tablebases show that Rybka still had a win with the "only" move 71...Kg5.
- Robin Smith
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: The largest Eval Jump I've ever seen from any program!

Post by George Tsavdaris »

smirobth wrote:I don't know if that is a bug or not. Some other programs also briefly like 60...Ne3?!, although they reject it very quickly, in a second or less.

It is odd that Rybka plays 60...Ne3, but the really strange move by Rybka in this game comes on move 71:
[D]8/8/8/4p2k/6qp/3Q4/7K/8 b - - 0 71
Rybka played 71...Qg3??, which leads to an immediate and obvious draw, yet Rybka sees a 3 pawn advantage. Rybka relies on tablebases to avoid these endgame blunders, so people probably should not use Rybka without tablebases since Rybka is missing some very basic endgame knowledge and makes serious endgame mistakes without them. But on my system Rybka 3 still wants to play 71...Qg3, even with the 3-4-5 man tablebases present!!?? This does seem to be a bug. Any program using even just 3 man tablebases should avoid 71...Qg3 like the plague. 6 man tablebases show that Rybka still had a win with the "only" move 71...Kg5.
Why you say it's the only move?
As i see 71...Qf4 leads to a mate in 48 and considering the 3 other previous moves with no capture and Pawn move it leads to a mate in 51 but this will never be a draw obviously since a Pawn move will occur during the 51 moves.
Also i see that 71...Qg5 leads also to a mate in 56 and with the +3 previous moves, black has to make progress(move a Pawn or a capture to occur) inside these 59 moves(before move 50 out of 59) and i obviously think this also again is easy for him, since the Pawn will move much earlier(around move 20-25 from the given position).
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
User avatar
smirobth
Posts: 2307
Joined: Wed Mar 08, 2006 8:41 pm
Location: Brownsville Texas USA

Re: The largest Eval Jump I've ever seen from any program!

Post by smirobth »

George Tsavdaris wrote:Why you say it's the only move?
As i see 71...Qf4 leads to a mate in 48 and considering the 3 other previous moves with no capture and Pawn move it leads to a mate in 51 but this will never be a draw obviously since a Pawn move will occur during the 51 moves.
Also i see that 71...Qg5 leads also to a mate in 56 and with the +3 previous moves, black has to make progress(move a Pawn or a capture to occur) inside these 59 moves(before move 50 out of 59) and i obviously think this also again is easy for him, since the Pawn will move much earlier(around move 20-25 from the given position).
You are correct. For some reason my chessbase GUI was giving me flaky results, but when I use the Wilhelm GUI it confirms that 71...Qf4 and 71...Qg5 also win.
- Robin Smith
pijl

Re: The largest Eval Jump I've ever seen from any program!

Post by pijl »

Uri Blass wrote:I consider it as a bug because a program should never evaluate it as +14 in the first place.
Is it not that simple. As you undoubtedly know, all of the evaluation of chess program consists of many patterns which all contribute to the evaluation score of the program. One pattern that is present is all of the stronger chess programs is the quadrant rule in pawn endings. Some programs have this rule in a crude form (i.e. if king is outside the quadrant, the pawn is an unstoppable passer and if the opponent doesn't have one, add a big bonus) and rely on the search to correct it in the few cases where it is wrong, while others also try to assess means to stop the passer in different ways and are more hesitant to add a big bonus when the opponent also has passers, especially when they are more advanced. One of those methods is demonstrated by a key position that Rybka may have based its big bonus on:
[D]8/8/3P1kP1/4p3/1p5p/8/8/7K w - -
The pawn on b4 is the 'unstoppable' in this case, but white can create a more advanced one by moving either of its passers and the king will be out of the quadrant for one of them.
CTD gives a static evaluation for this position of -8.554, but sees quickly that this is incorrect when the search is started.
The Baron has a more refined method of using the quadrant rule when there are more than one passers for one side. As a result, the static evaluation score of the Baron for this position is just -1.64.
The method used is pretty simple:
Instead of applying the quadrant rule to just one pawn, I 'and' the quadrant bitboards of all passers of one side together and test that with the king. Additionally, if the resulting bitboard does not contain all the promotion squares, I know this side will have an 'unstoppable' passer too.

The remaining question is now why it takes so long for Rybka to see that Ne3 is not really the best move. I can only imagine that this is due to the pruning logic in Rybka.
Richard.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The largest Eval Jump I've ever seen from any program!

Post by bob »

pijl wrote:
Uri Blass wrote:I consider it as a bug because a program should never evaluate it as +14 in the first place.
Is it not that simple. As you undoubtedly know, all of the evaluation of chess program consists of many patterns which all contribute to the evaluation score of the program. One pattern that is present is all of the stronger chess programs is the quadrant rule in pawn endings. Some programs have this rule in a crude form (i.e. if king is outside the quadrant, the pawn is an unstoppable passer and if the opponent doesn't have one, add a big bonus) and rely on the search to correct it in the few cases where it is wrong, while others also try to assess means to stop the passer in different ways and are more hesitant to add a big bonus when the opponent also has passers, especially when they are more advanced. One of those methods is demonstrated by a key position that Rybka may have based its big bonus on:
[D]8/8/3P1kP1/4p3/1p5p/8/8/7K w - -
The pawn on b4 is the 'unstoppable' in this case, but white can create a more advanced one by moving either of its passers and the king will be out of the quadrant for one of them.
CTD gives a static evaluation for this position of -8.554, but sees quickly that this is incorrect when the search is started.
The Baron has a more refined method of using the quadrant rule when there are more than one passers for one side. As a result, the static evaluation score of the Baron for this position is just -1.64.
The method used is pretty simple:
Instead of applying the quadrant rule to just one pawn, I 'and' the quadrant bitboards of all passers of one side together and test that with the king. Additionally, if the resulting bitboard does not contain all the promotion squares, I know this side will have an 'unstoppable' passer too.

The remaining question is now why it takes so long for Rybka to see that Ne3 is not really the best move. I can only imagine that this is due to the pruning logic in Rybka.
Richard.
Maybe I didn't quite follow, but something I have played with recently to address just this case was that if one side has two passed pawns, and both are stopped by the king, then I go a little further and ask "if either pawn were my king, could it stop the other one?" If the answer is no, then the pawns are unstoppable since if I capture one, the other one runs. But there are zugzwang cases that cause problems as below.

This still misses cases where the pawns are close. For example pawns on c5 and a5, enemy king on b7. If either pawn moves both are lost. But if the king tries to take either, the other can run. This is a key component of the "wild7" game on ICC, and one I gave up on trying to statically evaluate.