Stuck

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

arjuntemurnikar
Posts: 204
Joined: Tue Oct 15, 2013 10:22 pm
Location: Singapore

Re: Stuck

Post by arjuntemurnikar »

Lyudmil Tsvetkov wrote:
arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote: Maybe it is also time to bet with Arjun that an uniform 60cps bonus for a blocked pawn on the 6th rank, excluding only h6 that already has a big bonus, will pass both short and LTC. :D I.e., a uniform bonus only for blocked pawns on the 6th, excluding the 5th rank, where term clashes are bigger and chances to go astray increase.
Ok, I have pushed two quick tests to the framework to see if it is any improvement. 60 seems very high, so I have also pushed 40. :)
Thanks Arjun.

You are my biggest benefactor and possibly grave-digger. (or do they say undertaker?) :(

Concerning values, Stefan's 260 was even more shocking, but you have to consider that 60cps SF values is some 30cps standard ones, so that 30cps standard bonus for 6th rank is not big at all. The question is if it will still clash with something...apart from the already assigned bonus for edge h6/a6 storming pawn, which should be negligeable overall. (but excluding just the h6/a6 storming pawns from this would be even more consistent). I would be very grateful if you leave at least one of the tests for STC regardless of how the values score at 5 seconds. You know, those pawns scale differently, you have the bitter storm experience. :?

Many thanks again, Arjun!
Hi Lyudmil, first note that SF eval has been scaled down in recent times so it is now in line with other engines like Houdini and Komodo. You might have noticed this in the TCEC games of stage 3.

In either case, the evaluation score you see in the GUI is different from the internal values that are juggled around. The external score is scaled down in notation.cpp by this line:

Code: Select all

ss << "cp " << v * 100 / PawnValueEg;
where v is the final value of the internal evaluation
and PawnValueEg = 258.

I tried excluding the storm values from it, but in local tests, this did quite poorly. For some reason, it works better the way I have submitted it. (Fingers crossed).

We will see how this goes first. I will reschedule for 15" later.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Scaling with opposite-coloured bishops

Post by Lyudmil Tsvetkov »

Look carefully at how SF plays the Berlin:

[pgn][PlyCount "263"]
[MLNrOfMoves "131"]
[MLFlags "000100"]
[Event "SCCT"]
[Date "2014.03.20"]
[Round "51.1"]
[White "Rybka 4.1 x64 1c"]
[Black "Stockfish 120314 x64 1c"]
[Result "1/2-1/2"]
[EventDate "2014.??.??"]
[ECO "C67"]
[TimeControl "60+1"]


{Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz 2494 MHz W=18.7 plies; 111kN/s; 41. 656 TBAs; Perfect2014t.ctg B=28.3 plies; 1.494kN/s; Perfect2014t.ctg}
1. e4 {[%eval 0,0] [%emt 0:00:00]} 1... e5 {[%eval 0,0] [%emt 0:00:00]} 2. Nf3
{[%eval 0,0] [%emt 0:00:00]} 2... Nc6 {[%eval 0,0] [%emt 0:00:00]} 3. Bb5
{[%eval 0,0] [%emt 0:00:00]} 3... Nf6 {[%eval 0,0] [%emt 0:00:00]} 4. O-O
{[%eval 0,0] [%emt 0:00:00]} 4... Nxe4 {[%eval 0,0] [%emt 0:00:00]} 5. d4
{[%eval 0,0] [%emt 0:00:00]} 5... Nd6 {[%eval 0,0] [%emt 0:00:00]} 6. Bxc6
{[%eval 0,0] [%emt 0:00:00]} 6... dxc6 {[%eval 0,0] [%emt 0:00:00]} 7. dxe5
{[%eval 0,0] [%emt 0:00:00]} 7... Nf5 {[%eval 0,0] [%emt 0:00:00]} 8. Qxd8+
{[%eval 0,0] [%emt 0:00:00]} 8... Kxd8 {[%eval 0,0] [%emt 0:00:00]} 9. Rd1+
{[%eval 15,13] [%emt 0:00:03]} 9... Ke8 {[%eval 80,18] [%emt 0:00:02]} 10. Nc3
{[%eval 20,15] [%emt 0:00:03]} 10... Be6 {(h6) [%eval 72,19] [%emt 0:00:00]} 11.
Ng5 {(b3) [%eval 23,14] [%emt 0:00:06]} 11... Be7
{(Bc8) [%eval 66,20] [%emt 0:00:03]} 12. Nxe6 {[%eval 36,15] [%emt 0:00:05]}
12... fxe6 {[%eval 71,23] [%emt 0:00:00]} 13. Ne4 {[%eval 36,15] [%emt 0:00:06]}
13... Rf8 {(Rd8) [%eval 68,22] [%emt 0:00:00]} 14. f4
{(f3) [%eval 55,13] [%emt 0:00:06]} 14... Rd8 {[%eval 90,18] [%emt 0:00:04]} 15.
Rxd8+ {[%eval 49,15] [%emt 0:00:00]} 15... Kxd8 {[%eval 73,19] [%emt 0:00:02]}
16. Kf2 {[%eval 49,15] [%emt 0:00:01]} 16... h6
{(Kc8) [%eval 68,20] [%emt 0:00:03]} 17. Bd2 {(c3) [%eval 47,14] [%emt 0:00:04]}
17... c5 {(h5) [%eval 86,20] [%emt 0:00:08]} 18. g4
{[%eval 40,15] [%emt 0:00:13]} 18... Nd4 {(Nh4) [%eval 35534,23] [%emt 0:00:00]}
19. c3 {[%eval 62,16] [%emt 0:00:04]} 19... Nc6 {[%eval 91,22] [%emt 0:00:01]}
20. Kg3 {[%eval 61,16] [%emt 0:00:01]} 20... Rh8
{(Kc8) [%eval 91,22] [%emt 0:00:01]} 21. Be3 {(h3) [%eval 58,13] [%emt 0:00:04]}
21... b6 {[%eval 96,24] [%emt 0:00:02]} 22. Rd1+ {[%eval 61,14] [%emt 0:00:00]}
22... Kc8 {[%eval 96,26] [%emt 0:00:01]} 23. h4
{(b3) [%eval 60,14] [%emt 0:00:00]} 23... h5 {[%eval 93,18] [%emt 0:00:03]} 24.
Bf2 {[%eval 56,14] [%emt 0:00:00]} 24... a6
{(Rf8) [%eval 100,20] [%emt 0:00:03]} 25. Rg1
{(gxh5) [%eval 61,14] [%emt 0:00:03]} 25... hxg4 {[%eval 108,19] [%emt 0:00:03]}
26. Kxg4 {[%eval 61,15] [%emt 0:00:00]} 26... Rh6
{(Rf8) [%eval 98,20] [%emt 0:00:04]} 27. Rd1
{(Kh3) [%eval 61,15] [%emt 0:00:04]} 27... Bd8
{(Rh8) [%eval 103,20] [%emt 0:00:03]} 28. b3 {[%eval 61,14] [%emt 0:00:04]}
28... Be7 {(Rh8) [%eval 110,20] [%emt 0:00:00]} 29. Bg3
{[%eval 61,15] [%emt 0:00:04]} 29... g6 {(Rh8) [%eval 93,20] [%emt 0:00:00]} 30.
Bf2 {(Ng5) [%eval 61,15] [%emt 0:00:04]} 30... Rh8
{[%eval 86,23] [%emt 0:00:04]} 31. Rg1 {[%eval 59,15] [%emt 0:00:00]} 31... Kd7
{(Rf8) [%eval 86,24] [%emt 0:00:02]} 32. Kh3
{(Kf3) [%eval 59,14] [%emt 0:00:04]} 32... Rh6
{(Rf8) [%eval 86,25] [%emt 0:00:02]} 33. Rd1+ {[%eval 61,14] [%emt 0:00:02]}
33... Ke8 {(Kc8) [%eval 86,25] [%emt 0:00:00]} 34. Kg4
{[%eval 60,14] [%emt 0:00:02]} 34... Rh7 {(Rh5) [%eval 86,26] [%emt 0:00:00]}
35. a3 {(Bg3) [%eval 68,14] [%emt 0:00:03]} 35... Na5
{(Nd8) [%eval 85,22] [%emt 0:00:05]} 36. Nd2 {[%eval 57,13] [%emt 0:00:02]}
36... Nc6 {[%eval 85,21] [%emt 0:00:00]} 37. b4 {[%eval 68,15] [%emt 0:00:02]}
37... cxb4 {[%eval 71,21] [%emt 0:00:01]} 38. cxb4
{[%eval 68,15] [%emt 0:00:00]} 38... Kd7 {(a5) [%eval 85,23] [%emt 0:00:03]} 39.
Ne4+ {[%eval 107,15] [%emt 0:00:02]} 39... Kc8 {[%eval 85,25] [%emt 0:00:00]}
40. Nf6 {[%eval 100,15] [%emt 0:00:02]} 40... Rh8
{(Rg7) [%eval 75,23] [%emt 0:00:00]} 41. Kg5 {[%eval 84,13] [%emt 0:00:02]}
41... Rg8 {[%eval 75,24] [%emt 0:00:00]} 42. Rc1
{(Be3) [%eval 84,12] [%emt 0:00:02]} 42... Kb7 {[%eval 74,25] [%emt 0:00:02]}
43. Re1 {(Rg1) [%eval 70,15] [%emt 0:00:00]} 43... Nd8
{[%eval 63,21] [%emt 0:00:04]} 44. Kg4 {(Rd1) [%eval 70,17] [%emt 0:00:00]}
44... Rf8 {[%eval 63,23] [%emt 0:00:02]} 45. Be3
{(Rd1) [%eval 70,15] [%emt 0:00:00]} 45... Nc6
{(Rh8) [%eval 74,20] [%emt 0:00:05]} 46. Kg5 {[%eval 77,14] [%emt 0:00:02]}
46... Rg8 {[%eval 74,26] [%emt 0:00:00]} 47. Rd1
{(Rg1) [%eval 77,15] [%emt 0:00:02]} 47... Kc8 {[%eval 74,26] [%emt 0:00:02]}
48. Kg4 {(Rc1) [%eval 77,15] [%emt 0:00:00]} 48... Rh8
{[%eval 64,20] [%emt 0:00:03]} 49. Rc1 {(Rd3) [%eval 77,15] [%emt 0:00:00]}
49... Kb7 {[%eval 47,23] [%emt 0:00:03]} 50. Kg5
{(Rc3) [%eval 77,17] [%emt 0:00:00]} 50... Rg8 {[%eval 74,26] [%emt 0:00:02]}
51. Rg1 {[%eval 77,16] [%emt 0:00:00]} 51... Kc8
{(Nd8) [%eval 74,26] [%emt 0:00:02]} 52. Rg2
{(Bf2) [%eval 70,15] [%emt 0:00:03]} 52... Kb7 {[%eval 74,25] [%emt 0:00:02]}
53. Bf2 {(Rd2) [%eval 75,15] [%emt 0:00:00]} 53... Ka7
{(Nd8) [%eval 74,25] [%emt 0:00:03]} 54. Rg3
{(Rg1) [%eval 77,13] [%emt 0:00:01]} 54... Kb7 {[%eval 74,26] [%emt 0:00:02]}
55. Rd3 {[%eval 71,15] [%emt 0:00:00]} 55... Kc8 {[%eval 74,26] [%emt 0:00:01]}
56. Kg4 {(Rc3) [%eval 70,15] [%emt 0:00:00]} 56... Rf8
{(Rh8) [%eval 74,26] [%emt 0:00:02]} 57. Rc3 {[%eval 75,16] [%emt 0:00:02]}
57... Kb7 {[%eval 74,27] [%emt 0:00:00]} 58. Kg5 {[%eval 75,15] [%emt 0:00:01]}
58... Rg8 {[%eval 74,28] [%emt 0:00:00]} 59. Rg3 {[%eval 77,15] [%emt 0:00:01]}
59... Kc8 {[%eval 74,27] [%emt 0:00:02]} 60. Be1
{(Be3) [%eval 72,15] [%emt 0:00:01]} 60... Nd4 {[%eval 39,21] [%emt 0:00:02]}
61. Rd3 {[%eval 72,14] [%emt 0:00:00]} 61... Nf5 {[%eval 38,23] [%emt 0:00:02]}
62. Bf2 {[%eval 72,15] [%emt 0:00:00]} 62... Rg7 {[%eval 35,24] [%emt 0:00:01]}
63. Rc3 {(Rb3) [%eval 58,15] [%emt 0:00:01]} 63... Kb7
{[%eval 35,24] [%emt 0:00:01]} 64. Rc1 {[%eval 58,16] [%emt 0:00:00]} 64... Bf8
{[%eval 35,24] [%emt 0:00:01]} 65. Rd1 {[%eval 48,15] [%emt 0:00:01]} 65... Be7
{(Rf7) [%eval 35,25] [%emt 0:00:00]} 66. Be1
{(Rg1) [%eval 52,14] [%emt 0:00:02]} 66... c5 {[%eval 13,21] [%emt 0:00:03]} 67.
bxc5 {[%eval 35534,15] [%emt 0:00:00]} 67... Bxc5 {[%eval 96,20] [%emt 0:00:02]}
68. Rd7+ {[%eval 88,13] [%emt 0:00:00]} 68... Rxd7 {[%eval 0,21] [%emt 0:00:02]}
69. Nxd7 {[%eval 98,14] [%emt 0:00:00]} 69... Bxa3
{[%eval 47,21] [%emt 0:00:03]} 70. Kxg6 {[%eval 98,15] [%emt 0:00:00]} 70... Kc6
{(b5) [%eval 38,21] [%emt 0:00:02]} 71. Nb8+ {[%eval 73,16] [%emt 0:00:03]}
71... Kd5 {[%eval 26,23] [%emt 0:00:00]} 72. Nxa6
{[%eval 35534,17] [%emt 0:00:03]} 72... Ke4 {[%eval 29,25] [%emt 0:00:00]} 73.
Kf6 {(h5) [%eval 89,17] [%emt 0:00:03]} 73... Kxf4
{[%eval 26,23] [%emt 0:00:02]} 74. Nc7 {[%eval 89,17] [%emt 0:00:00]} 74... Kg4
{[%eval 26,24] [%emt 0:00:01]} 75. h5 {(Kxe6) [%eval 89,16] [%emt 0:00:00]}
75... Kxh5 {[%eval 26,22] [%emt 0:00:01]} 76. Kxe6
{(Bc3) [%eval 89,16] [%emt 0:00:00]} 76... Nd4+
{(Kg4) [%eval 23,23] [%emt 0:00:01]} 77. Kd5 {[%eval 82,16] [%emt 0:00:03]}
77... Nf3 {(Bb2) [%eval 22,28] [%emt 0:00:00]} 78. Bc3
{(Bf2) [%eval 87,16] [%emt 0:00:02]} 78... Kg4
{(b5) [%eval 22,26] [%emt 0:00:01]} 79. Nb5 {(e6) [%eval 89,18] [%emt 0:00:02]}
79... Bc5 {[%eval 21,27] [%emt 0:00:01]} 80. e6 {[%eval 89,19] [%emt 0:00:00]}
80... Ng5 {[%eval 21,28] [%emt 0:00:01]} 81. Bd2
{(Nd4) [%eval 88,19] [%emt 0:00:00]} 81... Nxe6 {[%eval 21,30] [%emt 0:00:01]}
82. Kxe6 {[%eval 88,23] [%emt 0:00:00]} 82... Kf3 {[%eval 21,31] [%emt 0:00:01]}
83. Kd5 {[%eval 87,22] [%emt 0:00:00]} 83... Ke2 {[%eval 21,31] [%emt 0:00:01]}
84. Bc3 {[%eval 87,23] [%emt 0:00:01]} 84... Kd3 {[%eval 21,33] [%emt 0:00:00]}
85. Be5 {(Bf6) [%eval 84,22] [%emt 0:00:02]} 85... Be3
{[%eval 21,32] [%emt 0:00:01]} 86. Bh8 {(Nd6) [%eval 84,20] [%emt 0:00:00]}
86... Bf4 {(Bc1) [%eval 21,32] [%emt 0:00:01]} 87. Bf6
{(Bd4) [%eval 82,23] [%emt 0:00:03]} 87... Bd2
{(Bc1) [%eval 21,37] [%emt 0:00:01]} 88. Bd4
{(Be7) [%eval 82,21] [%emt 0:00:02]} 88... Be3
{(Kc2) [%eval 21,35] [%emt 0:00:01]} 89. Be5
{(Bf6) [%eval 82,23] [%emt 0:00:01]} 89... Bc5
{(Bc1) [%eval 21,34] [%emt 0:00:01]} 90. Ba1
{(Bf4) [%eval 82,24] [%emt 0:00:02]} 90... Be3 {[%eval 21,39] [%emt 0:00:01]}
91. Bb2 {[%eval 82,23] [%emt 0:00:00]} 91... Bd2
{(Kc2) [%eval 21,37] [%emt 0:00:01]} 92. Bf6 {[%eval 82,24] [%emt 0:00:02]}
92... Be3 {(Bc1) [%eval 21,40] [%emt 0:00:00]} 93. Be7
{(Bd8) [%eval 82,24] [%emt 0:00:01]} 93... Bd2
{(Bc1) [%eval 21,39] [%emt 0:00:01]} 94. Bd8
{(Bd6) [%eval 82,25] [%emt 0:00:02]} 94... Be3
{(Bc3) [%eval 21,37] [%emt 0:00:01]} 95. Bh4
{(Be7) [%eval 82,25] [%emt 0:00:02]} 95... Bf4
{(Bc1) [%eval 21,34] [%emt 0:00:01]} 96. Bf2 {[%eval 82,25] [%emt 0:00:01]}
96... Be3 {(Bc1) [%eval 21,34] [%emt 0:00:00]} 97. Bg3
{[%eval 82,24] [%emt 0:00:01]} 97... Bd2 {(Bc1) [%eval 21,38] [%emt 0:00:00]}
98. Bh2 {(Bc7) [%eval 82,23] [%emt 0:00:01]} 98... Be3
{[%eval 21,37] [%emt 0:00:02]} 99. Bb8 {(Bd6) [%eval 82,26] [%emt 0:00:00]}
99... Bc5 {[%eval 21,43] [%emt 0:00:01]} 100. Bg3
{(Bf4) [%eval 82,26] [%emt 0:00:00]} 100... Bb4
{(Be3) [%eval 21,33] [%emt 0:00:01]} 101. Bf4
{(Bh4) [%eval 82,24] [%emt 0:00:02]} 101... Bd2
{(Bc3) [%eval 21,31] [%emt 0:00:01]} 102. Bc7
{(Bd6) [%eval 82,26] [%emt 0:00:01]} 102... Be3
{(Bc3) [%eval 21,39] [%emt 0:00:01]} 103. Bd6
{(Bd8) [%eval 82,25] [%emt 0:00:02]} 103... Bd2
{(Bc1) [%eval 21,37] [%emt 0:00:01]} 104. Be7
{(Nc7) [%eval 82,25] [%emt 0:00:01]} 104... Be3
{(Bc3) [%eval 21,40] [%emt 0:00:01]} 105. Bf8
{(Bd8) [%eval 82,25] [%emt 0:00:01]} 105... Bc1 {[%eval 21,34] [%emt 0:00:01]}
106. Bg7 {[%eval 82,24] [%emt 0:00:00]} 106... Bg5
{(Kc2) [%eval 21,35] [%emt 0:00:01]} 107. Bc3
{(Na7) [%eval 82,24] [%emt 0:00:01]} 107... Bd2
{(Bf4) [%eval 21,31] [%emt 0:00:01]} 108. Bb2 {[%eval 82,23] [%emt 0:00:01]}
108... Be3 {(Kc2) [%eval 21,35] [%emt 0:00:00]} 109. Ba1
{(Na7) [%eval 81,24] [%emt 0:00:01]} 109... Bd2
{(Bg5) [%eval 19,28] [%emt 0:00:01]} 110. Kc6 {[%eval 81,24] [%emt 0:00:01]}
110... Be3 {(Ke4) [%eval 21,32] [%emt 0:00:00]} 111. Be5
{(Bh8) [%eval 81,25] [%emt 0:00:01]} 111... Bc5
{(Ke4) [%eval 19,32] [%emt 0:00:01]} 112. Bf6
{(Bg7) [%eval 81,22] [%emt 0:00:01]} 112... Kc4
{(Ke4) [%eval 19,36] [%emt 0:00:01]} 113. Bh4
{(Bh8) [%eval 82,22] [%emt 0:00:01]} 113... Kb4
{(Bd4) [%eval 19,36] [%emt 0:00:02]} 114. Bg3
{(Nc7) [%eval 81,23] [%emt 0:00:01]} 114... Kc4 {[%eval 19,32] [%emt 0:00:01]}
115. Bf4 {(Na7) [%eval 81,23] [%emt 0:00:00]} 115... Kb4
{(Bb4) [%eval 19,35] [%emt 0:00:01]} 116. Bb8
{(Be5) [%eval 81,22] [%emt 0:00:01]} 116... Kc4
{(Be3) [%eval 19,33] [%emt 0:00:01]} 117. Bg3
{(Bh2) [%eval 81,24] [%emt 0:00:01]} 117... Kd3
{(Bb4) [%eval 19,33] [%emt 0:00:01]} 118. Be1
{(Na7) [%eval 81,25] [%emt 0:00:02]} 118... Kc4 {[%eval 19,38] [%emt 0:00:01]}
119. Bc3 {(Nc7) [%eval 81,27] [%emt 0:00:00]} 119... Be3
{[%eval 19,37] [%emt 0:00:01]} 120. Bg7 {(Be5) [%eval 81,26] [%emt 0:00:00]}
120... Bc5 {(Kd3) [%eval 19,37] [%emt 0:00:01]} 121. Bf6
{(Bh8) [%eval 81,25] [%emt 0:00:01]} 121... Kb4
{(Bb4) [%eval 19,34] [%emt 0:00:01]} 122. Be5 {[%eval 81,25] [%emt 0:00:02]}
122... Kc4 {(Be3) [%eval 19,38] [%emt 0:00:00]} 123. Bb2
{(Bh2) [%eval 81,25] [%emt 0:00:01]} 123... Bg1
{(Kd3) [%eval 11,32] [%emt 0:00:07]} 124. Bc1
{(Nd6+) [%eval 81,22] [%emt 0:00:01]} 124... Bf2
{(Kd3) [%eval 0,23] [%emt 0:00:01]} 125. Bf4
{(Nc7) [%eval 81,23] [%emt 0:00:01]} 125... Kb4
{(Bg1) [%eval 0,33] [%emt 0:00:01]} 126. Bg5
{(Nc7) [%eval 81,21] [%emt 0:00:01]} 126... Bg1
{(Bc5) [%eval 0,31] [%emt 0:00:01]} 127. Bd2+
{(Nd6) [%eval 81,21] [%emt 0:00:01]} 127... Kc4 {[%eval 0,35] [%emt 0:00:01]}
128. Bf4 {[%eval 81,22] [%emt 0:00:00]} 128... Kb4
{(Bc5) [%eval 0,38] [%emt 0:00:01]} 129. Bd6+
{(Nc7) [%eval 81,20] [%emt 0:00:02]} 129... Bc5 {[%eval 0,34] [%emt 0:00:01]}
130. Bc7 {(Be5) [%eval 0,21] [%emt 0:00:00]} 130... Bf2
{(Ka4) [%eval 0,56] [%emt 0:00:01]} 131. Nd6
{(Bd6+) [%eval 0,26] [%emt 0:00:01]} 131... Bg1
{(Bc5) [%eval 0,71] [%emt 0:00:01]} 132. Ne4 {[%eval 0,36] [%emt 0:00:01]}
1/2-1/2
[/pgn]

[d]r1bk1b1r/ppp2ppp/2p5/4Pn2/8/5N2/PPP2PPP/RNB2RK1 w - - 0 9
Well, this is certainly not an endgame. But the queens are off. Rybka, very realistically, evaluates this main Berlin position with 15cps white edge, SF sees 80cps!! white advantage. I am not a theoretician at all, but people, Kramnik among them, say that the Berlin is supposed to be a very sound opening, drawish at that, so that those whopping 80cps really amaze. Would be glad if anyone proves that latest development behaves otherwise.

[d]r1b1kb1r/ppp2ppp/2p5/4Pn2/8/2N2N2/PPP2PPP/R1BR2K1 b - - 0 10
One of the drawish lines includes black playing Bd7, Be7 and later exchanging the dark-coloured bishop for the white knight on g5, so that an opposite-coloured bishops ending arises. SF quite probably does not choose this line, as it does not scale score properly, so that it believes the opposite-coloured bishops ending with white pawn majority on the king side and doubled black pawns on the queen side is lost for black. Wrongly at that, because the opposite-coloured bishops hold.

[d]2k4r/p1p1b1p1/1pn1p2p/2p1P3/4NPP1/2P1B1K1/PP5P/3R4 w - - 0 23
Instead of an opposite-coloured bishop ending, SF enters same-coloured bishops ending, with the same pawn characteristics and probably some minor differences. Here Rybka gives 60cps white edge, SF a full pawn! Now, it is true that SF did not lose the game, probably because Rybka went wrong somewhere, but frankly, why should SF suffer so much here, when it could easily have chosen the much better opposite-coloured bishops line?

[d]7r/2pkb3/ppn1p1p1/2p1P3/4NPKP/1PP5/P4B2/6R1 w - - 0 32
I will not comment, SF saves this somehow, but this is suffering.

Do you believe SF plays well the Berlin?

Maybe someone will post other examples of SF in the Berlin, would be a nice contribution.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Scaling with opposite-coloured bishops

Post by Lyudmil Tsvetkov »

[d]r3k2r/pppb1pp1/2p4p/4Pn2/5B2/2N5/PPP2PPP/R2R2K1 b - - 0 14

Interesting to know how SF evaluates this above, should be pretty much drawish. Or maybe someone will post a more relevant Berlin position.

In any case, this should be a good position to judge necessity and percentage of scaling in a relatively rich position without queens on the midground between middlegame and endgame. Quite possibly, in positions without queens, percentage of scaling must be much bolder, but who knows, you need a lot of testing.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Stuck

Post by Lyudmil Tsvetkov »

arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote:
arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote: Maybe it is also time to bet with Arjun that an uniform 60cps bonus for a blocked pawn on the 6th rank, excluding only h6 that already has a big bonus, will pass both short and LTC. :D I.e., a uniform bonus only for blocked pawns on the 6th, excluding the 5th rank, where term clashes are bigger and chances to go astray increase.
Ok, I have pushed two quick tests to the framework to see if it is any improvement. 60 seems very high, so I have also pushed 40. :)
Thanks Arjun.

You are my biggest benefactor and possibly grave-digger. (or do they say undertaker?) :(

Concerning values, Stefan's 260 was even more shocking, but you have to consider that 60cps SF values is some 30cps standard ones, so that 30cps standard bonus for 6th rank is not big at all. The question is if it will still clash with something...apart from the already assigned bonus for edge h6/a6 storming pawn, which should be negligeable overall. (but excluding just the h6/a6 storming pawns from this would be even more consistent). I would be very grateful if you leave at least one of the tests for STC regardless of how the values score at 5 seconds. You know, those pawns scale differently, you have the bitter storm experience. :?

Many thanks again, Arjun!
Hi Lyudmil, first note that SF eval has been scaled down in recent times so it is now in line with other engines like Houdini and Komodo. You might have noticed this in the TCEC games of stage 3.

In either case, the evaluation score you see in the GUI is different from the internal values that are juggled around. The external score is scaled down in notation.cpp by this line:

Code: Select all

ss << "cp " << v * 100 / PawnValueEg;
where v is the final value of the internal evaluation
and PawnValueEg = 258.

I tried excluding the storm values from it, but in local tests, this did quite poorly. For some reason, it works better the way I have submitted it. (Fingers crossed).

We will see how this goes first. I will reschedule for 15" later.
Now Arjun, this is a bit murky for me, let us see how the test goes...
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Scaling with opposite-coloured bishops

Post by Lyudmil Tsvetkov »

Again, the paradox, but also the reason why bolder scaling might not work in tests, is that it is safe to scale this one:

[d]rnb1k1nr/pppp1ppp/8/8/8/8/PPPPPPPP/RNB1K1NR w KQkq - 0 1

but not this one:

[d]2bqk3/pppp1ppp/8/8/8/8/PPPPPPPP/2BQK3 w - - 0 1

Indeed, how safe it is to scale this:

[d]6k1/1p2qpp1/p3b2p/8/8/P1BQP3/1P3PPP/6K1 w - - 0 1

From that perspective, at least for my current limited understanding, the decision if to scale and how much depends very much on total material, as you scale more with less material, but even more so on presence of queens on the board. I would never scale with queens on, but maybe a limited scaling is warranted even in this case. What is certain however, is that queenless positions, middlegame or endgame, could be scaled much bolder.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Interpretation request

Post by Lyudmil Tsvetkov »

I do not know if this is the right thread, but I will ask here.

You know, I am very curious, and that is a bad thing.

Still, below the data from the last 3 standard and SMP (3 threads) SF regression tests:

27.02.2014 standard 39.25 elo SMP 37.55 difference 1.70 elo

25.03.2014 standard 43.70 elo SMP 39.24 difference 4.46 elo

26.04.2014 standard 57.08 elo SMP 51.04 difference 6.04 elo

As you see, the difference between standard single core and SMP tests continuously rises, with some peculiar behaviour. What I observe?

On 27.02, at 40 elo increase over SF DD, SF loses only 1.7 elo with SMP; on 26.04, at almost 60 elo increase over DD, 1/3 of the overall increase, SF already loses 6 elo with SMP; from 27.02 to 25.03 SF adds 4.5 elo and loses 2.76 elo at the same time with SMP, obviously the most macabre scaling period. From 25.03 to 26.04 SF adds 13.40 elo standard and loses 1.60 elo with SMP.

So, basically, the SMP performance on 27.02 is 3 times better than the SMP performance from 25.03 to 26.04. The SMP performance from 27.02 to 25.03 is absolutely disastrous for some reason.

Bearing in mind that SMP is very much tantamount to scalability, closely related to performance at longer TC, it is clear that the changes introduced since DD until 27.02 scale much better than the changes introduced in the last month, and even more so than the changes introduced from February till end of March.

How do you explain this peculiar behaviour? What were the changes that scaled so well in the beginning, and what are the changes that scale considerably less so now? What were the changes from February till March that scaled so extremely bad? Could we learn a lesson at looking at the history page what change it is good to make and what not, so that the scalability of SF is optimal?

Any comments very much appreciated. I am new and weak in this branch so maybe people will elucidate me. At the end of the day, we want an engine that scales best and plays best at longer TC and multiple threads as the TCEC conditions.
arjuntemurnikar
Posts: 204
Joined: Tue Oct 15, 2013 10:22 pm
Location: Singapore

Re: Interpretation request

Post by arjuntemurnikar »

Lyudmil Tsvetkov wrote:I do not know if this is the right thread, but I will ask here.

You know, I am very curious, and that is a bad thing.

Still, below the data from the last 3 standard and SMP (3 threads) SF regression tests:

27.02.2014 standard 39.25 elo SMP 37.55 difference 1.70 elo

25.03.2014 standard 43.70 elo SMP 39.24 difference 4.46 elo

26.04.2014 standard 57.08 elo SMP 51.04 difference 6.04 elo

As you see, the difference between standard single core and SMP tests continuously rises, with some peculiar behaviour. What I observe?

On 27.02, at 40 elo increase over SF DD, SF loses only 1.7 elo with SMP; on 26.04, at almost 60 elo increase over DD, 1/3 of the overall increase, SF already loses 6 elo with SMP; from 27.02 to 25.03 SF adds 4.5 elo and loses 2.76 elo at the same time with SMP, obviously the most macabre scaling period. From 25.03 to 26.04 SF adds 13.40 elo standard and loses 1.60 elo with SMP.

So, basically, the SMP performance on 27.02 is 3 times better than the SMP performance from 25.03 to 26.04. The SMP performance from 27.02 to 25.03 is absolutely disastrous for some reason.

Bearing in mind that SMP is very much tantamount to scalability, closely related to performance at longer TC, it is clear that the changes introduced since DD until 27.02 scale much better than the changes introduced in the last month, and even more so than the changes introduced from February till end of March.

How do you explain this peculiar behaviour? What were the changes that scaled so well in the beginning, and what are the changes that scale considerably less so now? What were the changes from February till March that scaled so extremely bad? Could we learn a lesson at looking at the history page what change it is good to make and what not, so that the scalability of SF is optimal?

Any comments very much appreciated. I am new and weak in this branch so maybe people will elucidate me. At the end of the day, we want an engine that scales best and plays best at longer TC and multiple threads as the TCEC conditions.
That is not correct.

When TC/depth increases, draw-ratio increases. Elo calculation takes into account draw-ratio, so naturally as SMP games tend to have deeper depths, elo gain is lower because there are more draws. (If you see carefully, win-loss ratio remains quite consistent).

Also, as elo difference between engines increases (for eg. SF DD vs master is now ~60 elo), the elo curve starts flattening out slightly. It flattens out even more for SMP. It is a normal thing. Nothing to worry about. :)

That is why the relative strength of engines at TCEC time control (with 16-cores) is much closer than on rating lists at bullet/blitz TC on standard hardware. The draw-ratio is orders of multitude higher in TCEC than in say LSratinglist. That's why you have so many close encounters even between engines that are 100-200 elo apart on standard rating lists. As TC/depth increases, elo difference decreases.
Last edited by arjuntemurnikar on Fri May 02, 2014 3:21 am, edited 1 time in total.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Interpretation request

Post by Lyudmil Tsvetkov »

OK, some statistics from the history page, mentally done:

from DD to 27.02 there are 25 successful patches, 10 eval, 8 search, 2 time, one speed optimization, and 4 I do not know what, mainly Uri patches. Of those, 21 patches were tested and passed with SPRT 0;6, 1 patch passed SPRT -3;1 and 4 patches, mainly Uri simplifications, passed SPRT -4;0. Overall, 80% passed SPRT 0;6.

from 27.02 to 25.03 there are 4 successful patches, 1 eval, 1 search (restore old window size), one technical (TT replace) and 1 contempt. Of those, 2 patches passed SPRT 0;6 and 2 SPRT -3;1. Overall, 50% passed SPRT 0;6.

from 25.03 to 26.04 there are 17 successful patches, almost all of them eval, plus 3 or 4 contempt patches. Of those 17, 9 passed SPRT -3;1 and 8 SPRT 0;6. Overall, close to 50% passed SPRT 0;6.

Do you find this data of any interest and how do you comment on it?

My first impression is that eval patches since SF DD are twice or even more bigger number than search patches. (but search patches might have a bigger relative weight, I do not know)

Second impression is that the most successful period of scaling is linked to highest percentage of patches that passed SPRT 0;6, simplification patches almost did not exist then. The least successful period as well as the last period are linked to patches half of which passed SPRT -3;1, the simplification test. An obvious conclusion would be that SPRT -3;1, simplification tests, are conducive to scaling deficiency, the more you simplify, the worse the scaling of the engine with multiple threads, but also meaning with bigger time.

It is also notable that an almost equal number of successful patches in the first period contributed 3 times as many elo as the patches in the last period. Maybe it is because of search patches, maybe because of simplification.
arjuntemurnikar
Posts: 204
Joined: Tue Oct 15, 2013 10:22 pm
Location: Singapore

Re: Interpretation request

Post by arjuntemurnikar »

Lyudmil Tsvetkov wrote:OK, some statistics from the history page, mentally done:

from DD to 27.02 there are 25 successful patches, 10 eval, 8 search, 2 time, one speed optimization, and 4 I do not know what, mainly Uri patches. Of those, 21 patches were tested and passed with SPRT 0;6, 1 patch passed SPRT -3;1 and 4 patches, mainly Uri simplifications, passed SPRT -4;0. Overall, 80% passed SPRT 0;6.

from 27.02 to 25.03 there are 4 successful patches, 1 eval, 1 search (restore old window size), one technical (TT replace) and 1 contempt. Of those, 2 patches passed SPRT 0;6 and 2 SPRT -3;1. Overall, 50% passed SPRT 0;6.

from 25.03 to 26.04 there are 17 successful patches, almost all of them eval, plus 3 or 4 contempt patches. Of those 17, 9 passed SPRT -3;1 and 8 SPRT 0;6. Overall, close to 50% passed SPRT 0;6.

Do you find this data of any interest and how do you comment on it?

My first impression is that eval patches since SF DD are twice or even more bigger number than search patches. (but search patches might have a bigger relative weight, I do not know)

Second impression is that the most successful period of scaling is linked to highest percentage of patches that passed SPRT 0;6, simplification patches almost did not exist then. The least successful period as well as the last period are linked to patches half of which passed SPRT -3;1, the simplification test. An obvious conclusion would be that SPRT -3;1, simplification tests, are conducive to scaling deficiency, the more you simplify, the worse the scaling of the engine with multiple threads, but also meaning with bigger time.

It is also notable that an almost equal number of successful patches in the first period contributed 3 times as many elo as the patches in the last period. Maybe it is because of search patches, maybe because of simplification.
And again, as explained in my earlier post, the apparent "regression" for SMP is normal, so your conclusions about SPRT -3, 1 are absolutely out of place. There is no reason to doubt the -3, 1 simplification approach because the statistics are sound.

And as for the frequency of eval patches vs search patches and simplification patches vs 0, 6 patches, it is completely irrelevant. Patches arise from ideas, and it is simply a random occurrence that in the early period say, lots of search related ideas were flowing around, and now lots of eval and simplification ideas are flowing around.

You are seeing patterns where none exist. :)
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Interpretation request

Post by Lyudmil Tsvetkov »

arjuntemurnikar wrote:
Lyudmil Tsvetkov wrote:OK, some statistics from the history page, mentally done:

from DD to 27.02 there are 25 successful patches, 10 eval, 8 search, 2 time, one speed optimization, and 4 I do not know what, mainly Uri patches. Of those, 21 patches were tested and passed with SPRT 0;6, 1 patch passed SPRT -3;1 and 4 patches, mainly Uri simplifications, passed SPRT -4;0. Overall, 80% passed SPRT 0;6.

from 27.02 to 25.03 there are 4 successful patches, 1 eval, 1 search (restore old window size), one technical (TT replace) and 1 contempt. Of those, 2 patches passed SPRT 0;6 and 2 SPRT -3;1. Overall, 50% passed SPRT 0;6.

from 25.03 to 26.04 there are 17 successful patches, almost all of them eval, plus 3 or 4 contempt patches. Of those 17, 9 passed SPRT -3;1 and 8 SPRT 0;6. Overall, close to 50% passed SPRT 0;6.

Do you find this data of any interest and how do you comment on it?

My first impression is that eval patches since SF DD are twice or even more bigger number than search patches. (but search patches might have a bigger relative weight, I do not know)

Second impression is that the most successful period of scaling is linked to highest percentage of patches that passed SPRT 0;6, simplification patches almost did not exist then. The least successful period as well as the last period are linked to patches half of which passed SPRT -3;1, the simplification test. An obvious conclusion would be that SPRT -3;1, simplification tests, are conducive to scaling deficiency, the more you simplify, the worse the scaling of the engine with multiple threads, but also meaning with bigger time.

It is also notable that an almost equal number of successful patches in the first period contributed 3 times as many elo as the patches in the last period. Maybe it is because of search patches, maybe because of simplification.
And again, as explained in my earlier post, the apparent "regression" for SMP is normal, so your conclusions about SPRT -3, 1 are absolutely out of place. There is no reason to doubt the -3, 1 simplification approach because the statistics are sound.

And as for the frequency of eval patches vs search patches and simplification patches vs 0, 6 patches, it is completely irrelevant. Patches arise from ideas, and it is simply a random occurrence that in the early period say, lots of search related ideas were flowing around, and now lots of eval and simplification ideas are flowing around.

You are seeing patterns where none exist. :)
There is a pattern, and it is very clear.

It is a pity that no one else wants to comment.

This is not a chance data, it is very consistent.

But this makes perfectly sense: with longer TC and computer power knowledge gains in importance as the relative speed loss from calculating that knowledge is already lower, it is bigger at shorter TC where resources are scarce. At the same time, with simplifications, a couple of million more or less computed lines in LTC and good hardware already do not matter, as randomness is already too big.

So that there is a pattern and it is clear. Knowledge scales well at LTC, removing knowledge scales bad at LTC and multiple threads. I hope SF gets the warning while it is still early. You can not simplify all the time, you need to add knowledge. But, how to do that, when people like their patches to succeed, and with simplification success chances are much higher. So basically everybody goes into simplification, while people should have been doing exactly the opposite: concentrate all their energy so that knowledge patches are implemented by consistent testing.

I really hope SF gets the signal while it is still early. You can imagine what happens with no-knowledge patches at 16 threads and 1 hour per game...