I disagree here.bob wrote:This has been the "Holy Grail" of testing for years. It is a tough problem. One fairly good indicator is that faster hardware produces better results, while in a true positional test this would not be the case. Either you have the knowledge or you don't. For example, a position where you can take black's a-pawn and give yourself a "distant pawn majority" (turns into a distant passed pawn eventually) or you can take black's g pawn which weakens his pawns a bit but not nearly as much as the majority. The right position won't be depth-sensitive, it will simply determine whether the program understands majorities or not. A book like PPD or something similar might give some good positions...swami wrote:Well, I have chosen only positions where the evaluation score for the best move is atleast > 0.20 more than the second best move and it's been verified after 5 hours of analysis by Dann. And these score difference are agreed on with by Rybka/Zappa/Naum in unison. Else they wouldn't pass the criteria.bob wrote:I think much of this is tactical in nature. What I've always looked for is positions such as 1. e4 c5 2. Nf3 any 3. d4 where d4 is a pretty obvious move to control the center, nullify c5 attacking d4, etc. There are other moves that are perfectly playable, but d4 strikes right to the crux of the position, without being a move that wins anything. The BK test pawn lever positions are similar. Either a program "gets it" or it doesn't. Depth is not particularly important although some require some depth to see the ultimate point of the correct move. I think the way you screened these is backward. I'd toss out positions where the best move scores significantly better than next-best, if you are using a computer to choose them. Some positional scores might well be .2 to .3 (if they don't include king safety issues) but most are a razor;s edge away from the second-best, which is what makes a GM's best move better than my best move.swami wrote:Whoah. That's pretty very high score from Crafty.
I'll try to look at these in some detail when I have time to see which look like the kind of positional tests I'd like to keep for eval testing and tuning...
You've a point that +4 scores in some tests are really tactical in nature, albeit there were only few such positions. I should cease to call the test suite positional. I should rather call it a puzzle where undermining occur. That would make more sense.
I don't trust GM's moves. I took a look into GM games database, I'm having a tough time trying to find any good positions, and it took me so long to come up with few. It's like sitting by the river trying to catch a fish, and there were hardly any.
Next day, I took a look into Rybka's games, I easily find many tests that could make into a good test suite. All I had to do was to check the score difference between the first best and the second best move from Rybka. And to see whether the position in question would qualify as "undermining" pattern. If all that qualifies, I send them to Dann, who would then run a deep analysis for hours with Top 3 engines, and if they all agree in unison, he'd put those tests into 'Qualified' list. That was fun, really.
I'd think that easier and quicker way to create more positions is from studying correspondence games, especially with the use of computers for days. I don't know where those games can be downloaded, but I've to ask around.
I do see some engines clearly doing better in undermining but doing fantastically bad in open files and diagonals. While others did better in the latter rather than the former. I'd hope to get the 3rd test suite ready. It's a good hobby, I should tell you, I really enjoyed every moment of it!
even in true positional test that means no material gain that computers can see computers can perform better at longer time control.
Suppose that a program does not know about candidate pawns but know about passed pawns.
With small depth the program may see only a position with candidate pawn when analyzing the right move so it is not going to find the right move.
With bigger depth the program may see a passed pawn because the right move force a passed pawn if you search deep enough and it may find it
not because of tactics that means winning material.
It means that more time can help the program to find the best move.