by “small”, I guess you mean “all the games it has ever played”.
by “unrepresentative sub-set”, I guess you mean “all positions it has encountered in all fair and according to the rules of chess games that its opponents share 50% of the responsibility for creating”.
by “manipulable”, I guess you mean the stronger of the two entities is able to steer the position towards a win.
This you have not done, nor has anyone else, nor can it be done. Reinforcement learning by self-play is designed to find the sub-set of all positions (the full set being unavailable) which are statistically the most common that occur, and to put maximal effort into learning about those positions. Which is only sensible, if you think about it, as time and technology allows the subset expands in size and the program improves. Stockfish, world’s leading AB program, progresses according to exactly the same paradigm.
or measuring the poor-case behavior on the total of all problems.
Tactical test suites are not a total of all problems. Nor are they a representative sub set of all problems. Your problem is that you are stuck on old paradigm that chess is tactics. It isn’t, as Deepmind and now LC0 prove. We had a lot of chess programmers who were not very good at chess, and people who are not very good at chess become quite convinced that chess is all about tactics, and they play chess constantly on the lookout for tactics. People who are good at chess understand the adage “stop looking for tactics, play positional and the tactics will come by themselves”.
Tactical testsuites are a (non-deliberately) manipulated sub set of positions. A position gets into a test suite for one reason only in reality. It is at or just beyond the boundary of what the historical set of chess programs can solve. Time progresses, those positions get solved, and new positions, again at the boundary, get added.
They are not deliberately designed for next years AB programs to solve, but that is generally the effect.
Amusing. And quite the opposite. What was it you said? “LC0 is a good example: it sucks in tactical positions, as is easily demonstrated from its performance in tactical test suits. But in games it just avoids tactically complex positions, so it still has a pretty high Elo for a poor Chess program.”
If you want to advertise that you cannot understand this, you are welcome.
“avoidance of tactically complex positions” means actually “play positional and the tactics will come by themselves”, but I think you don’t understand at that level.
Zero averaging programs don’t in any case “avoid tactical complexity”, they are quite asymmetric in that regard, they choose, deliberately, to steer to positions where they have the tactics, and the opponent doesn’t. But I don’t think you understand that either.
LC0 has a high Elo because in real world testing by being pitted against another entity it, more often than not, outsmarts it. Isn’t that the idea in chess?
If given a set of positions, designed unwittingly to show progress in AB programs, biased against positional and for tactical, a non-representative sub set of positions likely to be encountered in a normal chess game, stressing the untruth “chess is tactics” and refusing the truth “play positional and the tactics will come by themselves”, you can probably prove anything. That’s what the old paradigm unable to change people do.
Good. Invalidating something you don’t understand by using a bunch of assertions you don’t understand either was a waste of time in the first place.
I won't waste any more time on it.
Bye!!
BTW, it seems you read 'in' as 'is'...