Note that I never recommended relying on naked human judgement, but human judgement aided by use of computer tools. I would say that the judgement of humans that get to use computers as tools is significantly better than computer (or human) judgement on its own. Those old test set errors are situations where if a human looks at the computer analysis they say, "of course!" which means that the human using a computer is not wrong at all about those positions. It would be relatively easy for me to come up with a dozen positions computers will get wildly wrong, but hard for me (or computers) to come up with positions that a human using a computer as a tool would get wrong. Someday this may change, but in my opinion we are many years away from it.I guess that human judgement is wrong about as often as computer judgement is wrong. And that is a surprisingly large number. In other words, old test sets that have not been carefully scrutinized are always full of bugs.
-Sam