SzG wrote:Adam Hair wrote:All that is needed is a large enough set of positions where there
is a set of at least two "best moves". Positions that where the engines
tend to choose the same move have to be removed, as Don has been
doing. Also, positions where engines can not make up their mind on
their selection ( when repeatedly tested at that position) should be
removed. Those positions just add random noise to the test. If these
things are done, then you can reliably measure how *dissimilar 2 engines
are.
Hi Adam,
It seems to me that to create such a set of positions would take a year.
I believe that Don said the positions that were being used were the
offshoot from something else he and Larry Kaufman were doing.
But, the positions could be randomly chosen from any quality database.
Here is the blueprint I see to creating a good set of positions for Don's
utility:
Let's equate move selection to a loaded die, where each face of the die
corresponds to a good move available for a given position. We desire
positions where the die is loaded ( favoring one face) for each engine.
When an individual engine is repeatedly given a position to think about,
it should tend roll the same number with high probability ( chose the
same move).
For each position, the dice should not all be loaded the same. That would
correspond to most/all the engines selecting the same move. Those
positions can be found by testing a number of unrelated engines and
seeing for which positions do they tend to chose the same move. It does
not have to be 100% agreement. Depending on the number of unrelated
engines used ( more is better ), statistics can be used to determine the
threshold percentage.
Also, we would need to see if the dice are loaded. It does no good to
have a large number of positions where each engine's move selection
is more or less random. This would violate an assumption of the test,
that each engine has a preferred move. Too many positions with this
characteristic renders the test invalid. How can these positions be
found so that they can be removed? Choose a set of unrelated engines.
Test them a number of times ( I think 20 to 30 times would be good, that
allows certain assumptions about the distribution of the move selections
to be used). Find the positions where a large fraction of the engines
do not have a definite move choice.
It would take some time to collect the necessary number of positions.
However, the set of positions does not have to be perfect, as far as the
criteria above goes. Miguel's analysis program can deal with some random
noise. And a total of 2000 positions ( or less, depends on the number
of engines being compared) would be enough. I don't think it would
take an excessive amount of time to accumulate that amount of positions
from scratch, provided of course you are not trying to examine the move
choices by eye.