PV Fingerprinting

benstoker · Post by **benstoker** » Mon Jan 25, 2010 2:01 pm

Would the following PV fingerprinting method be a reliable method to detect clones at least superficially.

Start with any random set of mid and end game FENs.

For each engine snapshot the bestmove PV at a) fixed depth and again b) at fixed very short time control.

Then, note matches/nonmatches for every ply.

Then, do the statistics.

After several FENs, you'd have something like 99% for the first ply, 95% for second, and so on down to 8,9, or 10 ply.

Then, just average the results for each ply.

This could be generated in minutes. You would have averages based on say a 100 different positions.

Then, the debate would be how much of a deviation must there be without being considered a clone.

Or would the PV Fingerprinter be useless?

bob · Post by **bob** » Mon Jan 25, 2010 3:30 pm

benstoker wrote:Would the following PV fingerprinting method be a reliable method to detect clones at least superficially.

Start with any random set of mid and end game FENs.

For each engine snapshot the bestmove PV at a) fixed depth and again b) at fixed very short time control.

Then, note matches/nonmatches for every ply.

Then, do the statistics.

After several FENs, you'd have something like 99% for the first ply, 95% for second, and so on down to 8,9, or 10 ply.

Then, just average the results for each ply.

This could be generated in minutes. You would have averages based on say a 100 different positions.

Then, the debate would be how much of a deviation must there be without being considered a clone.

Or would the PV Fingerprinter be useless?

The idea has possibilities. But it would take some testing and analysis to see how prone it would be to produce "false positives"...

metax · Post by **metax** » Mon Jan 25, 2010 4:06 pm

I just rather think of the false negatives... If you change the evaluation weights of a clone completely or write a new eval, then the moves selected should differ significantly, but the strength will not necessary drop - only the playing style will change. Nevertheless it's still a clone.

benstoker · Post by **benstoker** » Mon Jan 25, 2010 4:11 pm

bob wrote:
benstoker wrote:Would the following PV fingerprinting method be a reliable method to detect clones at least superficially.

Start with any random set of mid and end game FENs.

For each engine snapshot the bestmove PV at a) fixed depth and again b) at fixed very short time control.

Then, note matches/nonmatches for every ply.

Then, do the statistics.

After several FENs, you'd have something like 99% for the first ply, 95% for second, and so on down to 8,9, or 10 ply.

Then, just average the results for each ply.

This could be generated in minutes. You would have averages based on say a 100 different positions.

Then, the debate would be how much of a deviation must there be without being considered a clone.

Or would the PV Fingerprinter be useless?
The idea has possibilities. But it would take some testing and analysis to see how prone it would be to produce "false positives"...

Maybe it could have other uses also. You have a known control engine, and you want to get a rough idea how close another engine is to the control.

bob · Post by **bob** » Mon Jan 25, 2010 5:46 pm

metax wrote:I just rather think of the false negatives... If you change the evaluation weights of a clone completely or write a new eval, then the moves selected should differ significantly, but the strength will not necessary drop - only the playing style will change. Nevertheless it's still a clone.

That's why you need to carefully choose the positions. For tactical ideas, a search will follow the same path regardless of the evaluation, and that can be recognized when you compare depths, PV moves and material eval at the end.

It's not an easy task, but it is doable. Some also use bizarre positions that produce known problems in some programs as a way of recognizing them. But then some of us fix such problems from time to time and unintentionally render that detection mechanism invalid.

rjgibert · Post by **rjgibert** » Mon Jan 25, 2010 8:31 pm

benstoker wrote:Would the following PV fingerprinting method be a reliable method to detect clones at least superficially.

Start with any random set of mid and end game FENs.

For each engine snapshot the bestmove PV at a) fixed depth and again b) at fixed very short time control.

Then, note matches/nonmatches for every ply.

Then, do the statistics.

After several FENs, you'd have something like 99% for the first ply, 95% for second, and so on down to 8,9, or 10 ply.

Then, just average the results for each ply.

This could be generated in minutes. You would have averages based on say a 100 different positions.

Then, the debate would be how much of a deviation must there be without being considered a clone.

Or would the PV Fingerprinter be useless?

This idea is a non-starter. There are a variety simple ways to fool your program. Small changes can have significant changes in behavior.

For example, one way for defeating your "fingerprinter" is to simply change the way depth is counted by the program. Throw a couple of other little changes like simply not reporting any information for shallow depths, etc. and this will throw off your "fingerprinter" enough to make it useless.

benstoker · Post by **benstoker** » Mon Jan 25, 2010 8:36 pm

rjgibert wrote:
benstoker wrote:Would the following PV fingerprinting method be a reliable method to detect clones at least superficially.

Start with any random set of mid and end game FENs.

For each engine snapshot the bestmove PV at a) fixed depth and again b) at fixed very short time control.

Then, note matches/nonmatches for every ply.

Then, do the statistics.

After several FENs, you'd have something like 99% for the first ply, 95% for second, and so on down to 8,9, or 10 ply.

Then, just average the results for each ply.

This could be generated in minutes. You would have averages based on say a 100 different positions.

Then, the debate would be how much of a deviation must there be without being considered a clone.

Or would the PV Fingerprinter be useless?
This idea is a non-starter. There are a variety simple ways to fool your program. Small changes can have significant changes in behavior.

For example, one way for defeating your "fingerprinter" is to simply change the way depth is counted by the program. Throw a couple of other little changes like simply not reporting any information for shallow depths, etc. and this will throw off your "fingerprinter" enough to make it useless.

Does this mean that one can draw no conclusions from the fact that a program and its alleged clone show very similar pv lines over a large number test positions?

Dann Corbit · Post by **Dann Corbit** » Mon Jan 25, 2010 8:48 pm

benstoker wrote:Would the following PV fingerprinting method be a reliable method to detect clones at least superficially.

Start with any random set of mid and end game FENs.

For each engine snapshot the bestmove PV at a) fixed depth and again b) at fixed very short time control.

Then, note matches/nonmatches for every ply.

Then, do the statistics.

After several FENs, you'd have something like 99% for the first ply, 95% for second, and so on down to 8,9, or 10 ply.

Then, just average the results for each ply.

This could be generated in minutes. You would have averages based on say a 100 different positions.

Then, the debate would be how much of a deviation must there be without being considered a clone.

Or would the PV Fingerprinter be useless?

Take the WAC test suite, and unleash 10 top engines on it at 5 minutes per position.

You will get an exact pv match at least 90% of the time out to the analyzed depth or hash table cut-off (often, there are noise nodes pasted on the end from quiescence or some such, but these don't really count).

What did that tell you?

A pv is a plan. If the plan is obvious enough, all the engines will say the same thing, if they know what they are doing.

rjgibert · Post by **rjgibert** » Mon Jan 25, 2010 9:37 pm

benstoker wrote:
rjgibert wrote:
benstoker wrote:Would the following PV fingerprinting method be a reliable method to detect clones at least superficially.

Start with any random set of mid and end game FENs.

For each engine snapshot the bestmove PV at a) fixed depth and again b) at fixed very short time control.

Then, note matches/nonmatches for every ply.

Then, do the statistics.

After several FENs, you'd have something like 99% for the first ply, 95% for second, and so on down to 8,9, or 10 ply.

Then, just average the results for each ply.

This could be generated in minutes. You would have averages based on say a 100 different positions.

Then, the debate would be how much of a deviation must there be without being considered a clone.

Or would the PV Fingerprinter be useless?
This idea is a non-starter. There are a variety simple ways to fool your program. Small changes can have significant changes in behavior.

For example, one way for defeating your "fingerprinter" is to simply change the way depth is counted by the program. Throw a couple of other little changes like simply not reporting any information for shallow depths, etc. and this will throw off your "fingerprinter" enough to make it useless.

Does this mean that one can draw no conclusions from the fact that a program and its alleged clone show very similar pv lines over a large number test positions?

The idea is this. If a program actually searches to a depth of 13 when asked to search to a depth of 10 and the PV up to depth 10 remains unchanged, then it is unlikely to be any different across many different (strong) programs, because what the PV should be will be too stable. In effect, the extra ply searched simulates a program that searches to depth 10 with a super accurate eval. In effect, a completely different eval.

You can try to get around this by carefully selecting the test positions, but then your test positions will become known to the cloner. You need to use a randomly generated set that can't be anticipated, but this has problems too.

benstoker · Post by **benstoker** » Mon Jan 25, 2010 9:39 pm

Dann Corbit wrote:
benstoker wrote:Would the following PV fingerprinting method be a reliable method to detect clones at least superficially.

Start with any random set of mid and end game FENs.

For each engine snapshot the bestmove PV at a) fixed depth and again b) at fixed very short time control.

Then, note matches/nonmatches for every ply.

Then, do the statistics.

After several FENs, you'd have something like 99% for the first ply, 95% for second, and so on down to 8,9, or 10 ply.

Then, just average the results for each ply.

This could be generated in minutes. You would have averages based on say a 100 different positions.

Then, the debate would be how much of a deviation must there be without being considered a clone.

Or would the PV Fingerprinter be useless?
Take the WAC test suite, and unleash 10 top engines on it at 5 minutes per position.

You will get an exact pv match at least 90% of the time out to the analyzed depth or hash table cut-off (often, there are noise nodes pasted on the end from quiescence or some such, but these don't really count).

What did that tell you?

A pv is a plan. If the plan is obvious enough, all the engines will say the same thing, if they know what they are doing.

But, to just get a so-called 'fingerprint' why not limit it to 2 ply or 1 second? In fact, the shorter the better, because the further you go out, that's when the good progs will start to converge.

Any way, why do people seem to get something out of looking at matching PVs to detect clones? You seem to also be saying that matching PVs says nothing about whether two progs are similar or are clones.

--Curious

PV Fingerprinting

PV Fingerprinting

Re: PV Fingerprinting

Re: PV Fingerprinting

Re: PV Fingerprinting

Re: PV Fingerprinting

Re: PV Fingerprinting

Re: PV Fingerprinting

Re: PV Fingerprinting

Re: PV Fingerprinting

Re: PV Fingerprinting