Colleges use software like this to determine if students are cheating and reusing others code or documents:
http://www.cs.vu.nl/~dick/sim.html
I am wondering if this idea could be taken a step further.
Use this source code, but make it into two pieces. The first piece would be run by the program author and would extract features about the source code. The feature file then could be registered somewhere online. As new programs appear, authors would be require to run their software through this code and provide the feature file. Hashing code could ensure the actual source code is submitted. Might catch a lot of clones and prevent authors from stealing too much from others. And by using data extracted from the source code, and not the source code itself, the source code could still remain private.
I have no idea how hard this might be to do, but maybe it is possible.
Mark
Similarity testing for source code.
Moderator: Ras