Similarity testing for source code.

Discussion of chess software programming and technical issues.

Moderator: Ras

mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Similarity testing for source code.

Post by mjlef »

Colleges use software like this to determine if students are cheating and reusing others code or documents:

http://www.cs.vu.nl/~dick/sim.html

I am wondering if this idea could be taken a step further.

Use this source code, but make it into two pieces. The first piece would be run by the program author and would extract features about the source code. The feature file then could be registered somewhere online. As new programs appear, authors would be require to run their software through this code and provide the feature file. Hashing code could ensure the actual source code is submitted. Might catch a lot of clones and prevent authors from stealing too much from others. And by using data extracted from the source code, and not the source code itself, the source code could still remain private.

I have no idea how hard this might be to do, but maybe it is possible.

Mark