Is It Me or is the Source Comparison Page Rigged?

Uri Blass · Post by **Uri Blass** » Wed Sep 03, 2008 7:48 am

bob wrote:
Uri Blass wrote:
bob wrote:
RegicideX wrote:
tiger wrote: Semantic similarity is what the courts are looking for in order to find copyright infringement.

Unfortunately finding "semantic similarity" is not a completely objective process -- in some cases the similarity close enough to justify charges of plagiarism, sometimes the semantic similarity is so vague that the codes are obviously different. But there is a large gray area where there is room for divergence of opinion.
As I said previously, there are lots of different axes to investigate in a multi-dimensional process. Suppose two people end up with similar evaluations, so that semantical analysis suggests plagiarism. But then suppose when you get outside the semantics of the actual executable instructions, you then find lots of identical scoring variables? We could argue about whether pawn = 100 is unique or not, I would suggest not really. Because deep thought used pawn=128. I believe Tord said he uses pawn = 256 in Glaurung (although I also think he said the value varies between opening and middlegame, so I don't know whether he has two values that are used for interpolation or whether the value is modified in some other way). But when you drift away from that, when we looked at cases of cloning in Crafty, I have several arrays called "piece/square" tables. 64 values giving a rough approximation of how well a piece is placed if it is on that particular square. Some have two sets, one for opening, one for endgame. For kings, I have three for the endgame to handle cases where all pawns are on one side or the other or both. What is the probability that two independent programmers will duplicate those? Close enough to zero to call it zero. This has to be investigated (not done so far). What about basic program structure? What is the probability that in a 40K line program, two independent authors would have exactly the same program structure (with respect to which procedures do what)? Yes, both may have a MakeMove, DoMove, or whatever. But with 40K lines, and maybe 400 different procedures, what is the probability two programs have exactly the same number, doing exactly the same thing, with either different or the same names? Again, near zero. What about data structures? They are large and complex. What is the probability that two independent programmers will use the same exact ones? Near zero. So there is _much_ to compare, and the challenge given no C code to look at, is to actually find/decode them to see how they would look in C.

If two large programs are similar in all those ways, there is absolutely something fishy going on. But fishing is a slow activity. I do it all the time, and there are days when I catch nothing...
It is clear that programmers are not independent so the question what is the probability for independent programmers is not relevant.

It could be relevant in case that Vas claimed that he did not read fruit but Vas did not claim it.

The question is not the probability for independet programmers but what type of dependency are considered to be dependency that is against the GPL.

Using ideas is also type of dependeny.
I use in movei the idea of average between opening score and endgame score based on the stage of the game.

I cannot say that it is independent from fruit because I learned the idea from fruit.

I do not use the same piece square table as fruit but if a programmer use the same piece square table he can claim that the specific numbers in the piece square table is an idea and if he is not allowed to use it then he cannot use ideas of fruit.

Other cases can be when programmers do not copy tables from fruit but change their tables to be closer to fruit's tables.

When you have different program than Fruit then it is not obvious that fruit's table is going to be good for your program but you need to decide what to test.

You can decide that because Fruit is a good program you test changes that make your tables closer to fruit by taking average between your original tables and fruit tables.

If you find that the change is productive based on games you accept it.

In this case people are not going to find Fruit's numbers in your code but it is clear that the numbers in your code are based on ideas from fruit.

Uri
This is simple. If you copy lines of code, whether they are executable statements or data tables, you are guilty of plagiarism. Unless the data is of such an obvious nature than different people would likely produce them by themselves. The list of character digits and their number equivalents is an example. Or a list of words such as in the simple program we were writing a couple of days ago. But not a set of numbers that give piece values for different squares. There are a near infinite number of such values and groups of values...

I do not see why it is so difficult to understand the difference between ideas and source code.

1)In the case of piece square table it is not clear.
Suppose that you make numbers that are close to the piece square table of another person but not the same.

How much close mean that you are quilty of plagiarism?

2)Supoose that you simply give rybka to analyze many quiet positions when the evaluation is not changed significantly with depth and have a file of position and target evaluation(simply rybka's evaluation).

Suppose that you test changing evaluation parameters not based on games but based on distance from rybka's evaluation.

It is not an accident that your evaluation is closer to rybka's evaluation relative to other people.

Are you quilty of plagiarism?
Are you going to be quilty if we replace rybka by fruit even if you did not use a single line of the source and simply used the output of the program?

Uri

bob · Post by **bob** » Wed Sep 03, 2008 5:03 pm

The general answer would be this. Given any two PST sets of values, if one can find a formula F(Pi) = Qi, for all i, where P = pst from program A and Q = pst from program B, any sane person would consider those to be copied. For example, in a centipawn vs millipawn program comparison, where P0 = 12 and Q0 = 120, etc, one would conclude "copied".

Is It Me or is the Source Comparison Page Rigged?

Re: Is It Me or is the Source Comparison Page Rigged?

Re: Is It Me or is the Source Comparison Page Rigged?