Is It Me or is the Source Comparison Page Rigged?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Is It Me or is the Source Comparison Page Rigged?

Post by bob »

michiguel wrote:
bob wrote:
John wrote:
bob wrote: ... We are talking about a direct and well known process for translating a high-level language into machine language, and then back again ...
Bob, you could serve the CCC well, by providing a link to peer-reviewed descriptions of this process.

Especially vital are reliable estimates of Type I versus Type II errors, and equally important, inter-rater reliability.

Minimizing both kinds of error, and maximizing the reliability, is where the advice of statisticians and psychologists is indispensable.
What on earth are you talking about? There is no "error" in this process. Given the line of C programming, a = b * c + 1;, there is one way to translate that to assembly language. You might slightly alter the order of the instructions to improve speed, but if the compiler does not have a bug, then the result must be semantically equivalent. Given the assembly code the compiler produced, it is again a direct translation to go from the assembly language back to the semantically-equivalent C.

so where are you trying to take this? I have no idea who you are, or what your background is. You can find mine easily enough. I have written assembliers and compilers, I have taught the courses for many years, and the others involved are also quite good and have several "looking over their shoulders, including myself" to make sure that translation errors do not occur.

So, where are you trying to go with this? And why?


There is no error in the process but there might be in the interpretation.
When someone ask "What are the chances that code A has not been copied and derivatized from code B?", you open the door to statistics with the word "chances". The process has no error, but you end up with a similarity that may be able to be quantified. If the code is 100% identical in semantics is one thing, but what if it is not? Where do you draw the line? How do you defined "% of similarity"? We know that it is easy to define 100%, but anything else might not be trivial. You certainly cannot deny emphatically that statistics play any role. A quick search lead me to this paper:

Shared Information and Program Plagiarism Detection
Xin Chen, Brent Francia, Ming Li, Brian Mckinnon, Amit Seker
University of California, Santa Barbara
http://citeseerx.ist.psu.edu/viewdoc/su ... .1.1.10.76

It may not be the best paper, but it is the first I found in which people are trying to put all this in quantifiable terms. This may be far from solved, but as I said, if things can be quantified, statistics have a role.

I quote two paragraphs. See that the problem resembles genome, or DNA sequence comparison. Something that I already pointed out and it was not paid attention:

"A common thread between information theory and computer science is the study of the amount of information
contained in an ensemble [17, 18] or a sequence [9]. A fundamental and very practical question has challenged
us for the past 50 years: Given two sequences, how do we measure their similarity in the sense that the measure
captures all of our intuitive concepts of “computable similarities”? Practical reincarnations of this question
abound. In genomics, are two genomes similar? On the internet, are two documents similar? Among a pile of
student Java programming assignments, are some of them plagiarized?
This paper is a part of our continued effort to develop a general and yet practical theory to answer the
challenge. We have proposed a general concept of sequence similarity in [3, 11] and further developed more
suitable theories in [8] and then in [10]. The theory has been successfully applied to whole genome phylogeny
[8], chain letter evolution [4], language phylogeny [2, 10], and more recently classification of music pieces in
MIDI format [6]. In this paper, we report our project of the past three years aimed at applying this general
theory to the domain of detecting programming plagiarisms."

Miguel


First, for plagiarism in the classroom, the process is much simpler. You can either do semantic analysis by hand, run run both programs thru an automated tool.

Here there is no room for "interpretation". yes, one can make _mistakes_. And that is a reason for multiple persons double-checking. Going from C to assembly and assembly to C is not magic or voodoo. Each is a well-defined process. This investigation is going one level more complicated, which is to go from asm to a specific C source to match them up. That is also a well-defined process. there is a little "search" involved, but it is not any sort of "creative" process as the relationship between C and assembly is pretty straightforward in either direction.

Also, while a general-purpose tool would be wonderful, it would also be _some_ project. Here we are looking at a specific machine language instance, and trying to determine how well it matches a specific C instance. That is a precisely defined goal that simplifies this greatly from the issues considered in a general-purpose automated tool.

There are tools around, but they are aimed at comparing multiple programs in a common language, say C. machine to C is a different problem entirely and is far less common, which is probably why there is not a lot of work done in it. I had a PhD student a few years ago that looked at "process migration on heterogeneous processors" and the problems there were quite interesting. Given a machine language on machine A, and a corresponding state S that defines where the program is at this instant in time, he wanted to migrate that to a different architecture with a different machine language, so he had to first map A to A' (a machine language translation) but then, _much_ more interesting, was mapping the state S to S'. Different numbers of registers, different instruction sets, it was an interesting study. And probably more related to the current discussion than other things I have worked on.
RegicideX

Re: Is It Me or is the Source Comparison Page Rigged?

Post by RegicideX »

GenoM wrote:
RegicideX wrote: You can claim all the expertise in the world -- it will not make the claim that there is no creativity involved in constructing a C source code from machine code anything less that sheer baloney.
Compare with that:
RegicideX wrote:
GenoM wrote:I've tried to follow this scientific discussion about methodology of research.
I can guess John Sidles and Alex K. are mostly worried about that translated code of Rybka is not exactly the same as the original code of Rybka. They are insisting on view that translated code can not be exactly the same as the original source code so from the results of comparision can not be drawn any valid (by scientific means) conclusions.
Have I got it right?
No, you have not.<...>
Seems I have got it pretty right, havn't I?

Your way of arguing is pretending to be very scientific-like, but as we see, you are not clear about your own position and views.
The only unclear thing is if you can tell the difference between

1) There is creativity involved in C source reconstruction (absolutely true)

and

2) No valid scientific conclusions can be drawn from machine code comparisons (false)


People have been dealing with uncertainty in science for centuries now -- it's not a novel concept, although it can be tricky sometimes.
RegicideX

Re: Is It Me or is the Source Comparison Page Rigged?

Post by RegicideX »

tiger wrote: Semantic similarity is what the courts are looking for in order to find copyright infringement.
Unfortunately finding "semantic similarity" is not a completely objective process -- in some cases the similarity close enough to justify charges of plagiarism, sometimes the semantic similarity is so vague that the codes are obviously different. But there is a large gray area where there is room for divergence of opinion.
User avatar
tiger
Posts: 819
Joined: Sat Mar 11, 2006 3:15 am
Location: Guadeloupe (french caribbean island)

Re: Is It Me or is the Source Comparison Page Rigged?

Post by tiger »

michiguel wrote:
bob wrote:
John wrote:
bob wrote: ... We are talking about a direct and well known process for translating a high-level language into machine language, and then back again ...
Bob, you could serve the CCC well, by providing a link to peer-reviewed descriptions of this process.

Especially vital are reliable estimates of Type I versus Type II errors, and equally important, inter-rater reliability.

Minimizing both kinds of error, and maximizing the reliability, is where the advice of statisticians and psychologists is indispensable.
What on earth are you talking about? There is no "error" in this process. Given the line of C programming, a = b * c + 1;, there is one way to translate that to assembly language. You might slightly alter the order of the instructions to improve speed, but if the compiler does not have a bug, then the result must be semantically equivalent. Given the assembly code the compiler produced, it is again a direct translation to go from the assembly language back to the semantically-equivalent C.

so where are you trying to take this? I have no idea who you are, or what your background is. You can find mine easily enough. I have written assembliers and compilers, I have taught the courses for many years, and the others involved are also quite good and have several "looking over their shoulders, including myself" to make sure that translation errors do not occur.

So, where are you trying to go with this? And why?


There is no error in the process but there might be in the interpretation.
When someone ask "What are the chances that code A has not been copied and derivatized from code B?", you open the door to statistics with the word "chances". The process has no error, but you end up with a similarity that may be able to be quantified. If the code is 100% identical in semantics is one thing, but what if it is not? Where do you draw the line? How do you defined "% of similarity"? We know that it is easy to define 100%, but anything else might not be trivial. You certainly cannot deny emphatically that statistics play any role. A quick search lead me to this paper:

Shared Information and Program Plagiarism Detection
Xin Chen, Brent Francia, Ming Li, Brian Mckinnon, Amit Seker
University of California, Santa Barbara
http://citeseerx.ist.psu.edu/viewdoc/su ... .1.1.10.76

It may not be the best paper, but it is the first I found in which people are trying to put all this in quantifiable terms. This may be far from solved, but as I said, if things can be quantified, statistics have a role.

I quote two paragraphs. See that the problem resembles genome, or DNA sequence comparison. Something that I already pointed out and it was not paid attention:

"A common thread between information theory and computer science is the study of the amount of information
contained in an ensemble [17, 18] or a sequence [9]. A fundamental and very practical question has challenged
us for the past 50 years: Given two sequences, how do we measure their similarity in the sense that the measure
captures all of our intuitive concepts of “computable similarities”? Practical reincarnations of this question
abound. In genomics, are two genomes similar? On the internet, are two documents similar? Among a pile of
student Java programming assignments, are some of them plagiarized?
This paper is a part of our continued effort to develop a general and yet practical theory to answer the
challenge. We have proposed a general concept of sequence similarity in [3, 11] and further developed more
suitable theories in [8] and then in [10]. The theory has been successfully applied to whole genome phylogeny
[8], chain letter evolution [4], language phylogeny [2, 10], and more recently classification of music pieces in
MIDI format [6]. In this paper, we report our project of the past three years aimed at applying this general
theory to the domain of detecting programming plagiarisms."

Miguel




Thank you Miguel for your suggestions. I have read the paper you have provided a link for. I have also, after reading it, made some additional research.

That has been a 6 hours journey.

Of particular interest is the NCD method (http://homepages.cwi.nl/~paulv/papers/cluster.pdf) because it is able to cluster items based on similarities. Using such a method could allow to evaluate distances between different programs. It would be extremely interesting for example to perform the experiment with Crafty, Fruit, Toga and Strelka. And then on other combinations of known related or unrelated chess programs.

Unfortunately a direct comparison of source codes would not work very well. The sources should first be preprocessed, at least to remove comments, to format them in a consistent way and finally something should be done for identifiers.

But it opens new possibilities for me. Thanks.



// Christophe
User avatar
tiger
Posts: 819
Joined: Sat Mar 11, 2006 3:15 am
Location: Guadeloupe (french caribbean island)

Re: Is It Me or is the Source Comparison Page Rigged?

Post by tiger »

RegicideX wrote:
tiger wrote: Semantic similarity is what the courts are looking for in order to find copyright infringement.
Unfortunately finding "semantic similarity" is not a completely objective process -- in some cases the similarity close enough to justify charges of plagiarism, sometimes the semantic similarity is so vague that the codes are obviously different. But there is a large gray area where there is room for divergence of opinion.


Seriously? You mean some people could disagree when presented with evidence? ;-)

OK, yes, I know there is a gray area.

What is wrong with bringing up the issue of fairness? What about defining more clearly what is allowed and what is not?

As I see it, taking protected GPL code and making it my own is not acceptable, but on the other hand we have a gray area so to hell copyright and GPL. Tomorrow we have an army of closed-source Toga derivatives, Strelka derivatives and 20 self-declared geniuses at the top of the CCRL.



// Christophe
BubbaTough
Posts: 1154
Joined: Fri Jun 23, 2006 5:18 am

Re: Is It Me or is the Source Comparison Page Rigged?

Post by BubbaTough »

Tomorrow we have an army of closed-source Toga derivatives, Strelka derivatives and 20 self-declared geniuses at the top of the CCRL.
You mean we have to wait for tomorrow for that? :lol:. The way everyone talks it sounds like people think everyone has already been doing that for a while now, and the only reason open source programs remain at/near the top is that they are near enough to some local optima that the 'derivative' programs always end up weaker. I have no idea if its true or not (and really am not that interested actually) but I have heard fruit blamed for the success of almost every strong amateur program around since I first came to this board. And I have noticed it is rare for programs to start out weak and slowly get stronger every year until they are really strong (which is how you would kind of expect most strong programs to emerge if they are not derived).

-Sam
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Is It Me or is the Source Comparison Page Rigged?

Post by bob »

RegicideX wrote:
tiger wrote: Semantic similarity is what the courts are looking for in order to find copyright infringement.
Unfortunately finding "semantic similarity" is not a completely objective process -- in some cases the similarity close enough to justify charges of plagiarism, sometimes the semantic similarity is so vague that the codes are obviously different. But there is a large gray area where there is room for divergence of opinion.
As I said previously, there are lots of different axes to investigate in a multi-dimensional process. Suppose two people end up with similar evaluations, so that semantical analysis suggests plagiarism. But then suppose when you get outside the semantics of the actual executable instructions, you then find lots of identical scoring variables? We could argue about whether pawn = 100 is unique or not, I would suggest not really. Because deep thought used pawn=128. I believe Tord said he uses pawn = 256 in Glaurung (although I also think he said the value varies between opening and middlegame, so I don't know whether he has two values that are used for interpolation or whether the value is modified in some other way). But when you drift away from that, when we looked at cases of cloning in Crafty, I have several arrays called "piece/square" tables. 64 values giving a rough approximation of how well a piece is placed if it is on that particular square. Some have two sets, one for opening, one for endgame. For kings, I have three for the endgame to handle cases where all pawns are on one side or the other or both. What is the probability that two independent programmers will duplicate those? Close enough to zero to call it zero. This has to be investigated (not done so far). What about basic program structure? What is the probability that in a 40K line program, two independent authors would have exactly the same program structure (with respect to which procedures do what)? Yes, both may have a MakeMove, DoMove, or whatever. But with 40K lines, and maybe 400 different procedures, what is the probability two programs have exactly the same number, doing exactly the same thing, with either different or the same names? Again, near zero. What about data structures? They are large and complex. What is the probability that two independent programmers will use the same exact ones? Near zero. So there is _much_ to compare, and the challenge given no C code to look at, is to actually find/decode them to see how they would look in C.

If two large programs are similar in all those ways, there is absolutely something fishy going on. But fishing is a slow activity. I do it all the time, and there are days when I catch nothing...
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Is It Me or is the Source Comparison Page Rigged?

Post by michiguel »

tiger wrote:
michiguel wrote:
bob wrote:
John wrote:
bob wrote: ... We are talking about a direct and well known process for translating a high-level language into machine language, and then back again ...
Bob, you could serve the CCC well, by providing a link to peer-reviewed descriptions of this process.

Especially vital are reliable estimates of Type I versus Type II errors, and equally important, inter-rater reliability.

Minimizing both kinds of error, and maximizing the reliability, is where the advice of statisticians and psychologists is indispensable.
What on earth are you talking about? There is no "error" in this process. Given the line of C programming, a = b * c + 1;, there is one way to translate that to assembly language. You might slightly alter the order of the instructions to improve speed, but if the compiler does not have a bug, then the result must be semantically equivalent. Given the assembly code the compiler produced, it is again a direct translation to go from the assembly language back to the semantically-equivalent C.

so where are you trying to take this? I have no idea who you are, or what your background is. You can find mine easily enough. I have written assembliers and compilers, I have taught the courses for many years, and the others involved are also quite good and have several "looking over their shoulders, including myself" to make sure that translation errors do not occur.

So, where are you trying to go with this? And why?


There is no error in the process but there might be in the interpretation.
When someone ask "What are the chances that code A has not been copied and derivatized from code B?", you open the door to statistics with the word "chances". The process has no error, but you end up with a similarity that may be able to be quantified. If the code is 100% identical in semantics is one thing, but what if it is not? Where do you draw the line? How do you defined "% of similarity"? We know that it is easy to define 100%, but anything else might not be trivial. You certainly cannot deny emphatically that statistics play any role. A quick search lead me to this paper:

Shared Information and Program Plagiarism Detection
Xin Chen, Brent Francia, Ming Li, Brian Mckinnon, Amit Seker
University of California, Santa Barbara
http://citeseerx.ist.psu.edu/viewdoc/su ... .1.1.10.76

It may not be the best paper, but it is the first I found in which people are trying to put all this in quantifiable terms. This may be far from solved, but as I said, if things can be quantified, statistics have a role.

I quote two paragraphs. See that the problem resembles genome, or DNA sequence comparison. Something that I already pointed out and it was not paid attention:

"A common thread between information theory and computer science is the study of the amount of information
contained in an ensemble [17, 18] or a sequence [9]. A fundamental and very practical question has challenged
us for the past 50 years: Given two sequences, how do we measure their similarity in the sense that the measure
captures all of our intuitive concepts of “computable similarities”? Practical reincarnations of this question
abound. In genomics, are two genomes similar? On the internet, are two documents similar? Among a pile of
student Java programming assignments, are some of them plagiarized?
This paper is a part of our continued effort to develop a general and yet practical theory to answer the
challenge. We have proposed a general concept of sequence similarity in [3, 11] and further developed more
suitable theories in [8] and then in [10]. The theory has been successfully applied to whole genome phylogeny
[8], chain letter evolution [4], language phylogeny [2, 10], and more recently classification of music pieces in
MIDI format [6]. In this paper, we report our project of the past three years aimed at applying this general
theory to the domain of detecting programming plagiarisms."

Miguel




Thank you Miguel for your suggestions. I have read the paper you have provided a link for. I have also, after reading it, made some additional research.

That has been a 6 hours journey.

Of particular interest is the NCD method (http://homepages.cwi.nl/~paulv/papers/cluster.pdf) because it is able to cluster items based on similarities. Using such a method could allow to evaluate distances between different programs. It would be extremely interesting for example to perform the experiment with Crafty, Fruit, Toga and Strelka. And then on other combinations of known related or unrelated chess programs.

Unfortunately a direct comparison of source codes would not work very well. The sources should first be preprocessed, at least to remove comments, to format them in a consistent way and finally something should be done for identifiers.

But it opens new possibilities for me. Thanks.

// Christophe


I believe it may be possible to compare all the programs from the binaries and calculate how similar are to each other. I do not underestimate the difficulty, but comparing whole genomes is not an easy task either and people are doing it. There were a lot of resources spent on those projects, so I may not be surprised if comp. sci. will end up learn learn some techniques from the bioinformatics area (giving something back).

Currently, there are systems (http://turnitin.com/static/index.html) that allow an automatic check with assignments turned in by students (not code, just text). You get back a score with red flags, orange, yellow, green etc. I truly believe this should be possible with code. I do not know how effective can be though. But anyway, my original point is that it is not possible to deny a role of statistics.

Miguel
Uri Blass
Posts: 10790
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Is It Me or is the Source Comparison Page Rigged?

Post by Uri Blass »

bob wrote:
RegicideX wrote:
tiger wrote: Semantic similarity is what the courts are looking for in order to find copyright infringement.
Unfortunately finding "semantic similarity" is not a completely objective process -- in some cases the similarity close enough to justify charges of plagiarism, sometimes the semantic similarity is so vague that the codes are obviously different. But there is a large gray area where there is room for divergence of opinion.
As I said previously, there are lots of different axes to investigate in a multi-dimensional process. Suppose two people end up with similar evaluations, so that semantical analysis suggests plagiarism. But then suppose when you get outside the semantics of the actual executable instructions, you then find lots of identical scoring variables? We could argue about whether pawn = 100 is unique or not, I would suggest not really. Because deep thought used pawn=128. I believe Tord said he uses pawn = 256 in Glaurung (although I also think he said the value varies between opening and middlegame, so I don't know whether he has two values that are used for interpolation or whether the value is modified in some other way). But when you drift away from that, when we looked at cases of cloning in Crafty, I have several arrays called "piece/square" tables. 64 values giving a rough approximation of how well a piece is placed if it is on that particular square. Some have two sets, one for opening, one for endgame. For kings, I have three for the endgame to handle cases where all pawns are on one side or the other or both. What is the probability that two independent programmers will duplicate those? Close enough to zero to call it zero. This has to be investigated (not done so far). What about basic program structure? What is the probability that in a 40K line program, two independent authors would have exactly the same program structure (with respect to which procedures do what)? Yes, both may have a MakeMove, DoMove, or whatever. But with 40K lines, and maybe 400 different procedures, what is the probability two programs have exactly the same number, doing exactly the same thing, with either different or the same names? Again, near zero. What about data structures? They are large and complex. What is the probability that two independent programmers will use the same exact ones? Near zero. So there is _much_ to compare, and the challenge given no C code to look at, is to actually find/decode them to see how they would look in C.

If two large programs are similar in all those ways, there is absolutely something fishy going on. But fishing is a slow activity. I do it all the time, and there are days when I catch nothing...
It is clear that programmers are not independent so the question what is the probability for independent programmers is not relevant.

It could be relevant in case that Vas claimed that he did not read fruit but Vas did not claim it.

The question is not the probability for independet programmers but what type of dependency are considered to be dependency that is against the GPL.

Using ideas is also type of dependeny.
I use in movei the idea of average between opening score and endgame score based on the stage of the game.

I cannot say that it is independent from fruit because I learned the idea from fruit.

I do not use the same piece square table as fruit but if a programmer use the same piece square table he can claim that the specific numbers in the piece square table is an idea and if he is not allowed to use it then he cannot use ideas of fruit.

Other cases can be when programmers do not copy tables from fruit but change their tables to be closer to fruit's tables.

When you have different program than Fruit then it is not obvious that fruit's table is going to be good for your program but you need to decide what to test.

You can decide that because Fruit is a good program you test changes that make your tables closer to fruit by taking average between your original tables and fruit tables.

If you find that the change is productive based on games you accept it.

In this case people are not going to find Fruit's numbers in your code but it is clear that the numbers in your code are based on ideas from fruit.

Uri
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Is It Me or is the Source Comparison Page Rigged?

Post by bob »

Uri Blass wrote:
bob wrote:
RegicideX wrote:
tiger wrote: Semantic similarity is what the courts are looking for in order to find copyright infringement.
Unfortunately finding "semantic similarity" is not a completely objective process -- in some cases the similarity close enough to justify charges of plagiarism, sometimes the semantic similarity is so vague that the codes are obviously different. But there is a large gray area where there is room for divergence of opinion.
As I said previously, there are lots of different axes to investigate in a multi-dimensional process. Suppose two people end up with similar evaluations, so that semantical analysis suggests plagiarism. But then suppose when you get outside the semantics of the actual executable instructions, you then find lots of identical scoring variables? We could argue about whether pawn = 100 is unique or not, I would suggest not really. Because deep thought used pawn=128. I believe Tord said he uses pawn = 256 in Glaurung (although I also think he said the value varies between opening and middlegame, so I don't know whether he has two values that are used for interpolation or whether the value is modified in some other way). But when you drift away from that, when we looked at cases of cloning in Crafty, I have several arrays called "piece/square" tables. 64 values giving a rough approximation of how well a piece is placed if it is on that particular square. Some have two sets, one for opening, one for endgame. For kings, I have three for the endgame to handle cases where all pawns are on one side or the other or both. What is the probability that two independent programmers will duplicate those? Close enough to zero to call it zero. This has to be investigated (not done so far). What about basic program structure? What is the probability that in a 40K line program, two independent authors would have exactly the same program structure (with respect to which procedures do what)? Yes, both may have a MakeMove, DoMove, or whatever. But with 40K lines, and maybe 400 different procedures, what is the probability two programs have exactly the same number, doing exactly the same thing, with either different or the same names? Again, near zero. What about data structures? They are large and complex. What is the probability that two independent programmers will use the same exact ones? Near zero. So there is _much_ to compare, and the challenge given no C code to look at, is to actually find/decode them to see how they would look in C.

If two large programs are similar in all those ways, there is absolutely something fishy going on. But fishing is a slow activity. I do it all the time, and there are days when I catch nothing...
It is clear that programmers are not independent so the question what is the probability for independent programmers is not relevant.

It could be relevant in case that Vas claimed that he did not read fruit but Vas did not claim it.

The question is not the probability for independet programmers but what type of dependency are considered to be dependency that is against the GPL.

Using ideas is also type of dependeny.
I use in movei the idea of average between opening score and endgame score based on the stage of the game.

I cannot say that it is independent from fruit because I learned the idea from fruit.

I do not use the same piece square table as fruit but if a programmer use the same piece square table he can claim that the specific numbers in the piece square table is an idea and if he is not allowed to use it then he cannot use ideas of fruit.

Other cases can be when programmers do not copy tables from fruit but change their tables to be closer to fruit's tables.

When you have different program than Fruit then it is not obvious that fruit's table is going to be good for your program but you need to decide what to test.

You can decide that because Fruit is a good program you test changes that make your tables closer to fruit by taking average between your original tables and fruit tables.

If you find that the change is productive based on games you accept it.

In this case people are not going to find Fruit's numbers in your code but it is clear that the numbers in your code are based on ideas from fruit.

Uri
This is simple. If you copy lines of code, whether they are executable statements or data tables, you are guilty of plagiarism. Unless the data is of such an obvious nature than different people would likely produce them by themselves. The list of character digits and their number equivalents is an example. Or a list of words such as in the simple program we were writing a couple of days ago. But not a set of numbers that give piece values for different squares. There are a near infinite number of such values and groups of values...

I do not see why it is so difficult to understand the difference between ideas and source code.