Rybka 1.0 vs. Strelka

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Uri Blass
Posts: 10790
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Wanted: some opposition to the provided evidence

Post by Uri Blass »

tiger wrote:
Uri Blass wrote:
tiger wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
tiger wrote:Zach is showing code snippets where Rybka 1.0 is actually more similar to Fruit than Strelka 2.0.

A few days ago there was some vocal opposition to the idea that Rybka 1.0 coud be a derived work of Fruit 2.1.

Where is the opposition now?

There are several skilled people ready to explain why many programmers think (without daring to tell it) that Rybka started its life as Fruit 2.1.

The evidence is now being shown factually. Feel free to contradict it factually.



// Christophe
There is a second possibility that rybka started her life with part of fruit but never had the full source.

I know that movei started its life with part of tscp structures and names of variables and constants (but no chess working code)

Uri
Unfortunately that is enough to settle this immediately. "part of fruit" is unacceptable since _all_ of fruit is GPL'ed. This issue is black or white, with no grey at all.

BTW, in some cases development is obvious. You can go back to reg.games.chess.computer circa 1994 november or so, and find posts by me where I was working on a _new_ program (now called Crafty). I started with the move generator and published the source there and got lots of feedback. You can also find discussions about search and evaluation as they were written, not copied. So starting with someone else's code is not a normal development course.
I do not know what is the normal course.

I know some programs that started with the full tscp code like trace
and I am not sure if most chess programs started without code from other programs.

In my case
I started with legal move generator but I used some constants and variables from tscp and also some names of functions.

My move generator never used mailbox that is part of tscp and used some structures of me that are not part of tscp so it is clearly different than tscp move generator.

Uri


Is TSCP protected by the GPL?

It looks like you have acted without even looking at the licence your model has been published under. Read the licence and so you will know if you have infringed on it or not.

If you have, maybe it's not too late to apologize in good faith and to clean your work. That is a more honorable behaviour that the mutism we see from another person.

I see that most of your posts seem tainted by the fear that you have yourself done something wrong. What is legal and what is not is already well defined and will not change because you defend here.

So just clean your stuff first.



// Christophe
I simply responded to Bob hyatt.
Tscp is not protected by the GPL

I think that what is legal and what is not legal is not well defined.

What is derivative work?
If you have one line that you copied then I do not think that you can call it derivative work and you can have one common line also without copying.

What is the minimal number of lines that you copy that your program is defined to be derivative?

Things are not clear.

Uri


The GPL licence is clear:
From section 0: ...a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications...
So there is no minimal number of lines required to start to infringe on the GPL. It starts at one line, or in the case of programs presented on one line (there are such programs), it starts at a few characters.

What's left to be proven is that some part of a source code that is supposed to infringe on the GPL has indeed been taken from a GPL source code and is not an original creation.

This is going to be left at the appreciation of software experts. So if all you have copied is "i++;" I guess you are safe. If you have copied a one hundred lines routine, I think you are not, because the experts are likely to rule that there is no chance that you have come up with exactly the same 100 lines of code by pure luck.

Remember, it does not matter if the routine you have copied is of vital importance in your program or if it is not. It does not even matter if the routines you have copied come from a program that has absolutely nothing to do with yours. For example, you are not allowed to use a routine from a financial program even if you are going to use it in a chess program.

The spirit of the GPL is that some guy gives away his work by generosity. What he asks you in exchange is to do the same if you take all or even just a part of his work to include it in your creations. So if your intention is to keep your work closed, then the least you can do is to respect the will of the guy and not copy ANY part of his work. Hence the relative intolerance against such re-use, even for what you call "unimportant" parts.

It is also the reason why there is no minimal number of identical lines you would be allowed to use. You are not allowed to use this work at all if you do not want to accept the rules of the game, so if there is evidence of re-use of code, as small as it is, you are caught.

It's up to you to decide if you could be caught or not with your current code, knowing the original programs you may have copied in part.

Now if you want to be safe, make at least sure that you do not have 800 identical lines of code that some program protected by the GPL.

It has been shown that at least 800 lines of Strelka 2.0 are identical to lines of Fruit 2.1. Ask an expert what he thinks about it. I don't think the answer will be unclear.



// Christophe
I doubt if 800 is correct

http://64.68.157.89/forum/viewtopic.php?t=23095&start=0

Nobody commented and counted identical stuff and it is not clear how to count identical

I think that the word equivalent is more correct then identical in part of the cases

From the following example 1,4,5,6,8,9 are equivalent and not identical


FRUIT:
static void parse_setoption( char string []) //1
{
const char *name; //2
char* value; //3

name = strstr(string, "name "); //4
value = strstr(string, "value "); //5
if (name == NULL || value == NULL || name >= value)return;//6
value[-1] = '\0'; //7
name += 5; //8
value += 6; //9

STRELKA:
void parse_setoption(char string[]) //1 without static
{
char *name, *value;//2,3 in one line
int size;

name = strstr(string,"name "); //4
value = strstr(string,"value "); //5
if (name == NULL || value == NULL || name >= value) return;//6
value[-1] = 0; //7
name += 5; //8
value += 6; //9
User avatar
tiger
Posts: 819
Joined: Sat Mar 11, 2006 3:15 am
Location: Guadeloupe (french caribbean island)

Re: Wanted: some opposition to the provided evidence

Post by tiger »

Uri Blass wrote:
tiger wrote:
I doubt if 800 is correct

http://64.68.157.89/forum/viewtopic.php?t=23095&start=0

Nobody commented and counted identical stuff and it is not clear how to count identical

I think that the word equivalent is more correct then identical in part of the cases

From the following example 1,4,5,6,8,9 are equivalent and not identical


FRUIT:
static void parse_setoption( char string []) //1
{
const char *name; //2
char* value; //3

name = strstr(string, "name "); //4
value = strstr(string, "value "); //5
if (name == NULL || value == NULL || name >= value)return;//6
value[-1] = '\0'; //7
name += 5; //8
value += 6; //9

STRELKA:
void parse_setoption(char string[]) //1 without static
{
char *name, *value;//2,3 in one line
int size;

name = strstr(string,"name "); //4
value = strstr(string,"value "); //5
if (name == NULL || value == NULL || name >= value) return;//6
value[-1] = 0; //7
name += 5; //8
value += 6; //9


Uri, are you doing this on purpose? Is this a diversion of some sort ot what?

Your chess program is written in C, correct?

So you know that in C whitespaces and presentation is free.

And you know that the code produced by Norman that you have quoted above has been run thru a program that normalizes the presentation in order to make automatic comparison easier.

In the lines that you have commented as 4, 5, and 6, the difference between Fruit and Strelka is a space. This space is not significant for the compiler, so will not produce any difference in the generated executable. It looks like the source formatting program used by Norman did not do its work correctly because in both sources it should have formatted the code in exactly the same way, with the space in both, or no space in both.

The differences you have pointed out are a bug in the formatting program. The lines would be exactly identical if the formatting program had done its work correctly.

In 1 there is a difference of space (bad formatting) and the word static does not exist in Strelka. This word makes no difference in functionality.

Sorry, I can't see the differences in 8 and 9.

As you have pointed out the difference about 2 and 3 is that in Strelka both variables are declared at once, in one line, which is exactly the same as using two lines, one per declaration.

Here you have actually a code snippet that shows 100% identity in the code generated for both Fruit and Strelka.

I'm not sure where you are trying to drag us with your quibbling about "identical" and "equivalent", because I'm pretty sure that you know that the generated code is exactly the same in both cases.



// Christophe
bnemias
Posts: 373
Joined: Thu Aug 14, 2008 3:21 am
Location: Albuquerque, NM

Re: Wanted: some opposition to the provided evidence

Post by bnemias »

tiger wrote:the generated code is exactly the same in both cases.
I'm no expert in disassembly, but it seems clear to me that for the above statement, ANY source producing identical code is identical for the purposes of comparing actual source with a disassembled binary. The variable names could be different, presumably.

I would be looking at implementation quirks such as comment 7:

Fruit

Code: Select all

static void parse_setoption( char string []) //1
{
const char *name; //2
char* value; //3

name = strstr(string, "name "); //4
value = strstr(string, "value "); //5
if (name == NULL || value == NULL || name >= value)return;//6
value[-1] = '\0'; //7
name += 5; //8
value += 6; //9  
Strelka

Code: Select all

void parse_setoption(char string[]) //1 without static
{
char *name, *value;//2,3 in one line
int size;

name = strstr(string,"name "); //4
value = strstr(string,"value "); //5
if (name == NULL || value == NULL || name >= value) return;//6
value[-1] = 0; //7
name += 5; //8
value += 6; //9
Comment 7's implementation detail is far from the obvious choice. It's certainly not a very good practice.
User avatar
tiger
Posts: 819
Joined: Sat Mar 11, 2006 3:15 am
Location: Guadeloupe (french caribbean island)

Re: Wanted: some opposition to the provided evidence

Post by tiger »

bnemias wrote:
tiger wrote:the generated code is exactly the same in both cases.
I'm no expert in disassembly, but it seems clear to me that for the above statement, ANY source producing identical code is identical for the purposes of comparing actual source with a disassembled binary. The variable names could be different, presumably.

I would be looking at implementation quirks such as comment 7:

Fruit

Code: Select all

static void parse_setoption( char string []) //1
{
const char *name; //2
char* value; //3

name = strstr(string, "name "); //4
value = strstr(string, "value "); //5
if (name == NULL || value == NULL || name >= value)return;//6
value[-1] = '\0'; //7
name += 5; //8
value += 6; //9  
Strelka

Code: Select all

void parse_setoption(char string[]) //1 without static
{
char *name, *value;//2,3 in one line
int size;

name = strstr(string,"name "); //4
value = strstr(string,"value "); //5
if (name == NULL || value == NULL || name >= value) return;//6
value[-1] = 0; //7
name += 5; //8
value += 6; //9
Comment 7's implementation detail is far from the obvious choice. It's certainly not a very good practice.


Yes, and it is expected to find the same coding mistakes in both sources from time to time.

Bob would certainly tell you that these mistakes are extremely revealing when one is comparing source codes.



// Christophe
Uri Blass
Posts: 10790
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Wanted: some opposition to the provided evidence

Post by Uri Blass »

tiger wrote:
Uri Blass wrote:
tiger wrote:
I doubt if 800 is correct

http://64.68.157.89/forum/viewtopic.php?t=23095&start=0

Nobody commented and counted identical stuff and it is not clear how to count identical

I think that the word equivalent is more correct then identical in part of the cases

From the following example 1,4,5,6,8,9 are equivalent and not identical


FRUIT:
static void parse_setoption( char string []) //1
{
const char *name; //2
char* value; //3

name = strstr(string, "name "); //4
value = strstr(string, "value "); //5
if (name == NULL || value == NULL || name >= value)return;//6
value[-1] = '\0'; //7
name += 5; //8
value += 6; //9

STRELKA:
void parse_setoption(char string[]) //1 without static
{
char *name, *value;//2,3 in one line
int size;

name = strstr(string,"name "); //4
value = strstr(string,"value "); //5
if (name == NULL || value == NULL || name >= value) return;//6
value[-1] = 0; //7
name += 5; //8
value += 6; //9


Uri, are you doing this on purpose? Is this a diversion of some sort ot what?

Your chess program is written in C, correct?

So you know that in C whitespaces and presentation is free.

And you know that the code produced by Norman that you have quoted above has been run thru a program that normalizes the presentation in order to make automatic comparison easier.

In the lines that you have commented as 4, 5, and 6, the difference between Fruit and Strelka is a space. This space is not significant for the compiler, so will not produce any difference in the generated executable. It looks like the source formatting program used by Norman did not do its work correctly because in both sources it should have formatted the code in exactly the same way, with the space in both, or no space in both.

The differences you have pointed out are a bug in the formatting program. The lines would be exactly identical if the formatting program had done its work correctly.

In 1 there is a difference of space (bad formatting) and the word static does not exist in Strelka. This word makes no difference in functionality.

Sorry, I can't see the differences in 8 and 9.

As you have pointed out the difference about 2 and 3 is that in Strelka both variables are declared at once, in one line, which is exactly the same as using two lines, one per declaration.

Here you have actually a code snippet that shows 100% identity in the code generated for both Fruit and Strelka.

I'm not sure where you are trying to drag us with your quibbling about "identical" and "equivalent", because I'm pretty sure that you know that the generated code is exactly the same in both cases.



// Christophe

I did a mistake and except 1 that has unimportant difference
I mentioned the identical parts instead of the non identical parts.

I meant 1,2,3,7 are not identical but equivalent.
I did not say that the exe that they generate is different but only that the code is not identical.

It is easy to change names of variables or to put 2 variables in one line to have non identical code so of course it proves nothing but inspite of it
I consider the words as important because few identical lines are going to be more suspected than few equivalent lines because there is a bigger probability that 2 different people write few equivalent lines independently(I do not claim that the equivalence between strelka and fruit is not enough to claim that the author used copy and paste for some parts).

Uri
User avatar
tiger
Posts: 819
Joined: Sat Mar 11, 2006 3:15 am
Location: Guadeloupe (french caribbean island)

Re: Wanted: some opposition to the provided evidence

Post by tiger »

Uri Blass wrote:
tiger wrote:
Uri Blass wrote:
tiger wrote:
I doubt if 800 is correct

http://64.68.157.89/forum/viewtopic.php?t=23095&start=0

Nobody commented and counted identical stuff and it is not clear how to count identical

I think that the word equivalent is more correct then identical in part of the cases

From the following example 1,4,5,6,8,9 are equivalent and not identical


FRUIT:
static void parse_setoption( char string []) //1
{
const char *name; //2
char* value; //3

name = strstr(string, "name "); //4
value = strstr(string, "value "); //5
if (name == NULL || value == NULL || name >= value)return;//6
value[-1] = '\0'; //7
name += 5; //8
value += 6; //9

STRELKA:
void parse_setoption(char string[]) //1 without static
{
char *name, *value;//2,3 in one line
int size;

name = strstr(string,"name "); //4
value = strstr(string,"value "); //5
if (name == NULL || value == NULL || name >= value) return;//6
value[-1] = 0; //7
name += 5; //8
value += 6; //9


Uri, are you doing this on purpose? Is this a diversion of some sort ot what?

Your chess program is written in C, correct?

So you know that in C whitespaces and presentation is free.

And you know that the code produced by Norman that you have quoted above has been run thru a program that normalizes the presentation in order to make automatic comparison easier.

In the lines that you have commented as 4, 5, and 6, the difference between Fruit and Strelka is a space. This space is not significant for the compiler, so will not produce any difference in the generated executable. It looks like the source formatting program used by Norman did not do its work correctly because in both sources it should have formatted the code in exactly the same way, with the space in both, or no space in both.

The differences you have pointed out are a bug in the formatting program. The lines would be exactly identical if the formatting program had done its work correctly.

In 1 there is a difference of space (bad formatting) and the word static does not exist in Strelka. This word makes no difference in functionality.

Sorry, I can't see the differences in 8 and 9.

As you have pointed out the difference about 2 and 3 is that in Strelka both variables are declared at once, in one line, which is exactly the same as using two lines, one per declaration.

Here you have actually a code snippet that shows 100% identity in the code generated for both Fruit and Strelka.

I'm not sure where you are trying to drag us with your quibbling about "identical" and "equivalent", because I'm pretty sure that you know that the generated code is exactly the same in both cases.



// Christophe

I did a mistake and except 1 that has unimportant difference
I mentioned the identical parts instead of the non identical parts.

I meant 1,2,3,7 are not identical but equivalent.
I did not say that the exe that they generate is different but only that the code is not identical.

It is easy to change names of variables or to put 2 variables in one line to have non identical code so of course it proves nothing but inspite of it
I consider the words as important because few identical lines are going to be more suspected than few equivalent lines because there is a bigger probability that 2 different people write few equivalent lines independently(I do not claim that the equivalence between strelka and fruit is not enough to claim that the author used copy and paste for some parts).

Uri


OK I understand.

It is true that equivalent lines is what is generally produced by the process of disassembling the executable code, and this is how Strelka has been built.

The comparative analysis shows a significant number of routines that are equivalent (in the sense that they produce identical code or in the worst case equivalent code). An expert would very quickly label them one by one as obviously copied from Fruit, with slight changes.

It is so unlikely that all these identical or equivalent or similar routines can have been written by pure luck that an expert will not have to count or label the lines. It's simply obvious.

I think it is not required to be an expert to see that these routines have been copied. All that is needed is moderate skills in C programming.

Then as all the similarities emerge the big picture becomes clear: Fruit has been taken as the basis and has then been modified. The "infrastructure", by this I mean all the parts that surround the search and evaluation (initialization, reading positions, commands, ...), is mostly Fruit. The infrastructure is what is written first in a chess program. You do not write the search and the evaluation first, because without the infrastructure there is no way you can test them. You would be coding for days without being able to test anything. That is not how chess programs are developped. The infrastructure is developped first and then the search and evaluation are written and can then be matured progressively.

I'm not saying that this scenario is the only possible or that it is certain, but I believe it is highly plausible.



// Christophe
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Wanted: some opposition to the provided evidence

Post by Zach Wegner »

tiger wrote:The infrastructure is what is written first in a chess program. You do not write the search and the evaluation first, because without the infrastructure there is no way you can test them. You would be coding for days without being able to test anything. That is not how chess programs are developped. The infrastructure is developped first and then the search and evaluation are written and can then be matured progressively.
This is especially true if you consider where Rybka was less than two years before the 1.0 release: http://www.vrichey.de/cct6/index_table.htm

You don't start with a regular program, improve its search and evaluation to be world class, and then replace all of the infrastructure with Fruit.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Wanted: some opposition to the provided evidence

Post by bob »

Uri Blass wrote:
tiger wrote:
Uri Blass wrote:
tiger wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
tiger wrote:Zach is showing code snippets where Rybka 1.0 is actually more similar to Fruit than Strelka 2.0.

A few days ago there was some vocal opposition to the idea that Rybka 1.0 coud be a derived work of Fruit 2.1.

Where is the opposition now?

There are several skilled people ready to explain why many programmers think (without daring to tell it) that Rybka started its life as Fruit 2.1.

The evidence is now being shown factually. Feel free to contradict it factually.



// Christophe
There is a second possibility that rybka started her life with part of fruit but never had the full source.

I know that movei started its life with part of tscp structures and names of variables and constants (but no chess working code)

Uri
Unfortunately that is enough to settle this immediately. "part of fruit" is unacceptable since _all_ of fruit is GPL'ed. This issue is black or white, with no grey at all.

BTW, in some cases development is obvious. You can go back to reg.games.chess.computer circa 1994 november or so, and find posts by me where I was working on a _new_ program (now called Crafty). I started with the move generator and published the source there and got lots of feedback. You can also find discussions about search and evaluation as they were written, not copied. So starting with someone else's code is not a normal development course.
I do not know what is the normal course.

I know some programs that started with the full tscp code like trace
and I am not sure if most chess programs started without code from other programs.

In my case
I started with legal move generator but I used some constants and variables from tscp and also some names of functions.

My move generator never used mailbox that is part of tscp and used some structures of me that are not part of tscp so it is clearly different than tscp move generator.

Uri


Is TSCP protected by the GPL?

It looks like you have acted without even looking at the licence your model has been published under. Read the licence and so you will know if you have infringed on it or not.

If you have, maybe it's not too late to apologize in good faith and to clean your work. That is a more honorable behaviour that the mutism we see from another person.

I see that most of your posts seem tainted by the fear that you have yourself done something wrong. What is legal and what is not is already well defined and will not change because you defend here.

So just clean your stuff first.



// Christophe
I simply responded to Bob hyatt.
Tscp is not protected by the GPL

I think that what is legal and what is not legal is not well defined.

What is derivative work?
If you have one line that you copied then I do not think that you can call it derivative work and you can have one common line also without copying.

What is the minimal number of lines that you copy that your program is defined to be derivative?

Things are not clear.

Uri


The GPL licence is clear:
From section 0: ...a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications...
So there is no minimal number of lines required to start to infringe on the GPL. It starts at one line, or in the case of programs presented on one line (there are such programs), it starts at a few characters.

What's left to be proven is that some part of a source code that is supposed to infringe on the GPL has indeed been taken from a GPL source code and is not an original creation.

This is going to be left at the appreciation of software experts. So if all you have copied is "i++;" I guess you are safe. If you have copied a one hundred lines routine, I think you are not, because the experts are likely to rule that there is no chance that you have come up with exactly the same 100 lines of code by pure luck.

Remember, it does not matter if the routine you have copied is of vital importance in your program or if it is not. It does not even matter if the routines you have copied come from a program that has absolutely nothing to do with yours. For example, you are not allowed to use a routine from a financial program even if you are going to use it in a chess program.

The spirit of the GPL is that some guy gives away his work by generosity. What he asks you in exchange is to do the same if you take all or even just a part of his work to include it in your creations. So if your intention is to keep your work closed, then the least you can do is to respect the will of the guy and not copy ANY part of his work. Hence the relative intolerance against such re-use, even for what you call "unimportant" parts.

It is also the reason why there is no minimal number of identical lines you would be allowed to use. You are not allowed to use this work at all if you do not want to accept the rules of the game, so if there is evidence of re-use of code, as small as it is, you are caught.

It's up to you to decide if you could be caught or not with your current code, knowing the original programs you may have copied in part.

Now if you want to be safe, make at least sure that you do not have 800 identical lines of code that some program protected by the GPL.

It has been shown that at least 800 lines of Strelka 2.0 are identical to lines of Fruit 2.1. Ask an expert what he thinks about it. I don't think the answer will be unclear.



// Christophe
I doubt if 800 is correct

http://64.68.157.89/forum/viewtopic.php?t=23095&start=0

Nobody commented and counted identical stuff and it is not clear how to count identical

I think that the word equivalent is more correct then identical in part of the cases

From the following example 1,4,5,6,8,9 are equivalent and not identical


FRUIT:
static void parse_setoption( char string []) //1
{
const char *name; //2
char* value; //3

name = strstr(string, "name "); //4
value = strstr(string, "value "); //5
if (name == NULL || value == NULL || name >= value)return;//6
value[-1] = '\0'; //7
name += 5; //8
value += 6; //9

STRELKA:
void parse_setoption(char string[]) //1 without static
{
char *name, *value;//2,3 in one line
int size;

name = strstr(string,"name "); //4
value = strstr(string,"value "); //5
if (name == NULL || value == NULL || name >= value) return;//6
value[-1] = 0; //7
name += 5; //8
value += 6; //9
sorry but that is garbage. The general usage is identical semantics. Not identical syntax. int a=1 and int alpha=1 are considered the same if you can replace all occurrences of a with alpha in the first and it works the same. Ditto for a "static" modifier which in the case above simply says that procedure can only be called by procedures in the same source file, and doesn't change a single thing semantically.

If your approach were taken, students would _never_ be accused of plagiarism. Even in publishing. I just print your article in Russian and I am home free since those are not "identical".
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Wanted: some opposition to the provided evidence

Post by bob »

bnemias wrote:
tiger wrote:the generated code is exactly the same in both cases.
I'm no expert in disassembly, but it seems clear to me that for the above statement, ANY source producing identical code is identical for the purposes of comparing actual source with a disassembled binary. The variable names could be different, presumably.

I would be looking at implementation quirks such as comment 7:

Fruit

Code: Select all

static void parse_setoption( char string []) //1
{
const char *name; //2
char* value; //3

name = strstr(string, "name "); //4
value = strstr(string, "value "); //5
if (name == NULL || value == NULL || name >= value)return;//6
value[-1] = '\0'; //7
name += 5; //8
value += 6; //9  
Strelka

Code: Select all

void parse_setoption(char string[]) //1 without static
{
char *name, *value;//2,3 in one line
int size;

name = strstr(string,"name "); //4
value = strstr(string,"value "); //5
if (name == NULL || value == NULL || name >= value) return;//6
value[-1] = 0; //7
name += 5; //8
value += 6; //9
Comment 7's implementation detail is far from the obvious choice. It's certainly not a very good practice.
That is one of many tests... is the executable identical? But there are ways to prevent that from happening. For example, doing a sort a bit earlier in the code, rather than exactly where it is needed.

Other student tricks:

for (i=0; i<n; i++) {
...
}

vs

loop_index = 0;
while (loop_index < n) {
...
loop_index++;
}

Those are semantically identical. They are syntatically equivalent. Any decent programmer would pick up on that immediately. And comments don't count since they are not used by the compiler. They are actually the easiest thing to change to try to disguise plagiarism.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Wanted: some opposition to the provided evidence

Post by bob »

There is another issue that CT mentioned that is key here. Most do not teach computer science. I have done it for 38 years. Even simple programs assigned in beginning courses have so many ways that they can be written, that we do not see code that is even close, much less semantically identical. When you get to upper level courses where the programs go beyond 500 lines, they don't even look similar, much less nearly identical. For a multi-thousand line chess program, the probability of two different programmers writing the exact same code, or even 90% the same is close enough to zero a statistician would not quibble over the number.

I have re-written a chess program probably about a dozen times over the last 40 years. No version was even remotely similar to the previous version, other than in the fact that both have move generators, search functions and evaluation code. And if anybody would duplicate code, the same author would be a perfect candidate. So two different people writing even 800 lines of code that are the same is simply not "luck"...