A Simple Experiment for Advancing the Discussion

bob · Post by **bob** » Sun Aug 31, 2008 8:32 pm

RegicideX wrote:

That is, the user enters a string consisting of literal strings like "one plus two" and the parser should print out the result of the arithmetic operation, in this case 3. To make matters simple, only one digit numbers and only one operation should be entered. Thus all operations should be of the form "X operation Y" where X and Y are literal representations of the numbers from zero to nine and "operation" should be "plus" "minus" and "times." An error message for invalid input could be present.
That is exactly what my code above does...
Here is the misunderstanding: when I said the input should be "one plus two" I meant that the input should be the string "one plus two" and not "1 + 2"

A simple misunderstanding.

All that would change would be to add a small awk script to replace every occurrence of "one" by "1", etc... before doing the same expr call... If you'd like to see the whole thing just let me know...

chrisw · Post by **chrisw** » Sun Aug 31, 2008 8:40 pm

bob wrote:
RegicideX wrote:

That is, the user enters a string consisting of literal strings like "one plus two" and the parser should print out the result of the arithmetic operation, in this case 3. To make matters simple, only one digit numbers and only one operation should be entered. Thus all operations should be of the form "X operation Y" where X and Y are literal representations of the numbers from zero to nine and "operation" should be "plus" "minus" and "times." An error message for invalid input could be present.
That is exactly what my code above does...
Here is the misunderstanding: when I said the input should be "one plus two" I meant that the input should be the string "one plus two" and not "1 + 2"

A simple misunderstanding.
All that would change would be to add a small awk script to replace every occurrence of "one" by "1", etc... before doing the same expr call... If you'd like to see the whole thing just let me know...

I doubt he could care less to see it for the simple reason of total irrelevence.

1. Rybka and Fruit are in C and we were discussing their C implementations and the assembler produced from compiling the C. Why C? Because the source is available for Fruit and the code concerned was a C-recreation by reverse engineering from Rybka.

2. Therefore the program to model Alex's literal string entry idea should be in C, not some other language or script of your choice.

3. This is not a contest to see who writes the fanciest code. It's a little experiment to see the results of writing a simple parser in C, once it's been turned into assembler.

But you knew all that already.

In my opinion, doing this by having looked at a previous programmers ideas, and not bothered about speed, memory, winning the university geekiest coder competition, but a simple hack job, will likely produce similar code.

RegicideX · Post by **RegicideX** » Sun Aug 31, 2008 9:03 pm

All that would change would be to add a small awk script to replace every occurrence of "one" by "1", etc... before doing the same expr call... If you'd like to see the whole thing just let me know...

We're definitely not writing operating systems here. I am also not doubting that one can be unfathomably clever in writing a program -- a look at some "Obfuscated C" competitions should prove that.

But "normal" code writing can produce similar code. Chris W. also makes a good point that there can be a psychological "anchoring effect" of seeing and studying a piece of code -- the code structure sticks in one's head.

I'll post something (either code or executable, I haven't decided yet) after the long weekend -- family comes before internet discussions.

chrisw · Post by **chrisw** » Sun Aug 31, 2008 9:25 pm

RegicideX wrote:

All that would change would be to add a small awk script to replace every occurrence of "one" by "1", etc... before doing the same expr call... If you'd like to see the whole thing just let me know...
We're definitely not writing operating systems here. I am also not doubting that one can be unfathomably clever in writing a program -- a look at some "Obfuscated C" competitions should prove that.

But "normal" code writing can produce similar code. Chris W. also makes a good point that there can be a psychological "anchoring effect" of seeing and studying a piece of code -- the code structure sticks in one's head.
I'll post something (either code or executable, I haven't decided yet) after the long weekend -- family comes before internet discussions.

Yes, described much better than my long winded picture.

psychological anchoring effect of seeing and studying a piece of code.

Well done. Perfect expression and description. Fits exactly for the trivial pieces of code of the UCI.

bob · Post by **bob** » Sun Aug 31, 2008 9:30 pm

chrisw wrote:Hi Alex,

Well, unlike Bob, who seems to me to be just trying to be difficult, and in the interests of progress, I wrote some crappy code in C, appended below.

How am I trying to be difficult. You are making your usual assumptions. Nowhere did he mention "C". I did not quite catch the "one" vs "1". But here is my code to do the entire process:

"evaluate":
#!/bin/csh
set noglob
set v1 = `echo $1 | awk -f swap`
set v3 = `echo $3 | awk -f swap`
echo `expr $v1 $2 $v3 | awk -f unswap`

swap:
/zero/{print "0"}
/one/{print "1"}
/two/{print "2"}
/three/{print "3"}
/four/{print "4"}
/five/{print "5"}
/six/{print "6"}
/seven/{print "7"}
/eight/{print "8"}
/nine/{print "9"}

unswap:
/0/{print "zero"}
/1/{print "one"}
/2/{print "two"}
/3/{print "three"}
/4/{print "four"}
/5/{print "five"}
/6/{print "six"}
/7/{print "seven"}
/8/{print "eight"}
/9/{print "nine"}

output:

scrappy% ./evaluate one + two
three
scrappy% ./evaluate nine - four
five
scrappy% ./evaluate two * four
eight
scrappy% ./evaluate nine / three
three

Seems top me that if people are asked blind to produce some code for this probnlem, they may well write different stuff.

But, if they studied, took a quick look, at similar code that already did the job, they might think, well in my case below ..... oh, ok split the input string up into three strings, compare each of those strings with ascii text data to find a match, set some variables with the match results and perform the desrired operation. Oh, and he used strcmp, that's easy, I don't need to go check in my C-guide now ....

And then they write their code. No copy. No cut 'n paste, just see the basic outline of the idea and hack it out.

Et voila, betcha the code produced by another program was then similar, even though entirely written by the second programmer.

Why? Because the second programmer looked at the first programmers ideas, thought why bother reinventing the wheel for something so similar and something so trivial, worked out the ideas behind it and sat down and hacked out the code all by himself. Why even bother optimising? Don't need any speed here, who cares about memory usage. Wham bang done.
Code: Select all
			{
				char		str[] = "four times nine";
				char*		strptr;
				int			i;
				char		substring[3][10];
				char*		substringptr;
				char		numtext[9][8] = {"one","two","three","four","five","six","seven","eight","nine"};
				char		optext[3][8] = {"plus","minus","times"};
				int			x,y,z;
				int			result;

				strptr = &str[0];
				i = 0;
				do
				{
					substringptr = &substring[i][0];
					while ((*strptr != ' ') && (*strptr != NULL))
					{
						*substringptr = *strptr;
						strptr++;
						substringptr++;
					}
					*substringptr = NULL;
					strptr++;
					i++;
				} while (i<3);

				// get x=first num
				i = 0;
				do
				{
					if (strcmp(&numtext[i][0],&substring[0][0]) == 0)
					{
						x=i+1;
						break;
					}
					i++;
				} while (i<9);

				// get y=second num
				i = 0;
				do
				{
					if (strcmp(&numtext[i][0],&substring[2][0]) == 0)
					{
						y=i+1;
						break;
					}
					i++;
				} while (i<9);

				// get operation
				i = 0;
				do
				{
					if (strcmp(&optext[i][0],&substring[1][0]) == 0)
					{
						z=i;
						break;
					}
					i++;
				} while (i<9);

				if (z==0) result = x+y;
				if (z==1) result = x-y;
				if (z==2) result = x*y;
				result = result;
			}
RegicideX wrote:It is clear by now that the only piece of evidence worth taking seriously in the Rybka discussion so far is the UCI code -- at least the only public evidence.

But "worth taking seriously" does not mean that it is anywhere close to providing proof or conclusive evidence. The problems are that

1) The functions being compared are very similar in purpose and they are similar in purpose in most if not all chess programs.

2) The procedures presented are relatively short.

3) The reconstructed code is by no means identical -- there are many dissimilarities.

When two procedures are both relatively short and have extremely similar purpose in most chess engines, the probability of observing similarities is relatively large.

Furthermore

3) The compiling process takes away a lot of the individuality of the program due to various optimization procedures.

4) The conjectural reconstruction of the code can have a bias in the direction of proving similarities. You need to look at all possible ways of reconstructing the code in order to show that the initial source is a clone.

In order to advance the discussion we can perform a simple experiment. A few programmers here can write a simple parser for making simple arithmetic operations.

That is, the user enters a string consisting of literal strings like "one plus two" and the parser should print out the result of the arithmetic operation, in this case 3. To make matters simple, only one digit numbers and only one operation should be entered. Thus all operations should be of the form "X operation Y" where X and Y are literal representations of the numbers from zero to nine and "operation" should be "plus" "minus" and "times." An error message for invalid input could be present.

After writing the program, it should be compiled and then submitted for disassembling. Then we should compare the recreated codes among themselves and see how much similarity there is. If we observe a lot of similarity, comparable to the Rybka/ Fruit UCI parser similarity then the anti-Rybka case falls flat -- at least as far as the UCI code goes. If no two programs have significant similarity then the UCI code evidence gains more weight.

Of course, this requires some work -- and while I'm willing to write the source code, I am conveniently lacking expertise in disassembling so I can not participate there (which is the hardest part of this exercise).

But if we do have takers this would be one way to move the debate into more objective directions.

Seems to me that anyone looking at that code that you wrote would immediately think "no way, that's horribly long" and would not even look at the code in detail.

You made one bad assumption. "C". Nothing in his original post suggests C. Which means I would choose the most efficient tool I had at hand to make this work. It took 5 minutes total to write and debug the above. The only bug is that one needs to use "set noglob" in the shell they are using or the "*" on the command line will get turned into a filename "glob".

So far we have two approaches. I wonder if anyone else will bite. if you want this written in C, which was not originally stated as a requirement, I can probably do that in about 10 minutes. Actually, for the sake of argumen... this took me exactly 5 minutes to write:

Code: Select all

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
  char *words[10] = {"zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine"};
  int i, answer, operand1, operand2;
  for (i=0; i<10; i++) {
    if (!strcmp(argv[1], words[i]))  operand1 = i;
    if (!strcmp(argv[3], words[i]))  operand2 = i;
  }
  switch (argv[2][0]) {
    case '+':  answer = operand1 + operand2; break;
    case '-':  answer = operand1 - operand2; break;
    case '*':  answer = operand1 * operand2; break;
    case '/':  answer = operand1 / operand2; break;
    default: printf("invalid operator\n"); exit(1);
  }
  for (i=0; i<10; i++)
    if (i == answer) break;
  if (i < 10)
    printf("%s\n",words[i]);
  else
    printf("answer is > 9, no output produced\n");
}

here is the output:

scrappy% ./xpr three + five
eight
scrappy% ./xpr seven - three
four
scrappy% ./xpr two * four
eight
scrappy% ./xpr nine / three
three
scrappy% ./xpr four * four
answer is > 9, no output produced
scrappy% ./xpr two % three
invalid operator

Feel free to "compare" since I wasn't about to take the time to look at that mess you wrote... This might be cleaned up a bit to become simpler, still.

bob · Post by **bob** » Sun Aug 31, 2008 9:31 pm

chrisw wrote:Hi Alex,

Well, unlike Bob, who seems to me to be just trying to be difficult, and in the interests of progress, I wrote some crappy code in C, appended below.

How am I trying to be difficult. You are making your usual assumptions. Nowhere did he mention "C". I did not quite catch the "one" vs "1". But here is my code to do the entire process:

"evaluate":
#!/bin/csh
set noglob
set v1 = `echo $1 | awk -f swap`
set v3 = `echo $3 | awk -f swap`
echo `expr $v1 $2 $v3 | awk -f unswap`

swap:
/zero/{print "0"}
/one/{print "1"}
/two/{print "2"}
/three/{print "3"}
/four/{print "4"}
/five/{print "5"}
/six/{print "6"}
/seven/{print "7"}
/eight/{print "8"}
/nine/{print "9"}

unswap:
/0/{print "zero"}
/1/{print "one"}
/2/{print "two"}
/3/{print "three"}
/4/{print "four"}
/5/{print "five"}
/6/{print "six"}
/7/{print "seven"}
/8/{print "eight"}
/9/{print "nine"}

output:

scrappy% ./evaluate one + two
three
scrappy% ./evaluate nine - four
five
scrappy% ./evaluate two * four
eight
scrappy% ./evaluate nine / three
three

Seems top me that if people are asked blind to produce some code for this probnlem, they may well write different stuff.

But, if they studied, took a quick look, at similar code that already did the job, they might think, well in my case below ..... oh, ok split the input string up into three strings, compare each of those strings with ascii text data to find a match, set some variables with the match results and perform the desrired operation. Oh, and he used strcmp, that's easy, I don't need to go check in my C-guide now ....

And then they write their code. No copy. No cut 'n paste, just see the basic outline of the idea and hack it out.

Et voila, betcha the code produced by another program was then similar, even though entirely written by the second programmer.

Why? Because the second programmer looked at the first programmers ideas, thought why bother reinventing the wheel for something so similar and something so trivial, worked out the ideas behind it and sat down and hacked out the code all by himself. Why even bother optimising? Don't need any speed here, who cares about memory usage. Wham bang done.
Code: Select all
			{
				char		str[] = "four times nine";
				char*		strptr;
				int			i;
				char		substring[3][10];
				char*		substringptr;
				char		numtext[9][8] = {"one","two","three","four","five","six","seven","eight","nine"};
				char		optext[3][8] = {"plus","minus","times"};
				int			x,y,z;
				int			result;

				strptr = &str[0];
				i = 0;
				do
				{
					substringptr = &substring[i][0];
					while ((*strptr != ' ') && (*strptr != NULL))
					{
						*substringptr = *strptr;
						strptr++;
						substringptr++;
					}
					*substringptr = NULL;
					strptr++;
					i++;
				} while (i<3);

				// get x=first num
				i = 0;
				do
				{
					if (strcmp(&numtext[i][0],&substring[0][0]) == 0)
					{
						x=i+1;
						break;
					}
					i++;
				} while (i<9);

				// get y=second num
				i = 0;
				do
				{
					if (strcmp(&numtext[i][0],&substring[2][0]) == 0)
					{
						y=i+1;
						break;
					}
					i++;
				} while (i<9);

				// get operation
				i = 0;
				do
				{
					if (strcmp(&optext[i][0],&substring[1][0]) == 0)
					{
						z=i;
						break;
					}
					i++;
				} while (i<9);

				if (z==0) result = x+y;
				if (z==1) result = x-y;
				if (z==2) result = x*y;
				result = result;
			}
RegicideX wrote:It is clear by now that the only piece of evidence worth taking seriously in the Rybka discussion so far is the UCI code -- at least the only public evidence.

But "worth taking seriously" does not mean that it is anywhere close to providing proof or conclusive evidence. The problems are that

1) The functions being compared are very similar in purpose and they are similar in purpose in most if not all chess programs.

2) The procedures presented are relatively short.

3) The reconstructed code is by no means identical -- there are many dissimilarities.

When two procedures are both relatively short and have extremely similar purpose in most chess engines, the probability of observing similarities is relatively large.

Furthermore

3) The compiling process takes away a lot of the individuality of the program due to various optimization procedures.

4) The conjectural reconstruction of the code can have a bias in the direction of proving similarities. You need to look at all possible ways of reconstructing the code in order to show that the initial source is a clone.

In order to advance the discussion we can perform a simple experiment. A few programmers here can write a simple parser for making simple arithmetic operations.

That is, the user enters a string consisting of literal strings like "one plus two" and the parser should print out the result of the arithmetic operation, in this case 3. To make matters simple, only one digit numbers and only one operation should be entered. Thus all operations should be of the form "X operation Y" where X and Y are literal representations of the numbers from zero to nine and "operation" should be "plus" "minus" and "times." An error message for invalid input could be present.

After writing the program, it should be compiled and then submitted for disassembling. Then we should compare the recreated codes among themselves and see how much similarity there is. If we observe a lot of similarity, comparable to the Rybka/ Fruit UCI parser similarity then the anti-Rybka case falls flat -- at least as far as the UCI code goes. If no two programs have significant similarity then the UCI code evidence gains more weight.

Of course, this requires some work -- and while I'm willing to write the source code, I am conveniently lacking expertise in disassembling so I can not participate there (which is the hardest part of this exercise).

But if we do have takers this would be one way to move the debate into more objective directions.

Seems to me that anyone looking at that code that you wrote would immediately think "no way, that's horribly long" and would not even look at the code in detail.

You made one bad assumption. "C". Nothing in his original post suggests C. Which means I would choose the most efficient tool I had at hand to make this work. It took 5 minutes total to write and debug the above. The only bug is that one needs to use "set noglob" in the shell they are using or the "*" on the command line will get turned into a filename "glob".

So far we have two approaches. I wonder if anyone else will bite. if you want this written in C, which was not originally stated as a requirement, I can probably do that in about 10 minutes. Actually, for the sake of argumen... this took me exactly 5 minutes to write:

Code: Select all

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
  char *words[10] = {"zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine"};
  int i, answer, operand1, operand2;
  for (i=0; i<10; i++) {
    if (!strcmp(argv[1], words[i]))  operand1 = i;
    if (!strcmp(argv[3], words[i]))  operand2 = i;
  }
  switch (argv[2][0]) {
    case '+':  answer = operand1 + operand2; break;
    case '-':  answer = operand1 - operand2; break;
    case '*':  answer = operand1 * operand2; break;
    case '/':  answer = operand1 / operand2; break;
    default: printf("invalid operator\n"); exit(1);
  }
  for (i=0; i<10; i++)
    if (i == answer) break;
  if (i < 10)
    printf("%s\n",words[i]);
  else
    printf("answer is > 9, no output produced\n");
}

here is the output:

scrappy% ./xpr three + five
eight
scrappy% ./xpr seven - three
four
scrappy% ./xpr two * four
eight
scrappy% ./xpr nine / three
three
scrappy% ./xpr four * four
answer is > 9, no output produced
scrappy% ./xpr two % three
invalid operator

Feel free to "compare" since I wasn't about to take the time to look at that mess you wrote... This might be cleaned up a bit to become simpler, still. BTW another 2 minutes and I can make it output results up to ninetynine.

bob · Post by **bob** » Sun Aug 31, 2008 9:33 pm

chrisw wrote:
bob wrote:
RegicideX wrote:

That is, the user enters a string consisting of literal strings like "one plus two" and the parser should print out the result of the arithmetic operation, in this case 3. To make matters simple, only one digit numbers and only one operation should be entered. Thus all operations should be of the form "X operation Y" where X and Y are literal representations of the numbers from zero to nine and "operation" should be "plus" "minus" and "times." An error message for invalid input could be present.
That is exactly what my code above does...
Here is the misunderstanding: when I said the input should be "one plus two" I meant that the input should be the string "one plus two" and not "1 + 2"

A simple misunderstanding.
All that would change would be to add a small awk script to replace every occurrence of "one" by "1", etc... before doing the same expr call... If you'd like to see the whole thing just let me know...
I doubt he could care less to see it for the simple reason of total irrelevence.

1. Rybka and Fruit are in C and we were discussing their C implementations and the assembler produced from compiling the C. Why C? Because the source is available for Fruit and the code concerned was a C-recreation by reverse engineering from Rybka.

2. Therefore the program to model Alex's literal string entry idea should be in C, not some other language or script of your choice.

3. This is not a contest to see who writes the fanciest code. It's a little experiment to see the results of writing a simple parser in C, once it's been turned into assembler.

But you knew all that already.

In my opinion, doing this by having looked at a previous programmers ideas, and not bothered about speed, memory, winning the university geekiest coder competition, but a simple hack job, will likely produce similar code.

OK, I wrote the thing in C. Writing, compiling, testing, took < 5 minutes, so I obviously didn't spend a lot of time copying others. Compare our codes. Mine was 23 lines long and could maybe be cleaned up a bit. Duplicate code just doesn't happen, regardless of how many times you want to claim it does.

Alexander Schmidt · Post by **Alexander Schmidt** » Sun Aug 31, 2008 9:33 pm

RegicideX wrote:1) The functions being compared are very similar in purpose and they are similar in purpose in most if not all chess programs.

OK, show us some similar code in, lets say, Crafty and Fruit. Or Glaurung and Slowchess. Or TSCP and Pepito.

I wait, ty.

Zach Wegner · Post by **Zach Wegner** » Sun Aug 31, 2008 9:37 pm

chrisw wrote:Yes, described much better than my long winded picture.

psychological anchoring effect of seeing and studying a piece of code.

Well done. Perfect expression and description. Fits exactly for the trivial pieces of code of the UCI.

Chris,

Did you read this post? http://www.talkchess.com/forum/viewtopi ... 747#213747

bob · Post by **bob** » Sun Aug 31, 2008 9:39 pm

RegicideX wrote:

All that would change would be to add a small awk script to replace every occurrence of "one" by "1", etc... before doing the same expr call... If you'd like to see the whole thing just let me know...
We're definitely not writing operating systems here. I am also not doubting that one can be unfathomably clever in writing a program -- a look at some "Obfuscated C" competitions should prove that.

But "normal" code writing can produce similar code. Chris W. also makes a good point that there can be a psychological "anchoring effect" of seeing and studying a piece of code -- the code structure sticks in one's head.

I'll post something (either code or executable, I haven't decided yet) after the long weekend -- family comes before internet discussions.

Sorry but that "structure" stuff is baloney. Way too many studies on what the human mind can "remember" without long-term memorization practices to force something into long-term memory. You might well remember overall structure "initialize stuff, input a move, do an iterated search that recursively calls an alpha/beta function, endpoints get the result of a static evaluation, etc..." but that will _never_ lead to duplicate code. Once I read all the stuff posted here, I will take a stab at comparing CW's code to mine, just to see how much similarity there is.. Ought to be interesting, even for an incredibly simple task such as the one you suggested...

A Simple Experiment for Advancing the Discussion

Re: A Simple Experiment for Advancing the Discussion

Re: A Simple Experiment for Advancing the Discussion

Re: A Simple Experiment for Advancing the Discussion

Re: A Simple Experiment for Advancing the Discussion

Re: A Simple Experiment for Advancing the Discussion

Re: A Simple Experiment for Advancing the Discussion

Re: A Simple Experiment for Advancing the Discussion

Re: A Simple Experiment for Advancing the Discussion

Re: A Simple Experiment for Advancing the Discussion

Re: A Simple Experiment for Advancing the Discussion