A Simple Experiment for Advancing the Discussion

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

RegicideX

A Simple Experiment for Advancing the Discussion

Post by RegicideX »

It is clear by now that the only piece of evidence worth taking seriously in the Rybka discussion so far is the UCI code -- at least the only public evidence.

But "worth taking seriously" does not mean that it is anywhere close to providing proof or conclusive evidence. The problems are that

1) The functions being compared are very similar in purpose and they are similar in purpose in most if not all chess programs.

2) The procedures presented are relatively short.

3) The reconstructed code is by no means identical -- there are many dissimilarities.

When two procedures are both relatively short and have extremely similar purpose in most chess engines, the probability of observing similarities is relatively large.

Furthermore

3) The compiling process takes away a lot of the individuality of the program due to various optimization procedures.

4) The conjectural reconstruction of the code can have a bias in the direction of proving similarities. You need to look at all possible ways of reconstructing the code in order to show that the initial source is a clone.


In order to advance the discussion we can perform a simple experiment. A few programmers here can write a simple parser for making simple arithmetic operations.

That is, the user enters a string consisting of literal strings like "one plus two" and the parser should print out the result of the arithmetic operation, in this case 3. To make matters simple, only one digit numbers and only one operation should be entered. Thus all operations should be of the form "X operation Y" where X and Y are literal representations of the numbers from zero to nine and "operation" should be "plus" "minus" and "times." An error message for invalid input could be present.

After writing the program, it should be compiled and then submitted for disassembling. Then we should compare the recreated codes among themselves and see how much similarity there is. If we observe a lot of similarity, comparable to the Rybka/ Fruit UCI parser similarity then the anti-Rybka case falls flat -- at least as far as the UCI code goes. If no two programs have significant similarity then the UCI code evidence gains more weight.

Of course, this requires some work -- and while I'm willing to write the source code, I am conveniently lacking expertise in disassembling so I can not participate there (which is the hardest part of this exercise).

But if we do have takers this would be one way to move the debate into more objective directions.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A Simple Experiment for Advancing the Discussion

Post by bob »

RegicideX wrote:It is clear by now that the only piece of evidence worth taking seriously in the Rybka discussion so far is the UCI code -- at least the only public evidence.

But "worth taking seriously" does not mean that it is anywhere close to providing proof or conclusive evidence. The problems are that

1) The functions being compared are very similar in purpose and they are similar in purpose in most if not all chess programs.

2) The procedures presented are relatively short.

3) The reconstructed code is by no means identical -- there are many dissimilarities.

When two procedures are both relatively short and have extremely similar purpose in most chess engines, the probability of observing similarities is relatively large.

Furthermore

3) The compiling process takes away a lot of the individuality of the program due to various optimization procedures.

4) The conjectural reconstruction of the code can have a bias in the direction of proving similarities. You need to look at all possible ways of reconstructing the code in order to show that the initial source is a clone.


In order to advance the discussion we can perform a simple experiment. A few programmers here can write a simple parser for making simple arithmetic operations.

That is, the user enters a string consisting of literal strings like "one plus two" and the parser should print out the result of the arithmetic operation, in this case 3. To make matters simple, only one digit numbers and only one operation should be entered. Thus all operations should be of the form "X operation Y" where X and Y are literal representations of the numbers from zero to nine and "operation" should be "plus" "minus" and "times." An error message for invalid input could be present.

After writing the program, it should be compiled and then submitted for disassembling. Then we should compare the recreated codes among themselves and see how much similarity there is. If we observe a lot of similarity, comparable to the Rybka/ Fruit UCI parser similarity then the anti-Rybka case falls flat -- at least as far as the UCI code goes. If no two programs have significant similarity then the UCI code evidence gains more weight.

Of course, this requires some work -- and while I'm willing to write the source code, I am conveniently lacking expertise in disassembling so I can not participate there (which is the hardest part of this exercise).

But if we do have takers this would be one way to move the debate into more objective directions.
There is no corresponding function of a chess engine that is that simple. A better idea would be to write a piece of code that can answer the question "Given this position, is the game over due to checkmate or stalemate?" And then compare source to make it easy. If sources are similar (which is so unlikely as to be considered impossible) then you could compile them and have the compiler produce assembly output which retains variable names and procedure names. If that is similar, then keep going in the direction of "harder to detect". But there is no reason to start on the hardest end until you verify that the original source is close enough.
RegicideX

Re: A Simple Experiment for Advancing the Discussion

Post by RegicideX »


There is no corresponding function of a chess engine that is that simple. A better idea would be to write a piece of code that can answer the question "Given this position, is the game over due to checkmate or stalemate?" And then compare source to make it easy.
I agree with your first sentence, but the evidence presented was for a parser, not for the deep bowels of the chess engine. So until we have evidence of striking code similarities in the engine, we should stick with parsers -- and not that complex parsers at that.

I disagree that we should compare the source. Comparing the source makes it easier to find student plagiarism -- but here we need to compare machine code (or plausible recreations of C code) because we only have machine code in the Rybka case.

Going directly to the source code would bias the exercise a lot towards dissimilarity -- if nothing else because you'll have much different variable names.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A Simple Experiment for Advancing the Discussion

Post by bob »

RegicideX wrote:

There is no corresponding function of a chess engine that is that simple. A better idea would be to write a piece of code that can answer the question "Given this position, is the game over due to checkmate or stalemate?" And then compare source to make it easy.
I agree with your first sentence, but the evidence presented was for a parser, not for the deep bowels of the chess engine. So until we have evidence of striking code similarities in the engine, we should stick with parsers -- and not that complex parsers at that.

I disagree that we should compare the source. Comparing the source makes it easier to find student plagiarism -- but here we need to compare machine code (or plausible recreations of C code) because we only have machine code in the Rybka case.

Going directly to the source code would bias the exercise a lot towards dissimilarity -- if nothing else because you'll have much different variable names.
A parser is far more complex than a simple one-operator expression evaluation. But if you are interested, here is my contribution: You use it on any linux system and type

evaluate <expression>

#!/bin/csh
set noglob
expr $*

for example:
scrappy% evaluate 1 + 2
3
scrappy% evaluate 5 - 3
2
scrappy% evaluate 4 * 5
20
scrappy% evaluate 60 / 12
5

Just a bit too simple to draw any conclusions from. And I'd be willing to bet that not a person here would think of using expr to deal with this. So even here, you would be pretty unlikely to choose the same language, the same constructs, and then the same lines of code...
trojanfoe

Re: A Simple Experiment for Advancing the Discussion

Post by trojanfoe »

Why should any such experiment be considered; isn't it like asking "what number am I thinking of ?" and if an insufficient number of people are able to correctly guess then the whole investigation should be scrapped? This seems like yet another attempt to disqualify the disassembly evidence.
RegicideX

Re: A Simple Experiment for Advancing the Discussion

Post by RegicideX »


#!/bin/csh
set noglob
expr $*
"expr" can deal with literal representations of numbers and operations?

Just a bit too simple to draw any conclusions from. And I'd be willing to bet that not a person here would think of using expr to deal with this.
Maybe that's because it doesn't work for what I actually proposed.

But just because you can be clever in how you program a procedure, it doesn't mean that two programmers who don't care about being clever in such a way can not stumble on similar implementations.

And that's what we're discussing.
RegicideX

Re: A Simple Experiment for Advancing the Discussion

Post by RegicideX »

trojanfoe wrote:Why should any such experiment be considered; isn't it like asking "what number am I thinking of ?" and if an insufficient number of people are able to correctly guess then the whole investigation should be scrapped? This seems like yet another attempt to disqualify the disassembly evidence.
Actually, if people are not able to "guess" similarly then the UCI evidence should be considered stronger.

The exercise is only useful if you care about judging how strong the actual evidence presented is. The "gut feeling" resulting after reading highly conjectural code can be misleading, given the similarity of purpose and the shortness of the code presented.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: A Simple Experiment for Advancing the Discussion

Post by bob »

RegicideX wrote:

#!/bin/csh
set noglob
expr $*
"expr" can deal with literal representations of numbers and operations?

Just a bit too simple to draw any conclusions from. And I'd be willing to bet that not a person here would think of using expr to deal with this.
Maybe that's because it doesn't work for what I actually proposed.

But just because you can be clever in how you program a procedure, it doesn't mean that two programmers who don't care about being clever in such a way can not stumble on similar implementations.

And that's what we're discussing.
Is this not what you asked someone to write:
That is, the user enters a string consisting of literal strings like "one plus two" and the parser should print out the result of the arithmetic operation, in this case 3. To make matters simple, only one digit numbers and only one operation should be entered. Thus all operations should be of the form "X operation Y" where X and Y are literal representations of the numbers from zero to nine and "operation" should be "plus" "minus" and "times." An error message for invalid input could be present.
That is exactly what my code above does... This would also be a "one-liner" in snobol/spitbol as well. And APL.
chrisw

Re: A Simple Experiment for Advancing the Discussion

Post by chrisw »

Hi Alex,

Well, unlike Bob, who seems to me to be just trying to be difficult, and in the interests of progress, I wrote some crappy code in C, appended below.

Seems top me that if people are asked blind to produce some code for this probnlem, they may well write different stuff.

But, if they studied, took a quick look, at similar code that already did the job, they might think, well in my case below ..... oh, ok split the input string up into three strings, compare each of those strings with ascii text data to find a match, set some variables with the match results and perform the desrired operation. Oh, and he used strcmp, that's easy, I don't need to go check in my C-guide now ....

And then they write their code. No copy. No cut 'n paste, just see the basic outline of the idea and hack it out.

Et voila, betcha the code produced by another program was then similar, even though entirely written by the second programmer.

Why? Because the second programmer looked at the first programmers ideas, thought why bother reinventing the wheel for something so similar and something so trivial, worked out the ideas behind it and sat down and hacked out the code all by himself. Why even bother optimising? Don't need any speed here, who cares about memory usage. Wham bang done.




Code: Select all

			{
				char		str[] = "four times nine";
				char*		strptr;
				int			i;
				char		substring[3][10];
				char*		substringptr;
				char		numtext[9][8] = {"one","two","three","four","five","six","seven","eight","nine"};
				char		optext[3][8] = {"plus","minus","times"};
				int			x,y,z;
				int			result;

				strptr = &str[0];
				i = 0;
				do
				{
					substringptr = &substring[i][0];
					while ((*strptr != ' ') && (*strptr != NULL))
					{
						*substringptr = *strptr;
						strptr++;
						substringptr++;
					}
					*substringptr = NULL;
					strptr++;
					i++;
				} while (i<3);

				// get x=first num
				i = 0;
				do
				{
					if (strcmp(&numtext[i][0],&substring[0][0]) == 0)
					{
						x=i+1;
						break;
					}
					i++;
				} while (i<9);

				// get y=second num
				i = 0;
				do
				{
					if (strcmp(&numtext[i][0],&substring[2][0]) == 0)
					{
						y=i+1;
						break;
					}
					i++;
				} while (i<9);

				// get operation
				i = 0;
				do
				{
					if (strcmp(&optext[i][0],&substring[1][0]) == 0)
					{
						z=i;
						break;
					}
					i++;
				} while (i<9);

				if (z==0) result = x+y;
				if (z==1) result = x-y;
				if (z==2) result = x*y;
				result = result;
			}
RegicideX wrote:It is clear by now that the only piece of evidence worth taking seriously in the Rybka discussion so far is the UCI code -- at least the only public evidence.

But "worth taking seriously" does not mean that it is anywhere close to providing proof or conclusive evidence. The problems are that

1) The functions being compared are very similar in purpose and they are similar in purpose in most if not all chess programs.

2) The procedures presented are relatively short.

3) The reconstructed code is by no means identical -- there are many dissimilarities.

When two procedures are both relatively short and have extremely similar purpose in most chess engines, the probability of observing similarities is relatively large.

Furthermore

3) The compiling process takes away a lot of the individuality of the program due to various optimization procedures.

4) The conjectural reconstruction of the code can have a bias in the direction of proving similarities. You need to look at all possible ways of reconstructing the code in order to show that the initial source is a clone.


In order to advance the discussion we can perform a simple experiment. A few programmers here can write a simple parser for making simple arithmetic operations.

That is, the user enters a string consisting of literal strings like "one plus two" and the parser should print out the result of the arithmetic operation, in this case 3. To make matters simple, only one digit numbers and only one operation should be entered. Thus all operations should be of the form "X operation Y" where X and Y are literal representations of the numbers from zero to nine and "operation" should be "plus" "minus" and "times." An error message for invalid input could be present.

After writing the program, it should be compiled and then submitted for disassembling. Then we should compare the recreated codes among themselves and see how much similarity there is. If we observe a lot of similarity, comparable to the Rybka/ Fruit UCI parser similarity then the anti-Rybka case falls flat -- at least as far as the UCI code goes. If no two programs have significant similarity then the UCI code evidence gains more weight.

Of course, this requires some work -- and while I'm willing to write the source code, I am conveniently lacking expertise in disassembling so I can not participate there (which is the hardest part of this exercise).

But if we do have takers this would be one way to move the debate into more objective directions.
RegicideX

Re: A Simple Experiment for Advancing the Discussion

Post by RegicideX »

That is, the user enters a string consisting of literal strings like "one plus two" and the parser should print out the result of the arithmetic operation, in this case 3. To make matters simple, only one digit numbers and only one operation should be entered. Thus all operations should be of the form "X operation Y" where X and Y are literal representations of the numbers from zero to nine and "operation" should be "plus" "minus" and "times." An error message for invalid input could be present.
That is exactly what my code above does...
Here is the misunderstanding: when I said the input should be "one plus two" I meant that the input should be the string "one plus two" and not "1 + 2"

A simple misunderstanding.