Is It Me or is the Source Comparison Page Rigged?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
GenoM
Posts: 910
Joined: Wed Mar 08, 2006 9:46 pm
Location: Plovdiv, Bulgaria

Re: Is It Me or is the Source Comparison Page Rigged?

Post by GenoM »

chrisw wrote:
GenoM wrote:I've tried to follow this scientific discussion about methodology of research.
I can guess John Sidles and Alex K. are mostly worried about that translated code of Rybka is not exactly the same as the original code of Rybka. They are insisting on view that translated code can not be exactly the same as the original source code so from the results of comparision can not be drawn any valid (by scientific means) conclusions.
Have I got it right?
Well, they're saying several things.

The Go Parser has, by force, to carry out several operations to parse the input text string. The input text string contains words from a restricted list of word, such as "infinite" or "ponder". The Go Parser parses the text and sets variables for the engine to tell it what it to, again forced.

Go Parser must carry out operations A,B,C,D,E,F,G,H and it must leave the results of its parse in variables, call them a,b,c,d,e,f,g,h

Operation A might be comparestrings("infinite", inputstring); if true, set variable a.

Operation A's across a range of programs doing Go Parser will be semantically identical usually. Likewise B,C,D,E,F,G,H. No reason why not. Too trivial for anything else.

Fruit does them in order ABCDEFGH, declaring variables abcdefgh
Rybka does them in order BDCEFGHA, declaring variable dcbfghab
Even Zach's ZCT does them, slightly differently, but basically if you look at his code, there they are. "infinite", "ponder" etc. Has to be so.
Each program has its own idiosyncracities in addition, some more so, some less so.

We have Go Parsers, all doing the same thing, by force, and composed of sub-components also doing the same things by force. The sub-components being so trivial and so forced that they are reasonably likely to be semantically identical. Not always, not necessarily, but usually.

The main likely differences are in the ordering of it all. It matters not very much what order the parsing is donea nd the variable filled.

It's mostly in the ordering that differences between program writers will emerge. With some differences in style. Some differences in variable type. Some differences in initial values. All of these differences we see in the Rybka and Fruit code.

What the antis did was to take the ordering, both of variables and program flow and make them identical. They converted Rybka's BDCEFGHA to be identical to Fruit's ABCDEFGH, didnt reveal that they'ld done this, implying that the Rybka source was just a natural disassembly and misled the forum and the computer chess community in order to cause damage to Vas and Rybka.

The Go Parser evidence is completely busted. If this was a court case and this information of misleading by hidden manipulation of evidential data revealed part way through the case, the prosection barristers would resign and the judge would throw the case out. With costs.
So you in general do not challenge the results of the experiment but has different explanation of them. Do I understand it right?
take it easy :)
chrisw

Re: Is It Me or is the Source Comparison Page Rigged?

Post by chrisw »

bob wrote:
chrisw wrote:
RegicideX wrote:
bob wrote: Fine. you are unable to follow the discussion.
Baseless assertion, followed by verbiage.

Aha. so you do _not_ understand "semantical equivalence". This is simply proving that for the same inputs, the two pieces of code produce the same output. "
Then the Rybka code and the Fruit code are not semantically equivalent -- there are plenty of differences in their outcome for the same input.

But of course, that's not what you mean. You mean that you should get to ignore differences --including semantic non-equivalence-- and focus only on what you find similar.

Order is immaterial so long as changes do not violate data depenencies, name dependencies or control dependencies.
Order makes the source codes look much more similar that they really are. If ten variables in a row are initialized in the same order in machine code then that's alarming. Having variables initialized all over the place, and many of them nonexistent is not at all alarming.

Changing the order without saying so is at best sneaky, at worst dishonest.

It is worth going back to the first presentation of the misleading source comparison data, the "Here's something to start with" thread ....

Norman Schmidt explains the "methodology", with absolutely no mention at all of any reordering of either variable or program flow. No mention of "semantics".
Norman wrote:
Please note that: Fruits ASSERTs have been removed comments have been removed there are differences between the two...mainly: where Fruit has TRUE, Rybka has 1 where Fruit has FALSE, Rybka has 0
where Fruit has double data types, Rybka uses integer
search->info struct has been replaced with a direct reference to the independant variable.
truthfully many of the differences reflect the implementation of backend C bitboard routines in place of object-oriented C++ class references
Fruit often does error checking that is absent in Rybka,
if (ptr == NULL) my_fatal("parse_go(): missing argument\n");
Another difference:
Fruit calls string_equal to accept input, whereas Rybka uses the C function:
int strcmp( const char *string1, const char *string2 )
but these functions are essentially equivilent...
string_equal is simply a Fabien re-write with ASSERTs included:
string_equal( const char s1 [], const char s2 [] )
{
ASSERT(s1 != NULL);
ASSERT(s2 != NULL);
return strcmp(s1, s2) == 0;
}
which is something he did often...see Fruit 2.1 util.cpp
one click...a global search and replace (for ex: replace true with 1, would recifiy some of the larger differences).
Sven Schule challenged on the variable ordering:
Sven wrote:
since you don't have the Rybka source, how can you know about the location and order of variable declarations in Rybka? Declarations of local variables often do not produce assembler code, you just see the variables when they are used.
Hyatt replied without any reference to the creative reordering that had been done:
Bob wrote:
If you look at the order of the variables as they either appear in memory or on the stack, you can almost infer the order they were declared.
And Alexander Schmidt was convinced:
Alexander wrote:
TY Norman, this is completely convincing. I have not much programming experience (mainly vba) but this are too many similaries for a coincidence.
Unfortunately people will not read it and not believe it...
And Christophe Theron remained absolutely silent on the creative reordering:
Christophe wrote:
Nothing at all.
Can we possibly use the _same_ vocabulary here, which includes the definition of words.

We have a set of sticks in pile A that are colored, where each stick is a different color. We have a second set of sticks and have been asked to answer the question "are the sticks in pile B identical to the sticks in pile A".
Of course your sets of sticks match. They're the necessary and forced sub-components of the Go Parser.
RegicideX

Re: Is It Me or is the Source Comparison Page Rigged?

Post by RegicideX »

bob wrote: yes there is non-equivalent code. But in a large program, one expects to find very little semantically equivalent code. That is the issue. Do you actually believe that in writing a 44,000 line program, that I will by pure chance duplicate what others have done _exactly_ here and there? when I don't even see this happen in 100-1000 line student programs???
I read things like these and I can only wonder if we are talking about the same things. We are talking about the UCI parser -- that has nowhere near 44 000 lines of code, more like 100-150, and their purpose is by design extremely similar to nearly identical. Of course you should expect similarities in such a procedure.

I also note that Robert sometimes says something with which I agree, but then adds that I should "go read" something to see that it is true.

There is a lot of mis-communication taking place -- I'm not going to participate in all these post and my lack of participation should not be taken as either agreement or disagreement.
Uri Blass
Posts: 10310
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Is It Me or is the Source Comparison Page Rigged?

Post by Uri Blass »

bob wrote:
RegicideX wrote:
GenoM wrote:I've tried to follow this scientific discussion about methodology of research.
I can guess John Sidles and Alex K. are mostly worried about that translated code of Rybka is not exactly the same as the original code of Rybka. They are insisting on view that translated code can not be exactly the same as the original source code so from the results of comparision can not be drawn any valid (by scientific means) conclusions.
Have I got it right?
No, you have not. It is perfectly true that the compiler adds a layer of uncertainty by changing the code around -- that's just part of the picture though. The real point is that a certain degree of similarity in not unlikely in programs that are short, simple and need to provide extremely similar if not identical output.
OK. but we are talking about chess programs. My program (crafty) currently has 44,864 lines of code. That is hardly "small". the final part about "similar output" is meaningless. We have so many ways to search trees, so many ways to extend and reduce, so many ways to order the moves to make the search more efficient, a nearly limitless number of ways to turn a position into a numeric value that can be passed thru the search to find the best move, that this point is simply meaningless. Finding semantically equivalent code can happen on two levels. Nobody cares about tiny pieces of code such as PopCnt(), LSB(), MSB() and the like. But when you get into procedures that are hundreds of lines long, big chunks of semantically identical code leads to but one conclusion. Again, for a given algorithm, there are an infinite number of ways syntactically to express it. So duplication is _highly_ improbable. And that is the basis that caused a couple of people to start to look at this more closely. It is simply so improbable unless code was physically copied.


The actual source comparison shows some similarities but also lots of dissimilarities -- which is what is to be expected.
Partially true. There will necessarily be parts of the code that are different, since program B (supposedly derived from A) is significantly stronger. But chunks of duplicated code is _not_ normal in any program of more than a few lines. So that concept I do not agree with.



But if you see 10 or 15 variables which are initialized in the same order in machine code, then despite the changes made by the compiler, something is fishy -- not necessarily conclusive, but fishy.
Agreed. And if they are all initialized, even if _not_ in the same order, it is _still_ fishy In fact, the compiler could change that order for other reasons if it wanted to For example, it is more efficient to initialize things in the order they appear in memory to take advantage of cache prefetching. And if you are trying to de-compile, you see the final order, not the programmed order, and rearranging makes perfect sense when trying to match things up.

It turns out though, that the order of variable initialization was faked to make the code look more similar than the disassembler made it look -- the same goes for the order in which various "if" clauses are checked. So there is really nothing fishy in the code regarding the order.
EXTREMELY poor choice of wording with "faked". Nothing being done in this investigation is "faked". If you want to use words that are inflammatory or accusatory, feel free. But don't expect much in the way of discussion if that is the way you want to proceed. Get some experience or knowledge about the field and you will see how wrong "faked" is.


There is plenty of semantic non-equivalent code, and what similarity there is, is mostly about the general structure of the program -- making allowance for the fact that even in the structure there are dissimilarities.
again, that is baloney. yes there is non-equivalent code. But in a large program, one expects to find very little semantically equivalent code. That is the issue. Do you actually believe that in writing a 44,000 line program, that I will by pure chance duplicate what others have done _exactly_ here and there? when I don't even see this happen in 100-1000 line student programs???
The situation with your students is different because one student does not study the work of another student.

Uri
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Is It Me or is the Source Comparison Page Rigged?

Post by bob »

RegicideX wrote:
bob wrote: Then I can reconstruct many semantically equivalent programs, one of which the compiler actually used to create the source it produced object code for
Therefore, you must use creativity to simply guess what was the original source code among the many possible variants. Q.E.D.
No, I use solid knowledge to _enumerate_ the possibilities. Why is this so hard to understand? there is plenty of information around to explain the process in a clear form. There is no "guessing" or I would get lots of impossible codes as well. That is not how it works.
And I have a target, the program I suspect was the original source that was copied.
Come on! You don't see how that makes your procedure entirely unscientific?

Confirmation bias --trying to prove your hunch instead of trying to disprove it-- is a basic and fundamental thing you should avoid in an investigation. Everyone is guilty of it -- but we should try to avoid it.
I am trying to prove that the integral of 2x is X^2 + C. It is a mathematical process, with no room for guesswork or opinions. It is just a process. Writing a program is creative. Comparing two programs is not.

Not only do you not try to avoid it -- but you think that it is good methodology to chose exactly the code reconstruction that justifies your hunch!
correct...
chrisw

Re: Is It Me or is the Source Comparison Page Rigged?

Post by chrisw »

GenoM wrote:
chrisw wrote:
GenoM wrote:I've tried to follow this scientific discussion about methodology of research.
I can guess John Sidles and Alex K. are mostly worried about that translated code of Rybka is not exactly the same as the original code of Rybka. They are insisting on view that translated code can not be exactly the same as the original source code so from the results of comparision can not be drawn any valid (by scientific means) conclusions.
Have I got it right?
Well, they're saying several things.

The Go Parser has, by force, to carry out several operations to parse the input text string. The input text string contains words from a restricted list of word, such as "infinite" or "ponder". The Go Parser parses the text and sets variables for the engine to tell it what it to, again forced.

Go Parser must carry out operations A,B,C,D,E,F,G,H and it must leave the results of its parse in variables, call them a,b,c,d,e,f,g,h

Operation A might be comparestrings("infinite", inputstring); if true, set variable a.

Operation A's across a range of programs doing Go Parser will be semantically identical usually. Likewise B,C,D,E,F,G,H. No reason why not. Too trivial for anything else.

Fruit does them in order ABCDEFGH, declaring variables abcdefgh
Rybka does them in order BDCEFGHA, declaring variable dcbfghab
Even Zach's ZCT does them, slightly differently, but basically if you look at his code, there they are. "infinite", "ponder" etc. Has to be so.
Each program has its own idiosyncracities in addition, some more so, some less so.

We have Go Parsers, all doing the same thing, by force, and composed of sub-components also doing the same things by force. The sub-components being so trivial and so forced that they are reasonably likely to be semantically identical. Not always, not necessarily, but usually.

The main likely differences are in the ordering of it all. It matters not very much what order the parsing is donea nd the variable filled.

It's mostly in the ordering that differences between program writers will emerge. With some differences in style. Some differences in variable type. Some differences in initial values. All of these differences we see in the Rybka and Fruit code.

What the antis did was to take the ordering, both of variables and program flow and make them identical. They converted Rybka's BDCEFGHA to be identical to Fruit's ABCDEFGH, didnt reveal that they'ld done this, implying that the Rybka source was just a natural disassembly and misled the forum and the computer chess community in order to cause damage to Vas and Rybka.

The Go Parser evidence is completely busted. If this was a court case and this information of misleading by hidden manipulation of evidential data revealed part way through the case, the prosection barristers would resign and the judge would throw the case out. With costs.
So you in general do not challenge the results of the experiment but has different explanation of them. Do I understand it right?
The first thing I challenge is the secret and misleading manipulation of the data, that seeks to get the result that somebody must have copied something.

Its not the first time "scientists" have secretly manipulated their input data and probably it wont be the last, but it for sure invalidates and destroys the antis case.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Is It Me or is the Source Comparison Page Rigged?

Post by bob »

chrisw wrote:
bob wrote:
RegicideX wrote:
GenoM wrote:I've tried to follow this scientific discussion about methodology of research.
I can guess John Sidles and Alex K. are mostly worried about that translated code of Rybka is not exactly the same as the original code of Rybka. They are insisting on view that translated code can not be exactly the same as the original source code so from the results of comparision can not be drawn any valid (by scientific means) conclusions.
Have I got it right?
No, you have not. It is perfectly true that the compiler adds a layer of uncertainty by changing the code around -- that's just part of the picture though. The real point is that a certain degree of similarity in not unlikely in programs that are short, simple and need to provide extremely similar if not identical output.
OK. but we are talking about chess programs.
No we're not. We're talking about the anti chief piece of evidence, the corrupted, manipulated, misleading, transmogrified and cheated alleged source code listing of "Go Parser". A piece of code quite forced, maybe 50 lines long, that takes a text string of known possible words and sets appropriate variables for the engine. All forced and all trivial in the sub-components.

Your side deliberately, secretly, misleadingly, manipulated the source listing to line it up with a target in order to fool the forum and discredit a fellow programmer and damage his business.
OK, for starters, all the programmers out there with a procedure named "go_parser" please raise your hand. Mine is not up. If you had any idea what was going on, you would know that this has not been a "misleading" procedure to this point. It has been done the only way it _can_ be done, the same way it has been done thousands of times in the past.

But you do know that.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Is It Me or is the Source Comparison Page Rigged?

Post by bob »

GenoM wrote:
chrisw wrote:
GenoM wrote:I've tried to follow this scientific discussion about methodology of research.
I can guess John Sidles and Alex K. are mostly worried about that translated code of Rybka is not exactly the same as the original code of Rybka. They are insisting on view that translated code can not be exactly the same as the original source code so from the results of comparision can not be drawn any valid (by scientific means) conclusions.
Have I got it right?
Well, they're saying several things.

The Go Parser has, by force, to carry out several operations to parse the input text string. The input text string contains words from a restricted list of word, such as "infinite" or "ponder". The Go Parser parses the text and sets variables for the engine to tell it what it to, again forced.

Go Parser must carry out operations A,B,C,D,E,F,G,H and it must leave the results of its parse in variables, call them a,b,c,d,e,f,g,h

Operation A might be comparestrings("infinite", inputstring); if true, set variable a.

Operation A's across a range of programs doing Go Parser will be semantically identical usually. Likewise B,C,D,E,F,G,H. No reason why not. Too trivial for anything else.

Fruit does them in order ABCDEFGH, declaring variables abcdefgh
Rybka does them in order BDCEFGHA, declaring variable dcbfghab
Even Zach's ZCT does them, slightly differently, but basically if you look at his code, there they are. "infinite", "ponder" etc. Has to be so.
Each program has its own idiosyncracities in addition, some more so, some less so.

We have Go Parsers, all doing the same thing, by force, and composed of sub-components also doing the same things by force. The sub-components being so trivial and so forced that they are reasonably likely to be semantically identical. Not always, not necessarily, but usually.

The main likely differences are in the ordering of it all. It matters not very much what order the parsing is donea nd the variable filled.

It's mostly in the ordering that differences between program writers will emerge. With some differences in style. Some differences in variable type. Some differences in initial values. All of these differences we see in the Rybka and Fruit code.

What the antis did was to take the ordering, both of variables and program flow and make them identical. They converted Rybka's BDCEFGHA to be identical to Fruit's ABCDEFGH, didnt reveal that they'ld done this, implying that the Rybka source was just a natural disassembly and misled the forum and the computer chess community in order to cause damage to Vas and Rybka.

The Go Parser evidence is completely busted. If this was a court case and this information of misleading by hidden manipulation of evidential data revealed part way through the case, the prosection barristers would resign and the judge would throw the case out. With costs.
So you in general do not challenge the results of the experiment but has different explanation of them. Do I understand it right?
Here is a good question for Chris. He dislikes the "go_parser" analysis. OK. What _would_ he need to see to begin to have doubts, then the group that are looking at this can focus precisely on what he thinks is necessary to offer acceptable evidence. Evaluation is a part of the code where duplication is inconceivable. So how about the evaluation constants? Would a long list of duplicate numbers be significant? What about a list of identical positional ideas using similar (can't be identical here since A is mailbox, B is bitboard) positional concepts? Or maybe something easier to find like identical data structures, or identical hash implementations? Or identical approaches to ordering moves, including 'quirks" that do not appear in other programs?

Exactly what would be considered acceptable proof? I suspect there is _nothing_ that would be accepted myself, just another one of those things that make you go hmmm...
chrisw

Re: Is It Me or is the Source Comparison Page Rigged?

Post by chrisw »

bob wrote:
chrisw wrote:
bob wrote:
RegicideX wrote:
GenoM wrote:I've tried to follow this scientific discussion about methodology of research.
I can guess John Sidles and Alex K. are mostly worried about that translated code of Rybka is not exactly the same as the original code of Rybka. They are insisting on view that translated code can not be exactly the same as the original source code so from the results of comparision can not be drawn any valid (by scientific means) conclusions.
Have I got it right?
No, you have not. It is perfectly true that the compiler adds a layer of uncertainty by changing the code around -- that's just part of the picture though. The real point is that a certain degree of similarity in not unlikely in programs that are short, simple and need to provide extremely similar if not identical output.
OK. but we are talking about chess programs.
No we're not. We're talking about the anti chief piece of evidence, the corrupted, manipulated, misleading, transmogrified and cheated alleged source code listing of "Go Parser". A piece of code quite forced, maybe 50 lines long, that takes a text string of known possible words and sets appropriate variables for the engine. All forced and all trivial in the sub-components.

Your side deliberately, secretly, misleadingly, manipulated the source listing to line it up with a target in order to fool the forum and discredit a fellow programmer and damage his business.
OK, for starters, all the programmers out there with a procedure named "go_parser" please raise your hand. Mine is not up. If you had any idea what was going on, you would know that this has not been a "misleading" procedure to this point. It has been done the only way it _can_ be done, the same way it has been done thousands of times in the past.

But you do know that.
For the Rybka alleged source code, you can't even state it is called Go Parser. It might be called Fred. Zach does something along those lines, he posted some source with "infinite", "ponder" and so on.

I don't think you did the misleading, Bob. You were misled.

Fooled once. Shame on them. You have an algorithm for case twice, I believe.
RegicideX

Re: Is It Me or is the Source Comparison Page Rigged?

Post by RegicideX »

No, I use solid knowledge to _enumerate_ the possibilities.
No, you don't -- there is no enumeration going on -- just the presentation of a single source, a presentation tilted heavily towards proving rather than disproving your case.
Not only do you not try to avoid it -- but you think that it is good methodology to chose exactly the code reconstruction that justifies your hunch!
correct...
Then you are simply not a reliable source when it comes to objectively analyzing the similarities between two sources -- your technical expertise not-withstanding. But at least you're honest about it.