Is It Me or is the Source Comparison Page Rigged?

RegicideX · Post by **RegicideX** » Mon Sep 01, 2008 1:55 am

Sorry to add fire to the Rybka debates, but I think this is important enough to mention.

The case for the UCI "go" parser similarities is made here:

http://pagesperso-orange.fr/ct_chess/Fr ... rt_go.html

However, it seems that the results on that page are NOT directly copied from the actual sources -- several modifications were made to make the code look more similar than it is. I'll explain below.

The actual source for Fruit 2.1 is easily available on the net (the file with the start_go parser is "protocol.cpp") while the reconstructed source for Rybka is taken from a post in this forum by Rick Faben:

http://64.68.157.89/forum/viewtopic.php ... t&start=10

(If the above link does not work, check the earlier thread "Here is something to start with" where you can find a link that works.)

Now comes the shady part. The order in which variables are declared and the order in which variables are initialized, plus the order in which the "if" claims are checked are actually different in several places, in both programs, from the sources presented in the "orange.fr" page. The "orange.fr" page presents the source modified so that there is maximum similarity in the orders of the variables and "if" statements.

Let me give some examples (I have not checked for all modifications)

Rick Faben's Ribka source:

bool ponder, infinite;
int movestogo, movetime, btime, wtime, binc, winc;

orange.fr Rybka code:

bool infinite, ponder;
int binc, btime, movetime, winc, wtime;
int movestogo;

Fruit 2.1 original source:

bool infinite, ponder;
int depth, mate, movestogo;
sint64 nodes;
double binc, btime, movetime, winc, wtime;

The orange.fr page writes almost correctly the Fruit 2.1 source -- except for changing the order of the int declarations line and double declarations line, and ignoring the "sint64 nodes" line-- but it changes the order of declarations completely in the Rybka code.

It is clear that the code was changed from Rick Faben's source to make it more similar to the Fruit declarations.

When we go to the variable initialization part, the following code is the same in both Rick Faben's Rybka code and in the orange.fr Rybka code -- but as we'll see, the Fruit code is change to match the Rybka code.

Rick Faben's Rybka code:

infinite = 0;
ponder = 0;
movestogo = 25;
winc = 0;
wtime = 0;
binc = 0;
btime = 0;
movetime = 0;

Actual Fruit 2.1 code:

infinite = false;
ponder = false;

depth = -1;
mate = -1;
movestogo = -1;

nodes = -1;

binc = -1.0;
btime = -1.0;
movetime = -1.0;
winc = -1.0;
wtime = -1.0;

Modified Fruit 2.1 code in the orange.fr page:

depth = -1;
mate = -1;
nodes = -1;

infinite = false;
ponder = false;
movestogo = -1;
winc = -1.0;
wtime = -1.0;
binc = -1.0;
btime = -1.0;
movetime = -1.0;

Again the Fruit code has been change to match the Rick Faben Rybka code.

There are similar modifications in the "if" statements order -- it takes to much typing to write that here too, but you can check it for yourselves.

Clearly the orange.fr page has modified the source codes in order to make lines more similar than they actually are. Why this is so, and who did it remains a matter for speculation.

Dr.Wael Deeb · Post by **Dr.Wael Deeb** » Mon Sep 01, 2008 1:59 am

With all my respect,we don't need yet another Rybka flamewar thread....
We've had more than enough....

John · Post by **John** » Mon Sep 01, 2008 2:20 am

A principle well-known to both medical researchers and stage magicians is that "People see what they expect to see."

That is why criteria and methods for evaluating the similarity of codes must be publicly specified in advance, and subsequent evaluation must be blinded.

Departing from these standards leads swiftly to pseudo-science.

It is dismaying commonplace for people to ignore these principles. The reason is that pretty much everyone---scientists and lay-people alike---places great faith in their own rationality. Yet experience has shown over and over that this faith is utterly misguided, even among the most senior and experienced investigators.

RegicideX · Post by **RegicideX** » Mon Sep 01, 2008 2:45 am

Dr.Wael Deeb wrote:With all my respect,we don't need yet another Rybka flamewar thread....
We've had more than enough....

I agree -- and I would ask everyone to be civil about this.

But people are forming an opinion about the similarity of two pieces of code by studying source code that was modified to look more similar than it is. This is not right.

Names are in fact different -- but making them similar anyway is understandable. But changing the order of variable declaration, the order of variable initialization and the order of "if" statements is a bit too much. This on top of the fact that various blocks of code do not actually correspond to each other.

I think it's important to point this out given that the good name of a programmer is involved.

bob · Post by **bob** » Mon Sep 01, 2008 3:04 am

I have not compared anything there, but let me add this: The issue is "semantic equivalence", not "identical source lines."

Anybody would recognize that a=1; b=2; is exactly the same, semantically, as writing a=1; b=a+1; or b=2; a=1; or b=2; a=b-1; and any of several thousand other ways to express the same semantic operations. The idea is "do the two programs so the same semantically? If the answer is yes, then they are _very_ close relatives. All this means is that to compare, _one_ of the programs is going to have to be altered by shifting lines around for a comparison to work. And it doesn't mean anything dishonest or deceptive is going on.

John · Post by **John** » Mon Sep 01, 2008 3:18 am

bob wrote:... "one_ of the programs is going to have to be altered by shifting lines around for a comparison to work. And it doesn't mean anything dishonest or deceptive is going on..."

Bob, isn't it the case that "shifting lines around for a comparison to work"---especially if done by-hand, by non-blinded investigators---provides unacceptable opportunities for self-deception, as as peer-reviewed studies in science and medicine clearly show?

This is true, even if everyone involved is intelligent, experienced, mathematically skilled, and has the best intentions.

That is why no one can obtain good results from a bad protocol, no matter how skilled or intelligent the investigators are.

RegicideX · Post by **RegicideX** » Mon Sep 01, 2008 3:19 am

bob wrote: All this means is that to compare, _one_ of the programs is going to have to be altered by shifting lines around for a comparison to work. And it doesn't mean anything dishonest or deceptive is going on.

No, you don't have to shift the lines at all -- and if you shift them then you should say so.

And this does matter: the probability of finding the same declarations in the same order is pretty low usually --although there are exceptions-- and making it look like you have the same order is thus misleading.

When you add that in fact various variables are added and missing in both programs, that various chunks of code are not similar at all -- then you can smell a rat in the anti-Rybka case.

The "semantical equivalence" issue is relative to the level of abstraction you want to look at. At one level of abstraction you can say that all UCI "go" parsers must be semantically equivalent; at a lower level of abstraction you can say that "a=1; b=2; c=a+b" is different from "c=1+2;"

So we're not advancing much by talking about semantical equivalence.

bob · Post by **bob** » Mon Sep 01, 2008 3:27 am

John wrote:
bob wrote:... "one_ of the programs is going to have to be altered by shifting lines around for a comparison to work. And it doesn't mean anything dishonest or deceptive is going on..."
Bob, isn't it the case that "shifting lines around for a comparison to work"---especially if done by-hand, by non-blinded investigators---provides unacceptable opportunities for self-deception, as as peer-reviewed studies in science and medicine clearly show?

This is true, even if everyone involved is intelligent, experienced, mathematically skilled, and has the best intentions.

That is why no one can obtain good results from a bad protocol, no matter how skilled or intelligent the investigators are.

\

I don't see how. The idea is to maintain semantical correctness. Can someone make a mistake? Of course. And the "group" looking at this stuff now is checking from several different directions to prevent obvious errors from creeping in. But this is a subjective process at the moment, and this is the only way I can see that they could make progress. Give them a chance to look at all the code. Data values will be interesting since they are not so difficult to compare. I caught multiple crafty clones because they duplicated huge arrays I used for move generation. So just hang on until more (much more) becomes available. I suspect that by then, nobody is going to be quibbling over single lines here and there...

this comparison is not so difficult IMHO. In the past, when I taught a course in writing an assembler and a course on writing a compiler, I had to do this regularly because I had to verify that the optimized object code one of the student compilers produced was semantically equivalent to the original source program I gave them to compile. It just takes some time.

bob · Post by **bob** » Mon Sep 01, 2008 3:39 am

RegicideX wrote:
bob wrote: All this means is that to compare, _one_ of the programs is going to have to be altered by shifting lines around for a comparison to work. And it doesn't mean anything dishonest or deceptive is going on.
No, you don't have to shift the lines at all -- and if you shift them then you should say so.

And this does matter: the probability of finding the same declarations in the same order is pretty low usually --although there are exceptions-- and making it look like you have the same order is thus misleading.

Fine. you are unable to follow the discussion. That's not a real problem. But if you don't understand what I explained, there is not a lot of point in arguing about what is going on... And I'll leave it at that. We are not looking "for the same declarations". We are looking for "semantical equivalence" (have I said that enough times yet, apparently not). Order of declarations is irrelevant. Order of statements is irrelevant so long as semantics are unchanged. To say it is misleading is silly, unless you just don't understand what is going on.

When you add that in fact various variables are added and missing in both programs, that various chunks of code are not similar at all -- then you can smell a rat in the anti-Rybka case.

You can smell a rat _anywhere_ if you want to bad enough. But just hang on for a while and the exterminator will be along.

The "semantical equivalence" issue is relative to the level of abstraction you want to look at. At one level of abstraction you can say that all UCI "go" parsers must be semantically equivalent; at a lower level of abstraction you can say that "a=1; b=2; c=a+b" is different from "c=1+2;"

Aha. so you do _not_ understand "semantical equivalence". This is simply proving that for the same inputs, the two pieces of code produce the same output. Order is immaterial so long as changes do not violate data depenencies, name dependencies or control dependencies. And no, not all UCI interfaces need to be semantically equivalent. I'm more familiar with the winboard protocol, where there are a minimal number of commands you have to handle to play a game, and where there are a bunch of other commands, of which you can implement any that you want, to get/provide better information. There is a lot of room for variance in how the protocol code works, what it does, what it doesn't do, and in some amusing cases, what it does that is horrible programming (comparing int to float as one example, longjmp() as a lesser example.)

We are not talking about lower level than "semantical equivalence" nor at a higher-level, because the pure concept of semantical equivalence is the ideal way to detect plagiarism. Again, you can find lots of good papers on this subject scattered around... Semantical equivalence is a _specific_ level of equality that is not open to interpretation.

"linguistic study of meanings in language. The semantic aspect of any expression is its meaning as opposed to its form."

We are looking at the form. If the form is the same, then there is _high_ probability that the language instances are highly related.

So we're not advancing much by talking about semantical equivalence.

John · Post by **John** » Mon Sep 01, 2008 3:42 am

Bob, your chess and programming skills are undoubted, but have you ever taught a statistics course?

I commend this page on Bonferroni corrections to all.

Modern computerized analysis methods allow (literally) millions of hypotheses to be searched, and in consequence, it is infeasible to evaluate the significance of any criteria applied ex post facto.

That is why ex post facto analysis reliably confirms investigator prejudices, and in consequence, yield results that are without scientific merit.

Is It Me or is the Source Comparison Page Rigged?

Is It Me or is the Source Comparison Page Rigged?

Re: Is It Me or is the Source Comparison Page Rigged?

Re: Is It Me or is the Source Comparison Page Rigged?

Re: Is It Me or is the Source Comparison Page Rigged?

Re: Is It Me or is the Source Comparison Page Rigged?

Re: Is It Me or is the Source Comparison Page Rigged?

Re: Is It Me or is the Source Comparison Page Rigged?

Re: Is It Me or is the Source Comparison Page Rigged?

Re: Is It Me or is the Source Comparison Page Rigged?

Re: Is It Me or is the Source Comparison Page Rigged?