RegicideX wrote:bob wrote:
Order makes the source codes look much more similar that they really are. If ten variables in a row are initialized in the same order in machine code then that's alarming. Having variables initialized all over the place, and many of them nonexistent is not at all alarming.
Again, before making statements, do a little research on translating source code to machine language. Instructions get re-ordered by the compiler, in fact, to improve speed and reduce pipeline stalls.
Maybe there would be something to your "correction" if it were not for the fact that I said the same thing about the compiler changing the code in a previous post.
It still doesn't change the fact that a lot of identically ordered variable initialization in machine code should be alarming. Reading what I actually wrote should help you.
Again, I am not sure where this is supposed to go. We _know_ about compiling from C to assembly, and we _know_ how to "uncompile" back to the C code. Let's say we take a C source program A and compile it to machine language and call this "B". Then we "decompile" the machine language B and end up with C. A and C are semantically equivalent, by definition. But they might not be indentical for many reasons. 1. variable names (non-global anyway) gets lost during compilation unless the compiler is told to keep them around for debugging use. 2. We have the compiler's "final product" to look at, and we can't possibly know how it rearranged source code to make it execute faster. So once we have C, which absolutely came from A, now it is just a matter of massaging C to make it look like A, or massaging A to make it look like C, and we eventually end up with a perfect match. We established semantic equality to start with, between A, B and C. But that is difficult to see for the casual person, so moving things around, while maintaining that semantic equivalency, so that we get identical source code finishes the project off and leaves a clearly identifiable connection.
Nothing sinister. Nothing dishonest. Those of us that do this understand the difficulty of the "decompiling" because the compiler does far more than just translate C to machine language. It can re-order. move/eliminate common sub-expressions to save time. unroll loops. And when we decompile, we end up with the "C program" the compiler created from our original, rather than our less efficient original. Then we have to continue the conversion, maintaining semantic equivalence to try to work our way back to the original code.
I don't see any "creativity" in that. we are not creating _anything_. it is quite technical in nature, and requires specific skills. But it is _not_ creative. If there was any justification, I suppose a large project group could automate the process. But it would only be useful for this one task, when most are only interested in the first half of the process, to get from source to fast machine language. But clearly, by theory, this is a two-way algorithm. If you can go from A to B, you can go from B back to A. This must be true. It is absolutely true. Might be one tough job, but so is building a fence across the USA's southern border. hard, but nobody would say "impossible" or "improbable it could be completed". Just a big job. And somehow this gets run into some sort of statistical sampling issues and clinical trials and the like, when it is a direct transformation from A to B with no witchcraft or voodoo involved at any point.
Changing the order without saying so is at best sneaky, at worst dishonest.
it is actually neither. Unless you consider your compiler and processor to be "sneaky or dishonest"...
That's pretty silly. I don't expect my compiler to try to make arguments and present evidence in a honest and straightforward manner -- I do expect that from human interlocutors, and it's clear that humans changed the code to make it look more similar than it is, without mentioning anything about it.
That's the problem. we were having a technical discussion between people that _understood_ how this worked. We tried having it here and there was a demand to see what was being done. Several produced the incomplete results available so far. And now it seems we were dishonest for showing data that anybody familiar with the process would instantly understand. It was not intended to be dishonest, and that is why more data is not presently forthcoming, so that more can be completed, cross-checks, and displayed in a way that won't generate hundreds of questions and claims of dishonesty.
if you ask a good compiler guy about this, he would not think twice about what is being discussed here, it would be expected. At some point, the idea solution would be to take source A, and executable B that some believe contains parts of A in it, and decompile to C. And show that step first. And then start the "massaging" to undo the various tricks the compiler used to speed up the code, and show that. And then try to massage the resulting C', order, names, and such, to maintain semantic equality with B, but while attempting to make it match A as closely as possible. The closer they can be made to match, the more code A and B then have in common. If they could be made to match perfectly (which we do not expect since we know A and B play differently) then we would have established absolute proof that B came from A. As it is, we might find that significant parts can be made to match up, so we know that significant parts came from A. Or we might not be able to show much at all came from A.
Work continues. new results are coming daily. And they are being carefully checked. And one day everyone will see everything that has been found and can make an intelligent judgement on the results. Without all the name calling, claims of dishonesty and dark motives and such.
Just let the "process" proceed at the only pace it can proceed at, which is limited by individual's ability to spend X hours a day on this. And sooner, rather than later, there will be something to look at that is more polished and easier to follow than what has been shown to date. I'd rather see the ongoing discussions carried out here, but these threads simply make that impossible.
In the past, we were able to do this. For crafty clone claims, someone would post some evidence, I would analyze and post more evidence, and we would carry out the investigation in the open where everyone could follow it in real-time. But that didn't work in this case, for obvious reasons...
So, we wait.