Reverse engineering, a legitimate form of 'discovery'

bnemias · Post by **bnemias** » Tue Jan 19, 2010 5:04 am

Dann Corbit wrote:If we can denude the work of the best professional programmers, where is the incentive for new workers to go into that field?

I have mixed feelings on this. I don't disagree really. But at this very site, we see a significant percentage of hobbyist programmers. A few of the best programmers are indeed hobbyists only, and their work in chess would merit a degree.

I do wonder if in most cases the same results can be achieved using the open source model where the is nothing to denude.

Dann Corbit · Post by **Dann Corbit** » Tue Jan 19, 2010 5:20 am

bnemias wrote:
Dann Corbit wrote:If we can denude the work of the best professional programmers, where is the incentive for new workers to go into that field?
I have mixed feelings on this. I don't disagree really. But at this very site, we see a significant percentage of hobbyist programmers. A few of the best programmers are indeed hobbyists only, and their work in chess would merit a degree.

I suspect that Tord is the best chess programmer in the world. If not Tord, then probably Fabian. They just don't have the interest to pursue at full speed.

The question is not necessarily related to "Where is the best work done?" What I am wondering is (supposing Tord or Frabian wanted to write a professional program) "Is the effort of full time chess programming better spent elsewhere?"
I think that almost certainly the answer to this question is "Yes" for more than one reason.

I do wonder if in most cases the same results can be achieved using the open source model where the is nothing to denude.

The majority of effort in a professional program is not in the engine (IMO). It is in the opening book and the slick GUI and the database, etc.
Then we have the technical support and documentation that will be needed for the majority of users, if the tool set is to ever go mainstream.
Even that is just the tip of the iceberg.

In short, for chess programs to achieve the sort of pinnacle of effort that goes into (for instance) video games, things would have to change a lot.

Dann Corbit · Post by **Dann Corbit** » Tue Jan 19, 2010 5:28 am

Uri Blass wrote:
Dann Corbit wrote:
Osipov Jury wrote:It talks a lot about the RE and its legality, morality, etc.
But nobody talks about the technical possibility of the reverse. This is probably due to poor understanding of the subject matter.
It is mathematically provable that it is impossible to fully reverse any binary back into source code. It has been demonstrated that this is the equivalent of the halting problem.
I do not know what is mathematically provable but
I have a simple algorithm to do it for source that is limited in size when the only problem is that it takes a long time
To make things simple assume that the size of the source code is
not more than 1 Mbyte.

If you have an exe file and want to produce source that give you the same exe then simply try all the possible source codes when you start from all the possible source code of size of 1 bytes and continue with all source code with size of 2 bytes and continue in this way.

This assumes that you know the compiler that produced it. Will a C decompiler work on a C++ executable? How about Fortran? What if the program was written in Delphi?

I guess that from 1000 bytes of binary (after stripping off the headers, etc) there are still an awfully large possible combination of possible programs to generate the binary.

Most of the source code that you try are going to give a compilation error
and programs that you can compile are usually going to give a different exe than the exe that you want but in theory after enough time you are going to get exactly the exe that you need.

The more interesting question is if it is possible to have a significantly faster algorithm to build the source code (not exponential but polynomial)

Uri

If you know the compiler that made it, there are some tools that can do enough to at least make it less work than translation of the assembly language by hand.

Suggested reading:
P.T.Breuer and J.P.Bowen, "Decompilation: The enumeration of types and grammars," Tech. Rep. PRG-TR-11-92, Oxford University Computing Laboratory, 11 Keble Road, Oxford OX1 3QD, 1992.

Dann Corbit · Post by **Dann Corbit** » Tue Jan 19, 2010 5:30 am

Dann Corbit wrote:
Uri Blass wrote:
Dann Corbit wrote:
Osipov Jury wrote:It talks a lot about the RE and its legality, morality, etc.
But nobody talks about the technical possibility of the reverse. This is probably due to poor understanding of the subject matter.
It is mathematically provable that it is impossible to fully reverse any binary back into source code. It has been demonstrated that this is the equivalent of the halting problem.
I do not know what is mathematically provable but
I have a simple algorithm to do it for source that is limited in size when the only problem is that it takes a long time
To make things simple assume that the size of the source code is
not more than 1 Mbyte.

If you have an exe file and want to produce source that give you the same exe then simply try all the possible source codes when you start from all the possible source code of size of 1 bytes and continue with all source code with size of 2 bytes and continue in this way.
This assumes that you know the compiler that produced it. Will a C decompiler work on a C++ executable? How about Fortran? What if the program was written in Delphi?

I guess that from 1000 bytes of binary (after stripping off the headers, etc) there are still an awfully large possible combination of possible programs to generate the binary.
Most of the source code that you try are going to give a compilation error
and programs that you can compile are usually going to give a different exe than the exe that you want but in theory after enough time you are going to get exactly the exe that you need.

The more interesting question is if it is possible to have a significantly faster algorithm to build the source code (not exponential but polynomial)

Uri
If you know the compiler that made it, there are some tools that can do enough to at least make it less work than translation of the assembly language by hand.

Suggested reading:
P.T.Breuer and J.P.Bowen, "Decompilation: The enumeration of types and grammars," Tech. Rep. PRG-TR-11-92, Oxford University Computing Laboratory, 11 Keble Road, Oxford OX1 3QD, 1992.

See also:
http://citeseerx.ist.psu.edu/viewdoc/do ... 1&type=pdf

slobo · Post by **slobo** » Tue Jan 19, 2010 4:13 pm

Dann Corbit wrote:
Osipov Jury wrote:It talks a lot about the RE and its legality, morality, etc.
But nobody talks about the technical possibility of the reverse. This is probably due to poor understanding of the subject matter.
It is mathematically provable that it is impossible to fully reverse any binary back into source code. It has been demonstrated that this is the equivalent of the halting problem.

For an individual binary created by a known compiler, it is possible to do fairly well, but it is still a very difficult and arduous task requiring human intervention.

I guess that only small sections of Rybka were reverse engineered (wild guess of 30%), but they turned out to be the most important sections and so the results proved to be shockingly strong.

There are some tools to automate the process now (IdaPro has a thing called HexRays decompiler http://www.hex-rays.com/decompiler.shtml)
I guess that even with these tools it is a very difficult task.

There are fundamental questions of right and wrong and many dark corners in these undertakings. I hope that in the long run, the field of computer chess is not harmed. If we can denude the work of the best professional programmers, where is the incentive for new workers to go into that field?

After reading this post I wonder: how was it possible that you blamed reverse engineering before. You always presented it as something immoral.

You admit now that it is an extremly hard enterprise, with unpredictable result. Who manage to get the desired information using such a process, must understand the original code's idea first, and only after this understanding he can start the reconstruction of the code. And more: this reconstruction of the code will always be one's own interpretation of the reversed engineered mess of signs.

Conclusion: reverse engineering is a work of high knowledge and, at the same time, of artistic imagination.

Sven · Post by **Sven** » Tue Jan 19, 2010 6:11 pm

slobo wrote:
Dann Corbit wrote:
Osipov Jury wrote:It talks a lot about the RE and its legality, morality, etc.
But nobody talks about the technical possibility of the reverse. This is probably due to poor understanding of the subject matter.
It is mathematically provable that it is impossible to fully reverse any binary back into source code. It has been demonstrated that this is the equivalent of the halting problem.

For an individual binary created by a known compiler, it is possible to do fairly well, but it is still a very difficult and arduous task requiring human intervention.

I guess that only small sections of Rybka were reverse engineered (wild guess of 30%), but they turned out to be the most important sections and so the results proved to be shockingly strong.

There are some tools to automate the process now (IdaPro has a thing called HexRays decompiler http://www.hex-rays.com/decompiler.shtml)
I guess that even with these tools it is a very difficult task.

There are fundamental questions of right and wrong and many dark corners in these undertakings. I hope that in the long run, the field of computer chess is not harmed. If we can denude the work of the best professional programmers, where is the incentive for new workers to go into that field?
After reading this post I wonder: how was it possible that you blamed reverse engineering before. You always presented it as something immoral.

You admit now that it is an extremly hard enterprise, with unpredictable result. Who manage to get the desired information using such a process, must understand the original code's idea first, and only after this understanding he can start the reconstruction of the code. And more: this reconstruction of the code will always be one's own interpretation of the reversed engineered mess of signs.

Conclusion: reverse engineering is a work of high knowledge and, at the same time, of artistic imagination.

What do you think about the following, *hypothetical* case:

Someone writes just another chess engine, nothing special, around 2200 ELO, with an "ad-hoc" implementation (exaggerated, of course) of eval and a very dumb pure alpha-beta search. Then he buys, or somehow gets access to, a top ELO 3000 program which is commercial and "closed-source" software. He disassembles and decompiles it, learns in detail how its search and eval parts work, and then replaces his own search and eval with those from the top program, slightly adapting board representation and other basic stuff accordingly. He finds out that the new eval gives him about 200 ELO improvement, and adding the new and excellent search on top of that gives another 500 ELO points. Now he has a 2900 ELO program, and after fixing some bugs and adding some improvements he arrives at 3000 ELO, too.

And finally, he puts the "combined" sources on the web without telling about their origin, including

a) his own, original sources of everything else than eval and search, and
b) sources for eval and search coming mostly from decompiling the top program, therefore not a copy but a derivative work matching the original, secret top program's source code very closely.

Would you say that this were a legitimate form of "discovery"?

Here is my opinion.

Everything prior to putting the resulting sources on the net could be tolerable. Publishing the "new" strong engine as binary could be tolerable or not (I would say it were not but that is not my point here). But publishing sources that were derived from essential parts of the top program, and thereby revealing the secrets of the commercial, closed-source program, would not be legitimate for me, it would be on the same level as copyright infringement for me, even if a court would decide that there was no source code *copied*.

Now my key point is: We may discuss for years about the question whether doing so would be *legal* or not. We may discuss whether *reverse engineering itself* is legal or not in the case of software, especially chess programs. But if our community would *accept* the scenario that I described above as a "legitimate form of 'discovery'" then something would be wrong in our minds since we would agree on ignoring each other's personal rights to keep his own discoveries secret as long as *he* wants to.

Note: I am not a "fan" of commercial chess programs. I have never bought one. I like open source software very much. But I would certainly not like someone who puts my decompiled secret algorithms on the net without my permission, and furthermore without mentioning the origins.

So far my *hypothetical* scenario. Note this post is no claim or attempt of a proof about the current IpHoeRobGorritBird case. I only wanted to extend the picture of reverse engineering we have seen here in this thread a little bit, in order to show that RE can be much more than

Slobo wrote:a work of high knowledge and, at the same time, of artistic imagination

, it can also hurt someone else very deeply. And that is beyond what courts and lawyers are dealing with, it is about how we feel what is good or bad.

Sven

Milos · Post by **Milos** » Tue Jan 19, 2010 6:26 pm

Sven Schüle wrote:Someone writes just another chess engine, nothing special, around 2200 ELO, with an "ad-hoc" implementation (exaggerated, of course) of eval and a very dumb pure alpha-beta search. Then he buys, or somehow gets access to, a top ELO 3000 program which is commercial and "closed-source" software. He disassembles and decompiles it, learns in detail how its search and eval parts work, and then replaces his own search and eval with those from the top program, slightly adapting board representation and other basic stuff accordingly. He finds out that the new eval gives him about 200 ELO improvement, and adding the new and excellent search on top of that gives another 500 ELO points. Now he has a 2900 ELO program, and after fixing some bugs and adding some improvements he arrives at 3000 ELO, too.

This all sounds logical. However, I would like you to explain how you then arrive to 3080 ELO, 80 ELO beyond top ELO program you decompiled, when nobody of the competition was not able even to come close to 3000 for previous 2 years???

Can you make a program 100 ELO stronger than source binary (and not any binary, but the binary of the strongest existing chess engine) just by reverse engineering? If you think so, I guess you have no clue about chess programming...

Sven · Post by **Sven** » Tue Jan 19, 2010 6:58 pm

Milos wrote:
Sven Schüle wrote:Someone writes just another chess engine, nothing special, around 2200 ELO, with an "ad-hoc" implementation (exaggerated, of course) of eval and a very dumb pure alpha-beta search. Then he buys, or somehow gets access to, a top ELO 3000 program which is commercial and "closed-source" software. He disassembles and decompiles it, learns in detail how its search and eval parts work, and then replaces his own search and eval with those from the top program, slightly adapting board representation and other basic stuff accordingly. He finds out that the new eval gives him about 200 ELO improvement, and adding the new and excellent search on top of that gives another 500 ELO points. Now he has a 2900 ELO program, and after fixing some bugs and adding some improvements he arrives at 3000 ELO, too.
This all sounds logical. However, I would like you to explain how you then arrive to 3080 ELO, 80 ELO beyond top ELO program you decompiled, when nobody of the competition was not able even to come close to 3000 for previous 2 years???

Can you make a program 100 ELO stronger than source binary just by reverse engineering? If you think so, I guess you have no clue about chess programming...

You misquote me, and you attack me while I gave no reason for that. Stop that, please.

1) Nowhere have I written "3080" in my example. You are thinking in a different direction than what has been my intent.

2) Although the numbers in my hypothetical example were not chosen randomly, they are still an example.

3) If you want to compare my example to the IpRob* case, feel free to do it, but please do that in a separate thread while choosing an own topic for it. This thread is about reverse engineering in general, and I replied with an example, a question and my opinion to it.

Also discussing about IpRob* ELO strength within this thread where it is all about legality, legitimacy and ethical aspects of reverse engineering seems misplaced for me.

One question may be allowed, since you obviously defend the originality of IpRob*:

Have you created Ippolit?

Sven

Milos · Post by **Milos** » Tue Jan 19, 2010 7:22 pm

Sven Schüle wrote: You misquote me, and you attack me while I gave no reason for that. Stop that, please.

1) Nowhere have I written "3080" in my example. You are thinking in a different direction than what has been my intent.

2) Although the numbers in my hypothetical example were not chosen randomly, they are still an example.

3) If you want to compare my example to the IpRob* case, feel free to do it, but please do that in a separate thread while choosing an own topic for it. This thread is about reverse engineering in general, and I replied with an example, a question and my opinion to it.

Also discussing about IpRob* ELO strength within this thread where it is all about legality, legitimacy and ethical aspects of reverse engineering seems misplaced for me.

One question may be allowed, since you obviously defend the originality of IpRob*:

Have you created Ippolit?

I suggest to read my post carefully.
First I did not say you wrote 3080, I wrote it myself as an additional hypothetical example.

Second, I did not mention Ippo/Robbo/etc. in any word. It is you who mentions them. So all this insinuations from you about me discussing/comparing Ippo/Robbo are just false. And what gives you the right to insinuate this and puts you in a position of exclusive interpreter of other ppls opinions?

Third, you are not the only one allowed to make "hypothetical" examples, and since this is a free country (and forum) I can extend my example to whoever I like to, including you.

Last, I have no clue why this all scares you so much...

Sven · Post by **Sven** » Tue Jan 19, 2010 8:09 pm

Milos wrote:
Sven Schüle wrote: You misquote me, and you attack me while I gave no reason for that. Stop that, please.

1) Nowhere have I written "3080" in my example. You are thinking in a different direction than what has been my intent.

2) Although the numbers in my hypothetical example were not chosen randomly, they are still an example.

3) If you want to compare my example to the IpRob* case, feel free to do it, but please do that in a separate thread while choosing an own topic for it. This thread is about reverse engineering in general, and I replied with an example, a question and my opinion to it.

Also discussing about IpRob* ELO strength within this thread where it is all about legality, legitimacy and ethical aspects of reverse engineering seems misplaced for me.

One question may be allowed, since you obviously defend the originality of IpRob*:

Have you created Ippolit?
I suggest to read my post carefully.
First I did not say you wrote 3080, I wrote it myself as an additional hypothetical example.

Second, I did not mention Ippo/Robbo/etc. in any word. It is you who mentions them. So all this insinuations from you about me discussing/comparing Ippo/Robbo are just false. And what gives you the right to insinuate this and puts you in a position of exclusive interpreter of other ppls opinions?

Third, you are not the only one allowed to make "hypothetical" examples, and since this is a free country (and forum) I can extend my example to whoever I like to, including you.

Last, I have no clue why this all scares you so much...

I leave it to the judgement of the readers whether this:

Milos wrote:However, I would like you to explain how you then arrive to 3080 ELO, 80 ELO beyond top ELO program you decompiled, when nobody of the competition was not able even to come close to 3000 for previous 2 years???

Can you make a program 100 ELO stronger than source binary (and not any binary, but the binary of the strongest existing chess engine) just by reverse engineering?

is more like an "additional hypothetical example" or like the explicit mentioning of the real case, just without using program names.

And if it were accepted as another example then it were off-topic for this thread since you have challenged the possibility of an "X+80" rated engine being result of reverse engineering (you know what I mean by X here), while the topic is very different from that.

Finally I ask you for the second time, please stop attacking me. I didn't attack you as far as I'm aware. Apart from annoying me, this could lead to wrong conclusions about how and why you defend against a non-existent attack.

Sven

Reverse engineering, a legitimate form of 'discovery'

Re: Reverse engineering, a legitimate form of 'discovery'

Re: Reverse engineering, a legitimate form of 'discovery'

Re: Reverse engineering, a legitimate form of 'discovery'

Re: Reverse engineering, a legitimate form of 'discovery'

Re: Reverse engineering, a legitimate form of 'discovery'

Re: Reverse engineering, a legitimate form of 'discovery'

Re: Reverse engineering, a legitimate form of 'discovery'

Re: Reverse engineering, a legitimate form of 'discovery'

Re: Reverse engineering, a legitimate form of 'discovery'

Re: Reverse engineering, a legitimate form of 'discovery'