How does that help? You are STILL doing the test.lucasart wrote:I appreciate that your knowledge of assembly is far better than mine, and probably almost everyone here.bob wrote:Here's the biggest win you get, from someone that has been programming in ASM forever...hgm wrote:I don't think that reasoning is sound. I can build an engine that plays better Chess than I do. Why shouldn't I be able to write a compiler that makes better assembly than I do? Even if I would be the worlds's number one expert. We have engines that beat the World champ Chess...
The compiler has to follow the semantics of the language. It doesn't know anything about the semantics of the program itself, other than what it sees. But that is not much.
For example, if you do a switch() it must, by standard, do what the specs say. If the switch variable value does not match any of the cases, the compiler is required to test for that condition to avoid errors. In my code, I might well KNOW that the "piece" value will only be one of 13 possibilities, and I can exclude that "if piece < -6 or piece > +6" sanity test. There are lots of such things that one can exploit, but it requires knowledge about the program that can't readily be discerned just looking at the source code for one procedure. Other things might include min and max range of a value where the compiler only knows "int" or "long" but you know it is 17 bits max. So there is room to out-perform the compiler, but nowhere near what it was 20-30 years ago when hand-coded asm was the way to go everywhere.
Compilers are good at doing the classic optimizations, like common-subexpression elimination, loop-invariant detection and such. So I doubt a 10:1 is possible, UNLESS one intentionally (or unknowingly) writes C code that ties the compiler's hands. Bad C code can still produce bad asm code. But with care, one can write C code that produces pretty well optimized asm code. Unless you step outside the semantics of the language somewhere. For example, there is no standard way to get to the popcnt() instruction. compilers do include intrinsics for such, but it is not something that is portable.
I've spent a lot of time looking at compiler output and am generally impressed. They can be improved on, with effort, and perhaps with a lot of pragma usage to tell the compiler things it can't spot for itself.
However, the arguments you are making have little to do with Assembly vs C. For example, the switch argument. Let's say that your piece values range from 0 to 11. How about writing this instead?The fact that 0 <= piece <= 11 can help to optimize the code is of course not known by a C compiler, nor is it known by an assembler...Code: Select all
if (piece >= 12) goto skip; switch (piece) {...}; skip:
In C++, with typed enum, you can let the compiler know the range of possible valuesAnd then the switch knows that the set of possible values for a "Piece" is realatively compact so the compiler may decide to use a jump table, to make branching a O(1) operation.Code: Select all
enum Piece {wPawn, wKnight...};
Regarding the int/long, there are many integer types in C. It's all about choosing the right one, and knowing that sign extension is not a zero cost operation. If you use unsigned judiciously, there is some (tiny) performance gain there. But in reality, writing compact and cache friendly code is far more important than these microscopic optimizations.
Again, there are things you know that the compiler can't know. A simple example: you want to change the side to move (WTM) variable. You could do the common "wtm = !wtm" which is horrible, or you could do the less obvious wtm ^= 1. The latter is a one instruction operation that will execute as fast as any instruction in the cpu. The former is a conditional comparison with a branch. the semantics of ! say "if wtm is zero, !wtm is 1, if wtm is non-zero, !wtm=0. The gotcha is that "is non-zero". The compiler can't know that the only two possible values are 0 and 1 and do the xor optimization, since we don't have any true boolean data type in C.
Two points, one we disagree on.
What I am trying to say is that you will not make your code faster by using assembly, except to use hardware features otherwise unavailable in C (in the case of chess, I think there are only 4 functions that deserve assembly or intrinsics: popcnt(), lsb(), msb(), prefetch()).
But you will make your program faster by writing *better C* code. And in order to write efficient C code, some knowledge of assembly, and how a compiler/optimizer works certainly helps.
(1) it is always a possibility that a better algorithm can be developed, one that dwarfs any hand-coded asm code. No argument there. No point in hand-coding a poor algorithm.
(2) hand-coding, for an experienced x86 (or any other processor) asm programmer can produce significant performance improvements. How big a gain is debatable, but I would NEVER say that a compiler can't be beat. I've looked at too much compiler output. They are good. Very good. But they are nowhere near perfect.
Whether (2) is a good idea is a different topic. Asm code is harder to read, harder to write, harder to debug, and harder to modify later. In return, you will generally get something that executes faster. The question is, how much faster? 2x is really not uncommon at all. Question is, for a chess engine, is it worth the effort required to get that 2x speedup, when compared to the effort required to make future changes to it? That is certainly a valid topic for debate.
Different topic, entirely. An OS, by definition, is constantly changing. We've had pure asm kernels in the past. IBM os/360 was an example. The xerox operating system CP-V for another. Etc. Problem with them is the difficulty of making changes. The linux kernel is very fast already. 2x would not make any measurable speed improvement for most users, and the advantage of having a c code base is a significant offsetting factor in favor of C.You can see that the Linux kernel has a very small %age of assembly code. It only uses assembly where it must (to do things not possible in C or to access hardware features like task switching port I/O etc.). Everything else is in C, even though Torvaldes is a performance freak, and a hardcore assembly programmer too.
That is the ONLY argument that makes sense. But there are plenty of places that STILL hand-code asm code for performance. Lawrence Livermore Lab for one... Sometimes speed is the overriding consideration, and there asm is the right answer. The only answer, in fact. If development time and such are important, then asm is never the answer.
And even if there is some performance gain that assembler gives you over *perfectly written* C code (which I doubt), it is certainly not worth doing. Imagine if your entire search() function was in assembly. It would be pages and pages long, and completely illegible. And you'd have to write plenty of versions of it for all sorts of different plateforms (and maintain them risking to have differences). A ridiculous waste of time, not to mention the fact that you'd have to constantly add new versions to support new CPU features etc.
