C++ coding guidelines

mar · Post by **mar** » Tue Oct 23, 2012 6:10 pm

wgarvin wrote:Be wary of constructors that initialize everything in a performance-critical struct (e.g. something you'll allocate a vector of thousands of, should have a do-nothing ctor or just a default ctor).

I agree that it's generally a very bad idea to use vector of complex classes, but I guess no one really does that anyway. I prefer using vector of pointers in that case as it allows random access.
Also most (all?) stl vector implementations use pointers (begin, end and capacity), which is good for iterators but bad for size() (involves division by element size - sure the compiler can optimize it by using a multiplication/shift instead but still), so iterating a vector the "natural" way (i=0..<size) should be slower than using iterators (unless the compiler does a magnificent job).

wgarvin wrote:
ilari wrote:And not all code has to be fast. In a typical desktop application only a very small percentage of the code is performance-critical.
This really varies with application domain. Its mostly true in computer chess, but (for example) mostly false in games.

I guess Ilari meant something else by desktop application, game engines are like chess engines - they need to be as fast as possible. But chess guis (and game editors) don't.

Gerd Isenberg · Post by **Gerd Isenberg** » Tue Oct 23, 2012 7:01 pm

Sven Schüle wrote:
diep wrote:Where is the garantuee that enum types are 'unsigned integer' rather than signed, which is SLOWER in 64 bits?
Nowhere, and you don't need such a guarantee. The compiler decides about the integral type it uses for an enum type depending on the enum values used by it.

Why should either unsigned int or signed int be "slower" than the other one on 64 bit systems? Do you have any hard proof for that thesis?

Sven

It is about explicit sign versus implicit zero extension of a 32-bit signed or unsigned int into a 64-bit index register. So an unsigned integer may use its native 32-bit registers like ebx, to index via rbx. A signed integer needs extra sign extension.

http://support.amd.com/us/Processor_Tec ... _guide.pdf

3.20 Sign of Integer Operands

In summary, use unsigned types for:
• Division and remainders
• Loop counters
• Array indexing

Use signed types for:
• Integer-to-floating-point conversion

wgarvin · Post by **wgarvin** » Tue Oct 23, 2012 8:06 pm

abulmo wrote:
Did you actually compile that code? I ask because int is 32-bits on most 64-bit platforms, including MSVC/Intel/GCC on x64 under either Windows or Linux...
Pointers are 64 bits. Vincent Diepeveen's example uses an array of ints and thus does pointer arithmetics using 64 bits. If 64-bit pointer tends to slow down a program, on the other hand, the 64 bit ABI using registers instead of the stack to pass parameters usually makes the program much faster. Linux has a new (and somewhat experimental) 32 bit ABI (x32), similar to the 64-bit ABI but with 32 bit pointers, perhaps the best of both world?

Okay, I had to download some docs from Intel and actually skim through them to find out how this really works on x64.

I don't think any "pointer math" actually occurs, but the address calculations are done using 64-bit registers so actually, Vincent was correct that sign-extending 8- or 16-bit values is not completely free here. Funny how this kind of info is not easy to find with Google; I guess not many programmers worry much about this kind of detail (and if they do they probably have those manuals handy anyway)..

I think there's basically two cases, both of which require the index (in this case a 32-bit int variable) to be sign-extended into 64-bits.
(1) if the base address of the array is at an immediate offset from some other 64-bit register such as RSP, RBP etc. (this is the case for any array on the stack) then the addressing would be something like [RSP+4*RAX+nnn]. Even for a global variable, if there is a suitable fixed address already in a register, (such as the address of some other global variable, or the address of the start of the module, or some global table... anything where the compiler knows the fixed offset between that thing and this global array variable) then method (1) can be used.
(2) otherwise, it has to form a 64-bit address for the array in a register first. It can load the 64-bit immediate with one instruction (this should never add latency, as it can be done anywhere earlier in the code). Then it is the same as method 1 but using this register as the base, i.e. something like [RDX+4*RAX].

In either case, the only thing you pay for (in latency) would be sign-extending a 32-bit "int" value into 64-bit RAX. Using "unsigned" instead of "int" would avoid that, since the zero-extension is actually free if you just write to EAX, i.e. something like MOVZX EAX,byte ptr [whatever].

So yeah. x86 can handle simple cases like this with its addressing modes (no actual instructions doing "pointer arithmetic") but a signed index does need to be sign-extended to 64 bits first, which is different from 32-bit x86 code. Bummer.

...On the other hand, if the compiler knows that accessing your array at negative indexes is "undefined behaviour", it might cleverly skip the sign-extension and effectively give you zero-extension of 32-bit to 64-bit instead. I'm not sure what array indexing situations are "undefined behaviour" but many of them are, and unfortunately modern compilers are starting to get more aggressive about taking advantage of stuff like that.

[Edit: to be clear, x64 has no addressing mode that mixes different-size operands in the address calculation; you can't add a 32-bit index register to a 64-bit base address for example. If you're calculating a 64-bit address, you have to use 64-bit register(s) for base/index. Immediates are 32-bits and are sign-extended to 64-bit for free, but registers used in the address calculation have to be sign-extended in advance unless the compiler can prove that using any negative value would be undefined behaviour. You are allowed to use different operand and address sizes; you can load a 64-bit value from a 32-bit address or a 32-bit value from a 64-bit address. You just can't mix sizes within the address calculation itself.]

diep · Post by **diep** » Wed Oct 24, 2012 12:59 pm

Wylie, you shouldn't make yourself any illusions about huge companies that introduce a new concept they try to sell and then googling about its disadvantages. You'll get some weirdo websites then instead of what you want most of the time.

There is some extreme examples of that.

Just google on how useful solar panels are for example and how useful windmills are. That's by far best example.

Actually on the market electricity companies have to PAY to operate windmills as delivering energy when market has enough you have to pay for and they have to pay BIGTIME at all the occasions it ain't producing. Thrown away money it is - besides such windmills get produced in china and work just 10-15 years versus a coals plant over here - they just closed it. It
operated nonstop for a 100 years.

Now i don't want a discussion on what political decision you prefer there - i just give the bare financial facts versus if you google you'll find total other facts.

Try to google on side effects of medicines from WITHIN hospitals; so the same terminal your doctor is using.

In Netherlands i stumbled upon a PROBLEM there... ...all the side effects were reported in English, so the Dutch hospitals, all with Dutch settings, are NOT finding *any* of those pages with side effects. The original pharmaceuticals good weather story is the only thing you'll find.

This in contradiction to when i later on tried the same query at home - yet my laptop has American settings...

This is always the case when trying to use google when big cash gets made - and in this case we speak about a rather 'innocent' disadvantage, as i don't think they sell less than when it would be with big capitals on the net, yet something that really slows down well written code.

The internet is of LIMITED usage in such cases and if you THINK about it you will realize WHY.

C is simply garantueeing that 'integer' is the fastest datatype, and in x64 this simply is no longer the case, which is a slowdown of well written software.

wgarvin · Post by **wgarvin** » Wed Oct 24, 2012 8:49 pm

diep wrote:Wylie, you shouldn't make yourself any illusions about huge companies that introduce a new concept they try to sell and then googling about its disadvantages. You'll get some weirdo websites then instead of what you want most of the time.

There is some extreme examples of that.

That's not quite what I meant. I meant that a few years ago I used to be able to google for things like "x64 sign extension movsxd" and get a bunch of forum threads where assembly enthusiasts or compiler writers or whatever, were discussing the topic in some useful detail. And sometimes thats still the case. But in this case, I didn't find much useful info at all with a few minutes of google searching. I think maybe hand-rolled assembly is not as popular nowadays as it was even ten years ago.

I was just surprised that there's no detailed discussion somewhere from sometime last year, about sign extension in x64, already indexed by google.

Everyone is trusting the compiler to do great optimizations on their code, but the source you feed to it can make a big difference to the quality of the final result. We can't be the only community of programmers who wants to know details like this.

wgarvin · Post by **wgarvin** » Wed Oct 24, 2012 9:03 pm

mar wrote:
wgarvin wrote:Be wary of constructors that initialize everything in a performance-critical struct (e.g. something you'll allocate a vector of thousands of, should have a do-nothing ctor or just a default ctor).
I agree that it's generally a very bad idea to use vector of complex classes, but I guess no one really does that anyway. I prefer using vector of pointers in that case as it allows random access.
Also most (all?) stl vector implementations use pointers (begin, end and capacity), which is good for iterators but bad for size() (involves division by element size - sure the compiler can optimize it by using a multiplication/shift instead but still), so iterating a vector the "natural" way (i=0..<size) should be slower than using iterators (unless the compiler does a magnificent job).

I don't know how off-the-shelf STL implementations do it, but ours doesn't do that. We store pointer and size. So begin() returns the pointer, and end() has to compute something like pointer+size*sizeof(T). And size() just returns the size, so using it in a loop condition is probably not bad (but in our implementation its a bitfield, so my habit is to copy it to stack local var and use that var in the loop condition). I don't usually modify a container's contents while iterating over it, obviously you have to be careful with the cached size if you do that.

ilari · Post by **ilari** » Wed Oct 24, 2012 11:13 pm

wgarvin wrote:
ilari wrote:C++ should not be thought of as better C - it's a different language and it should be used differently. Fruit is a good example of how to use C++ without really using C++, and Stockfish is a great example of how C++ should be used.
I completely disagree with this. C++ is a horrible language. If you have to write a high-level program and you can't just use something like Python, then I guess C++ can get the job done. But for low-level code, most of its language features are a double-edged sword, making it easy to bleed away the performance advantage of C without even noticing.

I think C++ is a very good tool when it's used by people who know what they're doing. It has served as quite well for Cute Chess where we need a good combination of low-level control and high level features. The bottlenecks were easy to identify and thanks to C++ they were also easy to optimize.

If you want to write "nice C++" then go ahead, but writing "proper C++ code" with lots of RAII, virtual functions, default constructors that initialize everything to zero or NULL, etc. is a good way to get slow programs. Plus nobody, and I mean nobody, can reliably write exception-safe C++ code that does anything tricky. Exception-safety in C++ is ridiculous and difficult. If you care about having correct programs, a much safer way to write them is to turn exceptions off and write all of your code without them!

You don't have to use every feature of C++, that's not what I was trying to say. For example we don't use exceptions at all. But we do use a lot of RAII, polymorphism, generic containers, Qt's unicode strings, etc. And in the very few places where we needed to minimize CPU or memory consumption, we did it with very C-like code. Rejecting C++'s high level features completely wouldn't have made our program more efficient in any noticeable way.

Anyway, for a programmer familiar with C it is very easy to use C++ as just a "slightly more convenient C". You can put little inline accessor methods in your structs, you can declare your variables in the middle of a function, and you can use a little bit of RAII style for some convenient things (such as profiling, or making sure a block of memory gets freed when exiting from a function that has a complex control flow). Sometimes it might make sense to use C++ templates a little bit, to reduce source code duplication (you'll still have near-duplicated code in the binary, but sometimes that is what you want, and templates might be less ugly than macros for your specific use-case.. it varies). For some kinds of code, std::vector is a much simpler way to get obviously-correct code than some manual malloc/free would be. Another pure win is std::sort, you can use it on ordinary arrays or std::vector or whatever else, and it will usually be faster than qsort() or anything else.

So you do advocate using C++'s features and even STL where they make sense. That was exactly my argument. There's no good reason why Fruit couldn't use std::string instead of char* for example. And I doubt you could make Stockfish much faster by converting the code to the kind of "better C" used in Fruit. BTW, you can declare variables in the middle of a function in C, unless you're using some ancient C89 compiler.

A big benefit of using C++ as a "better C" is that performance-wise, it doesn't cost any more than the equivalent C code. Be wary of constructors that initialize everything in a performance-critical struct (e.g. something you'll allocate a vector of thousands of, should have a do-nothing ctor or just a default ctor). Don't use STL (other than for convenience in non-performance-critical code), don't ever use the iostreams garbage, turn of RTTI and don't ever use dynamic_cast, and DISABLE EXCEPTIONS at least for your 32-bit builds. (In 64-bit builds the performance cost of having them enabled is pretty much zero, just a tiny bit of lookup-table bloat). Don't even use virtual methods unless you would have used some sort of dispatch anyways in your C code (but generally you should try not to use them in any performance-sensitive code). C++ is just as fast as C if you don't use any of the language features that have some extra cost. If you stick to "C subset" plus some conveniences like inline methods, then you can't really go wrong!

I don't really like this "be wary of this and that" mentality of premature low level optimization. I think it's better to just first write the code without worrying too much about low level performance and then profile the code. After profiling the bottlenecks can be optimized.

ilari wrote:And not all code has to be fast. In a typical desktop application only a very small percentage of the code is performance-critical.
This really varies with application domain. Its mostly true in computer chess, but (for example) mostly false in games. Modern games spend about 90% of their time spread across 50% of the code (millions of lines of it).

It's clear that we are developing completely different kinds of programs. I don't view modern hardcore computer games as typical desktop applications. But I know how crazy it is at companies like Naughty Dog where they're trying to squeeze every drop of performance out of very complex hardware using Assembly or whatever. I recently read an article about the development of Starcraft. The stuff about linked lists sounds exactly like the kind of hell that I will be avoiding like the plague.

diep · Post by **diep** » Thu Oct 25, 2012 3:08 am

wgarvin wrote:
diep wrote:Wylie, you shouldn't make yourself any illusions about huge companies that introduce a new concept they try to sell and then googling about its disadvantages. You'll get some weirdo websites then instead of what you want most of the time.

There is some extreme examples of that.
That's not quite what I meant. I meant that a few years ago I used to be able to google for things like "x64 sign extension movsxd" and get a bunch of forum threads where assembly enthusiasts or compiler writers or whatever, were discussing the topic in some useful detail. And sometimes thats still the case. But in this case, I didn't find much useful info at all with a few minutes of google searching. I think maybe hand-rolled assembly is not as popular nowadays as it was even ten years ago.

I was just surprised that there's no detailed discussion somewhere from sometime last year, about sign extension in x64, already indexed by google. Everyone is trusting the compiler to do great optimizations on their code, but the source you feed to it can make a big difference to the quality of the final result. We can't be the only community of programmers who wants to know details like this.

google for latency benchmarks of HPC networks.

HPC has a bigger turnover today than 10 years ago. There is QDR infiniband now and FDR infiniband and a few other standards still are alive.

there is 2 total different manufacturers for infiniband. Qlogics and Mellanox, with Mellanox being largest.

Yet where are all those benchmarks? Only something these companies wrote themselves you can find - oh and dozens from 8 - 10 years ago or so.

All sorts of technical information has dried up simply. Don't ask me why - i don't have the answer.

This has for sure a DIFFERENT reason.

Another example. Post in the HPC mailing lists a simple yet technical question.

Like i posted one some months ago: "what's the best method to do memory migration using MPI?"

0 answers my friend.

As if you ask them to betray war secrets - that's how everything low level and HPC type gets treated now.

The guys posting most - just like in these forums - who are all these new guys?

Let me do however 2 guesses
a) many of those low level jobs and HPC jobs went to India and similar 3d world nations

The number of Indian guys (the ones LIVING in india) who read this and other technical forums - you can count on 1 hand - most of them aren't for 'hobby' answerring your questions on those forums is my guess...

b) the new approach in USA that' sthere for a few years now where it's nearly impossible to easily talk to scientists in USA, meanwhile all sorts of jobs and contracts they had in past - nowadays you need to have an US passport to carry out the job or something - that total distrust to everything that's outside the borders of the USA - it's the next big reason for this new wave of total technical darkness - if you ask me.

Who seeds black darkness gets black darkness back if you ask me.

Rebel · Post by **Rebel** » Thu Oct 25, 2012 2:08 pm

wgarvin wrote:
ilari wrote:C++ should not be thought of as better C - it's a different language and it should be used differently. Fruit is a good example of how to use C++ without really using C++, and Stockfish is a great example of how C++ should be used.
I completely disagree with this. C++ is a horrible language. If you have to write a high-level program and you can't just use something like Python, then I guess C++ can get the job done. But for low-level code, most of its language features are a double-edged sword, making it easy to bleed away the performance advantage of C without even noticing.

Thanks for good post on this Wylie. Obviously our ASM background plays a major role in our dislike of C++. ASM programmers are code freaks and hate to give up control to the compiler. And that's what C++ (or any other high-level language) is about, selling your programmer soul to another piece of software

Nevertheless I feel that in the end we are going to lose the battle, to write a chess engine in pure 64-bit ASM code (with all the new possibilities) might easily be counter productive in terms of time investment versus a disappointing speed gain. The latter maybe not today, but in the long end compilers are perhaps able to outsmart the human mind.

Snipped all your good stuff.

Rebel · Post by **Rebel** » Thu Oct 25, 2012 2:39 pm

ilari wrote:I don't really like this "be wary of this and that" mentality of premature low level optimization. I think it's better to just first write the code without worrying too much about low level performance and then profile the code. After profiling the bottlenecks can be optimized.

Exactly my way of working. Whenever I want to add something speed sensitive (say an eval term) I program it in C using C in its simplest form. Then I test it and when OK the piece of C code is disassembled and then manually optimized and last, imported in the ASM part.

C++ coding guidelines

Re: C++ coding guidelines

Re: C++ coding guidelines

Re: C++ coding guidelines

Re: C++ coding guidelines

Re: C++ coding guidelines

Re: C++ coding guidelines

Re: C++ coding guidelines

Re: C++ coding guidelines

Re: C++ coding guidelines

Re: C++ coding guidelines