C++ coding guidelines

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: C++ coding guidelines

Post by rbarreira »

ilari wrote:Google's C++ style guide is pretty good, and has good reasoning behind every rule.
Indeed it's a very good guide. But some of the rationale behind the rules makes you go WTF at C++ (not at the guide itself), for example:

"Overloading also has surprising ramifications. For instance, if a class overloads unary operator&, it cannot safely be forward-declared. "

Ouch.

I like that this guide also focuses a bit on making C++ more "what you see is what you get", with rules such as "All parameters passed by reference must be labeled const. "
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: C++ coding guidelines

Post by diep »

rbarreira wrote:
ilari wrote:Google's C++ style guide is pretty good, and has good reasoning behind every rule.
Indeed it's a very good guide. But some of the rationale behind the rules makes you go WTF at C++ (not at the guide itself), for example:

"Overloading also has surprising ramifications. For instance, if a class overloads unary operator&, it cannot safely be forward-declared. "

Ouch.

I like that this guide also focuses a bit on making C++ more "what you see is what you get", with rules such as "All parameters passed by reference must be labeled const. "
Seems google is very enthusiastic about using the underscore everywhere, including at member variables.

So small screens where it's tough to see that underscore you cannot program google code at nor see it well. Also half of the programmer population has problems reading most screens, that's bad luck with google.

In fact i remember i once was somewhere, and i figured out that the underscore was not visible at all at this specific characterset used by those Apple machines. Lucky in the adobe code base that isn't a major problem.

But well i guess google ain't apple :)

I'm not a big fan of underscores myself, though i SOMETIMES use 'em myself as well. Yet google is total based upon using the underscore everywhere.

The restriction on the variable names, basically not allowing n_xyz where xyz is a full name, as just the n_ is not allowed, i'm not a fan about either.

For example in my own code in local loops so within say X lines of code it should be not a problem to use 'n' as a counter or 'i' or 'x' or 'y', as that's total trivial usage of such a variable.

The styleguide doesn't allow this from Google!

In all they restrict things a lot - which is good in C++.

Yet see how huge their styleguide has become - maybe they should drop C++ and strip down C++ until it's just a sort of C programming language only allowing basic classes.

Another surprise is not allowing multiple inheritance. Isn't that one of the MIRACLES of C++ which was praised so much in the 90s?

I tend to remember a whole lot of sheets where C++ was promoted BECAUSE it has multiple inheritance.

One of the problems in Java, which doesn't have multiple inheritance is that you start to create huge amounts of classes which all cooperate somehow together and influence each other, causing only the very best Java architects to be hired as otherwise the code gets too slow simply (besides the huge slowdown it has anyway because of all those classes).

Right now Google doesn't suffer yet from the problem of not hiring good people, but they already start to get this problem of course - as people from specific nations that are not exactly 1st world nations, they are moving up in the ranks and also start hiring personnel - so they just hire soldiers whom i'd never give a contract job coding up something critical...

It's a nerds styleguide, though better than most styleguides i read with respect to C++, it's really forcing you to write tons of characters.

My only real big criticism is: why disallow multiple inheritance and why make the underscore this important?

The underscore really is a nerd thing... ...and it's total different from the mnenomic based styleguides that we've seen past 25 years.

Putting an underscore behind every member variable is pathetic way of doing things IMHO. This could have been solved better.

For example always starting with the letter 'm' for member variables.

Also note that to write the _ you have to first push in the shift with left hand and then with right hand type the underscore.

So this is NOT an ergonomic styleguide at all.
This is a RSI promoting styleguide.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: C++ coding guidelines

Post by Sven »

abulmo wrote:
rreagan wrote:Even if the class is tiny (an int wrapper), it may make sense to have a class. For example, in my engine I have a class for Color, PieceType, Piece, Rank, File, Square, and so on. In most engines those are just ints, but there is an invariant to uphold. A Color should be one of three values (white, black, none). If a Color is anything else, I want to know about it, because it's probably a bug.
I do not like this approach. IMHO, it concentrates on wrong details. It is well known that the number of bugs a programmer emits is proportional to the length of the program. Wrapping everything into class will add a complexity that translates into more bugs in the program. Always checking the invariants will also make the program slower. All this is the way of programming bloated code.
If I didn't use classes to wrap these simple types, I would use enums like Stockfish.
enums is a better approach, although I think C++ is too pernickety about them.
I dislike those simple-type-wrapper classes, too. "Rank" and "File" (if meant as numbers and not as sets of squares) are best modelled by typedef's to some int type. The same applies to "Square", if that is meant as a number, too (I call it "SqrId" or "SquareId"). "Piece" sounds like an object with state (e.g. square id, color, piece type), so it is a possible class candidate, although I believe that modelling pieces as objects *might* tend to a slightly inefficient design.

"Color" and "PieceType" are typical cases for an enum type. But a simple typedef together with a set of constant definitions serves the same purpose, only lacking a little bit of type safety. If the latter is desired then the Stockfish solution including a couple of simple operators is the way to go.

But type safety is one thing, it still does not protect a program against one of its biggest enemies: bugs in itself causing memory corruption. And to detect these you *need* to check invariants (assertions) in a DEBUG version (of course it would be too slow for a release version). In my view this is no "code bloating", it is essential for safe programming. The best programs are not necessarily the shortest ones but often those with fewest bugs and optimal readability.

Sven
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: C++ coding guidelines

Post by diep »

Sven Schüle wrote:
abulmo wrote:
rreagan wrote:Even if the class is tiny (an int wrapper), it may make sense to have a class. For example, in my engine I have a class for Color, PieceType, Piece, Rank, File, Square, and so on. In most engines those are just ints, but there is an invariant to uphold. A Color should be one of three values (white, black, none). If a Color is anything else, I want to know about it, because it's probably a bug.
I do not like this approach. IMHO, it concentrates on wrong details. It is well known that the number of bugs a programmer emits is proportional to the length of the program. Wrapping everything into class will add a complexity that translates into more bugs in the program. Always checking the invariants will also make the program slower. All this is the way of programming bloated code.
If I didn't use classes to wrap these simple types, I would use enums like Stockfish.
enums is a better approach, although I think C++ is too pernickety about them.
I dislike those simple-type-wrapper classes, too. "Rank" and "File" (if meant as numbers and not as sets of squares) are best modelled by typedef's to some int type. The same applies to "Square", if that is meant as a number, too (I call it "SqrId" or "SquareId"). "Piece" sounds like an object with state (e.g. square id, color, piece type), so it is a possible class candidate, although I believe that modelling pieces as objects *might* tend to a slightly inefficient design.

"Color" and "PieceType" are typical cases for an enum type. But a simple typedef together with a set of constant definitions serves the same purpose, only lacking a little bit of type safety. If the latter is desired then the Stockfish solution including a couple of simple operators is the way to go.

But type safety is one thing, it still does not protect a program against one of its biggest enemies: bugs in itself causing memory corruption. And to detect these you *need* to check invariants (assertions) in a DEBUG version (of course it would be too slow for a release version). In my view this is no "code bloating", it is essential for safe programming. The best programs are not necessarily the shortest ones but often those with fewest bugs and optimal readability.

Sven
Where is the garantuee that enum types are 'unsigned integer' rather than signed, which is SLOWER in 64 bits?
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: C++ coding guidelines

Post by Sven »

diep wrote:Where is the garantuee that enum types are 'unsigned integer' rather than signed, which is SLOWER in 64 bits?
Nowhere, and you don't need such a guarantee. The compiler decides about the integral type it uses for an enum type depending on the enum values used by it.

Why should either unsigned int or signed int be "slower" than the other one on 64 bit systems? Do you have any hard proof for that thesis?

Sven
rreagan
Posts: 102
Joined: Sun Sep 09, 2007 6:32 am

Re: C++ coding guidelines

Post by rreagan »

abulmo wrote:
rreagan wrote:Even if the class is tiny (an int wrapper), it may make sense to have a class. For example, in my engine I have a class for Color, PieceType, Piece, Rank, File, Square, and so on. In most engines those are just ints, but there is an invariant to uphold. A Color should be one of three values (white, black, none). If a Color is anything else, I want to know about it, because it's probably a bug.
I do not like this approach. IMHO, it concentrates on wrong details. It is well known that the number of bugs a programmer emits is proportional to the length of the program. Wrapping everything into class will add a complexity that translates into more bugs in the program.
I doubt there will be bugs introduced because of the added lines containing "class", "public", and "private". I have some member functions for converting to and from strings, which I would also have without classes. There is not much extra code to use a class wrapper.

It's also trivial in C++ to write a template class that will handle all such "fruit list" types that are basically int-wrappers. If code bloat were a real problem here (it's not), I could take care of all of these with a single template class, which would cut the lines of code quite a bit. Even without a template class, the extra code is minimal, and it seems clear to me that it will result in more solid code, preventing far more bugs than the additional boilerplate code introduces.
Always checking the invariants will also make the program slower. All this is the way of programming bloated code.
I have two kinds of checks. One is a typical ASSERT macro, and another is a TEST macro. The ASSERT macro runs as you would expect, only in the debug build. The TEST macro runs at all times, but only during program startup. There is basically a sanity test of several thousand TEST statements that runs every time the program starts.

I would also argue that it decreases the total lines of code, because the ASSERT macros all live inside the classes, so there is little need for them in the calling code. The class instance checks its values before it returns them. If I had assert macros in the calling code, as you would be forced to do if you did not use classes, you would have an order of magnitude more assert statements, and as you pointed out, that is sure to lead to more bugs with such a large increase in lines of code. Or I'm likely to forget to add all of the correct assert statements in some calling code here or there, and that bug goes undetected.

It's a little extra work to do it this way, and it takes a little longer to get things off the ground, but IMO it's well worth it later on.

In any case, you at least must use the enum approach and not just a typedef. A typedef doesn't add any type safety, which is sure to let bugs in. And you need to place the enum inside a namespace or struct, to avoid polluting the global namespace with your functions. And by the time you add that, you only need to add "public" and "private" and you have a class :P
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: C++ coding guidelines

Post by diep »

Sven Schüle wrote:
diep wrote:Where is the garantuee that enum types are 'unsigned integer' rather than signed, which is SLOWER in 64 bits?
Nowhere, and you don't need such a guarantee. The compiler decides about the integral type it uses for an enum type depending on the enum values used by it.

Why should either unsigned int or signed int be "slower" than the other one on 64 bit systems? Do you have any hard proof for that thesis?

Sven
int x = SQUARE_A5, array[25],y;

y = array[x];

In 32 bits this is faster than 64 bits.

In 64 bits the 'signed' x must get sign extended first to index in the array,
as 0xffffffff means -1 in 32 bits yet it's 2^32 - 1 for a signed 64 bits integer,
so the compiler has no choice but to generate a sign extension instruction.

So that's why most code is slower in 64 bits than 32 bits if it isn't using 64 bits datastructures. Note that Nalimov had predicted this, but hadn't given the explanation why.

It is very bad that this is the case, as i had standardized everything upon integer in the 90s.

So datastructures that used to be 'unsigned char', i all standardized them onto 'integer' as integer is supposed to be fastest.

Yet in x64 it isn't.

Now the easy way out is use 'unsigned int' everywhere where possible.

Yet then we stumble upon the problem that:

int x = 5;
unsigned int y = 5;

if( x == y ) // result undefined

Which is why i had standardized everything at 'integer' in the first place, to avoid mixing types.

Now some will say then: "well why not use 64 bits integers".

Ok that has problems if you still want to compile 32 bits, but it also means a huge increase in size of your program and with the tiny L1 caches, processors simply aren't ready for it.

In fact i7 is even less ready for that as it hosts 2 threads onto its tiny L1 cache.

As for the x64 problems, ask Gerd for further details here.

Note that another problem of processors is the fact they are total optimized to be using just a few registers, meanwhile they do have plenty of rename registers to hide the problem of not enough registers in 32 bits, so the advantage of having more registers in 64 bits gets totally voided by that.

Though it would be interesting to see benchmarks there of more recent processors like i7 and newer i7.

So in itself if you port code in a normal manner, using the default 'int', it's slower in 64 bits.

that said, realize that in NORMAL software you simply cannot typemix too much. That's asking for trouble. So you HAVE to standardize on what the standards says is fastest. Which is 32 bits integer simply.

So i have no 100% plan yet how to solve this for Diep, as i really dislike mixing signed with unsigned.

One of the ideas i play with is to make diep's search and evaluation 100% 'unsigned int' everywhere. Also in evaluation.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: C++ coding guidelines

Post by wgarvin »

ilari wrote:C++ should not be thought of as better C - it's a different language and it should be used differently. Fruit is a good example of how to use C++ without really using C++, and Stockfish is a great example of how C++ should be used.
I completely disagree with this. :lol: C++ is a horrible language. If you have to write a high-level program and you can't just use something like Python, then I guess C++ can get the job done. But for low-level code, most of its language features are a double-edged sword, making it easy to bleed away the performance advantage of C without even noticing.

If you want to write "nice C++" then go ahead, but writing "proper C++ code" with lots of RAII, virtual functions, default constructors that initialize everything to zero or NULL, etc. is a good way to get slow programs. Plus nobody, and I mean nobody, can reliably write exception-safe C++ code that does anything tricky. Exception-safety in C++ is ridiculous and difficult. If you care about having correct programs, a much safer way to write them is to turn exceptions off and write all of your code without them!

Anyway, for a programmer familiar with C it is very easy to use C++ as just a "slightly more convenient C". You can put little inline accessor methods in your structs, you can declare your variables in the middle of a function, and you can use a little bit of RAII style for some convenient things (such as profiling, or making sure a block of memory gets freed when exiting from a function that has a complex control flow). Sometimes it might make sense to use C++ templates a little bit, to reduce source code duplication (you'll still have near-duplicated code in the binary, but sometimes that is what you want, and templates might be less ugly than macros for your specific use-case.. it varies). For some kinds of code, std::vector is a much simpler way to get obviously-correct code than some manual malloc/free would be. Another pure win is std::sort, you can use it on ordinary arrays or std::vector or whatever else, and it will usually be faster than qsort() or anything else.

A big benefit of using C++ as a "better C" is that performance-wise, it doesn't cost any more than the equivalent C code. Be wary of constructors that initialize everything in a performance-critical struct (e.g. something you'll allocate a vector of thousands of, should have a do-nothing ctor or just a default ctor). Don't use STL (other than for convenience in non-performance-critical code), don't ever use the iostreams garbage, turn of RTTI and don't ever use dynamic_cast, and DISABLE EXCEPTIONS at least for your 32-bit builds. (In 64-bit builds the performance cost of having them enabled is pretty much zero, just a tiny bit of lookup-table bloat). Don't even use virtual methods unless you would have used some sort of dispatch anyways in your C code (but generally you should try not to use them in any performance-sensitive code). C++ is just as fast as C if you don't use any of the language features that have some extra cost. If you stick to "C subset" plus some conveniences like inline methods, then you can't really go wrong!
ilari wrote:And not all code has to be fast. In a typical desktop application only a very small percentage of the code is performance-critical.
This really varies with application domain. Its mostly true in computer chess, but (for example) mostly false in games. Modern games spend about 90% of their time spread across 50% of the code (millions of lines of it). Megabytes of code gets run every frame (30 or 60 times per second). So in games, we try our best not to write stupid-slow code anywhere, because it all adds up and you get a sort of "performance death by a thousand cuts". Example: we use our own container classes with the same API as the STL containers such as std::vector. However, they are implemented differently with the goal to minimize generated code size because when you use thousands of different types with std::vector, off-the-shelf STL implementations generate literally megabytes of unnecessary code bloat, which makes the program run slower (due to paging, TLB misses, icache thrashing and so on). For specific use-cases like that, using some general-purpose library like STL is a good way to waste memory and CPU cycles. :D

Anyway, for computer chess, there are certainly parts of the program where performance is irrelevant (winboard/uci protocol stuff, top-level decision making etc.) but the "major" parts of the program such as evaluation and search do need to be fast. Move generation seems to be slightly less important, but to have a fast program you probably want that to be fast too. Code carefully, and understand the exact semantics of the code you write! I suggest not using a C++ feature unless you understand its performance consequences pretty well. Which really means, if you want to use a feature, try it out and investigate the performance consequences: is it slower than straight C? how much? Look at the generated assembly code. Is the compiler doing something stupid? A lot of people rely on the optimizer to just magically fix their code so it is fast. But there's two problems with that: (1) optimizers are really kind of dumb, plus the compiler implementer had to pick and choose what stuff was likely to be worth the CPU cycles at compile time, so even if they theoretically could do a certain type of optimization, that doesn't mean they actually do it. (2) sometimes language rules actually prevent the compiler from doing the "obvious" optimization. E.g. the possibility of pointer aliasing might prevent it from moving some loads and stores around. Even if you the programmer know that there is no aliasing, the compiler often has no way to know for sure so it ends up doing the "safe" thing.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: C++ coding guidelines

Post by wgarvin »

diep wrote:
Sven Schüle wrote:
diep wrote:Where is the garantuee that enum types are 'unsigned integer' rather than signed, which is SLOWER in 64 bits?
Nowhere, and you don't need such a guarantee. The compiler decides about the integral type it uses for an enum type depending on the enum values used by it.

Why should either unsigned int or signed int be "slower" than the other one on 64 bit systems? Do you have any hard proof for that thesis?

Sven
int x = SQUARE_A5, array[25],y;

y = array[x];

In 32 bits this is faster than 64 bits.

In 64 bits the 'signed' x must get sign extended first to index in the array,
as 0xffffffff means -1 in 32 bits yet it's 2^32 - 1 for a signed 64 bits integer,
so the compiler has no choice but to generate a sign extension instruction.
Did you actually compile that code? I ask because int is 32-bits on most 64-bit platforms, including MSVC/Intel/GCC on x64 under either Windows or Linux... any of those compilers will probably use MOVSX to load the byte and sign-extend it to 32-bits (which costs no more than a 32-bit or 64-bit load would). The performance will be the same whether its signed or unsigned.

If you used a 64-bit type though (or int is 64-bits on your compiler) then its true the sign extension will add a tiny bit of latency there (probably MOVSXD, one extra micro-op to sign-extend the 32-bit value into 64-bits). But that's part of the point of the "int" type: its 32-bits on these compilers for a reason, to avoid costs like this in the 95% of code that doesn't need full 64-bit values. :lol:

Also, for any array larger than that example one, even a small cost to sign-extend is easily outweighed by the benefit of fitting them in fewer cache lines. Even with large arrays and random access, using smaller types is probably pure win (because of fewer L2 misses, or fewer TLB misses... etc).

However, I think the general point Vincent was really making was "be careful with your types", because mixing them can sometimes cause the compiler to insert unnecessary extra instructions. Don't just assume the compiler is smart enough to optimize them out: sometimes it can't. Sometimes you have to help it, by being clearer about what code you want it to generate (in this case, don't mix different sizes of types in your source code... read small values from an array into larger-sized local variables of type "int" or "int32_t" or whatever, and do all math on those).
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: C++ coding guidelines

Post by abulmo »

Did you actually compile that code? I ask because int is 32-bits on most 64-bit platforms, including MSVC/Intel/GCC on x64 under either Windows or Linux...
Pointers are 64 bits. Vincent Diepeveen's example uses an array of ints and thus does pointer arithmetics using 64 bits. If 64-bit pointer tends to slow down a program, on the other hand, the 64 bit ABI using registers instead of the stack to pass parameters usually makes the program much faster. Linux has a new (and somewhat experimental) 32 bit ABI (x32), similar to the 64-bit ABI but with 32 bit pointers, perhaps the best of both world?
Richard