Coding Guidelines

Introduction

These are a set of basic rules that I developed at a previous company that I worked for to try to minimise the incompatibilities in interfaces between different people's code by adopting a standard set of coding rules and guidelines. They are intended to be a minimum set since I don't see the point in dictating silly things like code layout.

1. Use the STL

Basic data structures should be implemented using the STL template classes wherever they are appropriate. Most of these are Abstract Data Types (ADTs) implemented using templates but there are other less-used headers too. You should not write your own list structures, sets etc.

String processing should use the STL string type in preference to char*. Indeed, char* should be considered obsolete.

The STL provides:

stringdynamic string type
vectordynamic array ADT
listbasic list ADT
dequedouble-ended queue ADT
mapassociative storage ADT
multimapas map ADT but with multiple entries per key
setassociative storage ADT
multisetas set ADT but with multiple entries per key
algorithmsorts, searches etc. acting on ADTs
functionalpredicates etc. acting on ADTs
iostreamI/O system - but see notes below on I/O

There are a number of books around on using the STL. However, I can't recommend one because I haven't found a good one yet. A reasonably-useful online guide is to be found at SGI's website.

2. Use STLplus

The STLplus library has three objectives: it extends the STL by providing extra template classes; it deals with portability issues as discussed in the section "Make your code portable"; and it provides a lot of utilities which you will find useful.

As this documentation is now part of the STLplus documentation, I need say no more!

3. I/O System

In C++ you have a choice between three I/O systems (unistd from C, stdio from C and iostream from C++). This can cause incompatible interfaces. Therefore it is good practice to standardise on one.

However, the argyument is complicated by the fact that I believe that a fourth choice is the best, this being the STLplus TextIO system. This is similar to and has all the advantages of the C++ iostream classes but with some other major advantages too. All the components in STLplus that use I/O or provide printing facilities do so through TextIO, so standardising to this package ensures consistency not only between different people's code, but with STLplus itself.

If you've used iostream, then you already know how to use most of TextIO. The online documentation will fill in the rest. he documentation also explains why I wrote TextIO to replace IOStream.

There is also support in the STLplus library for a binary dump format. See the persistence functions for details.

4. Modularise

The recommendation is to have one subsystem declared per header. A subsystem may be a class, with possibly sub-classes declared in the same header. Or it could be a collection of closely-related functions. For that matter it could be one function. The header file should have the same name as the subsystem (no naff abbreviations, you're not limited to 8 letter filenames anymore and haven't been for decades) with the extension .hpp for C++ headers and .h for C-only headers.

Source code should be contained in a file with the same name as the header but with a .cpp extension for C++ and a .c extension for C-only.

I recommend not putting template implementations in headers since headers also need to be human-readable. It is amazing how many headers are not human-readable and I consider this incompetence. Hoever, there is a requirement that template implementations are included in the same way as headers, so shouls not be put in .cpp files. My solution to this is to have a third file type with a .tpp extension which contains template implementations. This I #include at the end of the .hpp file that declares the templates so that any code calling a template will have access to both declaration and implementation.

5. #Include Rules

Include only the minimum set of headers in a header file needed to make all the types used in the header available. Any additional headers needed in the C++ body should be included in the body file. This minimises the number of includes that someone including your header will inherit from you and is considered friendly.

Use a sentinel within each header so that the includes in a file become order independent. A sentinel puts a pre-processor conditional around the whole header file which means that, no matter how many times it is included, the contents will only be included once. At the very start of the file (I mean lines 1 and 2), for a header called my_stuff.h, the sentinel would look like this:

#ifndef MY_STUFF_H
#define MY_STUFF_H

and at the very end of the file (and I mean the very last line):

#endif

The name of the sentinel here is created by uppercasing the filename and changing the dot to an underscore. Some people add a double leading underscore on the name. This is perfectly acceptable. The aim is to ensure that all sentinel names are different. The second style is:

#ifndef __MY_STUFF_H
#define __MY_STUFF_H

Finally, never include the "using namespace std" clause in a header. All STL classes referred to in the header should have the std:: namespace prefix added - for example, string should be referred to as std::string within headers. The reason for this rule is that it is considered unfriendly to people who may wish to include your header in their code to dump all of the std namespace into their code against their wishes, which is what the using... clause does.

In body files, you are free to do what you like with namespaces, since no-one includes body files. By the way, the preferred way of including a C++ system header is:

#include <string>

Note the lack of an extension.

Also note that for C system headers, there are two forms. The normal form still works:

#include <stdlib.h>

This is just as in C and makes the stdlib functions and types available. However, you can drop the ".h" and add a "c" prefix and it puts the header into the std:: namespace:

#include <cstdlib>

If you now add the "using namespace std" then you're back to where you started, but you could alternatively refer to the contents of stdlib with the std:: prefix.

6. Exceptions

Exceptions should only be used for error conditions. They should not be part of the normal execution path of a program. It may seem that an exception can be used to return a value of a different type from the declared return type of a function, but this is extremely bad practice since it obfuscates your code. It also has performance implications, because compiler writers are under absolutely no obligation whatsoever to make exception handling fast. Indeed, there is an unwritten rule that code optimisation should focus on speeding up the normal operation of a program, not the erroneous operation so the implementation of exception handling is usually designed to minimise impact on normal operation.

Mere user errors or input errors should be indicated by returning an appropriate value from a function, setting a flag, dropping out of a loop or other 'normal' C++ operations. Only program failures should be handled by exceptions.

7. New/Delete versus Malloc/Free

You should always delete an object created using new and always free an object created using malloc. This is because the C++ memory manager is not guaranteed to be compatible with the C memory manager, even though it usually is. Note there is a difference between "guaranteed" and "usually". Just because "it works" with your compiler does not make it correct. It will probably not work with another compiler or a later edition of your current one.

Furthermore, realloc should only be used on memory allocated with malloc, never memory created with new.

You need to keep open the possibility of adding either a cached or debugging version of the memory manager. For example, a cached memory manager could speed up new and delete but in a way which could make them incompatible with malloc and free.

The easiest way of ensuring this rule is to only use new and delete and consider malloc, realloc and free to be obsolete, which of course they are, along with most of the C runtime.

8. No Static Objects

You may wish to repackage some or all of your code as a shared library (DLL in Windows-speak) so all code should really be written with this possibility in mind. There can be problems with globals (specifically class globals which need to be constructed) in shared libraries and these problems vary between operating systems and compilers.

My preference is to try to avoid the problem by avoiding statics altogether. This is easy when you start from scratch, but with legacy code it is not always as simple as it sounds.

Fortunately, basic types such as bool, int and all pointers are not affected by this problem. Thus, if you really must have a global class object, make it a global pointer to a class object and dynamically allocate the object on first use:

static my_stuff* stuff = 0;

bool do_something_now(...)
{
  if (!stuff) stuff = new my_stuff(...);
    ...
}

9. Make your code portable

I believe that everyone is responsible for writing portable code at all times. It is not an SEP (someone else's problem). You do not know what will happen to your code in the future - notice for example how Linux is faring now against Windows. Ten years ago Unix-type OSs were seen as scientific-interest only - now they are mainstream. Do you want to be adaptable in the future? Then make your code portable!

There are three issues relating to portability:

1) Portability between compilers

On Windows you might use Visual C++ or you might use Gnu gcc using the Cygwin port. On Linux you'll be using Gnu's gcc. Therefore all code must compile with both compilers. In practice this is pretty easy since there are only slight differences between them.

You do need to take compiler portability into account when structuring your directories for storing source code. For gcc users, there is a standard make file used for STLplus which will work across all platforms and you are strongly recommended to use that. However, it does impose some rules on how you structure your source:

Each project should be in a directory with the same name as the project (no spaces in the name). Then, beneath that, there may be an optional second-level directory called "source". All source files should be in that subdirectory and there should be no further source code subdirectories below that.

Following these conventions means that switching between compilation systems will be easier than falling off a dog.

2) Portability between Run-time Libraries

You should only use standard library functions - ANSI C run-time library and the standard C++ run-time library. You should not use any non-standard system calls. Nor should you use any extensions to the libraries, such as extra classes that a compiler vendor may have added to the STL. Nor should you use non-standard 'features' of the standard library functions.

3) Portability between Operating Systems

Rule (2) goes a long way to meeting this rule, but there are some things that you have to do which are different between Windows and Unix. The three specific areas that could affect your development are in file-system handling, internet access and in subprocess handling. These are solved by using the STLplus library interfaces for the File System, TCP and Subprocesses respectively. These implement both a Windows and a Unix version of these subsystems accessed through a platform-independent interface.

If you need to add other functionality that is platform-specific, then you should think about providing a Unix and a Windows implementation. You should encapsulate (that means hide) it behind a common platform-independent interface in the same way as the above STLPLus subsystems. There should therefore be no "#ifdef WIN32" or other platform-specific compiler switches anywhere else in your application code.

10. Avoid the C Runtime Library

The truth is that the C runtime library is obsolete. Yes, it is. Practically all of the functionality of the C runtime is provided in a better, more effective and more robust form in the C++ runtime library. For example, the I/O routines of stdio are superceded and vastly improved by iostream and by the STLplus TextIO classes.

Furthermore, there are some functions in the C runtime that are positively dangerous and should never be used. Their existence in a program is positive proof of an incompetent programmer. An example is the monster called sprintf. Let me explain why this should never, ever, ever ever be used. Ever.

First look at the interface:

int sprintf(char *, const char *, ...);

The first argument is a char* buffer to print into. The function prints text into this buffer according to the format string which is the second argument and the argument-vector parameters represented by the elipsis (...). What's missing is a parameter that tells sprintf how long the buffer is - so it has no way of knowing if an overflow happens. If the buffer is not long enough, then the function quite happily runs off the end and corrupts other data structures in memory. This kind of memory bug is extremely difficult to diagnose and fix. A common bodge (yes, it is a bodge, not a solution) is to simply make the buffer very large. However, that just pushes the problem further away, it doesn't fix it. Consider the case where one of the parameters in the format string is a command-line argument. You as a programmer have no control over the length of this argument. Therefore you have no way of deciding how big the buffer should be. Any "guess" at the size is a bodge.

This horror of a function is commonly exploited by virus writers who send very long requests to web servers so that sprintf overflows its buffer and overwrites program code, replacing it with virus code. If sprintf did not exist in this form, we'd probably have fewer viruses.

Both iostream and TextIO provide functions for formatting text in a string that has no potential overflow problems. Therefore there is no justifiable use for sprintf.

In any case, using char* for string handling is obsolete because you have to write buckets of code to constantly check for possible buffer overflows. You should be using std::string which dynamically allocates more memory as required, so you can get on with writing the real code.

Note also that rule 7 explained why malloc/free/realloc are obsolete and potentially dangerous.

Just about the only useful C runtime header left is ctype.h which defines classifying functions for char. For example, you test isdigit(ch) to see if the character is a digit (0 to 9). Even these functions though should be superceded by the C++ locale classes which provide classifications relevant to the user's spoken language.