slashbinbash.de / Structure and Complexity of Fundamental C++ Features

I have been working in a large C and C++ code base for the last 10 years. The code base is around 30 years old, and it requires you to have a broad knowledge of different C and C++ versions. Working in an old code base for so long, you get to see which design decisions panned out and which didn't. This lead me to think about the technical reasons for why certain C++ language features exist, and what long-term impact they have in terms of code complexity and code structure.

When we hire new C++ developers, we eventually expect them to write modern C++ in a legacy code base that they cannot easily rewrite, for several reasons. Many of them haven't worked with C or C++98. They don't know why certain code looks the way it looks. They don't know what patterns to look for in the language, and how to map these patterns to modern C++. They often lack the experience to know which constructs are worth keeping, and which ones need replacing.

What are the goals of this article?

I want to show that there are fundamental ideas and concepts in computer programming, because of the types of problems that we are trying to solve.
I want to show how these concepts translate into code, how the evolution of these concepts looks like between C, C++, and modern C++, and why this evolution was necessary from a technical standpoint.
I want to impart on the reader the value of simple code, the purpose of structure, the cost of abstractions, and how important it is to be able to reason about code from the local context.

Who is the target audience?

If you are a beginner in C++, then this article is probably NOT for you. Please check out the following web site to get started with C++: https://isocpp.org/get-started
If you are an intermediate C++ developer who is thrown into a legacy code base, and you are trying to make sense of it, then this article might be for you. You will learn some patterns of the C and C++ language, what concepts stand behind them, and how to map them to modern C++.
If you are an advanced C++ developer who is contemplating the meaning of programming, why languages are designed the way they are, and what long-term impact certain design decisions have on code, then this article might be for you. I discuss several of these topics throughout the different chapters.
If you are an expert C++ developer, then this article is probably NOT for you. This article does not discuss the state of the art, or any future developments.

Disclaimer: The code examples in this article are used to illustrate a point. Don't take them as advice on how to write code.

Overview

Introduction
Function
Structure
Statement
Enumeration
Union
Array
Slice
Error Handling
Namespace
Class
Ownership & Lifetime
Code Structure
Conclusion

Introduction

For some C++ language features, the answer to why they exist is trivial. If you need to perform different operations based on a condition, you will most likely use some kind of conditional statement. If you need to iterate over an array of items, you will probably use some kind of loop. There is no debate about the usefulness of a conditional statement or loop because they serve a concrete purpose.

For other C++ language features, the answer to why they exist is not trivial at all. This means that the concept is more complex, you have to spend more time to understand its purpose and when it is worth using such a concept in your code. There are language features that look useful at first, but they turn out to be terrible to maintain and debug. You might not be able to revert the decision of using a certain language feature without changing large parts of your code.

Over the last year, I have come to realize that I prefer a simple subset of C++ that is easy to write and easy to understand. The premise is: if you use simple language tools, you can concentrate on solving the actual problem at hand, rather than solving language problems.
A language problem is when you are struggling to map a solution in your head, to the programming language that you are using. You might be conflicted about what language features to use. This conflict might arise due to arbitrary beliefs about how programs should be written, but it could also arise from trying to combine complex language features that don't work well together.

In practice, using simple language tools means that there must be a technical benefit to using a language feature, one that directly helps you solve the real problem at hand. A language feature that does not help you solve the real problem at hand, only adds complexity to the code without serving any purpose. Therefore, you may just as well not use it.
I find this approach very liberating. However, it can be difficult to differentiate between a technical benefit and a stylistic choice. You have to know the language and its implementation quite well to know the difference.

This opinion is very much influenced by having programmed in C++ for over 10 years, in a large code base whose age can be estimated by counting the rings that different C and C++ versions have left in the code. I also have to mention Casey Muratori and Jonathan Blow. Their interviews and programming streams have made me revisit some of the assumptions I had about programming.

Function

Functions are the most important and versatile tool that you have as a programmer. Before you reach for anything more complex, consider if you can write it as a function.

Function: Code Reuse

Functions let you reuse code in different parts of your program.

You have to decide how you want to provide access to a function. Is the function public or private? This can be enforced on a code level, by choosing between internal linkage or extern linkage. Or it can be mandated on a filesystem level, where developers agree only to include the public header files from other domains or libraries, and not the private header files.

Internal linkage means that the function is only visible inside the source file or translation unit that it is defined in. External linkage means that the function can be called by any other function in your program.

If you want your function to have internal linkage, you can use an anonymous/unnamed namespace (C++):

namespace {

void doTask();

}

Or the static keyword (C/C++):

static void doTask();

If you want your function to have external linkage, and be used by other translation units, you declare the function in a header file:

void doTask();

And define the function in a source file:

void doTask()
{
    // TODO
}

You can also have the function definition in your header file. However, in C and C++ you are only allowed to have one function definition in your program (ODR: one definition rule). By including the header file in different source files, you create multiple definitions of that function.
This is why you see function definitions in header files using the keywords static or inline. What this means is that each translation unit that includes the header file, gets a copy of that function. This copy has internal linkage or is inlined into the function that uses it.

On a filesystem level, it is common to differentiate between public and private header files. Public means that the header files provide an interface to the user. Private means the header files provide definitions or include dependencies that should only be visible internally, to the implementation. Public header files are usually placed in an include directory, whereas private header files are placed next to the source files.

The mandate is:

As a user, you should only include header files from the include directory.
As a developer, you should not leak implementation details to the user.

Function: Abstraction

Functions are abstractions. They hide details about how a task is performed. You have to choose how much detail you want to hide from the reader.

Each abstraction creates a layer around the work that the computer is actually doing. It is common to have different abstraction layers in a system, whose functions deal with different aspects of the software. However, needless abstractions, or the wrong use of abstractions, make it difficult to understand what work is actually being done.

A high abstraction layer function could be something like loading a Word document, a low abstraction layer function could be something like accessing the filesystem to open a file.

There is a general understanding that a function should only call other functions from the same abstraction layer, or functions from abstraction layers below it. For example, you generally don't want UI code in your filesystem functions, but you might want your UI code to be able to open and save files.

Having different abstraction layers means that functions must communicate the state of the program to each other, so that actions can be taken in the appropriate abstraction layer.

Below is an example of different functions, at different levels of abstraction:

isEven(n);

countEven(numbers);

compareNumbers(a, b);

sortNumbers(numbers);

printNumbers(numbers);

readNumbers(file);

writeNumbers(file, numbers);

calculateNumbers();

createSpreadsheet();

loadSpreadsheet(file);

saveSpreadsheet(file);

runProgram();

Function: Generalization

Generalization in programming means that you find a more general definition from a set of specific cases.

For instance, if you have multiple functions like:

countOnes(numbers);

countTwos(numbers);

countThrees(numbers);

One possible generalization of these functions is:

countNumber(n, numbers);

A generalization always means that the implementation is less specific to the actual use-case.

We will later see how generalization works when creating types.

Function Overloading

In C++, function overloading allows you to generalize a function over different parameter types.

You can use the same name for functions that have a different amount of parameters.

For example:

void setColorFromRGBA(int r, int g, int b, int a);

void setColor(Color color);

Can be written as:

void setColor(int r, int g, int b, int a);

void setColor(Color color);

You can use the same name for functions that have the same amount of parameters, but different parameter types.

For example:

int countNumberInt(int i, const std::vector<int>& numbers);

int countNumberFloat(float f, const std::vector<float>& numbers);

Can be written as:

int countNumber(int i, const std::vector<int>& numbers);

int countNumber(float f, const std::vector<float>& numbers);

This does not break the one definition rule because the type and number of the parameters differentiate the function definitions from each other. The compiler will resolve which function is used depending on the type and number of arguments that you give it.

There are situations, however, where function overloading has an obfuscating effect. When you read code and you see two functions with the same name, you expect them to behave similarly. If you use function overloading, and you use the same name for two functions that do something entirely different, you are actively subverting the expectations of the reader.

Function Template

Function templates allow you to define a generic implementation of a function, that the compiler will then use to generate different versions of that function, depending on the template parameter.

Function templates are a tool for code generation.

For instance, the last example that I used:

int countNumber(int i, const std::vector<int>& numbers);

int countNumber(float f, const std::vector<float>& numbers);

Can be written in a more generic form:

template <typename T>
int countNumber (T t, const std::vector<T>& numbers) { ... }

Using the function template will cause the compiler to generate the appropriate code:

// generate an int version of the function
std::vector<int> intNumbers = {1, 2, 1, 2, 3};

countNumber(3, intNumbers);

// generate a float version of the function
std::vector<float> floatNumbers = {1.0f, 2.0f, 1.0f, 2.0f, 3.0f};

countNumber(2.0f, floatNumbers);

// generate a double version of the function
std::vector<double> doubleNumbers = {1.0, 2.0, 1.0, 2.0, 3.0};

countNumber(1.0, doubleNumbers);

Function templates are very useful for data structures and library functions. But be careful when you use templates in your application. They make it difficult to reason about code. If you provide the wrong template parameters, the generated compile errors can be horrendous.

Function templates have the tendency to spread through your code. If you want to use a function template in different source files, it has to be completely defined in a header file. Otherwise, the compiler does not know what code to generate. Such a function template will leak implementation details and dependencies to its users. You can also define a function template inside the source file that uses it, but this limits the usefulness of the function template.

Function: Interface

An interface creates a layer of abstraction, typically at a domain boundary. The interface is a contract between the developer and the user. The contract ties a set of behaviors to the correct usage of the interface.

The simplest form of an interface in C++ is the header file. The header file is a collection of function declarations, type definitions, and constants.

For example:

enum class Color
{
    Red, Green, Blue, Yellow, Cyan, Magenta,
};

void drawPoint(int x, int y);

void drawLine(int x0, int y0, int x1, int y1);

void drawRect(int x, int y, int width, int height);

void drawString(int x, int y, const std::string& str);

void setColor(int r, int g, int b);

void setColor(Color color);

An interface should serve as a dependency guard. This means that implementation details should not be leaked across the interface. This protects the user code from seeing platform or domain specific types and functions, that are used internally by the implementation.

An interface allows the developer to make changes and improvements to its implementation without affecting the code of the user. The user can also replace the implementation of the interface without changing their code.

In object-oriented programming, the word interface has a special meaning and is strongly related to polymorphism and virtual function tables.

Function: Function Pointer

A function pointer allows you to reference and call different function implementations, as long as they have the same function signature.

A function pointer allows you to dynamically change the behavior of the program at run-time. This flexibility comes at the cost of introducing abstraction into your program. This is because you don't know what implementation is called when reading the code - the implementation is only known at run-time.

We will later see how to group sets of function pointers more succinctly.

Selecting Behavior at Run-Time

Lets say you have three functions that each implement a different sorting algorithm.

void bubbleSort(std::vector<int>& numbers);

void quickSort(std::vector<int>& numbers);

void mergeSort(std::vector<int>& numbers);

Suppose that you want to select an algorithm at the start of the application, that should be used at a later point during program execution. This can be done in two ways.

The first approach is to write a function that takes the type of algorithm as its first parameter and selects the appropriate sorting algorithm:

enum class Algo
{
    BubbleSort,
    QuickSort,
    MergeSort,
};

void sort(Algo algo, std::vector<int>& numbers)
{
    switch (algo)
    {
    case Algo::BubbleSort:
        bubbleSort(numbers);
        break;
    case Algo::QuickSort:
        quickSort(numbers);
        break;
    case Algo::MergeSort:
        mergeSort(numbers);
        break;
    default:
        assert(false);
        break;
    }
}

The user can pass an Algo as an argument whenever they need to call the sort function.

Algo selectedAlgo = Algo::QuickSort;

sort(selectedAlgo, numbers);

This solution is straight forward. It clearly specifies which algorithms can be used. Adding a new algorithm requires you to add a new value to the enum, and also extend the switch case.

The second approach is to use a function pointer and to select the desired algorithm by assigning the function address to the function pointer:

using SortFn = void(*)(std::span<int>);

SortFn sort = &quickSort;

sort(numbers);

There is no overhead when adding a new algorithm, compared to the first approach. However, anyone can assign any implementation to the function pointer, as long as it has the same function signature. The same rules that apply to pointers in general, also apply to function pointers. If you fail to initialize the function pointer before invoking the function, the behavior is undefined (?).

Injecting Behavior at Run-Time

Function pointers can be passed to or returned from other functions. This is often used in combination with algorithms or function callback mechanisms.

For example, you might want to register a function at a callback mechanism to get regular notifications about certain events:

void myMessageHandler(Message message)
{
    // handle the message
}

setMessageHandler(&myMessageHandler);

The receiver of the message handler then stores the function pointer and sends notifications whenever an event occurs:

if (someEventOccurred)
{
    if (handleMessage != nullptr)
    {
        handleMessage(Message::SomeEventOccurred);
    }
}

This is one of the standard techniques for decoupling different domains or systems from each other. The domain that receives the function pointer does not have a direct dependency to the domain that provides the implementation. The downside is that you don't know what implementation is called by just looking at the code - the implementation is only known at run-time.

There are different variants of this technique that use lambda expressions or classes, but they all boil down to the same principle.

Function: Lambda (C++11)

Lambda expressions were introduced in C++11. A lambda allows you to create a function object that captures the variables in the scope that it is defined in.

If the capture clause is empty, the lambda does not capture anything. A lambda without a capture is basically a function pointer. The difference is that you can define a lambda inside the function scope.

This allows you to write the following code:

void doTask(List& list)
{
    auto compareName = [](const Item& lhs, const Item& rhs) -> int {
        // compare names and return -1, 0, 1
    };

    sort(list, compareName);
}

Function: Lambda with Capture

A lambda with a capture is a function pointer with associated data. The lambda can only capture variables that are in its surrounding scope. You have access to the captured variables inside the body of the lambda.

In C, you can find the following pattern when you register a function callback with another library:

void handleMessage(Message message, void *userdata)
{
    MyObject *myObject = (MyObject *)userdata;

    doStuff(myObject);
}

void doTask(void)
{
    MyObject *myHeapObject = ...;

    setMessageHandler(&handleMessage, myHeapObject);
}

The library or API cannot know the types you have defined in your application. Therefore, when the library calls your function, you are passed a void pointer to the user data that you provide. This means that you are responsible for casting it to the appropriate type.

In C++, you can sometimes find function objects, also called functors. This is usually implemented as a class that overloads the function call operator:

class MessageHandler
{
public:
    MessageHandler(MyObject* obj) : myObject(obj) {}

    void operator()(Message message)
    {
        myObject->doStuff();
    }

private:
    MyObject* myObject;
};

void doTask()
{
    MessageHandler messageHandler{myHeapObject};

    setMessageHandler(messageHandler);
}

In C++11, a lambda with a capture achieves the same thing as the function object example above. It has improved type-safety over the C version. You don't have to define a class every time that you want to use a function object. Since the lambda can be defined at function scope, you can capture all objects that are also visible to the function, either by value or reference.

void doTask()
{
    MyObject* myHeapObject = ...;

    auto handleMessage = [=](Message message) {
        myHeapObject->doStuff();
    };

    setMessageHandler(handleMessage);
}

In any case, you have to be aware of the lifetime of the captured object. If the object is destroyed before the lambda is called, accessing the destroyed object through its pointer or reference is undefined behavior. In this case it is generally safer to copy the state into the lambda and not hold any pointers or references to objects with undetermined lifetimes.

Thus, the complexity of using a lambda that captures state lies in ensuring that the captured objects are valid when the function is called.

Structure

Structures are the second most important and versatile tool that you have as a programmer.

Structures allow you to create user-defined types and to group related data.

The structures chapter focuses exclusively on plain structures that only store data.

Structure: User-Defined Types

Structures allow you to define new types in the language. This means that the compiler will check that you are using these types correctly, helping you to avoid programming errors at compile time, rather than at run-time.

Structures are especially useful when combined with functions. They make it easier to express intent and to define good interfaces. Once you use structures in your function signatures, you can modify the structures without changing the function signature.

Structure: Aggregation

Structures allow you to group values that are related to each other.

struct Point
{
    int x = 0;
    int y = 0;
};

struct Rect
{
    int x = 0;
    int y = 0;
    int width = 0;
    int height = 0;
};

struct Address
{
    std::string city;
    std::string street;
    int streetNumber = 0;
    int postalCode = 0;
};

Structures can also be combined into more complex structures:

struct Person
{
    std::string name;
    int age = 0;

    Address homeAddress;
    Address workAddress;
};

struct School
{
    std::array<Room, 5> rooms;
    std::array<Teacher, 5> teachers;
    std::array<Student, 100> students;
};

This makes it possible to reuse types in many different situations.

Structure: Abstraction

Structures are abstractions. This is even more obvious with structures than it is with functions:

struct Person
{
    std::string name;
    int age = 0;
};

In reality, a person is much more complex than a name and an age. But for the purposes of a program, it might be sufficient to think of a person as a name and an age, if this is the data that is needed to execute the program.

Let me repeat this point. The data is more important than the concept that the structure represents. The name makes it easy to refer to the data. The name also makes it easy to differentiate this type from other types. It is not a representation of the actual thing in your program.

Structure: Generalization

Generalization in programming means that you find a more general definition from a set of specific cases.

Generalization is similar for structures as it is for functions.

For example:

struct RedCar
{
    std::string name;
};

struct GreenCar
{
    std::string name;
};

struct BlueCar
{
    std::string name;
};

Can be written as:

struct Car
{
    Color color;
    std::string name;
};

Car redCar{Color::red, "red car"};
Car greenCar{Color::green, "green car"};
Car blueCar{Color::blue, "blue car"};
Car yellowCar{Color::yellow};

As trivial as this example might seem, there are situations where it is easier to generalize a structure by adding a property, than defining multiple different types that have to be handled separately.

Structure: Opaque Type

Before you can use a struct, you have to define it. Code that accesses the members of a struct must see its definition.

However, a pointer to a struct is already defined by the pointer type. To handle a pointer to a struct, only the declaration of the struct (its name) is needed. With a so called forward declaration, you can return and pass pointers to structs without knowing the definition of the struct.

Opaque types are often used as handles to objects and resources, hiding implementation details behind an abstract type.

Structure: Dynamic Interface

We discussed how you can create an interface by declaring a set of functions. We also discussed how you can change the behavior of a program at run-time by using function pointers.

Dynamic interfaces allow you to define a set of related functions whose behavior can be changed at run-time.

struct StreamInterface
{
    bool (*open) (Stream*);
    bool (*close)(Stream*);
    int  (*write)(Stream*, const void* buffer, size_t size);
    int  (*read) (Stream*, void* buffer, size_t size);
};

StreamInterface streamInterface;

streamInterface.open = &openFile;
streamInterface.close = &closeFile;
streamInterface.write = &writeFile;
streamInterface.read = &readFile;

streamInterface.open(stream);
streamInterface.close(stream);

For this particular example, you might consider using a class interface (see Class: Dynamic Interface).

Statement

There is no argument about the general usefulness of conditional statements and loops. They are the cornerstones of structured programming. Their purpose is well understood. They directly shape the execution and logic of a program.

There are different versions and flavors of conditional statements and loops. Which one you choose will depend on the specific use-case. I want to draw your attention to one type of for loop that I think is beneficial.

Most languages nowadays have a range-based for loop. You should consider using it instead of an index-based for loop. The reason is that index-based for loops have more syntactical pieces, thereby increasing the chance of errors like index-out-of-bounds memory errors.

Enumeration

An enumeration is an easy way to define a set of constant values that can be compared to each other.

This means that instead of defining each constant, its type, and its value yourself, you can use an enumeration to achieve the same thing.

From:

const int MONDAY = 0;
const int TUESDAY = 1;
const int WEDNESDAY = 2;
const int THURSDAY = 3;
const int FRIDAY = 4;
const int SATURDAY = 5;
const int SUNDAY = 6;

To:

enum Weekday
{
    MONDAY,
    TUESDAY,
    WEDNESDAY,
    THURSDAY,
    FRIDAY,
    SATURDAY,
    SUNDAY,
};

The use of the C-style enums has a few pitfalls that you need to be aware of.

The names of the enumerators pollute the global namespace. If any library that you include uses the name MONDAY, you will have to rename it.
You should not compare values of different enumerations with each other. The enumerators are not unique outside of the enumeration they are defined in. This means that the comparison MONDAY == JANUARY will compile, and might even return true, but this is most likely a bug. Some compilers can issue a warning for this.^[1]
You can do arithmetic with the values of an enumeration. This leads to questions like: What does WEDNESDAY + SUNDAY mean?

Enumeration: Scoped Enumeration (C++11)

C++11 introduced the scoped enumeration, that tries to address all the potential issues that can arise from using enumerations wrong.

enum class Weekday
{
    Monday,
    Tuesday,
    Wednesday,
    Thursday,
    Friday,
    Saturday,
    Sunday,
};

All names are put into a namespace, in this case the namespace is called Weekday.
You can compare the values of an enumeration with each other, but you cannot compare values from two different enumerations.
The values don't convert implicitly to integer values, meaning that you cannot do arithmetic with the values of an enumeration.

Unless I have very good reasons, scoped enumerations are the default type of enumerations that I use.

Union

Unions are useful if you need an object that can represent multiple types that are mutually exclusive. For example, you might need an object that can be an integer, or a float, or a string, but it can only be one of these things at a time. You choose which type it is at run-time.

If you assign an integer value to a union you have to read an integer value from it. Trying to read from a union with a different type is undefined behavior. This means that you have to store some information about the type of the object either globally or locally. One strategy is to store the type information directly with the union, which is also known as a tagged union or discriminated union.

We will see later how a similar solution can be implemented by using classes.

Union: Tagged Union

Tagged unions store type information next to the union. This type information must be checked at run-time to determine which variable of the union to access.

For example:

struct Value
{
    enum class Tag
    {
        Integer,
        Float,
        String
    } tag;

    union
    {
        int i;
        float f;
        const char* s;
    };
};

With the user code looking like this:

switch (value.tag)
{
case Value::Tag::Integer:
    value.i;
    break;
case Value::Tag::Float:
    value.f;
    break;
case Value::Tag::String:
    value.s;
    break;
}

You can also use plain data structures inside the union. Plain means that they cannot have any user-defined special functions like constructors, destructors, etc. This excludes the use of standard C++ types like std::string inside unions.

struct DataMessage
{
    double* data;
    int size;
};

struct ErrorMessage
{
    int code;
    const char* reason;
};

struct Message
{
    enum class Tag
    {
        Data,
        Error
    } tag;

    union
    {
        DataMessage data;
        ErrorMessage error;
    };
};

You have to be very careful when accessing the union because accessing the wrong type is undefined behavior.

Union: std::variant (C++17)

std::variant is a safer alternative to the tagged union. It can also store non-trivial types.

struct DataMessage
{
    std::vector<double> data;
};

struct ErrorMessage
{
    int code = 0;
    std::string reason;
};

using Message = std::variant<DataMessage, ErrorMessage>;

This results in the following user code:

if (std::holds_alternative<DataMessage> (message))
{
    auto& dataMessage = std::get<DataMessage> (message);
}
else if (std::holds_alternative<ErrorMessage> (message))
{
    auto& errorMessage = std::get<ErrorMessage> (message);
}
else
{
    ...
}

Or you could also write:

if (auto* data = std::get_if<DataMessage> (&message))
{
    data->data;
}
else if (auto* error = std::get_if<ErrorMessage> (&message))
{
    error->code;
}
else
{
    ...
}

Trying to access the variant with std::get checks the type and throws a std::bad_variant_access exception if you access the wrong type.

Unfortunately, there is no switch or match statement in C++ that allows you to exhaustively check the variant for each type. There is the std::visit function, which I am not a fan of. The std::visit function is something you design when you cannot add a match statement to your programming language. It is akin to using iterators before the introduction of the range-based for loop.

Array

An array is a contiguous block of memory that can store multiple elements of one type. This includes primitive data types, structures, pointers, function pointers, other arrays, etc.

There are fixed size arrays, that do not change their size at run-time. If you have a table of data that you use to look up values, or if you don't need more than N elements or N bytes to store some data at run-time, then a fixed size array is a good match.

There are dynamic size arrays, that do change their size at run-time. These are useful when you don't know how many elements you will need to store in advance. These arrays can grow as you add more elements to them. However, references or pointers to elements can become invalid when the array grows.

The last thing that you need to consider is the storage duration or lifetime of the array and its elements.

Array: C-Style

In C, you will sometimes see the following use of arrays:

#define NUM_ELEMENTS 100

int elements[NUM_ELEMENTS];

getNumbers(elements, NUM_ELEMENTS);

calcSum(elements, NUM_ELEMENTS);

In C++, there is also this variant, if you intend to allocate the array on the heap:

constexpr int kNumElements = 100;

int* elements = new int[kNumElements];

getNumbers(elements, kNumElements);

calcSum(elements, kNumElements);

delete[] array;

To pass the array to a function, you sometimes see functions being defined as such:

int calcSum(int *array, int size)
{
    int sum = 0;

    for (int i = 0; i < size; i++)
    {
        sum += array[i];
    }

    return sum;
}

Note that the array decays to a pointer when it is passed to the function. This has two effects:

It is implied in the use of the pointer that it is an array, and not a pointer to a single value. There is no way for the compiler to check if this is true.
You must always pass the size of the array to know how many elements you can iterate over before reaching the end of the array.

There is no bounds-checking when accessing the array. Accessing elements outside of the array is undefined behavior.

Arrays with dynamic or automatic storage duration are not initialized by default.

In C, there are also variable length arrays (VLAs). VLAs are not a C++ language feature. Be very careful with variable length arrays. If used incorrectly, they can cause stack overflows.

Unless there is a very good reason, I generally don't use C-style arrays. But you must be able to recognize this style of code and know what its implications are.

Array: std::array (C++11)

The std::array implements a fixed size array. The size of the array needs to be known at compile time.

std::array aims to be a safer version of the array. You can ask for its size. You can iterate over the elements with a range-based for loop. You can access the array with bounds-checking by using std::array::at.

std::array is not initialized by default unless it has static storage duration.

Array: std::vector

The std::vector implements a heap allocated dynamic array. The array may grow when more elements are added to it. Pointers, references, and iterators will become invalid when the array grows because this typically involves reallocating the buffer to a new address.

You can use the std::vector as a heap allocated fixed size array if you resize it at the beginning to the required size, and then only use std::vector::operator[] or std::vector::at to read and write to it.

std::vector is not initialized by default.

If I need a heap allocated array and there are no other particular constraints, then std::vector is my default choice.

Slice

Slices allow you to define the position and length of a memory region, without having ownership of that memory.

For example, lets say you have the following list of comma separated values:

"apple,orange,car,watermelon"

If you want to get each word in that list, you need to parse the string and return a position and length for every word in the string. This gives you a list of slices that you can use to refer to the words individually, without having to copy the contents of the string.

The advantage of slices is that they are cheap to copy. Creating sub-slices is also cheap. Each slice knows its size, allowing you to perform bounds checking. However, if the memory region is freed, the slice becomes invalid.

Slice: C-Style

In C and C++ code, you will sometimes see function signatures that take a pointer to some type and a size:

int calcSum(int *array, size_t size);

It is implied that the pointer points to an array in memory, and the size is the number of elements in the array. Note that this signature doesn't differentiate if it is being passed the entirety of an array, or a slice of an array.

Another variant of this can be seen with C-strings:

int countChar(const char *str, char c);

It is implied that the pointer must point to a character array in memory that is terminated by a null character. Otherwise, you wouldn't know how long the string is.

Slice: std::span (C++20)

std::span is the general implementation of a slice in C++. It is essentially a pointer into memory and a size. It is a safer alternative to the C-style handling of arrays and slices.

std::span can be created from arrays, std::array, and std::vector. Once you have an std::span you can create sub-spans. std::span mirrors the STL container functions, making it easy to use. You can use std::span with ranged-based for loops. You can access the buffer with bounds-checking by using std::span::at (C++26).

Most importantly, passing std::span by value doesn't create any copies of the underlying buffer. However, if the underlying buffer is destroyed, the std::span becomes invalid.

Slice: std::string_view (C++17)

The std::string_view is basically a read-only slice of a string, with a similar interface to an std::string. Different from a std::string, the std::string_view operations do not allocate and deallocate memory, or create copies of the existing string. This means that you can use std::string_view::substr without paying for an allocation.

A benefit of using an std::string_view is that you can assign string literals to it and it does not create a copy on the heap. The length is stored in the std::string_view so you don't have to recompute the length every time, or pass the length around as a second parameter.

Error Handling

Error handling is an integral part of computer programming. In this chapter I want to discuss some error handling techniques that you will commonly see in the C and C++ language.

Why is error handling necessary in the first place?

When a function reports an error it means that the function cannot handle the error itself. The user of the function must either handle the error or pass it to a layer of the program that has enough context to handle the error appropriately.
In all but the most trivial functions there are several error cases that can occur. The arguments of the function might be wrong. Some action might fail for whatever reason. The function might use an API that returns errors that need to be propagated up the callstack.

Despite what you might think, error handling in C and C++ is mandatory. Not all error handling techniques enforce error handling on the user side but this does not mean that you can ignore errors. If you see any of these error handling patterns you must use them as intended. The least that you can do is to assert that the function call was successful.

In some cases, you cannot handle an error immediately, in which case you may propagate the error up the callstack. However, keep in mind that if an error reaches a domain boundary, you have to consider if you want to pass the error across your interfaces, or if the error should remain internal to the implementation.

The decision if you should pass an error across an interface depends on several factors. It makes sense to report errors that the user can actually handle, either by using the interface, or by gracefully aborting the operation. You also have to be careful not to leak implementation details with your errors.

Error Handling: No Error

Some functions do not communicate errors:

void doStuff(int value);

No error code is returned, there is nothing to check for. You have to assume that if an error occurs inside the function, the function handles it. The function may still throw an exception, but you wouldn't know that from just looking at the function signature.

Error Handling: Success or Failure

Some functions either succeed or fail to perform an action.

// returns true on success, otherwise false
bool doStuff(int value);

if (doStuff(42))
{
    // success
}
else
{
    // failure
}

The function tells you if it failed but it does not tell you the reason why it failed. What this means is that even if the function told you what went wrong, you might not be able to do anything about it, as the user.

Error Handling: Error Code

Some functions explicitly communicate the reason for the error so that users can take different actions depending on the error. Note that these functions don't compute and return any actual value, except for the error code.

You may see functions like this:

// returns 0 if successful or an error code
int doStuff(int value);

int err = doStuff(42);

if (err == 0)
{
    // success
}
else
{
    // failure
}

You have to be very careful when testing the result of a function that returns 0 for success. A common mistake is to write this:

// returns 0 if successful or an error code
int doStuff(int value);

int result = doStuff(42);

if (result) // 0 is implicitly cast to false
{
    // this is NOT the success case
}

You will also see some variants that use enumerations:

enum Error
{
    kNoError = 0,
    kHardwareError,
    kBadArgumentsError,
};

Error doStuff(int value);

Error err = doStuff(42);

if (err == kNoError)
{
    // success
}
else
{
    // failure
}

Or the enum class in C++11:

enum class Result
{
    Ok,
    ReadError,
    FileError,
};

Result doStuff(int value);

switch (doStuff(42))
{
case Result::Ok:
    break;
case Result::ReadError:
    break;
case Result::FileError:
    break;
}

Error Handling: Shared Value or Error Code Range

Some functions return either a value or an error.

The range of a return value is split into a value range and an error range.

// returns index of item or -1 if the item does not exist
int indexOf(Item* item);

int result = indexOf(item);
if (result >= 0)
{
    // success
}
else
{
    // error
}

We have explicitly chosen to use a signed integer as a return value, to be able to report an error, instead of using an unsigned integer which would be more appropriate. This means that all negative integers must be an error. We have thereby halved the range of valid indices that we can represent.

Error Handling: Value or Nil

Some function return either a value or nothing. This means that the error is communicated by indicating the absence of value.

A typical example of this is returning a valid pointer or nullptr:

// returns nullptr if item does not exist
Item* getItem(int index);

Item* item = getItem(42);
if (item != nullptr)
{
    // success
}
else
{
    // failure, don't dereference the pointer
}

You must check that the pointer is not nullptr before dereferencing it.

Error Handling: Value or Error Code

The value and the error have separate representations in the function signature. The error is usually the return value of the function, and the value is copied into one of the function arguments. It is common that the value is only set if the function is successful. Otherwise, assume that the value is invalid and should not be used.

// returns true if the item was copied successfully, otherwise it returns false
bool copyItem(int index, Item* item);

Item item;
if (copyItem(42, &item))
{
    // item is valid
}
else
{
    // item is not valid, don't use it!
}

You will also see this with an error code or an enumeration:

Error readFile(File file, ByteBuffer& buffer);

ByteBuffer buffer;
Error err = readFile(file, buffer);
if (err == kNoError)
{
    // ...
}
else
{
    // buffer is not valid, don't use it!

    switch (err)
    {
    case kReadError:
        break;
    case kFileNotFoundError:
        break;
    }
}

Note that in both cases, the function may not write to the value in the case of an error. This means that you should not use or access it.

Error Handling: std::optional (C++17)

std::optional is a general solution for the Value or Nil case. This means that the function either returns a value or nothing. It has the advantage of being explicit. Getting the value from an optional requires an additional step that will check if the optional has a value or not.

For example:

std::optional<size_t> indexOf(Item* item);

size_t index = indexOf(item); // error: cannot assign optional to size_t

size_t index = indexOf(item).value(); // will throw exception if optional has no value

if (auto index = indexOf(item))
{
    index.value(); // index has a value
}

size_t index = indexOf(item).value_or(0);

std::optional can also be used in cases where a value might not always be set:

std::optional<int> someValue;

if (someCondition)
{
    someValue = 10;
}

if (someValue.has_value())
{
    doStuff(someValue.value());
}

Prior to std::optional, you might have used a Shared Value or Error Code Range in this example, or a Success or Failure flag.

Error Handling: std::expected (C++23)

std::expected is a general solution for the Value or Error Code and Shared Value or Error Code Range case. This means that the function either returns a value or an error code.

Example:

enum class Error
{
    BadArguments,
    InsufficientData,
    CalculationFailed,
};

using Result = std::expected<int, Error>;

Result doStuff(int value);

Result result = doStuff(42);

if (result.has_value())
{
    result.value();
}
else
{
    switch (result.error())
    {
    case Error::BadArguments:
        break;
    case Error::InsufficientData:
        break;
    case Error::CalculationFailed:
        break;
    }
}

What makes this better than the Value or Error Code solution is the fact that you cannot access an invalid value by accident.

Error Handling: Exceptions

I have not worked in a large C++ code base that uses exceptions. I don't know what correct exception handling looks like in a large C++ project. Therefore, I'm not qualified to give advice about exception handling. However, I can point out a few things about exceptions that I do know.

If you use the C++ standard library containers and functions, a lot of them throw exceptions. This means that you must read the C++ standard library specification to know what kind of exceptions are thrown. Here are a few examples:

std::vector::at throws an std::out_of_range exception. This means that if you access the vector with a wrong index, which is undefined behavior, the program doesn't just continue execution after such a critical error, it terminates.
std::stoi throws a std::invalid_argument or std::out_of_range exception. In this case you must catch the exception if you want to know if the string was converted correctly to an integer.
Both the keyword new and the function std::make_unique throw a std::bac_alloc exception. Both will either return a valid pointer or throw an exception.

From a safety perspective, it is good that the program doesn't continue execution after encountering critical errors. This means that an unexperienced developer using these functions may cause some crashes here and there, but they are not introducing undefined behavior or vulnerabilities to the program.

From a code structure perspective, it appears that exceptions greatly simplify the code because you don't have to handle the error case explicitly in every function. The error path in your code is hidden, except for the code that throws the exception, and the code that catches the exception. What remains is the success path, which not only affects the functions but also the function signatures. This means that if you decide to use exceptions for all your error handling, its a design decision that you cannot easily retrofit to an error code based error handling.

From a complexity perspective, exceptions trade code complexity for cognitive complexity. There is hidden logic when using exceptions. If you are writing modern C++, you are using exceptions everywhere, even without writing a single try-catch statement. Every function that you call can throw an exception, causing code execution to stop, and to immediately return from every function in the callstack. Every function call has a hidden return statement, so to speak.

What does this mean for your code?

There is the notion of exception safety. How well is your code prepared for a situation in which every function call could potentially throw an exception. If you want your code to have at least a "basic exception guarantee", you must use RAII everywhere in order to not leak resources. You can read more about RAII in the section Class: RAII.

Another thing that puzzles me is, how do you know what exceptions to catch by looking at the functions that you use. Do you have to document all possible exceptions in your code documentation, like in the C++ standard library? Do you define what exceptions can be thrown in a certain domain and then only catch these exceptions?
The C++ Core Guidelines tell you that you should have an error handling strategy before you start a project, but what is a good exception handling strategy? I feel like you are left to figure this out on your own.

Lastly, not all languages have the concept of exceptions. This means that you have to learn how error handling works in the respective language.

Let me use an example that I gave in Shared Value or Error Code Range:

// returns index of item or -1 if the item does not exist
int indexOf(Item* item);

With exceptions, you could write this:

// throws exception if item does not exist
size_t indexOf(Item* item);

The benefit is that you are not halving the range of valid indices, and the function signature conveys the success path. However, is this really a function that should throw an exception?

Maybe this is a better choice:

std::optional<size_t> indexOf(Item* item);

I am posing this question because it might be the case that even with exceptions, there are cases where you still need to know how to do error handling without exceptions.

Namespace

In C and C++ there is the one definition rule (ODR). We have already talked about the one definition rule as it relates to functions. This rule also applies to types, variables, and enumerations.

When you are in control of all the source code in your project, this might not be such an issue because you can resolve name collisions as they appear during development. However, when you include external libraries in your project, resolving name collisions becomes a much bigger problem.

This is why many C libraries follow a naming scheme, where library functions have a small prefix. For example the SDL library uses the prefix SDL, to differentiate its types and functions from other libraries.

SDL_Window* window = SDL_CreateWindow(...);

The GLFW library uses the prefix GLFW:

GLFWwindow* window = glfwCreateWindow(...);

The way that C++ tries to solve this is by using namespaces. The most obvious example of this is the C++ standard library, that provides its types and functions via the std namespace. This makes sense because the types and functions that it defines, like std::array and std::vector, have short names that would otherwise clash with functions and types that you might have in your own project.

Other languages have the concept of modules or packages, and named imports, which deal with name collisions differently than C++ does. Every language has to deal with name collisions in some way or another.

Class

The classes chapter focuses exclusively on the object-oriented language features.

All of the points that I listed in the Structure chapter also apply to classes. Structures and classes are technically identical, except for the default visibility of their members. The members of structures are public by default. The members of classes are private by default.

Classes combine data and functions. A class without functions is just a struct. A class without data is just a namespace.

Classes are similar to interfaces, in the sense that the design of a class suggests how the class should be used. Therefore, when designing classes, you should think of how you want the class to be used.

Classes allow you to create rich user-defined types with syntax that feels natural in the language, as well as deep and intricate semantics that are unique to that type. Therein also lies the complexity of classes.

Classes impose structure. They encourage you to add more methods to existing classes, and to add properties close to these methods, rather than to use free-standing functions and structures that can be accessed by everyone. Once you add a method to a class, it is harder to generalize the method, or to move the method to a different class, or to make it a free-standing function.

Class: Object

In C, an object is defined as "a region of data storage in the execution environment, the contents of which can represent values." Objects and alignment - cppreference.com

That is a very general definition and only tells you that an object is something that lives in memory. As such, it has an address, a size, and a value, among other properties.

In C++, the definition is mostly the same as in C, with a few additions that don't exist in the C programming language.

In object-oriented programming, an object is generally defined as something that stores state and has some functions associated with it.

In object-oriented programming languages with classes, an instance of a class is usually what you refer to as the object. The object has member variables and member functions that you can call on the object, to modify its state.

What is more important than the definition, is how you use objects in your program, how many objects there are of a type, who owns these objects, and how long the lifetime of an object is.

Multiple Objects of a Type

In my personal opinion, object-oriented programming is a good fit for programs that need to handle multiple objects of one type.

When writing a program, there are some things that you only need one object of, and other things that you need multiple objects of. For example, in a text editor, you might have multiple text file objects that store the required data and state for the text file to be displayed, edited, and searched correctly. To manage the open text file objects, you will probably have one editor object.

To use the text file objects, you have to be able to address them individually by some kind of unique identifier.

In C, you might see the following function calls:

Document *docA = createNewDocument();

openFile(docA, "foo.txt");
findWord(docA, "foo");

Document *docB = createNewDocument();

openFile(docB, "bar.txt");
findWord(docB, "bar");

closeAllDocuments();

Notice how each function gets passed a reference to the object that should be modified. Notice also how the createNewDocument() and closeAllDocuments() calls imply that there must be one global instance that manages the documents. This global instance does not need to be explicitly referenced in these calls because, by design, there can only be one.

In C++, you can use classes and write the following code:

Document* docA = createNewDocument();

docA->open("foo.txt");
docA->findWord("foo");

Document* docB = createNewDocument();

docB->open("bar.txt");
docB->findWord("bar");

closeAllDocuments();

It is understood that calling a method of an object is equivalent to calling a function on an object.

When you use classes, there is the question of which functions should be class methods, and which functions should be free-standing functions. With a concept as broad as a Document class, all functions that operate on the class will seem to be a good fit for class methods. However, this can easily lead to a Document class with hundreds of methods, making it difficult for users to find the few methods that they need for implementing their use-case.

For example, you might want to search for a word in the document. You might argue from an object-oriented perspective, that findWord should be a method, so that you can invoke it by writing doc.findWord(). I have also seen equivalents to creating a WordFinder class in such situations. If you approach it from an interface design perspective, you might land somewhere in the middle between extending the Document class with yet another method, and creating a separate class for each operation that you can do on a document. This might be a collection of free-standing functions that provide operations on a document, separated by different categories or use-cases.

This is where I want to reiterate that class design is very much interface design.

Data Structures as Objects

In my personal opinion, object-oriented programming is a good fit for data structures, where the coupling of data and functions comes very natural. This is due to the fact that data structures can be defined by a fixed amount of functions, that give the user access to all of the data, in a controlled manner.

Another sign that object-oriented programming is a good fit for data structures, is that you usually have multiple instances of a data structure in your program. This means that whenever you access or modify an instance of the data structure, you have to reference the object in the function call.

Have a look at the following C++ example:

struct Stack;

void init(Stack& stack, size_t size);

void free(Stack& stack);

void clear(Stack& stack);

bool isEmpty(const Stack& stack);

int peek(const Stack& stack);

void pop(Stack& stack);

void push(Stack& stack, int value);

This interface results in the following user code:

Stack stack;

init(stack, 10);

push(stack, 1);
push(stack, 3);
push(stack, 7);

peek(stack);
pop(stack);

if (!isEmpty(stack))
    clear(stack);

free(stack);

Using a class, you can rewrite the interface as such:

class Stack
{
public:
    Stack(size_t size);

    ~Stack();

    void clear();

    bool isEmpty() const;

    int peek() const;

    void pop();

    void push(int value);

private:
    ...
};

Which results in the following user code:

Stack stack{size};

stack.push(1);
stack.push(3);
stack.push(7);

stack.peek();
stack.pop();

if (!stack.isEmpty())
    stack.clear();

// stack is freed at the end of the scope (RAII)

Except for the RAII part, you could argue that this is a purely aesthetic choice, and you would be right. But the class fully encapsulates what the data structure is. You cannot remove any methods without breaking the data structures. Adding new methods to the data structure doesn't add any more value to it.

Class: Generalization

The same ideas about generalization that applied to structures also apply to classes. The only difference that I want to discuss is generalizing classes by using class templates because they tie in perfectly with the use of data structures, and the use of the C++ standard library containers.

Class Template

The class template is very similar to the function template. It allows you to insert type placeholders for the data and methods that you define, that are resolved at compile time.

Class templates are a code generation tool.

Class templates have the tendency to spread through the code. If you are not careful, you might create layers of abstract templated code that is impossible to reason about and to debug. This is why I generally avoid templates, unless I'm using them for code generation, like the example below.

Taking the stack example from above, we can see that the stack data structure only works for integer values. If we wanted to store floating point values on the stack, we would have to copy the implementation and create a FloatStack.

Using templates, we can define a stack that works for different primitive types, and use the compiler to generate different version of that stack for us. This might look something like this:

template <typename T>
class Stack
{
public:
    Stack(size_t size) { ... }

    ~Stack() { ... }

    void clear() { ... }

    bool isEmpty() const { ... }

    T peek() const { ... }

    void pop() { ... }

    void push(T value) { ... }

private:
    std::vector<T> buffer;
}

Or we could use the stack that is provided by the C++ standard library:

std::stack<int> integers;
std::stack<float> floats;

using FrameStack = std::stack<Frame>;

In most cases, a class template has to be completely defined in a header file, otherwise they cannot be used in other parts of the program to generate the required code. This means that class templates will leak implementation details.

Class: Abstraction

The points regarding Structure: Abstraction are also true for classes.

In C++, there is the concept of an abstract class. The abstract class is abstract in the literal sense, it represents a concept that does not have a concrete manifestation. The abstract class is also abstract in the technical sense, you cannot create an object or instance of an abstract class. You must derive from it and implement its pure virtual functions.

With a language feature that has "abstract" in the name, it is guaranteed that using this feature will add abstraction and complexity to your code. The question is how useful is it to talk about abstract things in your code, that, by definition, cannot have a real representation. A user does not ask to have abstract shapes in their drawing application. They want lines, rectangles, polygons, etc.

Therefore, abstract classes primarily serve a technical purpose. Abstract classes are the object-oriented approach to function pointers, making use of the virtual function table. By deriving an abstract class, and overwriting its virtual functions, you are influencing the selection of the function behavior at run-time. I discussed the basics of this in Function: Function Pointer and Structure: Dynamic Interface.

Abstract classes can be used as base classes, but not every base class must be an abstract class.

Although the base class is very similar to the dynamic interface, each have their own purpose. The interface sits at the domain boundary and serves as a view into the domain for outside users. The base class sits at the core of the implementation and serves as an abstract common denominator between specialized implementations.

Class: Encapsulation

One aspect of encapsulation that I want to discuss is restricting access to members of a class.

You can use the keyword private to restrict access to member functions and member variables. This means that code that uses a class cannot call private member functions, and it cannot access private member variables. Private members can only be accessed in the implementation of the class.

Restricting access to members is especially useful when you have class invariants. For example, if you have a class that stores items in an array and a size that indicates how many items are in the array, then adding an item to the array must always be followed by an update to the size. Failing to update the size correctly will result in errors when accessing the array.
The solution is to make the array and size members private, to add public methods like add and remove, and to ensure that these public methods update the size correctly.

However, making members private only restricts the access. It does not hide private members and their types from the rest of the application. If a class has a private member variable whose type is std::vector, any code that includes the header with this class, must also include the header of std::vector. Classes unavoidably leak implementation details, unless you explicitly design them not to do so. And there are plenty of situations where you must hide what files are included, otherwise you leak platform or framework specific dependencies to your users.

To truly hide the implementation details of a class, a so called Pimpl (pointer to implementation) is often used. This technique is essentially an opaque type, as discussed in Structure: Opaque Type. The pointer to the implementation is stored as a private member variable in the class. The public methods of the class then call the methods of the Pimpl, which performs the actual computation.

I want to stress the fact that hiding or restricting access to data and functions is not unique to object-oriented programming. As I have discussed above, C++ classes don't do the best job at hiding implementation details, requiring techniques that are traditionally used in the C programming language, to truly hide implementation details.

Class: Inheritance

Most canonical examples of inheritance show ways of modeling is-a relationships through class hierarchies. The Shape (Rect, Circle) example, or the Animal (Cat, Dog) example come to mind.

In reality, however, these examples are very misleading because they suggest that you can and should model your problem domain in terms of class hierarchies.

Instead, you should think of inheritance as a tool that allows you to extend the properties and functionality of an existing class. This is what it is doing on a technical level. When you are extending a class, you are adding new member variables and methods that can be used with instances of the derived class.

However, C++ already has tools to accomplish this by aggregating structures, as outlined in Structure: Aggregation. Therefore, it is not always obvious when inheritance is a good tool or not. The Google C++ Guidelines encourage you to use aggregation over inheritance. The C++ Core Guidelines has dozens of guidelines related to classes, that address all kinds of quirky and outright nasty problems that arise from using inheritance.

Classes have a calcifying effect on your code, especially when you use deep class hierarchies. If your logic is spread throughout multiple levels of the hierarchy, and you have many specializations and extensions, then you are not going to change this class hierarchy anymore. It is more likely that you add more classes to the hierarchy.

If I use inheritance, I use it to extend the functionality of classes, and I try to restrict the class hierarchy to two or three levels at most (including the base class).

Class: Polymorphism

In the section about tagged unions, we saw an example of a type that could be one of many different message types. We used an enum tag to differentiate the message type at run-time.

We could have solved this by using inheritance and polymorphism:

class Message
{
public:
    ~Message() = default;

    virtual void print() {}
};

class ErrorMessage : public Message
{
public:
    void print() override { ... }

    int code = 0;
    std::string reason;
};

class DataMessage : public Message
{
public:
    void print() override { ... }

    std::vector<double> data;
};

This allows us to pass objects of type Message that could either be an ErrorMessage, a DataMessage, or any other type that inherits from the Message class.

To go from the base class Message to the derived class ErrorMessage, you can use the keyword dynamic_cast, which checks what type of message the object refers to, before it is cast to the derived type in the class hierarchy. If the type is not correct, dynamic_cast returns a nullptr. dynamic_cast uses the run-time type information (RTTI), to do this check at run-time.

void someFunction(Message* message)
{
    message->print();

    if (auto* errorMessage = dynamic_cast<ErrorMessage*> (message))
    {
        ...
    }
    else if (auto* dataMessage = dynamic_cast<DataMessage*>(message))
    {
        ...
    }
    else
    {
        ...
    }
}

Polymorphism comes at the cost of abstraction. You don't know the type of the object by reading the source code. You don't know what implementation is used when you call a virtual method.

Inheritance requires you to use references or pointers if you want to refer to the base class of an object. Otherwise, this may result in object slicing.

Class: Dynamic Interface

Class interfaces provide you with an object-oriented way of defining dynamic interfaces. This is a run-time feature which is implemented by using a virtual function table (see Structure: Dynamic Interface).

In C++, a class interface is an abstract class that consists only of pure virtual functions (and an empty virtual destructor).

In Structure: Dynamic Interface, we saw an example of a function pointer table:

struct StreamInterface
{
    bool (*open) (Stream*);
    bool (*close)(Stream*);
    int  (*write)(Stream*, const void* buffer, size_t size);
    int  (*read) (Stream*, void* buffer, size_t size);
};

This can be defined as an abstract class with only pure virtual functions:

class IStream
{
public:
    virtual ~IStream() = default;

    virtual bool open() = 0;

    virtual bool close() = 0;

    virtual int write(const void* buffer, size_t size) = 0;

    virtual int read(void* buffer, size_t size) = 0;
};

The implementation is provided by a derived class that must implement all of the functions:

class FileStream : public IStream
{
public:
    FileStream(File* file);

    ~FileStream() override;

    bool open() override;

    bool close() override;

    size_t write(const void* buffer, size_t size) override;

    size_t read(void* buffer, size_t size) override;

private:
    File* file;
};

You can then create a FileStream and return it to your users as a pointer or reference to an IStream. Or you can pass it to a function that accepts a pointer or reference to an IStream. Be careful to pass the IStream by reference, and not by value, otherwise this can result in object slicing.

The solution is dynamic because you can have different implementations of IStream that can be changed at run-time (i.e. FileStream, NetworkStream). The base class serves as an interface because you are not leaking implementation details about the FileStream to the users of IStream.

Although the dynamic interface is very similar to a base class on a technical level, each have their own purpose. The interface sits at the domain boundary and serves as a view into the domain for outside users. The base class sits at the core of the implementation and serves as an abstract common denominator between specialized implementations.

Class: RAII

RAII stands for Resource Acquisition Is Initialization.

RAII is a useful technique that you should be aware of even if you don't use it all the time. The fundamental problem that it tries to solve is managing resources that are acquired at the beginning of a scope, that need to be released before the scope is exited.

Below is an example of a programming mistake that leaks memory:

{
    MyObject* objA = new MyObject();

    if (objA->someCondition() == false)
        return; // ERROR: leaks objA

    MyObject* objB = new MyObject();

    if (objB->someCondition() == false)
        return; // ERROR: leaks objA and objB

    objA->doStuff();
    objB->doStuff();

    delete objB;
    delete objA;
}

As you can see, the instances of MyObject are leaked because we exit the scope before deleting the objects. There are many different ways to deal with this.

One solution you might see is deleting the object before every return statement:

{
    MyObject* objA = new MyObject();

    if (objA->someCondition() == false)
    {
        delete objA;
        return;
    }

    MyObject* objB = new MyObject();

    if (objB->someCondition() == false)
    {
        delete objB;
        delete objA;
        return;
    }

    objA->doStuff();
    objB->doStuff();

    delete objB;
    delete objA;
}

This often leads to a lot of duplicated memory management code.

In some code bases you might see the use of a goto:

{
    MyObject* objA = new MyObject();

    if (objA->someCondition() == false)
        goto errorA;

    MyObject* objB = new MyObject();

    if (objB->someCondition() == false)
        goto errorB;

    objA->doStuff();
    objB->doStuff();

errorB:
    delete objB;
errorA:
    delete objA;
}

This often leads to hard to follow code and you have to be extra careful with the order of the goto cases.

Another solution you might see is structuring all the code such that there is only one exit point from the function:

{
    MyObject* objA = new MyObject();

    if (objA->someCondition())
    {
        MyObject* objB = new MyObject();

        if (objB->someCondition())
        {
            objA->doStuff();
            objB->doStuff();
        }

        delete objB
    }

    delete objA;
}

This often leads to deeply nested if statements.

On first sight, regardless of the solution, it looks like you have solved the memory leak problem. But that's not quite true. C++ has exceptions. If someCondition() or doStuff() throws an exception, the object will not be deleted and you leak memory.

Using std::unique_ptr and its RAII features, you could write the following code:

{
    auto objA = std::make_unique<MyObject>();

    if (objA->someCondition() == false)
        return;

    auto objB = std::make_unique<MyObject>();

    if (objB->someCondition() == false)
        return;

    objA->doStuff();
    objB->doStuff();
}

Now the code meets the basic exception safety guarantee. Regardless of how many returns are used in this scope, or how many exceptions are thrown, std::unique_ptr makes sure that the resource is always released when exiting the scope.

So what does this all mean?

If you have a resource that should be managed inside a scope, and you want to be 100% sure that the resource is always released, you can use RAII to your advantage.

The cppreference.com RAII example shows how you can use RAII to safely lock() and unlock() a mutex inside a function scope, with multiple return points, using a std::lock_guard. You can create similar guard classes for your own use-cases.

To take advantage or RAII, you must define the constructor and destructor of a class. The constructor either creates the resource, or initializes the class with a given resource. The destructor releases the resource.

Ownership & Lifetime

You might have noticed that I didn't cover topics like smart pointers in this article.

Smart pointers try to address several programming concerns:

Memory management
Ownership
Lifetime

What technique you use to manage memory is often domain and programming language specific. The only commonality is that you need to know about memory management, even in languages with garbage collection.

One aspect of solving the memory management question is that of ownership. Who owns the memory, are different domains sharing ownership of that memory, and who is responsible for freeing it? In C++ you can use std::unique_ptr and std::shared_ptr to explicitly denote ownership of a dynamically allocated object. But that's not your only option.

Another aspect to consider is the lifetime of an object. Object lifetime fundamentally shapes the structure of your program, even if you are not consciously thinking about it. This is a skill that is not taught well enough in programming education.

There are four different storage durations in C++.

Static storage duration
Automatic storage duration
Dynamic storage duration
Thread storage duration

Whenever you use a variable or create an object, you have to decide which of these storage durations it should have, thereby determining its lifetime.

However, the question about the lifetime of an object goes beyond the storage duration. For example, if you are programming a text editor that can display multiple text files, then each text file may be a collection of objects that only need to live as long the text file is open. When the text file is closed, these objects can be freed. The object that manages the open text files will probably have a longer lifetime than the individual text files. And the program lifetime will probably be longer than that of the manager.

Even in this simple example, you can see that there are hierarchical dependencies between lifetimes, going all the way up to the lifetime of your program. Breaking down the lifetime of your objects into these larger groups will help you in finding a suitable memory management strategy for your application.

But it does not stop there. If you have persistency between separate program executions, you have to consider what run-time state has to be stored to disk, what stored state has to be loaded into RAM, and when do they happen with regards to your object lifetimes.

It is crucial to identify and define the lifetime of the objects in your program. This will guide the structure and the logic of your program.

Code Structure

The act of structuring code has a calcifying effect. The more structure you add to your code, the harder it becomes to shape it.

When you are prototyping an idea you want the code to be mendable, while you are exploring the space of possible solutions. Any kind of structure that is enforced by the programming language will slow you down.

At some point, you want some parts of the code to be more structured, more rigid, and less likely to be changed, like interfaces. Whereas other parts of the code need to remain flexible and easy to change, so that the code can be adapted to new requirements.

It is commonly understood that if you have good interfaces and proper modularization, it should be easy to change the implementation. This applies to any programming language.

However, I want to discuss the structure inherent in the concepts of a programming language. Using these concepts will impose structure on your code.

Code Structure: Type

C++ is a statically typed programming language. As such, the compiler checks if the type is used correctly. If a type is not used correctly, the compiler will stop the compilation with an error. This helps to avoid programming mistakes early on.

Type safety is also important when making structural changes to the code. If you change a type that is used in multiple places, you can rely on the compiler to guide you through the necessary changes. This usually means having to change dozens if not hundreds of functions and call sites that rely on this type.

The more complicated the type system is, and the more complicated your types are, the more difficult it becomes to do these kinds of changes. Therefore, you should carefully consider what types you use for functions and interfaces, as this will determine the structure of your code.

Code Structure: Core Type

Core language types like strings, slices, tuples, tagged unions, optionals, and results, are semantically rich, and are good default solutions. They have been tested across many languages, over many years. With the help of statements, they can shape the structure of your code into a set of uniform and agreed upon language patterns.

Having strong core types, that are integrated well into the language, and solve 95% of your issues, is what makes programming day-to-day easier. You don't have to waste your time solving language problems because the language already provides you with a general solution.

Unfortunately, C++ is lagging behind other languages in this regard. There have been improvements in recent years but the support for some of the core types remains rudimentary, or feels like a work-in-progress.

As of this writing, std::string and std::string_view are missing standard string manipulating functions like split and trim. The UTF-8 support in C++ is slowly improving, while UTF-8 has been the standard in other languages and domains for decades. Converting between string representations is a pain. The std::variant visitor is a poor solution to what should be a proper match statement. std::span didn't have a bounds-checking access function until recently (C++26).

Code Structure: Statement

Statements, conditions, and loops, are the essential tools for structured programming. They are semantically rich, they can fundamentally improve the correctness of the code, they make programming day-to-day easier, and their use has a direct impact on the execution and logic of a program.

For example, the range-based for loop has rightfully become the standard in any modern programming language. It is a good default. You have to work harder if you want an index-based for loop. If you use the C++ standard containers, you will rarely have to write your own iterators. This is much better than having to use iterators.

Core language types and statements go hand in hand by supporting and playing off each other. Both can shape the structure of your code into a set of uniform and agreed upon language patterns. The language can guide you towards these patterns by providing good statements.

Unfortunately, C++ is lagging behind other languages in this regard. There have been improvements in recent years but some concepts are lacking proper language support, or they feel like a work-in-progress.

std::variant needs a match statement. Switch statements do not work with string comparisons. Numeric ranges, that are expressed as 1..10 in other languages, are currently expressed as std::ranges::iota_view or std::views::iota in the C++ standard library.

Code Structure: Error Handling

In the Error Handling chapter, I have discussed different error handling techniques in detail.

The error code techniques add a significant amount of structure to your project. In every function you have to implement the actions that the function has to take, as well as all the plumbing that is necessary to pass and handle errors that can occur. This gives you a direct view on what the code is doing. There are no hidden control paths.

Exceptions hide this structure and trade the code complexity for cognitive complexity. You have to know how exceptions work behind the scenes. All your resources have to be handled with RAII, otherwise you risk leaking memory. If done correctly, the error handling is reduced to the point where the exception is thrown, and where the exception is caught. You can focus on implementing the success path without having to worry about error propagation.

Just like with statements, error handling is such an integral part of programming that you cannot really avoid it, if you want to program correctly.

Code Structure: Interface

Interfaces are designed to be stable. They are a contract between the user of an interface and the developer that implements it. As such, interfaces impose structure on both their users and implementers, with the intent to establish a common abstraction that they can talk about.

Depending on the granularity or stability of an interface, you may have to write another abstraction around it, that is more specific to the use-case. You might also have to transform between different data representations, since the model in your application is often not structured in a way that it can be passed directly to the interface.

This imposed structure creates a lot of work and makes the program rigid. However, if good interfaces are put at the right places in a system, you can prevent implementation details and dependencies from being leaked across domain boundaries, allowing you to safely change the implementation at a later time. This is an investment into the long-term stability of your software structure.

Since interfaces have such a big impact, they have to be designed carefully. Otherwise, they can lead to fundamental problems in your software structure.

Code Structure: Class

Classes in C++ try to formalize what object-oriented programming means in the language and its implementation. Therefore, classes impose structure per definition, just like statements do in structured programming. Since classes are also user-defined types, all structural aspects of types that I mentioned above, also apply to classes. Classes also share similarities to interfaces because they expose a set of related functions whose behavior depends on the state of the object.

Once you start using classes, your problem solving toolset will mainly revolve around classes, and the use of object-oriented design patterns. You cannot use classes to their fullest without structuring all your code according to them. This would be acceptable if classes didn't have so many problems in C++.

Classes calcify your code. Especially when you start using inheritance. Having a deep class hierarchy, where functionality is split between different levels of the hierarchy, means that you are never going to change this hierarchy ever again. You are more likely to add more classes to the hierarchy.

Classes make it easy to create complexity. Especially when you start using polymorphism. If used haphazardly, you won't be able to figure out what your application is doing, without running it in the debugger.

Classes leak implementation details and dependencies. The keyword private only restricts access to certain members but it does not hide them. This generally means that when you include a class, you will also include some of its dependencies.

Classes lead to programming language problems like: "Which class does this function belong to?" This is a treacherous one and can lead to two extremes:

Classes that have too many methods, dealing with different use-cases and aspects that are irrelevant to the core concept of the class. A monolith of a class, also known as the god class.
A project with too many classes, each of them implementing one specific use-case. There is one class per file. Each class has one public method. A sprawling web of classes whose function only becomes apparent when viewed as a whole.

The class as a concept does not really answer any real structuring concerns on the single unit level, or module level, or systems level. But when you create a class, it has a serious structural impact on all of these levels, without the users of the class being able to opt-out.

And yet, we use classes all the time. They are a good tool. But I try to be more aware of their structural impact and push against it, when needed.

Code Structure: Template

I have mentioned several times before, that templates tend to spread through the code when you start using them.

When the compiler can deduce the template parameters it is not as bad, because the use of the template does not leak into the call site. But when you have to explicitly specify template parameters, or you have to pass your template parameters to other template functions or template classes, or you are doing compile time polymorphism with templates, this is when templates start to become viral inside your code base.

This can lead to a few problems:

The code is full of templates, making it harder to read and modify.
The template code is difficult to reason about because you do not know what types are used to generate the code.
Most template code must be visible to the caller in order for the code generation to work, thereby leaking implementation details and dependencies into all domains of your code.
Compile errors caused by templates have terrible error messages in C++.
Templates have compile time and compilation size implications that should not be ignored.

I haven't even mentioned all the template meta-programming aspects, which is like a whole different language inside of C++.

I usually stick with simple code generation for most of my day-to-day work. I'm looking forward to proper compile time execution and compile time reflection, for anything more advanced.

Code Structure: Abstraction

I have discussed how functions, structures, and interfaces create abstractions. Lets take a step back and look at how these abstractions affect the program structure.

Once a project becomes large enough, you can start to see the formation of different levels or layers of abstraction. For example, you might have a layer that wraps a platform specific implementation for how data is written to disk, or transferred over the network. The level above contains the business logic and its data models. On top of that, there is a level containing the UI implementation.

Ideally, each layer should be separated by an interface that would allow you to replace the implementation of each level without disturbing the other levels.
For this to work, there typically is a rule that dependencies only point down, and never point up. For example, The UI code can access the business logic, but the business logic must not access the UI code.

When you program in a code base with different layers of abstraction, you have to be aware of what programming style and language features are appropriate for that level. For example, if you are working on a mathematical function in a library, it is inappropriate to use logging in this function. First, you don't expect a mathematical function to log to a file if anything should go wrong. Second, you are adding a needless dependency to a logging library in a mathematical function that could be used anywhere in the code base.
You could argue that logging should be located closer to the UI level implementation. Functions below the UI level must communicate their error state up to the UI level.

You could also look at it from the perspective of language concepts. For example, if you are writing code in one of the lower layers, that directly calls platform specific APIs, it might be inappropriate to use design patterns like Factory, Strategy, and Builder classes, which are meant to compose and configure behavior in one of the higher layers. You can still use a function to create a complex object depending on a parameter, and you can still use a function to select different behaviors at runtime. These are valid techniques that can be used without the introduction of unnecessary structure in a layer that should mainly be focused on the API calls.

What I'm trying to say is, be aware of the code that you are working on. Which layer are you on? What is appropriate for this layer and what isn't? Otherwise you might be adding structure that feels like over-engineering to other people, or dependencies that should be avoided on certain layers.

Conclusion

In this article, I presented the concepts and features that I use to solve about 90% of the day-to-day problems that I encounter. The remaining 10% are either specialized code, or dealing with the intricacies of C++, that require features that are not universally applicable, like macros, iterators, operator overloading, etc.

I discussed the cost of using certain language features and what impact it has on the overall code structure and complexity. This discussion is by no means exhaustive. Nevertheless, I hope that it has become clear that you don't have to use a language feature if it is badly implemented, has a bad interface, or has bad syntax. Especially when the decision to use said feature means that your code becomes more difficult to maintain.

To summarize:

Functions and structures are by far the most useful concepts in any language. They give you enough ways to structure and generalize your code, as well as to create abstractions and interfaces.
Concepts like statements, arrays, slices, errors, and optional return values are universal. They exist in any language because they are at the core of what programs do.
To achieve dynamic behavior that is decided at run-time, rather than at compile time, you have the option of using conditions to select the desired behavior, or you can use one of the many different variants of function pointer indirection. This comes at the cost of introducing more abstraction into your code, which makes it difficult to read and debug. You don't know what implementation is used until run-time.
To generate different versions of code depending on a type, you have function templates and class templates at your disposal. Use them carefully because templates have the tendency to spread through the code once you start using them, making the code more difficult to read and debug.
Classes work well when you have a strong coupling of data and functions that form a closed unit, like a data structure. Be careful when using inheritance and polymorphism because they introduce abstractions and indirections. There are many rules about using inheritance and polymorphism correctly, that you don't need to consider if you don't use these features.
Use C++ language and standard library features that have a good syntax-to-semantics ratio. Consider wrapping the rest in utility functions that can be replaced when the language has improved the ergonomics of these features.

Overview

Introduction

Function

Function: Code Reuse

Function: Abstraction

Function: Generalization

Function Overloading

Function Template

Function: Interface

Function: Function Pointer

Selecting Behavior at Run-Time

Injecting Behavior at Run-Time

Function: Lambda (C++11)

Function: Lambda with Capture

Structure

Structure: User-Defined Types

Structure: Aggregation

Structure: Abstraction

Structure: Generalization

Structure: Opaque Type

Structure: Dynamic Interface

Statement

Enumeration

Enumeration: Scoped Enumeration (C++11)

Union

Union: Tagged Union

Union: std::variant (C++17)

Array

Array: C-Style

Array: std::array (C++11)

Array: std::vector

Slice

Slice: C-Style

Slice: std::span (C++20)

Slice: std::string_view (C++17)

Error Handling

Error Handling: No Error

Error Handling: Success or Failure

Error Handling: Error Code

Error Handling: Shared Value or Error Code Range

Error Handling: Value or Nil

Error Handling: Value or Error Code

Error Handling: std::optional (C++17)

Error Handling: std::expected (C++23)

Error Handling: Exceptions

Namespace

Class

Class: Object

Multiple Objects of a Type

Data Structures as Objects

Class: Generalization

Class Template

Class: Abstraction

Class: Encapsulation

Class: Inheritance

Class: Polymorphism

Class: Dynamic Interface

Class: RAII

Ownership & Lifetime

Code Structure

Code Structure: Type

Code Structure: Core Type

Code Structure: Statement

Code Structure: Error Handling

Code Structure: Interface

Code Structure: Class

Code Structure: Template

Code Structure: Abstraction

Conclusion

Further Reading