Post

Exploring C++26

Checking out features in the latest C++26 spec

Exploring C++26

C++26 spec is finally done, and Herb Sutter is mentioning it is the “most compelling release since C++11.” That is quite the claim given how much of an impact C++11 has had on the language. Herb’s enthusiasm is always infectious, so I figured I would join in on the fun and read the spec to see what new language features were added.

This article is more or less a list of findings I think will be useful for those of us who work in slightly more constrained environments, rather than an overall look at everything in the spec. I’ll try to look into features that help solve real problems within the embedded world and tend to be annoying to maintain (userspace daemons, debug utilities, platform layers, etc.), but some examples might be slightly esoteric to help convey the intent of the feature.

Before we begin, I should note the usual caveats. Those who are not used to reading ISO specs like the currently existing C++26 draft spec should probably start with the cppreference page. The cppreference page will be very “to the point” in terms of of what is being added, and if you need more information, you can cross-reference the ISO PDF for more detail. From there, make sure to always check the cppreference compiler support section to see what feature(s) are available to you. As of writing this, the meta-clang layer supports version 20, openembedded-core supports GCC 15.2. Pretty much all references will be from these links, so make sure to check those out if you need to cross-reference anything.

Why C++26?

A lot of embedded C++ ends up looking older than it needs to because teams optimize for predictability and portability. That is a totally reasonable approach, especially if you need massive support across multiple toolchains. The downside is that codebases with homegrown abstractions, such as custom containers, traps, overflow helpers, and other custom platform code tend to become less predictable as time goes on. C++26 standardizes several things that might help make these a bit more robust in nature without relying on potential UB. Let’s jump into it.

std::execution

Let’s start with a familiar problem that occurs in the embedded Linux world. A service or daemon starts out very simple. It might start by reading a message, validating it, possibly transforming it, publishing it, and eventually updating some state. But then it grows threads, queues, timeouts, and more. Before long, the real logic is hidden under the sea of the synchronization, and it becomes a bit of a massive pain to maintain.

Enter std::execution - the execution control library in C++26 that standardizes a model for async execution around schedulers, senders, receivers, and composition algorithms. The core motivation behind P2300 was that the older std::future/std::promise/std::async model did not scale well to real async systems and did not compose cleanly enough for modern workloads.

std::execution works by allowing users to write code that can be used with different execution resources without baking threading or queueing behavior into application logic. It gives us a standard way to express async workflow as a series of stages, allowing us to focus on the work itself.

Example: Frame reading daemon

Imagine a daemon that reads one frame from a device or socket, validates it, and then publishes it to the rest of the system. std::execution is perfect here as we can break each section up into individual steps, making it way easier to test and reason about.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <execution>
#include <stdexcept>
#include <vector>
#include <cstdint>

struct frame {
    std::vector<std::uint8_t> bytes;
};

// Let's pretend these are implemented elsewhere in the daemon
frame read_frame();
bool validate_frame(const frame& f);
void publish_frame(const frame& f);

template <typename Scheduler>
auto process_one_frame(Scheduler sched) {
    return std::execution::schedule(sched)
         // Step 1:
         // Run the first stage on the provided scheduler. This will help
         // keep scheduling policy separate from business logic
         | std::execution::then([] {
               return read_frame();
           })
         // Step 2:
         // Next, validate the frame we just read. If the validation fails,
         // stop the pipeline by throwing an error
         | std::execution::then([](frame f) {
               if (!validate_frame(f)) {
                   throw std::runtime_error("ERROR: Invalid frame");
               }
               return f; // Return validated frame for next stage
           })
         // Step 3:
         // Publish the validated frame to the next part of the system
         | std::execution::then([](frame f) {
               publish_frame(f);
           });
}

The key idea here is that the async workflow is described as a clear set of steps:

  1. Read a frame
  2. Validate it
  3. Publish it

Something like this is much easier to follow than spreading the same logic across multiple threads, queue handlers, or callback entry points.

For embedded Linux developers, std::execution’s value is that it makes daemon-style async flows look like normal code again. The scheduler stays configurable, while the actual packet or frame handling logic remains easy to read and maintain.

Static Reflection

Static reflection is one of the headline C++26 changes. In fact, it is so big that I could probably write an entire blog post on reflection alone. If you have yet to see Herb’s talk on static reflection where he mentions that C++ has the ability to become the new lingua franca, then definitely take some time to check it out as it is extremely fascinating…ambitious claims aside.

The design in P2996 centers on reflection values (std::meta::info), the reflection operator ^^ (love calling this guy the “cat’s ears operator”), compile-time queries through consteval, and splicing reflected entities back into code.

The paper explicitly covers extremely powerful and complex paradigms such as member enumeration and related introspection capabilities. These make real compile-time structural queries possible which can be insanely powerful. However, for embedded Linux, the best way to think about reflection is probably as a way to help delete duplicate metadata. A lot of platform code still defines the same thing twice - once as a C++ type, and again as a macro list, serializer table, or maybe something like a config schema. Reflection lets the type itself become the source of truth. Heck, even getting rid of legacy X-macro use cases would be a huge win alone.

Example: Dumping a config struct without a parallel table

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include <meta>
#include <string>
#include <type_traits>
#include <cstdint>

// A simple config struct that we want to dump as text
struct daemon_config {
    std::uint32_t poll_ms;
    std::uint32_t retry_count;
    bool verbose;
};

template <typename T>
std::string dump_config(const T& obj) {
    std::string out;
    // This tells the compiler what members we are allowed to inspect
    // from the current location
    constexpr auto ctx = std::meta::access_context::current();

    // Ask reflection for all non-static data members of T
    //
    // ^^T moves from the "normal" world into the "reflection" world.
    // The result is metadata describing T, not an object of type T
    //
    // nonstatic_data_members_of(...) returns compile-time metadata
    // for each normal data member in the struct
    template for (constexpr auto m :
                  std::meta::nonstatic_data_members_of(^^T, ctx)) {
        // Get the member name as text and append "name=" to the output str
        out += std::meta::identifier_of(m);
        out += "=";

        // Turn the reflected member back into "normal" member access
        //
        // If m reflects "poll_ms", this is like writing: obj.poll_ms
        // If m reflects "retry_count", this is like writing: obj.retry_count
        const auto& value = obj.[:m:];

        // Handle bool separately so we print "true"/"false" instead of "1"/"0"
        if constexpr (
            std::is_same_v<std::remove_cvref_t<decltype(value)>, bool>) {
            out += value ? "true" : "false";
        } else {
            // For int fields in this struct, convert the val to text
            out += std::to_string(value);
        }
        // End this line before moving to the next reflected member
        out += "\n";
    }

    return out;
}

The goal of this function is to turn a struct into text like:

1
2
3
poll_ms=1000
retry_count=3
verbose=true

All without having to write code for every field. Normally, you would write some sort of std::string to dump the config, but you are duplicating the struct manually, and adding fields later on requires you to update not only the struct, but also any of those helper functions. Reflection is especially attractive in those kinds of use cases. Examples off the top of my head would be:

  • Config dumps (as shown above)
  • Serializer/deserializer glue code
  • ABI or layout verification helpers

Those tend to be everywhere in our codebases, and we can drastically reduce areas where type changes can silently drift from their associated glue code.

This one does come with a bit of a cost, however. It is very easy to get into the weeds with insane template metaprogramming given how powerful these are. If you’re not used to reading more modern C++ code, there is also an argument that the X-macro alternative is more straightforward. Those are totally valid arguments, so you need to weigh maintenance with some readability and possibly modern C++ language training tradeoffs.

<hazard_pointer> and <rcu>

Many platform daemons read things like:

  • Subscription lists
  • Cached device state
  • Config snapshots

The hard part of making those structures concurrent is often not the swap itself, but rather reclaiming old objects safely as readers may still hold references. <hazard_pointer> and <rcu> help solve the problem in a standardized manner.

Read-Copy-Update

Let’s first start with Read-Copy-Update (RCU), as it is often the better starting point thanks to the usage model being fairly straightforward to explain. Readers enter a cheap, read-side critical section, load the current snapshot, use it, and then exit. Writers publish a replacement and retire the old snapshot after pre-existing readers are guaranteed to be out.

The <rcu> interface includes a series of classes and functions to help achieve this sort of scenario in a standardized way.

Example: Config snapshot

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#include <rcu>
#include <atomic>
#include <memory>
#include <vector>
#include <string>

// This is the actual data we want to share between threads.
// Readers will look at one snapshot at a time, while
// writers will build a brand new snapshot and then swap it in
struct config_snapshot : std::rcu_obj_base<config_snapshot> {
    std::vector<std::string> entries;
};

class shared_config {
public:
    // ptr_ always points at the "current" snapshot that readers should use
    shared_config() : ptr_(new config_snapshot{}) {}
    // The caller gets a pointer to the current snapshot here.
    // While the read-side lock is held, RCU guarantees that this snapshot
    // will not be destroyed out from under us
    const config_snapshot* acquire() const {
        domain_.lock();
        // memory_order_acquire ensures later reads of the snapshot's contents
        // see a fully published obj
        return ptr_.load(std::memory_order_acquire);
    }
    // After this, the caller must no longer keep using the snapshot ptr
    void release() const {
        domain_.unlock();
    }
    // Replace the current configuration with a brand new snapshot.
    // We build a fresh snapshot and then atomically publish it
    void replace(std::vector<std::string> new_entries) {
        auto* next = new config_snapshot{};
        // Fill it in completely before publishing it
        next->entries = std::move(new_entries);

        // Atomically swap the new snapshot in as the live one.
        // ptr_ now points to "next", "old" is the prev snapshot
        auto* old = ptr_.exchange(next, std::memory_order_acq_rel);
        // retire(...) tells RCU to destroy the old snapshot later
        // once it is safe and no readers depend on it
        old->retire(std::default_delete<config_snapshot>{}, domain_);
    }

private:
    // This RCU domain tracks read-side critical sections and
    // deferred cleanup
    mutable std::rcu_domain domain_;

    // Atomic pointer to the current live snapshot.
    std::atomic<config_snapshot*> ptr_;
};

Many services need “cheap” concurrent reads of a current view of state with infrequent updates. The standard library now gives you an RCU vocabulary for that use case instead of requiring one to write a custom reclamation subsystem.

Hazard Pointers

Many people are moving towards lock-free designs, and C++’s standard library is right there to assist. Hazard pointers solve a very specific problem in lock-free code - safely looking at an object that might otherwise be destroyed by another thread. One of the hardest parts of lock-free programming is knowing whether the object it points to is still alive by the time you actually dereference it. If another thread removes that node from the structure and frees it at just the wrong moment, a reader can end up following a dangling pointer.

Hazard pointers prevent that by letting, say, a reader know that it is currently using an object and to not reclaim it until it is done. This makes them useful for lock-free stacks, queues, linked lists, and similar structures where readers or pop operations may briefly hold a raw pointer while another thread could be trying to retire that same object.

The C++26 interface provides a straightforward way to express this. A reader acquires a hazard pointer object, uses `protect()` to safely claim the node it is about to inspect, and later calls `reset_protection()` once it is finished with that object.

Example: Hazard-protected node access

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <hazard_pointer>
#include <atomic>
#include <cstdint>

// std::hazard_pointer_obj_base<node> allows this object
// to use the features of hazard pointers
struct node : std::hazard_pointer_obj_base<node> {
    std::uint32_t value{};
    std::atomic<node*> next{nullptr};
};

std::uint32_t read_head_value(std::atomic<node*>& head) {
    // This is a guard saying, "I may be using a node right now."
    auto hp = std::make_hazard_pointer();
    // Make sure the node we get back is now marked as in use
    // by this hazard pointer so it won't be reclaimed when in use
    node* p = hp.protect(head);
    // Standard empty list check
    if (p == nullptr) {
        return 0;
    }

    // p is now safe to dereference because it is protected by hp
    std::uint32_t v = p->value;
    // We're done, so we can remove the protection.
    // The node can become eligible for reclamation if no other
    // hp protects it.
    hp.reset_protection();

    return v;
}

The key thing to understand is that protect() is not just a normal pointer load, but rather a safe variant as it’s tied to the hazard pointer’s reclamation mechanism. That is the key difference between this and simply doing an atomic read of head and dereferencing the result directly.

If you’re looking for some sort of lock-free read paths but still need safe memory reclamation, mess around with the new hazard pointer feature. It might help clean up some existing codebases that could be implementing their own reclamation scheme.

Contracts

If you’re like me, you probably use assert() more often than you would like to admit. Traditional assert() is inherited from C and implemented as a macro. C++26 now gives us a standardized contracts mechanism for expressing assumptions about program state.

Contracts have three main forms: preconditions, postconditions, and the contract_assert statement. Preconditions and postconditions are attached to function declarations and lambdas, and they let us describe what must be true before a function runs, and what must be true after it returns. contract_assert is then used to check internal conditions inside the function or lambda body. These are a great way to describe things like, “this frame must be large enough,” or “this configuration should have been validated before we got here.”

Example: Polling interval

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <contracts>
#include <cstdint>

// Pre: caller must not pass 0 as a poll interval of 0 makes little sense
// Post: returned poll interval is always within the supported range
std::uint32_t normalize_poll_ms(std::uint32_t requested_ms)
    pre(requested_ms != 0)
    post(result: result >= 100 && result <= 5000)
{
    // By the time we get here, the precondition should have been met
    contract_assert(requested_ms != 0);

    // Clamp small values up to the min supported interval
    if (requested_ms < 100) {
        requested_ms = 100;
    }

    // Clamp large values down to the max supported interval
    if (requested_ms > 5000) {
        requested_ms = 5000;
    }

    // Sanity checking that to guarantee the range before return
    contract_assert(requested_ms >= 100 && requested_ms <= 5000);

    return requested_ms;
}

You could have used assert() here, but contracts allow you to express expectations and guarantees that are part of the interface itself, letting you emphasize intent. You define your assumptions up front, and then your contract_assert statements verify those internal assumptions.

One caveat worth pointing out here, however, is that contracts are still runtime and not compile-time. They are expected to hold true during execution. If a path is never exercised, then a contract violation on that path will never be observed. Still, this is a step in the right direction as C++ continues to moves towards stronger correctness and safety expectations.

<stdckdint.h>

C++26 adds <stdckdint.h> that was originally found in the C23 standard which includes checked integer arithmetic functions like ckd_add, ckd_sub, and ckd_mul. These perform integer arithmetic and report whether the result can be represented in the destination type. These are perfect for areas where overflow bugs show up like computing buffer offsets, conversions between units, total packet size, etc. With these being part of the library now, you can reduce the amount of fallbacks or compiler-specific builtins trying to do the same thing. And since they were part of the C23 codebase, you can use them in your C codebases, as well!

Example: Compute total size of a packet

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <stdckdint.h>
#include <cstddef>

// Compute the total size of a packet made up of a header,
// a payload, and a trailer.
bool compute_total_size(std::size_t header,
                        std::size_t payload,
                        std::size_t trailer,
                        std::size_t& total) {
    std::size_t temp = 0;

    // temp = header + payload
    // If the sum doesn't fit in std::size_t, ckd_add returns true
    if (ckd_add(&temp, header, payload)) {
        return false;
    }

    // total = temp + trailer
    // Same as before, fail if the result cannot be represented
    if (ckd_add(&total, temp, trailer)) {
        return false;
    }

    return true;
}

That is simple and portable, allowing developers to write overflow checks that are explicit and hard to misread.

<debugging>

Finally, C++26 adds the <debugging> header and associated functions for handling things such as breakpoints, and it is genuinely useful for platform work. Debug-trigger hooks often end up wrapped in platform-specific code or #ifdef trees. A standard debugger-aware breakpoint looks much cleaner in comparison.

Example: Breakpoint-aware code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include <debugging>
#include <cstdint>

struct device_status {
    std::uint32_t error_count;
    bool link_up;
};

// Called when a device hits a state we did not expect
void handle_unexpected_device_state(const device_status& status) {
    // If a debugger is attached, we can stop here to inspect
    // the current device state
    if (!status.link_up && status.error_count > 10) {
        std::breakpoint_if_debugging();
    }

    // Normal logging/logic cont...
}

Wrapping up

Most people are probably going to talk about reflection when talking about the latest C++26 spec. It makes sense as it is probably the most powerful feature that it gives us. As with any new version of the language, you do not need to use every feature. Assuming you have the ability to shift the version, see which ones make the most sense for your project and start there. Small wins are more satisfying than major refactors to critical codepaths that take many months of testing and debugging. As always, test your code.

Bonus

Related, but make sure to check out this talk by Keith Stockdale on migrating the Sea of Thieves codebase from C++14 to C++20. It helps show some common pitfalls that can happen and is an interesting talk overall!

Further Reading

Some useful links on modern C++:

Happy metaprogramming!

This post is licensed under CC BY 4.0 by the author.