Archive for the ‘Programming Recipes’ Category

My previous post has been well received by the ecosystem so I have decided to write a short follow-up article on another classical problem that can be solved with similar patterns.

In finance, the drawdown is the measure of the decline from a historical peak in some series of data (e.g. price of a stock on a certain period of time).

For example, here is the hypothetical price series of the fake “CPlusPlus” stock:

You know, the 2008 crisis affected C++ too, renaissance in 2011/2012, some disappointments in 2014/2015 because C++14 was a minor release and “Concepts” didn’t make it, and nowadays the stock is increasing since programmers feel hopeful about C++20.

Drawdowns are the differences between the value at one year and the value at the previous maximum peak. For instance, in 2008 we have a drawdown of 28-12 (16) and in 2015 the “dd” is 35-21 (14):

The Maximum drawdown is just the highest value among all the drawdowns. In the series above, the maximum drawdown is 16.

In economics, MDD is an indicator of risk and so an important problem to solve.

Let’s see how to solve this problem and how to bring more value out of it.

The maximum difference

The MDD problem can be formulated as follows: given an array A, find the maximum difference A[j] - A[i] with j < i. The constraint on i and j makes this problem challenging. Without it we can just find the maximum and minimum elements of the array.

The bruce force approach is quadratic in time: for each pair of indexes i, j we keep track of the maximum difference. The code follows:

int MaxDrawdownQuadratic(const vector<int>& stock)
    auto mdd = 0;
    for (auto j=0u; j<stock.size(); ++j)
        for (auto i=j; i<stock.size(); ++i)
            mdd = std::max(mdd, stock[j] - stock[i]);
    return mdd;

When I facilitate Coding Gym, I suggest the attendees who are stuck to start from the naive solution, if any. The naive solution might be a red herring but it can lead to the optimal solution when we ask and answer key questions.

In this case the key question is “how can we remove/optimize the inner loop?”.

The current solution starts from stock[j] and it goes forward to calculate the differences between that value and all the following ones. This approach considers stock[j] like a peak and goes forward.

The key question turns to “how to avoid going through all the following elements”?

Think about the difference stock[j] - stock[i]. For every i, such a difference is maximized when stock[j] is the maximum value so far (for each j from 0 to i - 1). Thus, we can ignore all the other values since stock[j] - stock[i] would be lower.

The insight comes when we “reverse” our way of thinking about the pairs: we shouldn’t start from stock[j] and then go forward to find the lowest value. Instead, we should start from stock[i] having the previous biggest value cached!

So, instead of looking at each pair, we can just keep track of the maximum value preceding any other index. Here is the idea:

int MaxDrawdown(const vector<int>& stock)
    auto mdd = 0;
    auto maxSoFar = stock.front();
    for (auto i=1u; i<stock.size(); ++i)
        mdd = std::max(mdd, maxSoFar - stock[i]);
        maxSoFar = std::max(maxSoFar, stock[i]);
    return mdd;

The solution is linear in time.

Now it’s time for me to show you how to get more out of this problem. It’s time for me to inspire you to achieve more every time you face with puzzles. There is no limitation.

In the rest of the post, I’m just freely playing with the problem to express myself.

Emerging patterns

As I elaborated in the previous post, sometimes we can combine known patterns to solve problems.

I don’t have a special recipe that brings out patterns from code. Sometimes I just “see” them between the lines (as for Kadane). Other times I try some tricks to discover them. Or I don’t find any patterns at all.

I recently met again the MDD problem (after some years) and it looked to me similar to Kadane. I was biased by Kadane and my mind was unconsciously suggesting me to apply similar patterns to the MDD problem because the code looked similar. It might be dangerous! It doesn’t work every time. Ever heard about “the hammer syndrome”? This is the caveat of “thinking inside the box”. Anyway, in this case my sixth sense was right.

First of all I had an intuition: I realized that all the evolving values of maxSoFar  are totally independent from any other decision points of the algorithm. I could enumerate them separately. One trick to use when searching for patterns is asking the question “which computations can be isolated or extracted?”.

maxSoFar is just a cumulative maximum. For instance:

4 3 2 6 8 5 7 20

The cumulative maximum is:

4 4 4 6 8 8 8 20

The pattern that can generate such a series is prefix sum (when “sum” is not addition but maximum).

So I refactored the original code by isolating the cumulative maximum calculation into a separate vector:

int MaxDrawdown(const vector<int>& stock)
    std::vector<int> maxs(stock.size());
    std::partial_sum(std::begin(stock), std::end(stock), std::begin(maxs), [](auto l, auto r) { return std::max(l, r); });
    auto mdd = 0;
    for (auto i=1u; i<stock.size(); ++i)
        mdd = std::max(mdd, maxs[i] - stock[i]);       
    return mdd;

The next trick is to figure out if the loop hides another pattern. The question is: what kind of operation is underneath the calculation of mdd?

We have some hints:

  • at every step we calculate maxs[i] - stock[i] so we read the ith-value from two sequences,
  • every result of such a calculation is then reduced by applying std::max

Do you know this pattern?

Sure! It’s zip | map | reduce!

See this post for more details.

In other words:

  • zip maxs with stock (we pair them off)
  • map every pair with subtraction
  • reduce the intermediate results of map with std::max

In C++ we can express such a pattern with std::inner_product (I’m not saying “use this in production”, I’m just letting by brain work):

int MaxDrawdown(const vector<int>& stock)
    std::vector<int> maxs(stock.size());
    std::partial_sum(std::begin(stock), std::end(stock), std::begin(maxs), [](auto l, auto r) { return std::max(l, r); });
    return std::inner_product(std::begin(maxs), std::end(maxs), 
                              [](auto l, auto r) { return std::max(l, r); }, 

Now we have a solution that is harder for people not familiar with STL algorithms, an additional scan as well as more memory usage…

Beauty is free

First of all, although the code is not intended for production use, I am already satisfied because my brain has worked out. As you see, the line between production code and “training code” might be more or less marked. In my opinion, our brain can benefit from both training and production “styles” (when they differ).

Now, I would like to push myself even more by giving my own answer to the following question:

What might this code look like in next-generation C++?

What about using ranges? Might that help solve the issues introduced before?

Here is my answer:

int MaxDrawdown(const vector<int>& stock)
    auto maxs = view::partial_sum(stock, [](auto l, auto r){ return std::max(l, r); });
    auto dds = view::zip_with(std::minus<>{}, maxs, stock);
    return ranges::max(dds);

The combination of view::zip_with and ranges::max has displaced std::inner_product. In my opinion, it’s much more expressive.

I hope someone will propose and defend some function objects for min and max so we can avoid the lambda – after all, we have std::minus and std::plus, why not having std::maximum and std::minimum (or such)?

If you are wondering if this code does only one scan, the answer is yes. Every view here is lazy and does not use extra memory.

We can happily argue again that “beauty is free”.


usually the MDD is calculated as a ratio because it makes more sense to display it as a percentage. For example:

float MaxDrawdown(const vector<int>& stock)
    auto maxs = view::partial_sum(stock, [](auto l, auto r){ return std::max(l, r); });
    auto dds = view::zip_with([](auto peak, auto ith) { return (float)(peak-ith)/peak; }, maxs, stock);
    return 100.0f * ranges::max(dds);


Playing with patterns

Consider again the brute force solution:

int MaxDrawdownQuadratic(const vector<int>& stock)
    auto mdd = 0;
    for (auto j=0u; j<stock.size(); ++j)
        for (auto i=j; i<stock.size(); ++i)
            mdd = std::max(mdd, stock[j] - stock[i]);
    return mdd;

We have seen that the optimal solution consists in just scanning the array forward and “caching” the biggest stock[j] so far.

A similar schema applies if we think the solution backwards: we scan the array backwards and cache the lowest price so far:

int MaxdrawdownBackwards(const vector<int>& stock)
    auto mdd = 0;
    auto minSoFar = stock.back();
    for (auto i=stock.size()-1; i>0; --i)
        mdd = std::max(mdd, stock[i-1] - minSoFar);
        minSoFar = std::min(minSoFar, stock[i-1]);
    return mdd;

Getting to the ranges-based solution is not so hard since we know how the problems is broken down into patterns: the forward cumulative maximum is replaced with the backward cumulative minimum. Still the prefix sum pattern. We just change the proper bits:

  • stock is iterated backwards
  • std::min displaces std::max

zip | map | reduce stays the same except for the inputs order (we have to subtract stock[i] to the i-th minimum) and the direction of stock (backwards).

Thus, here is the code:

int MaxDrawdownBackwards(const vector<int>& stock)
   auto mins = view::partial_sum(view::reverse(stock), [](auto l, auto r){ return std::min(l, r); });
   return ranges::max(view::zip_with(std::minus<>{}, view::reverse(stock), mins));

If you have some difficulties at this point, write down the “intermediate” STL code without ranges.

The same challenge gave us the opportunity to find another solution with the same patterns.

Playing with patterns is to programmers creativity as playing with colors is to painters creativity.

Playing with patterns is a productive training for our brain.


Problem variations

Playing with patterns is also useful for tackling problem variations fluently. For instance, if the problem changes to “calculate the minimum drawdown”, we just have to replace ranges::max with ranges::min. That’s possible because we know how the problem has been broken down into patterns.

The MDD problem has interesting variations that can be solved with the same patterns (customizing the proper bits). A couple of challenges are left to the willing reader:

  • Given an array A, find the maximum difference A[j] - A[i] with i < j (MDD is j < i).
    Rephrasing: given a series of stock prices, find the maximum profit you can obtain by buying the stock on a certain day and selling it in a future day. Try your solutions here (alas, up to C++14).
  • Given an array of stock prices, each day, you can either buy one share of that stock, sell any number of shares of stock that you own, or not make any transaction at all. What is the maximum profit you can obtain with an optimum trading strategy? See the full problem and try your solutions here (alas, up to C++14).


Have fun practicing with STL algorithms and ranges!


“How many moves ahead could you calculate?”

“Just one, the best one”.

The legendary answer by José Capablanca – a world chess champion of the last century – indicates a commonly known fact: chess champions win by being better at recognizing patterns that emerge during the game. They remember meaningful chess positions better than beginners. However, experts do not remember random positions effectively better than non-experts.

Patterns serve as a kind of shorthand that’s easier to remember than a meaningless configuration of pieces that could not occur in a real game.

This fact does not apply to chess only. Our brain works by constantly recognizing, learning and refining patterns on the world.

The reason is efficiency: the brain applies such an optimization to ignore some of the possible choices we have in every situation. Thus, experts get better results while thinking less, not more.

You should know another category of experts who are usually good at recognizing patterns: programmers.

We, programmers, get better results while thinking in patterns. We decompose complex problems in combinations of simpler patterns. However, many problems cannot be solved with known patterns. Here is where creativity kicks in.

However, not only creativity enables human beings to solve unknown problems with new ideas, but it’s also the capacity to reinterpret known problems in new and inspiring ways. It’s the art of creating new associations. As Jules Henri Poincaré once said “creativity is the ability of unite pre-existing elements in new combinations that are useful”. In programming, “pre-existing elements” are commonly called patterns.

That’s the spirit of this article. I will revisit a classical problem from another perspective.

The article is a bit verbose because the style is didactic: I spend some time explaining the example problem and the solution. If you already know the Maximum Subarray Problem, you can skip the following section. Even though you know Kadane’s algorithm, it’s worth reading the dedicated section anyway because I get to the solution from a slightly different point of view than the canonical one.

The Maximum Subarray Problem

Let me introduce one protagonist of the story, the famous “Maximum Subarray” problem whose linear solution has been designed by Joseph “Jay” Kadane in the last century.

Here is a simple formulation of the problem:

Given an array of numbers, find a contiguous subarray with the largest sum.

We are just interested in the value of the largest sum (not the boundaries of the subarray).



Max subarray: [4,-1,2,1]. Sum: 6.

Clearly, the problem is interesting when the array contains negative numbers (otherwise the maximum subarray is the whole array itself).

The literature around this problem is abundant. There exist different conversations about that. We’ll focus only on Kadane’s algorithm, the most famous (and efficient) solution.

The theory behind Kadane’s algorithm is not straightforward and it’s beyond the scope of this didactic post. The algorithm lies in an area of programming called Dynamic Programming, one of the hardest techniques to solve problems in computer science – and one of the most efficient as well. The basic principle of Dynamic Programming consists in breaking a complex problem into simpler sub-problems and caching their results.

For example, the task of calculating “1+1+1+1” can be broken down this way: first we calculate “1+1=2”, then we add “1” to get “3” and finally we add “1” to get “4”. Implicitly, we have “cached” the intermediate results since we did not started from scratch every time – e.g. to calculate (1+1)+1 we started from “1+1=2”. A more complex example is Fibonacci: each number is calculated from the famous formula fibo(n) = fibo(n-1) + fibo(n-2).

For example:

fibo(4) = fibo(3) + fibo(2)

= fibo(2) + fibo(1) + fibo(2)

= fibo(1) + fibo(0) + fibo(1) + fibo(2)

= fibo(1) + fibo(0) + fibo(1) + fibo(1) + fibo(0)

The sub-problems are called “overlapping” since we solve the same sub-problem multiple times (fibo(2) called twice and fibo(1) and fibo(0) three times). However, the main characteristic of Dynamic Programming is that we do not recalculate the sub-problems that we have already calculated. Instead, we “cache” them. Without stepping into further details, there exist two opposite approaches which come with a corresponding caching strategy: Top-Down and Bottom-Up. Roughly speaking, the former is recursive, the latter is iterative. In both we maintain a map of already solved sub-problems. More formally, in the Top-Down approach the storing strategy is called memoization, whereas in the Bottom-Up one it is called tabulation.

In Bottom-Up we go incrementally through all the sub-problems and reuse the previous results. For instance:

table[0] = 0;
table[1] = 1;

for(auto i=2; i<=n; ++i)
   table[i] = table[i-1] + table[i-2];
return table[n];

On the other hand, Top-Down involves recursion:

// suppose memo has size n+1
int fibo(std::vector<int>& memo, int n) 
   if(n < 2)
      return n;

   if(memo[n] != 0)
      return memo[n];

   memo[n] = fibo(memo, n-1) + fibo(memo, n-2);
   return memo[n];

Now that we have a bit of background, let’s quickly meet Kadane’s algorithm.

Kadane’s algorithm

Kadane’s algorithm lies in the Bottom-Up category of Dynamic Programming, so it works by first calculating a solution for every sub-problem and then by using the final “table” of results in some way. In Fibonacci, we use the table by just popping out its last element. In Kadane we do something else. Let’s see.

My explanation of the algorithm is a bit different from the popular ones.

First of all, we should understand how the table is filled. Differently than Fibonacci, Kadane’s table[i] does not contain the solution of the problem at index i. It’s a “partial” result instead. Thus, we call such a table “partial_maxsubarray”.

partial_maxsubarray[i] represents a partial solution on the subarray ending at the ith index and including the ith element. The last condition is the reason why the result is partial_. Indeed, the final solution might not include the ith element.

Let’s see what this means in practice:


partial_maxsubarray[0] means solving the problem only on [-3], including -3.
partial_maxsubarray[1] is only on [-3, 1], including 1.
partial_maxsubarray[2] is only on [-3, 1, -3], including -3.
partial_maxsubarray[3] is only on [-3, 1, -3, 4], including 4.
partial_maxsubarray[4] is only on [-3, 1, -3, 4, -1], including -1.
partial_maxsubarray[5] is only on [-3, 1, -3, 4, -1, 2], including 2.
partial_maxsubarray[6] is only on [-3, 1, -3, 4, -1, 2, 1], including 1.
partial_maxsubarray[7] is only on [-3, 1, -3, 4, -1, 2, 1, -5], including -5.
partial_maxsubarray[8] is only on [-3, 1, -3, 4, -1, 2, 1, -5, 4], including 4.

For each index i, the ith element will be included in the partial_maxsubarray calculation. We have only a degree of freedom: we can change where to start.

Consider for example partial_maxsubarray[2]. If the main problem was on [-3, 1, -3], the solution would have been 1 (and the subarray would have been [1]). However, partial_maxsubarray[2] is -2 (and the subarray is [1, -3]), because of the invariant.

Another example is partial_maxsubarray[4] that is not 4 as you might expect, but 3 (the subarray is [4, -1]).

How to calculate partial_maxsubarray[i]?

Let’s do it by induction. First of all, partial_maxsubarray[0] is clearly equal to the first element:

partial_maxsubarray[0] = -3

Then, to calculate the next index (1) we note that we have only one “degree of freedom”: since we must include 1 anyway, we can either extend the current subarray by one (getting [-3, 1]) or we can start a new subarray from position 1. Let me list the two options:

  1. extend the current subarray, getting [-3, 1], or
  2. start a new subarray from the current index, getting [1].

The choice is really straightforward: we choose the subarray with the largest sum! Thus, we choose the second option (partial_maxsubarray[1] = 1).

To calculate partial_maxsubarray[2]:

  1. keep 1, [1, -3], or
  2. start a new subarray [-3]

Clearly, the former is better (partial_maxsubarray[2] = -2).

Again, partial_maxsubarray[3]:

  1. keep 4, [1, -3, 4], or
  2. start a new subarray [4]

The latter is larger (partial_maxsubarray[3] = 4).

Do you see the calculation underneath?

For each index, we calculate partial_maxsubarray[i] this way:

partial_maxsubarray[i] = max(partial_maxsubarray[i-1] + v[i], v[i])

At each step i, we decide if either start a new subarray from i or extend the current subarray by one on the right.

Once we have filled partial_maxsubarray, do you see how to use it to calculate the solution to the main problem?

Let’s recall how we calculated partial_maxsubarray[2]:

partial_maxsubarray[0] = -3
partial_maxsubarray[1] = 1

partial_maxsubarray[2] = max(partial_maxsubarray[1] + v[2], v[2])

Since v[2] is -3, we ended up with -2. Thus, partial_maxsubarray[1] is larger than partial_maxsubarray[2].

Running the algorithm on the remaining indexes we get:

partial_maxsubarray[3] = 4
partial_maxsubarray[4] = 3
partial_maxsubarray[5] = 5
partial_maxsubarray[6] = 6
partial_maxsubarray[7] = 1
partial_maxsubarray[8] = 5

It turns out that partial_maxsubarray[6] has the largest value. This means there is a subarray ending at index 6 having the largest sum.

Thus, the solution to the main problem is simply calculating the maximum of partial_maxsubarray.

Let’s write down the algorithm:

int kadane(const vector<int>& v)
    vector<int> partial_maxsubarray(v.size());
    partial_maxsubarray[0] = v[0];
    for (auto i = 1u; i<v.size(); ++i) 
      partial_maxsubarray[i] = std::max(partial_maxsubarray[i-1] + v[i], v[i]);

    return *max_element(begin(partial_maxsubarray), end(partial_maxsubarray));

If you knew this problem already, you have probably noticed this is not the canonical way to write Kadane’s algorithm. First of all, this version uses an extra array (partial_maxsubarray) that is not used at all in the classical version. Moreover, this version does two iterations instead of just one (the first for loop and then max_element).

“Marco, are you kidding me?” – Your subconscious speaks loudly.

Stay with me, you won’t regret it.

Let me solve the two issues and guide you to the canonical form.

To remove the support array, we need to merge the two iterations into one. We would kill two birds with one stone.

We can easily remove the second iteration (max_element) by calculating the maximum along the way:

int kadane(const vector<int>& v)
    vector<int> partial_maxsubarray(v.size());
    partial_maxsubarray[0] = v[0];

    auto maxSum = partial_maxsubarray[0];
    for (auto i = 1u; i<v.size(); ++i) 
        partial_maxsubarray[i] = std::max(partial_maxsubarray[i-1] + v[i], v[i]);
        maxSum = max(maxSum, partial_maxsubarray[i]);
    return maxSum;

After all, a maximum is just a forward accumulation – it never goes back.

Removing the extra array can be done by observing that we do not really use it entirely: we only need the previous element. Even in Fibonacci, after all we only need the last two elements to calculate the current one (indeed, removing the table in Fibonacci is even easier). Thus, we can replace the support array with a simple variable:

int kadane(const vector<int>& v)
    int partialSubarraySum, maxSum;
    partialSubarraySum = maxSum = v[0];
    for (auto i = 1u; i<v.size(); ++i) 
        partialSubarraySum = max(partialSubarraySum + v[i], v[i]);
        maxSum = max(maxSum, partialSubarraySum);
    return maxSum;

The code above is likely more familiar to readers who already knew Kadane’s algorithm, isn’t it?

Now, let’s have some fun.

Emerging patterns

As most people, the first time I saw Kadane’s algorithm, it was in the canonical form. At the time, I didn’t notice anything particular. It was 2008 and I was at the university.

Many years passed and I met the problem again in 2016. In the last years, I have been regularly practicing with coding challenges to develop my ability to “think in patterns”. With “pattern” I mean simply a “standard solution to a standard problem, with some degree of customization”. For example, “sorting an array of data” or “filtering out a list” are patterns. Many implementations of patterns are usually provided in programming languages standard libraries.

I am used to consider every C++ standard algorithm as a pattern. For example, std::copy_if and std::accumulate are, for me, two patterns. Some algorithms are actually much more general in programming. For example, std::accumulate is usually known in programming as fold or reduce. I have talked about that in a previous post. On the other hand, something like std::move_backwards is really C++ idiomatic.

Thinking in patterns can be some good for many reasons.

First of all, as I have mentioned at the beginning of this article, our brain is designed to work this way. Cognitive scientist call “the box” our own state of the art, our own model of the world that enables us to ignore alternatives. Clearly, the box has pros and cons. Constantly thinking inside the box works as long as we deal with known problems. Thinking outside the box is required to solve new challenges. This is creativity.

When I think of creativity, I think of cats: they can be coaxed but they don’t usually come when called. We should create conditions which foster creativity. However, something we can intentionally influence is training our own brain with pattern recognition and application. To some extent, this is like “extending our own box”. This is what I have been doing in the last years.

Another benefit of thinking in patterns is expressivity. Most of the times, patterns enable people to express algorithms fluently and regardless of the specific programming language they use. Patterns are more declarative than imperative. Some patterns are even understandable to non-programmers. For example, if I ask my father to “sort the yogurt jars by expiration date and take the first one”, that’s an easy task for him.

So, in 2016 something incredible happened. When I met again Kadane’s algorithm, my brain automatically recognized two patterns from the canonical form. After visualizing the patterns in my mind, I broke down the canonical form in two main parts. This is why I first showed you this version of the algorithm:

int kadane(const vector<int>& v)
    vector<int> partial_maxsubarray(v.size());

    partial_maxsubarray[0] = v[0];
    for (auto i = 1u; i<v.size(); ++i) 
      partial_maxsubarray[i] = std::max(partial_maxsubarray[i-1] + v[i], v[i]);

    return *max_element(begin(partial_maxsubarray), end(partial_maxsubarray));

The second pattern is clearly maximum (that is a special kind of reduce, after all).

What is the first one?

Someone might think of reduce, but it is not. The problem with reduce is that it does not have “memory” of the previous step.

The pattern is prefix sum. Prefix sum is a programming pattern calculating the running sum of a sequence:

array = [1, 2, 3, 4]
psum  = [1, 3, 6, 10]

How does that pattern emerge from Kadane’s algorithm?

Well, “sum” is not really an addition but it’s something different. The update function emerges from the loop:

thisSum = std::max(previousSum + vi, vi);

Imagine to call this line of code for every element of v (vi).

In C++, prefix sum is implemented by partial_sum. The first element of partial_sum is just v[0].

Here is what the code looks like with partial_sum:

int kadane(const vector<int>& v)
    vector<int> partial_maxsubarray(v.size());

    partial_sum(begin(v), end(v), begin(partial_maxsubarray), [](auto psumUpHere, auto vi){
        return max(psumUpHere + vi, vi);
    return *max_element(begin(partial_maxsubarray), end(partial_maxsubarray));

When I ran this code getting a green bar I felt very proud of myself. I didn’t spend any effort. First of all, my brain recognized the pattern from the hardest version of the code (the canonical form). My brain popped this insight from my unconscious tier to my conscious reasoning. Then I did an intermediate step by arranging the code in two main parts (the cumulative iteration and then the maximum calculation). Finally, I applied partial_sum confidently.

You might think this is useless. I think this is a great exercise for the brain.

There is something more.

Since C++17, the code is easy to run in parallel:

int kadane(const vector<int>& v)
    vector<int> partial_maxsubarray(v.size());

    inclusive_scan(execution::par, begin(v), end(v), begin(partial_maxsubarray), [](auto psumUpHere, auto vi){
        return max(psumUpHere + vi, vi);

    return *max_element(execution::par, begin(partial_maxsubarray), end(partial_maxsubarray));

inclusive_scan is like partial_sum but it supports parallel execution.

Beauty is free

Some time ago I read a short post titled “Beauty is free” that I cannot find anymore. The author showed that the execution time of an algorithm coded with raw loops gave the same performance as the same one written with STL algorithms.

Compared to the canonical form, our “pattern-ized” alternative does two scans and uses an extra array. It’s clear that beauty is not free at all!

The reason why I am writing this article now and not in 2016 is that I have finally found some time to try my solution with range v3. The result – in my opinion – is simply beautiful. Check it out:

int kadane(const vector<int>& v)
   return ranges::max(view::partial_sum(v, [](auto s, auto vi) { return std::max(s+vi, vi); }));

view::partial_sum is a lazy view, meaning that it applies the function to the ith and (i-1)th elements only when invoked. Thus, the code does only one scan. Moreovwer, the support array is vanished.

Running a few performance tests with clang -O3, it seems that the optimizer does a better job on this code rather than on the canonical form. On the other hand, the code does not outperform the canonical one on GCC. As I expect, running the range-based code in debug is about 10 times slower. I have not tried on Visual Studio. My tests were not accurate so please take my affirmations with a grain of salt.

I would like to inspire you to take action. Practice is fundamental.

A common question people ask me is “how can I practice?”. This deserves a dedicated post. Let me just tell you that competitive programming websites are a great source of self-contained and verifiable challenges, but they are not enough. You should creatively use real-world problems to practice.

Ranges are the next-generation of STL. Ranges are the next-generation of C++.

However, if you want to learn how to use ranges, you have to know and apply STL patterns first.

Ranges are beyond the scope of this article. The documentation is here. A few good posts here and here. Help is needed as it was in 2011 to popularize C++11.

I hope to blog again on some extraordinary patterns and how to use them. For now, I hope you have enjoyed the journey through a new interpretation of Kadane’s algorithm.

Some scientific notions of this article come from The Eureka Factor.

C++17 added support for non-member std::size, std::empty and std::data. They are little gems for generic programming. Such functions have the same purpose of std::begin and the rest of the family: not only can’t you call functions on C-arrays (e.g. arr.begin() or arr.size()), but also free-functions allow for more generic programming because they can be added afterwards on classes you cannot modify.

This post is just a note about using std::size and std::empty on static C-strings (statically sized). Maybe it’s a stupid thing but I found more than one person others than me falling into such “trap”. I think it’s worth sharing.

To make it short, some time ago I was working on a generic function to compare strings under a certain logic that is not important to know. In an ideal world I would have used std::string_view, but I couldn’t mainly for backwards-compatibility. I could, instead, put a couple of template parameters. Imagine this simplified signature:

template<typename T1, typename T2>
bool compare(const T1& str1, const T2& str2);

Internally, I was using std::size, std::empty and std::data to implement my logic. To be fair, such functions were just custom implementations of the standard ones (exhibiting exactly the same behavior) – because at that time C++17 was not available yet on my compiler and we have had such functions for a long time into our company’s C++ library.

compare could work on std::string, std::string_view (if available) and static C-strings (e.g. “hello”). While setting up some unit tests, I found something I was not expecting. Suppose that compare on two equal strings results true, as a normal string comparison:

EXPECT_TRUE(compare(string("hello"), "hello"));

This was not passing at runtime.

Internally, at some point, compare was using std::size. The following is true:

std::size(string("hello")) != std::size("hello");

The reason is trivial: “hello” is just a statically-sized array of 6 characters. 5 + the null terminator. When called in this case, std::size just gives back the real size of such array, which clearly includes the null terminator.

As expected, std::empty follows std::size:

EXPECT_TRUE(std::empty("")); // ko

EXPECT_TRUE(std::empty(string(""))); // ok

EXPECT_TRUE(std::empty(string_view(""))); // ok

Don’t get me wrong, I’m not fueling an argument: the standard is correct. I’m just saying we have to be pragmatic and handle this subtlety. I just care about traps me and my colleagues can fall into. All the people I showed the failing expectations above just got confused. They worried about consistency.

If std::size is the “vocabulary function” to get the length of anything, I think it should be easy and special-case-free. We use std::size because we want to be generic and handling special cases is the first enemy of genericity. I think we all agree that std::size on null-terminated strings (any kind) should behave as strlen.

Anyway, it’s even possible that we don’t want to get back the length of the null-terminated string (e.g. suppose we have an empty string buffer and we want to know how many chars are available), so the most correct and generic implementation of std::size is the standard one.

Back to compare function I had two options:

  1. Work around this special case locally (or just don’t care),
  2. Use something else (possibly on top of std::size and std::empty).

Option 1 is “local”: we only handle that subtley for this particular case (e.g. compare function). Alas, next usage of std::size/empty possibly comes with the same trap.

Option 2 is quite intrusive although it can be implemented succinctly:

namespace mylib
   using std::size; // "publish" ordinary std::size
   // on char arrays
   template<size_t N>
   constexpr auto size(const char(&)[N]) noexcept
      return N-1;

   // other overloads...(e.g. wchar_t)

You can even overload on const char* by wrapping strlen (or such). This implementation is not constexpr, though. As I said before: we cannot generally assume that the size of an array of N chars is N – 1, even if it’s null-terminated.

mylib::empty is similar.

EXPECT_EQ(5, mylib::size("hello"));  // uses overload
EXPECT_EQ(5, mylib::size(string("hello")); // use std::size
EXPECT_EQ(3, (mylib::size(vector<int>{1,2,3})); // use std::size

Clearly, string_view would solves most of the issues (and it has constexpr support too), but I think you have understood my point.

[Edit] Many people did not get my point. In particular, some have fixated on the example itself instead of getting the sense of the post. They just suggested string_view for solving this particular problem. I said that string_view would help a lot here, however I wrote a few times throughout this post that string_view was not viable.

My point is just be aware of the null-terminator when using generic functions like std::size, std::empty, std::begin etc because the null-terminator is an extra information that such functions don’t know about. That’s it. Just take actions as you need.

Another simple example consists in converting a sequence into a vector of its underlying type. We don’t want to store the null-terminator for char arrays. In this example we don’t even need to use std::size but just std::begin and std::end (thanks to C++17 template class deduction):

template<typename T>
auto to_vector(const T& seq)
  return vector(begin(seq), end(seq));

Clearly, this exhibits the same issue discussed before, requiring extra logic/specialization for char arrays.

I stop here, my intent was just to let you know about this fact. Use this information as you like.



TL;DR: Just know how std::size and std::empty work on static C-strings.

  • static C-strings are null-terminated arrays of characters (size = number of chars + 1),
  • std::size and std::empty on arrays simply give the total number of elements,
  • be aware of the information above when using std::size and std::empty on static C-strings,
  • it’s quite easy to wrap std::size and std::empty for handling strings differently,
  • string_view could be helpful.

I know, it’s been a while since the last time I published something on my blog. The main reason is that in my spare time – apart from private life – I’ve been committed to organize events and activities in Italy and also to work on a personal project with a great friend of mine. Anyway, I found some time to share a new blog post I hope you will like.

This article is also part of my series C++ in Competitive Programming.

In the very first installment of this series, I showed an example whose solution amazed some people. Let me recall the problem: we have to find the minimum difference between any two elements in a sorted sequence of numbers. For example:

[10, 20, 40, 100, 200, 300, 1000]

The minimum difference is 10, that is 20-10. Any other combination is greater. Then, I showed an unrolled solution for such problem (not the most amazing one!):

Imagine there is always at least one element in the sequence. Note that we calculate elems[i+1]-elems[i] for each i from 0 to length-1, meanwhile, we keep track of the maximum of such differences. I see a pattern, do you?

Before getting to the point, let me show you another example. We want to calculate the number of equal adjacent characters in a string:


That is 4:


Again, here is a solution:

We compare s[i] with s[i-1] and, meanwhile, we count how many trues we get. Careful readers will spot a little difference from the previous example: at each step, we access s[i] and s[i-1], not s[i+1] and s[i]. I’ll develop this subtlety in a moment. Now, please, give me another chance to let you realize the pattern yourself, before I elaborate more.

This time we have two vectors of the same size, containing some values and we want to calculate the maximum absolute difference between any two elements. Imagine we are writing a test for a numerical computation, the first vector is our baseline (expectation) and the second vector is the (actual) result of the code under test. Here is some code:

Here we access the ith-elements of both the vectors at the same time. Is that similar to the other examples?

It’s time to get to the point, although you are already there if you have read the intro of this series – if you have not, please stay here and don’t spoil the surprise yourself!

Actually, there is not so much difference among the examples I showed. They are all obeying the same pattern: given two sequences, the pattern combines every two elements from input sequences at the same position using some function and accumulates these intermediate results along the way.

Actually, in functional programming, this pattern is the composition of three different patterns:

zip | map | fold

zip makes “pairs” from input sequences, map applies a function to each pair and returns some result, fold applies an operation to reduce everything to a single element. A picture is worth a thousand words:

For simplicity, imagine that zip and map are combined together into a single operation called zipWith.

Now we have two customization points:

  1. which function zipWith uses to combine each pair, and
  2. which function fold uses to reduce each result of zipWith to a single element.

The general case of this pattern operates on any number of sequences, making a tuple for each application of zip (e.g. imagine we zip the rows of a matrix, we get its columns).

In C++ we have an algorithm that (partially) implements this pattern: inner_product. I said “partially” just because it accepts only two ranges and for this reason I say “pairs of elements”, not tuples – as in the general case. In C++17’s parallel STL, inner_product is made in parallel by transform_reduce (be aware of the additional requirements).

In the future we’ll do such things by using new tools that will be incorporated into the standard: ranges. For now, inner_product is an interesting and (sometimes) understimated tool we have. Regardless you are going to use such pattern in real-world code, I think that understanding when this pattern applies is mindblowing. If you regularly practice competitive programming as I do, you have the opportunity to recognize many patterns to solve your problems. In the last years I have found several cases when this pattern worked smoothly and I am sharing here a few.

The simplest form of inner_product takes two ranges of the same length and a starting value, and it calculates the sum of the products of each pair. It’s literally an inner product between two vectors. inner_product has two additional customization points to replace “product” and “sum” as we wish.

Let’s have some fun.

It’s time to code the solutions to the previous challenges in terms of inner_product. I start from the latter.

I recall that we want to find the maximum absolute difference of two vectors of double. In this case we replace “product” with “absolute difference” and “sum” with “max”. Or, we combine each pair by calculating the absolute difference and we keep track of the maximum along the way. I stress the fact that we reduce the combined pairs along the way and not at the end: inner_product is a single-pass algorithm (e.g. it works on stream iterators).

Here is the code:

I tend to use standard function objects as much as possible. They are just clearer, shorter and (should be) well-known. Thinking a bit more we come up with:

Better than the other? It’s debatable, I leave the choice to you.

The other two challenges are on a single sequence, aren’t they? Does the pattern still apply?

It does.

Zipping two distinct ranges is probably more intuitive, but what about zipping a sequence with itself? We only have to pass correct iterators to inner_product. We want to combine s[i] with s[i-1]. zipWith should use operator== and fold operator+. For the first sequence we take S from the second character to the last character. For the second sequence we take S from the first character to the second last character. That is:

  1. S.begin() + 1, S.end()
  2. S.begin(), S.end()-1

We have to pass the first three iterators to inner_product:

We zip with equal_to which uses operator== under the hood and we fold with plus<> which applies operator+. As you see, not having to specify the second sequence’s boundary is quite handy. If the solution is not clear, I hope you will find this picture useful:

When equal_to is called, the left hand side is S[i] and the right hand side is S[i-1]. In other words, the first range is passed as the first parameter to zipWith and the second range as the second parameter.

Careful readers will spot a subtle breaking change: the solution is not protected against an empty string anymore. Advancing an iterator that is not incrementable (e.g. end) is undefined behavior. We have to check this condition ourself, if we need. This example on HackerRank never fall into such condition, so the solution is just fine.

Finally, in the first exercise we are requested to calculate the minimum difference between any two elements in a sorted sequence of numbers. I intentionally wrote elems[i+1]-elems[i] and not elems[i]-elems[i-1]. Why? Just to show you another form of the same pattern. This one I like less because the call to inner_product is more verbose:

We can apply the other pattern by (mentally) turning the loop into elems[i] – elems[i-1]:

As before, the solution is not protected against an empty sequence. You understand that zipping a sequence on itself (by shifting it) is never protected against an empty range.

This pattern works in all the examples above just because the stride between two elements is 1. If it was greater, it would have failed – I know, we can use boost::strided or such but I don’t mind here. Basically, we have processed adjacent elements in a “window” of size 2. There are scenarios where this window can be larger and inner_product still applies.

As an example we take Max Min on HackerRank. This problem is very close to calculating the minimum difference between any two elements in a sorted sequence of numbers. It states: Given a list of N non-unique integers, your task is to select K integers from the list such that its unfairness is minimized. If (x1,x2,…,xk) are K numbers selected from the list, the unfairness is defined as:

max(x1, x2, …, xk) – min(x1, x2,…, xk)

A possible solution to this problem consists in sorting the sequence and applying zipWith | fold as we did in the very first example. The only difference is that the distance between the two elements we zip together is K-1:

Do not misunderstand: inner_product still steps by one every time and still combines elements in pairs. It’s just that we zip the sequence with itself by shifting it by K-1 positions and not by just 1. Here is what I mean:

As you see, although the size of the window is K, inner_product still works in the same way as before. The pairs that it conceptually creates (remember that the first sequence is shifted by K-1 positions) are depicted below:


This works only if K is less than or equal to N (and N has to be at least 1).

The pattern fits this problem because we turn the sequence in a particular structure: we sort it. We have to select K elements and we know that min(x1,…,xk) is x1 and max(x1,…,xk) is xk. This is just an effect of sorting. So we just check all these possible windows, incrementally, by using only x1 and xk. We may ignore erverything inside the window. Another interesting property is that the first range passed to inner_product is always greater (the problem states that all elements are distinct) than the second, for each iteration. This is why we can use minus<> for zipWith. If we wanted the opposite, we would have changed the order of the iterators or we would have iterated backwards. Using algorithms make variations simpler than rolling a for loop.


Recap for the pragmatic C++ competitive coder:

  • In C++, (zip | map | fold) on two ranges is implemented by inner_product:
    • set the first callable to customize fold (plus by default);
    • set the second callable to customize (zip | map) – combined in a single operation (multiplies by default);
    • the order of the iterators matter: the first range is the left hand side of zipWith, the second range is the right hand side;
  • zipping a sequence on itself is just the same pattern;
    • be aware it won’t work with ranges shorter than the number of positions we shift the sequence by (e.g. 1 in the first 3 examples);
  • praciting recognizing and understanding coding patterns is food for the brain.

string_view odi et amo

Posted: January 3, 2017 in Programming Recipes
Tags: ,

string_view-like wrappers have been successfully used in C++ codebases for years, made possible by libraries like boost::string_ref. I think all of you know that string_view has joined the C++ standard library since C++17.

Technically, basic_string_view is an object that can refer to a constant contiguous sequence of char-like objects with the first element of the sequence at position zero. The standard library provides several typedefs for standard character types and std::string_view is simply an alias for:


For simplicity, I’ll just refer to string_view for the rest of the post but what I’m going to discuss is valid for the other aliases as well.

You can imagine string_view as a smart const char* which provides any const member function of std::string as well as a few handy utilities to reduce its span. You cannot enlarge a string_view until you reassign it. Other languages (e.g. Go) have similar constructs that permit to grow the range as well as to participate in the ownership of such range. Although string_view does not, the power of such simple wrapper is huge, though.

The applications of string_view are many and it’s relatively simple to let string_view join your codebase. For years, I’ve been using a proprietary implementation of string_view dated back to the 90s and then improved on the base of boost::string_ref and recently on std::string_view. If you start today, it’s very likely you can adoperate your compiler’s string_view implementation (e.g. latest Visual Studio 2017 RC, clang and GCC support it), you can grab an implementation from the web or you can just use boost::string_ref or another library (e.g. Google’s, folly).

One can think that using string_view is as simple as using std::string with the only difference that string_view does not take the ownership of the char sequence and cannot change its content. That’s not completely true. Adoperating string_view requires you to pay attention to a few other traps that I’m going to describe later on. Before starting, let me show you a couple of simple examples.

Generally speaking, string_view is a good friend when we need to do text processing (e.g. parsing, comparing, searching), but first of all, string_view is an adapter: it allows different string types to be adapted into a std::string-like container. This means that string_view provides iterator support and STL naming conventions (e.g. size, empty). To create a string_view, we only require a null-terminated const char* or both a const char* and a length. Note that in the latter case we don’t need the char sequence to be null-terminated.

Suppose now that our codebase hosts many different string types but we want to write only one function doing a certain task on constant strings. Can string_view help? It can, if the string types manage a contiguous sequence of characters and also provide (read) access to it. Examples:

Then we may write only one function for our task:

ReturnType readonly_on_string_function(string_view sv); // only one implementation

Into readonly_on_string_function we can exploit the whole set of const functions of std::string. Just this simple capability is priceless. You know what I mean if you use more than three string types into your codebase 🙂

To show you other string_view functionalities, let me consider the problem of splitting a string. This problem can be tackled in many ways (e.g. iterator-based, range-based, etc) but let me keep things simple:

The worst things of this function are (imho):

  • we create a new string for each token (this possibly ends up with dynamic allocation);
  • we can split only std::string and no other types.

Since string_view provides every const function of string, let’s try simply replacing string with string_view:

Not only is the code still valid, but also potentially less demanding because we just allocate 8/16 bytes (respectively on 32 and 64 bit platforms – a pointer and a length) for each token.

Now, let’s use some utilities to shrink the span. Suppose I get a string from some proprietary UI framework control, providing its own string representation:

auto name = uiControl.GetText();

Then imagine we want to remove all the whitespaces from the start and the end of such string (we want to trim). We can do it without changing the string itself, just by using string_view:

remove_prefix moves the start of the view forward by n characters, remove_suffix does the opposite. Edge cases have been handled succinctly.

Now we have a string_view containing only the “good” part of the string. At this point, let me end with a bang: we’ll use the sanitized string to query a map without allocating extra memory for the key. How? Thanks to heterogeneous lookup of associative containers:

That’s possible because less<> is a transparent comparator and string_view can be implicitly constructed from std::string (thus, we don’t need to write operator< between std::string and std::string_view). That’s powerful.

It should be clear that string_view can be dramatically helpful to your daily job and I think it’s quite useless to show you other examples to support this fact. Rather, let me discuss a few common pitfalls I have met in the last years and how to cope with them.

#1: “losing sight of the string”

The first error I have encountered many times is storing string_view as a member variable and forgetting that it will not participate in the ownership of the char sequence:

Suppose that Parse is never called with a temporary (moreover, we can enforce that assumption just by deleting such overload), this code is still fine because the caller of Parse has also ‘current’ in scope. Then some time later, a programmer that is not very familiar with string_view (or who is simply heedless) puts the following error in the code:

‘someProcessing’ is a temporary string and then StatefulParser will very likely refer to garbage.

So, string_view (as well as span, array_view, etc) is often not recommended as a data member. However, I think that string_view as data member sometimes is useful and in these scenarios we need to be prudent, just like using references and pointers as data members.

#2: replacing const string& with string_view

string_view seems a drop-in replacement of const std::string& because it provides the whole set of std::string‘s const functions and also because it’s a view (reference). So, the general rule you hear pretty much everywhere (especially nowdays that string_view has officially joined the C++ standard) is “whenever you see const string&, just replace it with string_view“.

So let’s do that:

void I_dont_know_how_string_will_be_used_but_i_am_cool(const string& s);

We turn into:

void I_dont_know_how_string_will_be_used_but_i_am_cool(string_view s);

As users of this function, we are now permitted to pass whatever valid string_view, aren’t we?

As writers of this function, we may have now serious problems.

We have introduced a subtle change to our interface that breaks a sort of guarantee that we had before:  null-termination. string_view does not require (and then does not necessarily handle) a null-terminated sequence. On the other hand, string guarantees to get one back – with c_str().

Maybe you don’t need that feature, in this case the rest of the interface should be ok. Otherwise, if you are lucky, your code simply stops compiling because you are using c_str() somewhere in the code. Else, you are using data(), and the code continues compiling just fine because string_view provides data() as well.

This is not a syntactic detail. What should be clear is that the interface of ‘I_dont_know_how_string_will_be_used_but_i_am_cool’ is not seamlessly changed because now the user can just pass in a not null-terminated sequence of characters:

string something = "hello world";
I_dont_know_how_string_will_be_used_but_i_am_cool(string_view{, 5}); // hello

Suppose at some point you call a C-function expecting a null-terminated string (it’s common), then you call .data() on string_view. What you obtain is “hello world\0” instead of what the user expected (“hello”). In this case, you maybe only get a logical error, because the \0 is at the end of the string. In this other case you are not so lucky:

char buff[] = {'h', 'e', 'l', 'l', 'o'};
I_dont_know_how_string_will_be_used_but_i_am_cool(string_view{buff, 5});

Even if uncommon (generally string_view refers to real strings, that are always null-terminated), that’s even worse, isn’t it?

In general, string_view “relaxes” (does not have) that requirement on null-termination (it’s just a wrapper on const char*). Imagine that the DNA, the identity, of string_view is made of both the pointer to the sequence of characters and the number of referred characters (the length of the span). On the other hand, since string::c_str() guarantees that the returned sequence of characters is null-terminated, you can think that the identity of a string is just what c_str() returns – the length is a redundant information (e.g. computable by strlen(str.c_str())).

To conclude this point, replacing const string& with string_view is safe as far as you don’t expect a null-terminated string – if you are using c_str() then you can figure that out at compile time because the code simply not compile, otherwise you are possibly in trouble.

Since we are on the subject: replacing const string& with string_view has also another (minor) consequence because string_view involves some work, that is copying a pointer and a length. The latter is an extra, compared to const string&. That’s just theory. In practice you measure when in doubt.

#3: string = string_view::data() + string_view::size()

From the previous point, it should be evident that wherever you need to create a string from a string_view you have to use both data() and size(), and not only data(). You have to use the DNA of string_view. I have reviewed this error many times:

string_view sv = ...;
string s =; // possibly UB

It does not work in general, for the same reasons I have just showed you (e.g. this constructor of std::string requires a null-terminated sequence of characters).

From C++17 you can just use one of string’s constructors:

string s { sv };

Before C++17, we have to use data() + size():

string s {, sv.size() };

Clearly, as for std::string, you have to do the same for other string types. E.g.:

CString cstr {, sv.size() };
#4: numerical conversions

Although C and C++ provide many functions to perform conversions between a number and a string/C-string (and viceversa), none supports a range of characters (e.g. begin + end, or begin + length). Moreover, every C/C++ conversion function expects the input string to be null-terminated. These facts lead to the conclusion that it does not exist any function able to convert a string_view into a number out of the box. We can use some C/C++ functions, but we have limitations. I’ll show you some in this section.

For instance, using atoi or C++11 functions we fall into traps or undefined behavior:

So, how to properly convert a string_view into a number? Many ways exist, generally motivated by different requirements and compromises. For the rest of this section I’ll refer only to int conversions because the end of the story is similar for other numeric types.

Sometimes, although it seems counterintuitive, to fulfill the null-termination requirement we can create an intermdiate std::string (or char array):

Actually, having a std::string we can rely on any C and C++ conversion function. Such intermdiate step of copying into a std::string is sometimes affordable because certain numeric types – like int – have a small number of maximum digits (e.g. int is 11). As far as the char sequence really contains one of such little data, the resulting std::string will be created without allocating dynamic memory thanks to SSO (Small String Optimization). Clearly, that shortcut does not hold for bigger numeric types and in general is not portable.

Other fragile solutions I encountered were based on sscanf and friends:

In some cases this code does not behave how we expect – e.g. when the converted value overflows and when the sequence contains leading whitespaces. Although I don’t recommend this approach, compared to the previous one, it only allocates a fixed amount of characters (e.g. 24) on the stack.

In many other cases, the approach is strictly based on how string_view is employed. This means that we have to make some assumptions. For example, suppose we write a parser for urls where we assume that each token is separated by ‘/’. Since atoi and strtol stops on the last character interpreted, if the whole url is both well-formed and stored into a null-terminated string (assumptions/preconditions) we can use such functions quite safely:

Basically, we assumed that the character past the end of any string_view is either a delimiter or the null-terminator. Pragmatically, many times we can make such assumptions, even if they distance our solution from genericity.

So, I encountered code like that:

In this example we use strtol to read an int and then we return the rest of the string_view. We basically try to “consume”  an int from the beginning of the string_view.

Note that C and C++ conversion functions have more or less relaxed policies on errors (mainly for performance reasons). For instance, if the conversion cannot be performed, strtol returns 0 and if the representation overflows, it sets errno to ERANGE. Instead, in the latter case the return value of atoi is undefined. What I really mean is that if you decide to use such functions then you are going to accept the consequences of their limitations. So, just pay attention to such limitations and take actions if needed. For example, a more defensive version of the previous code is:

The fact that it makes sense to check against the null-terminator (if (*entrPtr != 0)) is the fundamental assumption we made here. Generally such assumption is easy to make. Scenarios like this, instead:

string whole = "12345";
parse_int ( {, 3}, i );

are still not covered, because the length of the string_view is not taken into account. For this, we have at least three options: create and use an intermdiate std::string (or use a std::stringstream – however only std::string benefits the SSO), improve the sscanf-based solution that somehow uses such information, or write a conversion function manually. It’s quite clear that C++ lacks a set of simple functions to convert char ranges to numbers easily, efficiently and with a robust error handling.

Actually, I think the most elegant, robust and generic solution is based on boost::spirit:

However, if you don’t already depend on boost, it’s quite inconvenient to do just for converting strings into numbers.

We have a happy ending, though. Finally, C++17 fills this gap by introducing elementary string conversion functions:

This new function will just convert the given range of characters into an integer. It is locale-independent, non-allocating, and non-throwing. Only a small subset of parsing policies used by other libraries (such as sscanf) is provided. This is intended to allow the fastest possible implementation. Clearly, overloads for other numeric types are provided by the standard.

To be thorough, here is an example of the opposite operation, using to_chars:

Both to_chars and from_chars return a minimal output which contains an error flag and a pointer to the first character at which the parsing stopped (e.g. something like what is written into endPtr in the strtol example).

Are you already looking forward to putting your hands on them?!


Here is wrap-up of the main points we covered in this post:

  • string_view is a smart const char*: an object that refers to a constant sequence of characters, keeps track of its length and provides any const function of std::string;
  • just like a reference or a pointer, you have to pay attention to storing string_view as a member variable;
  • string_view’s DNA is both the char sequence and the length:
    • the pointed sequence of characters is not necessarily null-terminated (e.g. c_str() does not exist);
    • whenever you need to copy the content of a string_view into a string(-like container), you have to use both;
  • bear in mind that replacing const string& with string_view implies the user can start passing not null-terminated strings into your functions (just ask yourself if that makes sense);
  • To convert a string_view into a number:
    • pre-C++17: use boost::spirit if you can, agree to compromises and use C/C++ functions with their limitations, or roll some utilities yourself;
    • since C++17: use from_chars.
  • string_view is already available in:
    • Microsoft Visual Studio 2017 RC
    • clang HEAD 4.0 (or in 3.8, under the experimental include folder)
    • gcc HEAD 7.0

A couple of weeks ago I found a simple bug in the dusty corners of a production repository I usually work on. A certain feature was rarely used and it seemed to be covered by a test…Actually it was not and some years ago a guy, feeling guarded by this test, changed the code bugging the feature, but nobody complained. Recently a user tried to resume this old feature but it didn’t work as expected.

I don’t want to bother you with all the details of the story but just to give you a bit of context: there was an old piece of code reading a small file through C FILE handles. Some years ago that piece of code was migrated to C++ streams (since files to read were really small and simple) and a silly bug was introduced. Since this bug was really simple I wondered if it was caused by inattention or ignorance, then I had a chat with the programmer who committed the change and I discovered it was caused by the latter reason. The fix was really easy.

Some days later I discussed about this problem with some friends and I realized they were unaware of this problem too. So I decided to write a short post about this story, maybe it is useful to other coders.

Imagine you are using streams to read some data from the standard input. This is what the input looks like:

some words in a line
some words in a line

And then imagine the following code reading that input:

int num; string line;
while ( (cin >> num) && getline(cin, line) )
; // something

Did you spot any problems?

If not, don’t worry, it’s a bit subtle.

Consider the invisible characters contained in the input stream:

some words'\n'

Actually this is not formally true on Windows, but in general you have a LF char at the end of each line. Let’s follow the code flow:

  • cin >> num correctly reads the int, stopping at (in the language of streams: “detecting but not consuming”) ‘\n’
  • getline(cin, line) now reads the next line until it encounters a line separator (‘\n’ by default). But ‘\n’ is still in the stream buffer and then getline returns immediately, storing nothing in line.
  • Again cin >> num is evaluated but this time it fails, because the stream is not fed with an int. failbit is set then. The loop terminates.
  • The user complains because the feature does not work as he expects. Ok, sorry this is not part of the code flow…

We just experienced a difference between operator>> and getline: the first skips any leading whitespace (actually any separator – according to the locale in use) before performing the read operation, instead, the second does not.

Basically, it has to do with the difference between formatted and unformatted input function. Stream operators (like operator>> for strings) belong to the former category, getline to the latter. In short – among other differences – formatted input functions skip leading separators (e.g. whitespaces, LF) by default, unformatted functions do not.

The long story is: both formatted and unformatted functions create basic_istream<CharT>::sentry objects for preparing input streams for I/O (e.g. checking the validity of the stream). One of the operations performed in the sentry constructor is skipping leading whitespaces. For deciding whether skipping or not it uses two information:

  • a bool parameter  passed to the constructor, that is false by default. Don’t get confused: it’s false when you want the sentry object to skip whitespaces (in fact, it’s generally called noskipws – e.g. _Noskip on Visual Studio).
  • ios_base::skipws flag (set or not on the stream object).

If _Noskip is false and the ios_base::skipws is true then leading whitespaces will be skipped.

I am sure you already imagine the rest of the story: when a formatted function creates a sentry, the parameter is left to its default value (false) and since cin‘s ios_base::skipws is true, operations like cin >> i work as expected even if some whitespaces stand in front of the int value. Conversely, unformatted functions create sentry instances by explicitly passing true to the constructor. For this reason the lonely leading ‘\n’ is not discarded.


Beware something about formatted/unformatted functions is changed between C++98 and C++11, in particular about istream& operator>>(streambuf*). In fact in C++98 it was a formatted operation, now it is unformatted.


Why does getline preserve leading separators? I think because it’s kind of raw read operation. Note that if the delimiter is found, it is extracted and discarded (e.g. it is not stored and the next input operation will begin after it). This is important, because it enables such a code to work as expected:

stringstream ss("the first line\nthe second line)"
while (getline(ss, line)) 
{ ... // line does not contain '\n' at the end

How we fixed this issue?

The simplest thing you can do is just:

while ( (cin >> num >> std::ws) && getline(cin, line) )

The left hand side reads the int and then skips leading separators from the stream. std::ws is an input manipulator created for this purpose.

A bunch of other solutions are possible. For example the one using ignore:

while ( (cin >> num).ignore(numeric_limits<streamsize>::max(), '\n') && std::getline(cin, line))

Here we discard as many leading separators as possible, that is until either count characters are discarded, the delimiter (specified by the second parameter) is found or the end of the stream is reached.

Not only is the former solution simple and effective enough, but it also prevents oversights like:

some words

Above the user left an empty line after the first number. The std::ws solution does not break, the ignore one does instead. On the other hand, std::ws solution does not preserve leading whitespaces, if you need them. But it was not our case (you can imagine the final code looked a bit more defensive than the one I showed here, but the main observations hold).

One can also develop a proxy object to allow code like this:

cin >> num >> std::ws >> as_line >> line;

as_line may also embody the std::ws part:

cin >> num >> as_line >> line;

It’s not hard to code such a machinery. For example:

struct lines_proxy
	istream& operator()(string& s)
		return getline(is >> std::ws, s);

	istream& is;

struct line_t {} as_line;

lines_proxy operator>>(istream& is, line_t)
	return{ is };

istream& operator>>(lines_proxy p, string& s)
	return p(s);


while (cin >> num >> as_line >> line)

Just for fun.

It’s over for today. The leading actor of our story was a really silly problem but the lesson learned was interesting: even if streams were designed as a “formatted” abstraction on top of I/O buffers, unformatted operations are still there. Mixing formatted and unformatted operations should be done with care.

This post is just a reminder to myself because I fell for it, again…Well, let me step back and explain the scenario.

Suppose we are using this generic approach for memoization:

template <typename R, typename... Args>
auto memoize(R(*fn)(Args...))
    std::map<std::tuple<Args...>, R> table;
    return [fn, table](Args... args) mutable -> R {
        auto argt = std::make_tuple(args...);
        auto memoized = table.find(argt);
        if(memoized == table.end())
            auto result = fn(args...);
            table[argt] = result;
            return result;
            return memoized->second;

Suppose also – at some point – two new requirements pop up:

  • ability to remove/update some entries of the table – because something changes somewhere else and those results become old,
  • ability to switch to other functions (the table is not needed there).

We decide to store the table and the lambda into a struct. For the sake of simplification, this is a specialized version of such a class using a toy-function simulate:

struct Memoization
        calc = [this](int i)
            auto memoized = table.find(i);
            if(memoized == table.end()) {
                auto result = simulate(i); // <- somewhere
                table[i] = result;
                return result;
            } else {
                return memoized->second;

    function<int(int)> calc;
    std::map<int, int> table;

Even if this design is probably silly, we can easily satisfy both the requirements (e.g. calc can be set to something else, and table is fully accessible).

This code is fine for rapid prototyping, so we don’t refactor it yet but instead we experiment a bit more. For example, we create a factory for creating different Memoization instances depending on some configuration. Each configuration merely results in a new core function to use (e.g. remember the second requirement). Since it’s prototyping, we leave the constructor as is and we instead set the function by hand:

Memoization CreateMemo(const Configuration& config)
   Memoization memo;
   // using config to create memo
   // e.g. memo.calc = ...
   return memo;

We try this code and we note that it behaves differently on two compilers: on Visual Studio 2013 we get a crash at some point while calling calc(), instead on clang it seems to work smoothly. We start debugging and we immediately spot what’s going on…Do you note that we are one step away from falling into a dangling reference problem? Actually, this accident happens on both clang and Visual Studio, but some (un)lucky condition makes this work on the former.

The culprit – in this case – is RVO, but the issue is…both capturing this into calc – that is a member variable – and copying/moving *this.

By capturing this, we have coupled calc to this->table, that won’t change anymore – say when you do move (or copy). this has not special treatment inside the callable object created by the lambda expression. It’s just a Memoization pointer and when the lambda gets moved (or copied), so does the pointer. Shallowly.

In our factory function, RVO is probably working a bit differently between clang and VS. VS is not using RVO here, clang instead uses RVO and the problem is apparently concealed. By disabling RVO on clang (e.g. -fno-elide-constructors), we get the same problem found on VS.

Did you get the point? In case of the original “memoized” version of calc (which captures this), after the move (or the copy), the returned Memoization instance has a reference to the local memo->table, not to the new one. Finally, the local temporary instance is destroyed and we get a dangling reference. This problem is subtle since the code could still “work” under certain conditions – e.g. RVO. It should be clear that copying instead of moving has the same effect.

Maintaining the original design, the problem can be quickly solved, for example by constructing the object directly:

class Memoization
    // same as before

   Memoization(function<int(int)> calcFn)
      : calc(calcFn)
   // ...
// ...

Memoization CreateMemo(const Configuration& config)
   if (config. ...)
      return {...} // won't copy nor move

But the main issue holds and this could lead to disaster:

Memoization m1;
auto m2 = m1; // [this] in m2.calc points to m1

Not only is it dangerous, but also wrong: you probably expect each Memoization instance has its own copy of the table – i.e. for this reason a solution – say – with shared ownership of the table doesn’t fit well.

At the end of the story we changed our design and we came up with another better solution. But this is not the main point of the article. Even if my example is probably goofy, this experience left a valuable lesson: capturing this into a member variable lambda is valid C++ code and may cause headache if we copy/move *this. Sometimes I think we have a duty to set some limits, for preventing traps other people could fall into. For this reason, I came to a couple of observations:

(You understand that moving instead of copying does not twist the meaning, so let me just mention copying so I do not need to write copy/move every time).

First, we usually don’t need to capture this into a member variable lambda at all. Exploring better ways is always more advisable.

Some of you could still complain that C++ is missing an opportunity by letting this behavior go undisturbed. You could expect the compiler magically sets the this pointer to the copied instance, don’t you? Honestly, I have no strong opinions on that, I’m just thinking aloud.

Second: we can judge pragmatically. Capturing this into a member variable lambda and copying *this just do not get along. Doing either is realistically better than adding some special treatment.

For this reason, I see a terse idiom: either capture this into a member variable lambda or copy *this (or neither).

This is eventually another subtle case the Rule of Zero does not cover and – in case you cannot live without capturing this – I think deleting copy and move operations in the host class is such a desirable design decision to apply (and to document?) – being understood that maybe you don’t need to capture this into a member variable lambda at all.

Last month Marco Foco and I facilitated a workshop on refactoring legacy C++ code. It was an improved version of the same workshop we presented at the Italian Agile Day in November, with Gianluca Padovani as well.

To give you a bit of context, some days before the attendees cloned a certain Git repository we indicated and they compiled the code (by using CMake to generate the projects in their environment) on their machines. The workshop was divided in 4 parts, each one focusing on a C++ theme. They were: productivity, memory management, algorithms and generic programming. During each part, we first spent 10 minutes explaining a few C++11/14 concepts and then we gave 25 minutes to work on some refactoring exercises. At the end of each part, a brief retrospective.

In this post I’m going to describe three pitfalls attendees fell into, related to initializer_list  at least at first sight.

The workshop code was a simple version of Yahtzee, the famous dice game. It was test-driven and, among others, we wrote a test suite covering the score calculation. For instance, suppose a player rolls the dice getting:

2 2 3 3 3

She gets a full house, or 25 points. A class is responsible for recognizing and calculating this kind of stuff. To roll the dice, another class is involved, a sort of “IDiceRoller”. It is an interface that provides only one function:

virtual void Roll(int(&dice)[5]) = 0;

In the test suite, we implemented a simple fake object (not a mock – that could be an exercise) to manipulate and control the dice. Imagine:

class FakeDiceRoller : public IDiceRoller
  FakeDiceRoller() { ... set _dice[i] = 1 ... } 
  void AssignDiceValues(int values[5]) { ... copy values to _dice ... }

  void Roll(int(&dice)[5]) { ... copy _dice to dice ... }
  int _dice[5];

// in a test fixture

// FakeDiceRoller _roller is a member variable

int dice[5] = {1,1,2,3,4};

FakeDiceRoller had intentionally a poor interface and design. The point was: how could you improve it? Suppose you couldn’t change the interface of the domain interface IDiceRoller, as – likely – in the real life. The first series of exercises was about productivity. Sure, AssignDiceValues was one of the point we wanted participants to think about, At some point they gave a try:


They compiled and…they failed.

“Cannot convert initializer list argument to ‘int*'”.

People started trying to figure out why initializer_list was not covertible to int[]…

“This has nothing to do with initializer_list”. I stated. “This is the language and it’s saying the function parameter int values[5] is just int* dice, you cannot initialize int* from {1,2,3,4,5}”.

Then a gentleman took the floor and shouted “use the same signature as Roll, accepting a const reference to array instead of a non-const reference”. That was:

void AssignDiceValues(const int(&values)[5])
// usage

It was fine.

But we were more subtle. Now the attendees started merging two lines in one, getting code like the following:


Do you spot any problem here?

Now FakeDiceRoller‘s _dice is:

[1, 1, 1, 1, 0]

This is because the language zero initializes missing ints.

It happended the test at issue checked a poker of ones. And ok, we had four ones. It happended also the test was a bit wrong, because it didn’t check the 5th value of the dice (say it was a 3, the test had to check we scored a poker AND a 3). Who wrote the test made a mistake. C’est la vie.

Can our test environment prevent us doing this kind of imprudence? Now this simple requirement can be translated to a C++ exercise: we want to inline-initialize an array of strictly N elements. In short, this should fail under our conditions:

void AssignDiceValues(const int(&values)[5]);
_roller.AssignDiceValues({1,2,3,4}); // missing last die

How can this be done? I give you 3 attempts. Other solutions are possible, here I want the simplest approaches possible. The good news was that many people at the workshop suggested the last, that I think is the best.

Attempt #1: initializer_list

Since initializer_list contents is sculpted in the code – at compile-time – its .size() function should be constexpr, shouldn’t it? Yes, but from C++14:

void AssignDiceValue(initializer_list<int> values)
   static_assert(values.size() == 5, "Please provide exactly 5 values"); // only from C++14
   copy(values.begin(), values.end(), _dice);

Actually this doesn’t work yet, as a reader commented.

Attempt #2: strict_array

template<typename T, size_t N>
struct strict_array : array<T, N>
   template<typename... V>
   strict_array(V... vals) // no &&/forward to simplify
      : array<T, N>( {vals...} )
     static_assert(sizeof...(vals) == N, "Please provide exactly 5 values");

void AssignDiceValues(const strict_array<int, 5>& values);
_roller.AssignDiceValues({1,2,3,4,5}); // ok
_roller.AssignDiceValues({1,2,3,4}); // static_assert fires
_roller.AssignDiceValues({1,2,3,4,5,6}); // static_assert fires and array<int, 5> constructor complains

Attempt #3: just the language

We don’t want to add complexity to my framework. We don’t need static_assert nor new bizarre array types. We can use the language and bring my requirement out by design. Just needing to add a tiny level of abstraction:

class Die
   int value;
  Die(int val) : value(val) {} // mandatory
  // operator int() { return value; } // if really needed

void AssignDiceValues(const array<Die, 5>& values);
_roller.AssignDiceValues({1,2,3,4,5}); // ok
_roller.AssignDiceValues({1,2,3,4}); // ko
_roller.AssignDiceValues({1,2,3,4,5,6}); // ko

This is the solution I like the most. It’s a design decision.

Two main points about initializer_list to remember when you refactor legacy “initialization” code:

  • int arr[] is int*. Don’t expect the language to magically deduce an initializer_list
  • initializer_list‘s size() is constexpr only from C++14.

Next. Another task where initialization was involved regarded the game configuration: a game could be configured with a few options. Since the codebase was an hybrid of old C++ and modern C++, a tuple was employed.

YathzeeGame game ( make_tuple(5, 6, 2) ); // 5 dice, [1..6] values, 2 players

A gentleman spotted the following in the dark corners of the codebase:

vector<YahtzeeGame> games;
games.push_back(make_tuple(5, 6, 2));
games.push_back(make_tuple(5, 6, 3));
games.push_back(make_tuple(5, 6, 4));
// other stuff

Excited about C++11, he tried to refactor:

vector<YahtzeeGame> games = { {5, 6, 2}, {5, 6, 3}, {5, 6, 4} };

And does it compile?



I’m kidding you!

I rephrase: do the following statements compile?

YahtzeeGame game { 5, 6, 2 }; or YahtzeeGame game { {5, 6, 2} };

No, they don’t neither. Some participants asked “why is not initializer_list supported here?”.

“initializer_list is not guilty”, I replied. First: how can one expect an initializer_list to be used to contstruct a tuple? initializer_list is – by-definition – homogeneous! Just the opposite of tuple. tuple should have a constructor taking…initializer_list<?>. Some people started likening tuple to pair: “I can do it with pair”.

Yes, you can do with pair because the real reason is tuple’s constructor, that is explicit, and – as you know – copy initialization considers only non-explicit constructors. That is, you can do:

tuple<int, string, foo> t {10, "hello", {fooArg1, fooArg2}};

But not:

tuple<int, string, foo> t = {10, "hello", {fooArg1, fooArg2}};


tuple<int, string, foo> make_my_tuple() {

   return {10, "hello", {fooArg1, fooArg2}};


So you may refactor the initial code by adding make_tuple:

vector<YahtzeeGame> games = { make_tuple(5, 6, 2), make_tuple(5, 6, 3), make_tuple(5, 6, 4) };

Also here initializer_list was in the clear. When something is wrong with initialization, since curly braces initialization (aka uniform initialization) and initializer_list share the same syntax, and since almost all the standard containers support initializer_list construction, someone could jab at this type. As a reader commented, N4387 proposes (among other stuff) getting rid of this limitation.

The third and last example is another story.

To calculate the scores, a class with a CalculateScores function was provided. This function was monolithic, imagine a big if cascade:

if (...single dice...) {
if (...pair dice...) {
if (...tris dice...) {
if ( {

We proposed to decouple this function and make it modular. This way one can create several versions of the game, for example one with no special points (e.g. no full, poker, straight), another with extra points, etc. People designed a simple IRule interface, providing a function:

virtual void Apply(const GameState& state, ScoreTable& scores) = 0;

ScoreTable was already in the code and it just stored the results of the calculation. The idea was to apply rules in chain. Straightforward.

A funny anecdote: at some point I asked “how can you improve this if cascade?”. One person replied “we can use a switch-case”. I responded: “yes but…it’s pretty much the same. What can we do from a design point of view to make this code more modular?” Another guy said “we can design an interface and several concrete classes”. And suddenly the person who proposed the switch-case got up and left the room! Ouch…is an interface so bad?!

No more chatting! People coded this interface, created the rules and…they had to store them somewhere. They opted for a vector of unique_ptrs:

vector<unique_ptr<IRule>> rules;

And they serenely wrote:

vector<unique_ptr<IRule>> rules = { make_unique<Single>(), make_unique<Double>(), make_unique<Tris>(), make_unique<Full>(), ... };

And they got impatient for testing their code, having unit tests from their side – contrary to what they have at work 🙂

I felt a tremor in the force…

“Noooo. Another compiler error” 😦

Said desperate programmers whining from the trenches.

“call to deleted constructor of ‘std::unique_ptr<Single, std::default_delete<Single> >”

“What the fuck?” Some of them kindly complained!

This time, they really made initializer_list fell guilty. And this time they were right. initializer_list doesn’t support move-only types. The main reason is that its begin() and end() return const pointers. There is a proposal to address this issue and several smart guys advanced their idioms – for example here.

As before, I wanted a simple solution for my modern C++ novices, to let them play and experience with C++11/14. I seized the moment: “guys, let’s do a simple exercise with variadics “:

auto rules = CreateVector<unique_ptr<IRule>>( make_unique<Single>, make_unique<Double>(), ... );

The idea was very simple and so was the implementation:

template<typename T, typename H>
void CreateVectorImpl(vector<T>& v, H&& single)

template<typename T, typename H, typename... Tail>
void CreateVectorImpl(vector<T>& v, H&& head, Tail&&... tail)
   CreateVectorImpl(v, forward<Tail>(tail)...);

template<typename T, typename... Tail>
vector<T> CreateVector(Tail&&... tail)
   vector<T> v;
   CreateVectorImpl(v, forward<Tail>(tail)...);
   return v;

Alessandro Vergani (who were there to help us) sent me this (specific – but slick) solution:

template <typename Type>
void setup_rules(vector<unique_ptr<IRule>>& v)

template <typename Type, typename Type2, typename... OtherTypes>
void setup_rules(vector<unique_ptr<IRule>>& v)
   setup_rules<Type2, OtherTypes...>(v);

template<typename... Types>
vector<unique_ptr<IRule>> CreateRules()
   vector<unique_ptr<IRule>> rules;
   return rules;

// usage: auto rules = CreateRules<Single, Double, Poker>();

Wrapping up the story, I believe people at the workshop tried to burden initializer_list too much. They got errors on something related to initialization with curly braces and they accused initializer_list. In the first case, the main misunderstanding was related to the language itself: int[] is just int*, in C++11 as in C++98. Rectifying was quite simple, by using a const reference to an array. initializer_list doesn’t have to do with that, not even here. It’s the language. And just by using the language we addressed the other requirement about prohibiting “uninitialized” dice. Here, some people thought they could just static_assert initializer_list’s size. I deem it’s not worth.

At first sight, the second case is even more related to initializer_list, because every container is constructible from initializer_list. Why tuple differs? If people don’t think about the mathematical difference between initializer_list (aka: homogeneous) and tuple (aka: heterogeneous) they can fall into a trap. Pair is the same story, but curly braces are because of uniform initialization. And pair’s  constructor is not explicit, thus copy initialzation is possible and the trap is just veiled.

In the last example, initializer_list tried to escape through the window, but this time it couldn’t. Imagine initializer_list as arrays, globally stored somewhere. Even if they are (maybe) used only once (in the line you perform the initialization), the compiler is solely responsible for their state. We know there are many workarounds but I’d really like having an official feature in the standard to address this issue (e.g. N4166).

[Note: thanks to Davide Di Gennaro for having reviewed this post and for having suggested some improvements. A paragraph has been completely written by Davide.]

[From Wikipedia] In programming, named parameters refer to a computer language’s support for function calls that clearly state the name of each parameter within the function call itself. A function call using named parameters differs from a regular function call in that the values are passed by associating each one with a parameter name, instead of providing an ordered list of values.

Compare with a traditional positional function call:

createArray(10, 20); // what does this mean, precisely?
createArray(length=10, capacity=20); // oh, I see...
createArray(capacity=20, length=10); // same as previous

A typical example:

// some pseudo-language
window = new Window {
   xPosition = 10,
   yPosition = 20,
   width = 100,
   height = 50

This approach is especially useful if a function takes a large number of optional parameters, and users are usually going to accept the default for most of them.
Several languages support named parameters (e.g. C#, Objective-C, …). C++ does not. In this post, I’m going to explore some of the classical ways to emulate named parameters in C++ as well as mention new approaches.


Only to mention, the most trivial way to emulate named parameters is through comments 🙂

Window window {
 10, // xPosition
 20, // yPosition
 100, // width
 50 // height

This approach is very popular among windows developers, as MSDN pubishes Windows API usage examples with such comments.

Named Parameter Idiom

Imported from Java programming style is the Named parameter idiom (and this is just another post about). The idea is simple: create a proxy class which houses all the parameters. Optional ones are settable via a fluent interface (e.g. by using method chaining):

// 1
File f { OpenFile{"path"} // this is mandatory
   . ... };

// 2 classical version (not compliant with the auto-everything syntax)
File f = OpenFile { ... }
   ... ;

// 3 for auto-everything syntax just add a layer (I prefer 1)
auto f = CreateFile ( OpenFile("path")
 . ... ));

OpenFile is a sort of parameter object and File’s constructor accepts an OpenFile instance. Some authors (for instance, here) argue that OpenFile should have only private members and declare File as friend. This makes sense if you want to add more complex logic to set your parameters via the fluent interface. If you merely want to set plain parameters then a public struct suffices.

In this approach:

  • mandatory parameters are still positional (as the constructor of OpenFile is a regular function call),
  • optional parameters must be copy/move-assignable (basically, settable) – possible runtime penalty,
  • you need to write an extra class (the proxy).

I – almost – never found usages in critical code.

A final note: the fluent interface can be polymorphic as I wrote more than two years ago.

Parameter Pack Idiom

Similar to the one above it’s the parameter pack idiom – from Davide Di Gennaro’s Advanced C++ Metaprogramming – a technique using proxy objects to set parameters via assignment operator (=), resulting in a sweet syntactic sugar:

MyFunction(begin(v), end(v), where[logger=clog][comparator=greater<int>()]);

The actors involved here are:

  1. logger and comparator are global constants; the assignment operator just returns a wrapped copy of the assigned value,
  2. where is a global constant of type “parameter pack”, whose operator[] just returns a new proxy that replaces one of its members with the new argument.

In symbols:

where = {a, b, c }
where[logger = x] → { a,b,c }[ argument<0>(x) ]  →   {x,b,c}

Sketching an implementation, just to give you an idea:

// argument
template <size_t CODE, typename T = void>
struct argument
   T arg;
   argument(const T& that)
      : arg(that)

// void argument - just to use operator=
template <size_t CODE>
struct argument<CODE, void>
   argument(int = 0)
   template <typename T>
   argument<CODE, T> operator=(const T& that) const
     return that;
   argument<CODE, std::ostream&> operator=(std::ostream& that) const
      return that;

// argument pack (storing the values)
template <typename T1, typename T2, typename T3>
struct argument_pack
   T1 first;
   T2 second;
   T3 third;
   argument_pack(int = 0)
   argument_pack(T1 a1, T2 a2, T3 a3)
     : first(a1), second(a2), third(a3)
   template <typename T>
   argument_pack<T, T2, T3> operator[](const argument<0, T>& x) const
      return argument_pack<T, T2, T3>(x.arg, second, third);
   template <typename T>
   argument_pack<T1, T, T3> operator[](const argument<1, T>& x) const
      return argument_pack<T1, T, T3>(first, x.arg, third);
   template <typename T>
   argument_pack<T1, T2, T> operator[](const argument<2, T>& x) const
      return argument_pack<T1, T2, T>(first, second, x.arg);

enum { LESS, LOGGER };
const argument<LESS> comparator = 0;
const argument<LOGGER> logger = 0;
typedef argument_pack<basic_comparator, less<int>, std::ostream> pack_t;
static const pack_t where(basic_comparator(), less<int>(), std::cout);

For the complete code, please refer to Davide’s book.

While this technique may look interesting, in practice it’s hard to generalize. In the book, in fact, it’s included as an example of “chaining” multiple calls to operator[].


Andrzej Krzemieński published an interesting post Intuitive interface, where he mentions an alternative approach.
Named parameters are introduced by companion tags (empty structs used just to select different overloads of the same function). Notable examples of tags are from the STL:

std::function<void()> f{std::allocator_arg, a}; // treats a as an allocator instead of a callble object
std::unique_lock<std::mutex> l{m, std::defer_lock}; // don't lock now

Andrzej proposes tags to improve readability:

// not real STL
std::vector<int> v1(std::with_size, 10, std::with_value, 6);

I like his approach, but – as it stands – you possibly need to create lots of overloads and you cannot choose the order of the parameters. However, there are no requirements on copy-assignment, default values and forwarding is also clear. From the article: “However, tags are not an ideal solution, because they pollute the namespace scope, while being only useful inside function (constructor) call.”

Additionally (from the comments to the article) a reader proposes a slightly different idea that uses a proxy:

std::vector<int> v1(std::with_size(10), std::with_value(6));


Boost has the Parameter Library.

It’s possibly the most complete option if you really need named parameters in C++. An example:

// class code
#include <boost/parameter/name.hpp>
#include <boost/parameter/preprocessor.hpp>
#include <string>


   (int), // the return type of the function, the parentheses are required.
   function_with_named_parameters, // the name of the function.
   tag, // part of the deep magic. If you use BOOST_PARAMETER_NAME you need to put "tag" here.
   (required // names and types of all required parameters, parentheses are required.
      (foo, (int))
      (bar, (float))
   (optional // names, types, and default values of all optional parameters.
      (baz, (bool) , false)
      (bonk, (std::string), "default value")
   if (baz && (bar > 1.0)) return foo;
      return bonk.size();

//client code
function_with_named_parameters(1, 10.0);
function_with_named_parameters(7, _bar = 3.14);
function_with_named_parameters( _bar = 0.0, _foo = 42);
function_with_named_parameters( _bar = 2.5, _bonk= "Hello", _foo = 9);
function_with_named_parameters(9, 2.5, true, "Hello");

Modern named parameters

Modern C++ opened some new doors. Can the new language features lead to slimmer implementations of named parameters?


Method chaining is verbose. I don’t like adding all the functions returning the object itself. What about defining just a struct and assign all the members through a lambda?

struct FileRecipe
   string Path; // mandatory
   bool ReadOnly = true; // optional
   bool CreateIfNotExist = false; // optional
   // ...

class File
   File(string _path, bool _readOnly, bool _createIfNotexist)
      : path(move(_path)), readOnly(_readOnly), createIfNotExist(_createIfNotExist)

   string path;
   bool readOnly;
   bool createIfNotExist;

auto file =  CreateFile( "path", [](auto& r) { // sort of factory
   r.CreateIfNotExist = true;

You still have to provide a parameter object but this approach scales quite better than the classical named parameter idiom in which even chaining functions have to be written.

A variant consists in making File constructible from a FileRecipe (like the named parameter idiom).

How to improve the fluency of mandatory parameters? Let’s mix this approach with tags:

auto file =  CreateFile( _path, "path", [](auto& r) {
   r.CreateIfNotExist = true;

But they are still positional. If you rest easy with a runtime error if a mandatory parameter is missing then use an optional type and check it at runtime.

CreateFile is trivial and left to the reader.

I’ve recently used this approach to configure test recipes and mocks. For example, I needed to create tests of a trivial dice game. Every game had a configuration and tests used to look like:

TEST_F(SomeDiceGameConfig, JustTwoTurnsGame)
   GameConfiguration gameConfig { 5u, 6, 2u };

By using the approach above we could have:

TEST_F(SomeDiceGameConfig, JustTwoTurnsGame)
   auto gameConfig = CreateGameConfig( [](auto& r) {
       r.NumberOfDice = 5u;
       r.MaxDiceValue = 6;
       r.NumberOfTurns = 2u;

Diehards may suggest a macro to reduce verbosity:

TEST_F(SomeDiceGameConfig, JustTwoTurnsGame)
   auto gameConfig = CREATE_CONFIG(
       r.NumberOfDice = 5u;
       r.MaxDiceValue = 6;
       r.NumberOfTurns = 2u;

Exploiting Variadics

Variadics can improve techniques I described above. What about Andrej’s tags approach? Tags could be preferred over the lambda + parameter object because you don’t have to create another object, you don’t have problems with settability and you consider all the parameters the same (e.g. by using the lambda approach you have to treat mandatory parameters differently). But I think tags would be better, if I could:

  • define only one overload of my constructor (or function),
  • decide the order of the parameters (pairs tag-value),
  • the two above + having optional and mandatory parameters.

Something simple like:

File f { _readonly, true, _path, "some path" };

or (my preference):

File f { by_name, Args&&... args) {} 

My idea is: I just want to use variadics to let the user decide the order and let her omit optional parameters.

Imagine two constructors:

File(string path, bool readonly, bool createIfNotExist) {} // all mandatory

template<typename... Args>
File(by_name_t, Args&&... args) {}

A instance of File can be created by using both. If you use the variadic one then I’ll look for all parameters in the pack and delegates the other constructor to really make the instance. Search is (at compile-time) linear over the pack that contains parameters in the order chosen by the caller.

[Note: my implementation is just a proof of concept I did more than one year ago (I only added decltype(auto) somewhere). It could be done better and better.]

Here is how the class designer may look at her class:

File(string path, bool readonly, bool createIfNotExists /*...*/)
   : _path (move(path)), _createIfNotExist(createIfNotExist), _readonly(readonly) // ,etc...

template<typename Args...>
File(named_tag, Args&&... args)
   : File{ REQUIRED(path), OPTIONAL(read, false) // , etc... } // delegating

Prior to show you a working code, it’s clear we can apply the same idea to proxies and obtain:

auto f = File { by_name, readonly=true, path="path" };

The real difference here is about forwarding: with proxies, we benefit from the syntax sugar (the operator=) but now we have to store the values and forward them (not ideal for non-movable/copyable types – and other problems could arise).

Here you can play with the code (and here is the same file on Gist). I first started with the tag version and then I tried with proxies. For this reason there are two versions: the former works with tags ([tag, value]…) and the latter with proxies ( [tag=value]…). Some code could (and should) be refactored.

You’ll find two sections called “PACK UTILS” (two versions: tag and proxy). These contain code I wanted to play originally (e.g. playing with variadics). I also think these kind of operations can be done by using std::forward_as_tuple and then by exploiting tuple’s utilities.

Another part of the code contains macros to retrieve parameters and to generate tags.

The final section is a full example.

Here is what a class looks like:

class window
    // classic constructor
    window( string pTitle, int pH, int pW,
    int pPosx, int pPosy, int& pHandle)
       : title(move(pTitle)), h(pH), w(pW), posx(pPosx), posy(pPosy), handle(pHandle)

    // constructor using proxies (e.g. _title = "title")
    template<typename... pack>
    window(use_named_t, pack&&... _pack)
       : window { REQUIRED_NAME(title), // required
                  OPTIONAL_NAME(h, 100), // optional
                  OPTIONAL_NAME(w, 400), // optional
                  OPTIONAL_NAME(posx, 0), // optional
                  OPTIONAL_NAME(posy, 0), // optional
                  REQUIRED_NAME(handle) } // required

    // constructor using tags (e.g. __title, "title")
    template<typename... pack>
    window(use_tags_t, pack&&... _pack)
       : window { REQUIRED_TAG(title), // required
                  OPTIONAL_TAG(h, 100), // optional
                  OPTIONAL_TAG(w, 400), // optional
                  OPTIONAL_TAG(posx, 0), // optional
                  OPTIONAL_TAG(posy, 0), // optional
                  REQUIRED_TAG(handle) } // required

  string title;
  int h, w;
  int posx, posy;
  int& handle;

You see, both named and tag constructors always delegate the real constructor to perform initialization.

The following code fragment shows how the caller uses the contraption:

int i=5;
// tags version
window w1 {use_tags, __title, "Title", __h, 10, __w, 100, __handle, i};
cout << w1 << endl;

// proxies version
window w2 {use_named, _h = 10, _title = "Title", _handle = i, _w = 100};
cout << w2 << endl;

// classic version
window w3 {"Title", 10, 400, 0, 0, i};
cout << w3 << endl;

by_name here is called use_named, but the meaning is the same.


  • mandatory and optional parameters are uniform (named or tagged)
  • order is not defined a priori
  • tag approach has no forwarding issues


  • compile-time errors could be hard to understand (static_assert helps a bit)
  • available parameters should be documented
  • pollution of the namespace scope still remains
  • default-values are always evaluated (some improvements for laziness are possible)
  • proxy approach is not ideal for forwarding.

A note about the first trouble: Clang is a gentleman and it complains politely. For instance, suppose I forget a title for my window. Here is the output:

main.cpp:28:2: error: static_assert failed "Required parameter"
        static_assert(pos >= 0, "Required parameter");
        ^             ~~~~~~~~
main.cpp:217:14: note: in instantiation of template class 'get_at<-1, 0>' requested here
                :       window { REQUIRED_NAME(title),

This way you know precisely where you miss a required parameter. This could be improved.

A minimalistic approach using std::tuple

[Note: completely by Davide Di Gennaro]

We can exploit some of the power of std::tuple to write an extremely compact and portable implementation. We will stick to some simple principles:

  • the parameter pack will be a special tuple, where a “tag type” is immediately followed by its value (so the type would be something like std::tuple<age_tag, int, name_tag, string, … >)
  • the standard library already has utility functions to forward / concatenate objects and tuples that guarantee optimal performance and correctness in move semantics
  • we will use a macro to introduce global constants that represent a tag
  • the syntax for constructing a parameter pack will be (tag1=value1)+(tag2=value2)+…
  • the client will take a parameter pack as a reference to template type, e.g.
template <typename pack_t>
void MyFunction([whatever], T& parameter_pack)      // or const T&, T&&, etc.
  • With a function call, the client will extract a value from the pack and (say) move it into a local variable.

Ideally, here’s how the code will look like:

namespace tag
   CREATE_TAG(age, int);
   CREATE_TAG(name, std::string);

template <typename pack_t>
void MyFunction(T& parameter_pack)
   int myage;
   std::string myname;
   bool b1 = extract_from_pack(tag::name, myname, parameter_pack);
   bool b2 = extract_from_pack(tag::age, myage, parameter_pack);
   assert(b1 && myname == "John");
   assert(b2 && myage == 18);

int main()
   auto pack =  (tag::age=18)+(tag::name="John");

Here is how the implementation may look like. We will omit most of the potential optimizations for sake of clarity (and they are probably unnecessary).

First, the macro:

#include <tuple>
#include <utility>

template <typename T>
struct parameter {};

#define CREATE_TAG(name, TYPE) \
   struct name##_t \
   { \
      std::tuple<parameter<name##_t>, TYPE> operator=(TYPE&& x) const \
      {  return std::forward_as_tuple(parameter<name##_t>(), x); } \
      name##_t(int) {} \
}; \
const name##_t name = 0

The expansion of CREATE_TAG(age, int); creates a class and a global object. Note that this will work if positioned inside a namespace.

struct age_t
   std::tuple<parameter<age_t>, int> operator=(int&& x) const
      return std::forward_as_tuple(parameter<age_t>(), x);
   age_t(int) {}

const age_t age = 0;


Conceptually the assignment

age = 18

Translates into something similar to:

make_tuple(parameter<age_t>(), 18);

Observe that we wrote:

std::tuple<parameter<age_t>, int> operator=(int&& x) const

As written, we require an r-value on the right. First, this is an extra safety feature: to increase the readability of the code with parameter packs, you may want to assign constants, not variables (otherwise, renaming the variable would be sufficient). e.g.

int myage = 18;
f(myage); // ok, clear

g((...) + (age=18)); // ok, clear
g((...) + (age=myage)); // compiler error, and redundant from a readability point of view

Second, we can exploit move semantics:

The difference between

std::tuple<parameter<age_t>, int> operator=(int&& x) const
   return std::make_tuple(parameter<age_t>(), x);


std::tuple<parameter<age_t>, int> operator=(int&& x) const
   return std::forward_as_tuple(parameter<age_t>(), x);

is very subtle. The latter returns std::tuple<…, int&&>, but since the return type is tuple<…, int> then tuple’s move constructor is invoked.
Alternatively we could write

std::tuple<parameter<age_t>, int> operator=(int&& x) const
   return std::make_tuple(parameter<age_t>(), std::move(x));

Now, we add a suitable tuple-concatenation operator.

We informally agree that all tuples starting with parameter<T> have been generated by our code, so without any explicit validation, we just cat them:

template <typename TAG1, typename... P1, typename TAG2, typename... P2>
std::tuple<parameter<TAG1>, P1..., parameter<TAG2>, P2...>
operator+ (std::tuple<parameter<TAG1>, P1...>&& pack1, std::tuple<parameter<TAG2>, P2...>&& pack2)
    return std::tuple_cat(pack1, pack2);

Very simply, this function will do a simple pattern matching on two tuples: if they both look like:

tuple<parameter<tag>, type, [maybe something else]>

then they are joined together.

Finally, we publish a function to perform the extraction of an argument from the pack. Note that this function has move semantics (i.e. after a parameter is moved out of the pack).

template <typename TAG, typename T, typename... P, typename TAG1>
bool extract_from_pack(TAG tag, T& var, std::tuple<parameter<TAG1>, P...>& pack);

the effect of this function is:

if the “pack” contains parameter<TAG>, then var receives the value immediately following, and the function returns true. otherwise something bad happens (we can choose between: a compiler error, return false, throw exception, and a few more…)

To make this selection possible, actually the function will be:

template <typename ERR, typename TAG, typename T, typename... P, typename TAG1>
bool extract_from_pack(TAG tag, T& var, std::tuple<parameter<TAG1>, P...>& pack)

So we will invoke it as:

extract_from_pack< erorr_policy > (age, myage, mypack);

Due to variadic templates pattern matching, extract_from_pack knows that the pack has the form tuple<parameter<TAG1>, … > so it needs to examine recursively if TAG is equal to TAG1. We will do this dispatching the call to a class:

extract_from_pack< erorr_policy > (age, myage, mypack);


extractor<0, erorr_policy >::extract (age, myage, mypack);

which in turn calls

extractor<0, erorr_policy >::extract (age, myage, std::get<0>(pack), mypack);

which has two overloads:

extract(TAG, … , TAG, …)

which succeeds, performs the assignment and returns true, or

extract(TAG, … , DIFFERENT_TAG, …)

which keeps on iterating, calling again

extractor<2, erorr_policy >::extract (age, myage, mypack);

when iteration is not possible, error_policy::err(…) is invoked.

template <size_t N, typename ERR>
struct extractor
   template <typename USERTAG, typename T, typename TAG, typename... P>
   static bool extract(USERTAG tag, T& var, std::tuple<parameter<TAG>, P...>&& pack)
      return extract(tag, var, std::get<N>(pack), std::move(pack));

   template <typename USERTAG, typename T, typename TAG, typename... P>
   static bool extract(USERTAG tag, T& var, parameter<TAG> p0, std::tuple<P...>&& pack)
      return extractor<(N+2 >= sizeof...(P)) ? size_t(-1) : N+2, ERR>::extract(tag, var, std::move(pack));

   template <typename USERTAG, typename T, typename... P>
   static bool extract(USERTAG tag, T& var, parameter<USERTAG>, std::tuple<P...>&& pack)
      var = std::move(std::get<N+1>(pack));
      return true;

template <typename ERR>
struct extractor<size_t(-1), ERR>
   template <typename TAG, typename T, typename DIFFERENT_TAG, typename... P>
   static bool extract(TAG tag, T& var, std::tuple<parameter<DIFFERENT_TAG>, P...>&& pack)
   { return ERR::err(tag); }

template <typename ERR, typename TAG, typename T, typename... P, typename TAG1>
bool extract_from_pack(TAG tag, T& var, std::tuple<parameter<TAG1>, P...>& pack)
   return extractor<0, ERR>::extract(tag, var, std::move(pack));

Due to the flexible nature of parameter packs, the best error policy would be a plain “return false” (any stronger error would in fact make that parameter mandatory). So:

struct soft_error
   template <typename T>
   static bool err(T)
      return false;

However we are free to chose any of these:

struct hard_error
   template <typename T>
   static bool err(T); // note that static_assert(false) here won’t work. can you guess why?

struct throw_exception
   template <typename T>
   static bool err(T)
      throw T();
      return false;

An additional improvement could be a redundancy check, that prevents code like (age=18)+(age=19).

The code for this is short, but it requires some subtle manipulation with variadic templates, so we leave it as an excercise.

Final notes

I have not discussed about runtime techniques, e.g.:

void MyFunction ( option_parser& pack )
   auto name = pack.require("name").as<string>();
   auto age = pack.optional("age", []{ return 10; }).as<int>();

Whereas the opening techniques I have presented are fairly consolidated, last ideas are just work in progress and I’m still trying to understand if they make sense. The code I’ve shown is just a proof of concept and it does not claim to be optimal nor suitable for production. I wrote these lines more than one year ago, to practice with variadics and I finally found some time to summarize my ideas in a decent (I hope) post and share with you. If this will be helpful or inspiring to anyone, I’ll be really glad.

I found a recent proposal aiming to introduce named arguments in C++ here. Cool!

Anyhow I have a question for you: where would you have wanted to use named parameters in C++?

I read the article “Enforcing the Rule of Zero” from latest Overload (of ACCU) and I’d like to point out something that I misapprehended at a first reading.

I’m not explaining what the Rule of Zero is. If you’ve never heard about it, I heartily suggest you to read Martinho Fernandes’ article. The rule states (basically): classes that have custom destructors, copy/move constructors or copy/move assignment operators should deal exclusively with ownership. This rule is an application of the Single Responsability Principle.

Back to the article, a clever point the author underlines is about polymorphic deletion: what to do when we want to support polymorphic deletion, or when our classes have virtual functions? Quoting Herb Sutter’s: If A is intended to be used as a base class, and if callers should be able to destroy polymorphically, then make A::~A public and virtual. Otherwise make it protected (and not-virtual).

For example:

struct A
   virtual void foo();

struct B : A
   void foo() override ;
   int i;

A* a = new B{}; 

delete a; // ops, undefined behavior

To correctly destroy B, A should have a virtual destructor:

struct A
   virtual ~A() {}
   virtual void foo();

Problem: now the compiler won’t automatically generate move operations for A. And, worse, in C++14 this deficiency is extended to copy operations as well. So, the following may solve all the problems:

struct A
   //A() = default; // if needed
   virtual ~A() = default;
   A(const A&)=default;
   A& operator=(const A&)=default;
   A& operator=(A&&)=default;

   virtual void foo();

This is called the Rule of Five. The author then wisely proposes to follow the second part of the Rule of Zero, that is: “Use smart pointers & standard library classes for managing resources”. I add: or use a custom wrapper, if an STL’s class does not fit with your needs. The key point is: use abstraction, use RAII, use C++.

Then the author suggests to use a shared_ptr:

struct A
   virtual void foo() = 0;

struct B : A
   void foo() override {}

shared_ptr<A> ptr = make_shared<B>(); // fine

This works well and it avoids the (more verbose) Rule of Five.

Why shared_ptr and not simply unique_ptr? Let me remark the key point: the article never says “use any smart pointer”, because not every smart pointer would do. In fact a plain unique_ptr would not have worked.

One of the (many) differences between unique and shared pointers is the deleter. Both can have a custom deleter, but in  unique_ptr the deleter is part of the type signature (namely, unique_ptr<T, Deleter>) and for shared_ptr is not: shared_ptr has a type-erased deleter (and in fact also a type-erased allocator).

This implies that shared_ptr<B> has a deleter which internally remembers that the real type is B.

Consider the example I borrowed from the article: when make_shared<B> is invoked, a shared_ptr<B> is constructed as expected. shared_ptr<B> constructs a deleter which will delete the B*. Later, shared_ptr<B> is passed to shared_ptr<A>’s constructor: since A* and B* are compatible pointers, shared_ptr<B>’s deleter is “shared” as well. So even if the type of the object is  shared_ptr<A>, its deleter still remembers to delete a pointer of type B*.

Conversely, unique_ptr<T> has a default deleter of type std::default_deleter<T>. This is because the unique_ptr is intended to be use exactly as a light wrapper of a delete operation (with unique ownership – and not merely a scoped one). std::default_deleter<A> can be constructed from std::default_deleter<B> (since pointers are compatible), but it will delete an A*This is by design, since unique_ptr is intended to mimic closely operator new and delete, and the (buggy) code { A* p = new B; delete p; } will call delete(A*).

A possible way to work around this issue is to define a custom type-erased deleter for unique_ptr. We have several ways to implement this. One uses std::function and the other uses a technique described by Davide Di Gennaro in his book Advanced C++ Metaprogramming, called Trampolines. The idea for the latter was suggested by Davide, in a private conversation.

Using std::function is the simplest way:

template<typename T>
struct concrete_deleter
   void operator()(void* ptr) const
      delete static_cast<T*>(ptr);


unique_ptr<A, function<void(void*)> ptr { new B{}, concrete_deleter<B>{} };

Here we are using type-erasure, again. std::function<void(void*)> is a wrapper to any callable object supporting operator()(void*). When the unique_ptr has to be destroyed it calls the deleter, that is actually a concrete_deleter<B>. Internally, concrete_deleter<T> casts the void* to T*. To reduce verbosity and avoid errors like { new B(), concrete_deleter<A>{} }, you can write a make_unique-like factory.

The second solution is cunning and it implements type-erasure without a virtual function call (an optimization which goes beyond what std::function can really use):

struct concrete_deleter
   using del_t = void(*)(void*);
   del_t del_;

   template <typename T>
   static void delete_it(void *p)
      delete static_cast<T*>(p);

   template<typename T>
     : del_(&delete_it<T>)

   void operator()(void* ptr) const

template<typename T, typename... Args>
auto make_unique_poly(Args&&... args)
   return unique_ptr<T, concrete_deleter>{new T{forward<Args>(args)...}, static_cast<T*>(nullptr)};

unique_ptr<A, concrete_deleter> ptr = make_unique_poly<B>();

The idea is storing the type information in the del_ function pointer, directly.


As many readers suggested, this can be done also by using a lambda. This way we get rid of the concrete_deleter support struct. I’m just afraid of this solution (that was in the first draft of this post) because if you use a generic type like the following:

unique_ptr<A, void(*)(void*)>

When you read the code you don’t know, at a first sight, what unique_ptr means. Worse, you may re-assign the unique_ptr to another one, passing a totally different lambda that observes the same signature.

Moreover, as Nicolas Silvagni commented, the size of unique_ptr<A, concrete_deleter> (or the one using a lambda) is greater than unique_ptr<A> (typically doubles – e.g. 8 bytes vs 16 bytes, on 64bit architecture). To prevent this, an intrusive approach is possible (read the comments for details). Alas, an intrusive approach does not follow the design of unique_ptr (and of STL wrappers in general) that is non-intrusive.


So to sum up, here are the possible workarounds:

  1. Use shared_ptr (if possibile),
  2. Apply the Rule of Five (so declare a virtual destructor),
  3. Use unique_ptr with a custom deleter.


What do you think?


Many thanks to Davide Di Gennaro for reviewing this article and suggesting me some improvements. Some ideas arised from a private conversation we had.