In finance, the **drawdown** is the measure of the decline from a historical peak in some series of data (e.g. price of a stock on a certain period of time).

For example, here is the hypothetical price series of the fake “CPlusPlus” stock:

You know, the 2008 crisis affected C++ too, renaissance in 2011/2012, some disappointments in 2014/2015 because C++14 was a minor release and “Concepts” didn’t make it, and nowadays the stock is increasing since programmers feel hopeful about C++20.

Drawdowns are the differences between the value at one year and the value at the previous maximum peak. For instance, in 2008 we have a drawdown of 28-12 (16) and in 2015 the “dd” is 35-21 (14):

The **Maximum drawdown** is just the highest value among all the drawdowns. In the series above, the maximum drawdown is 16.

In economics, **MDD** is an indicator of risk and so an important problem to solve.

Let’s see how to solve this problem and how to bring more value out of it.

The MDD problem can be formulated as follows: given an array `A`

, find the maximum difference `A[j] - A[i]`

with `j < i`

. The constraint on `i`

and `j`

makes this problem challenging. Without it we can just find the maximum and minimum elements of the array.

The bruce force approach is quadratic in time: for each pair of indexes i, j we keep track of the maximum difference. The code follows:

int MaxDrawdownQuadratic(const vector<int>& stock) { auto mdd = 0; for (auto j=0u; j<stock.size(); ++j) { for (auto i=j; i<stock.size(); ++i) { mdd = std::max(mdd, stock[j] - stock[i]); } } return mdd; }

When I facilitate Coding Gym, I suggest the attendees who are stuck to start from the naive solution, if any. The naive solution might be a red herring but it can lead to the optimal solution **when we ask and answer key questions**.

In this case the key question is “how can we remove/optimize the inner loop?”.

The current solution starts from `stock[j]`

and it goes forward to calculate the differences between that value and all the following ones. This approach considers `stock[j]`

like a peak and goes forward.

The key question turns to “how to avoid going through all the following elements”?

Think about the difference `stock[j] - stock[i]`

. For every `i`

, such a difference is maximized when `stock[j]`

is the maximum value so far (for each `j`

from 0 to `i - 1`

). Thus, we can ignore all the other values since `stock[j] - stock[i]`

would be lower.

The insight comes when we “reverse” our way of thinking about the pairs: we shouldn’t start from `stock[j]`

and then go forward to find the lowest value. Instead, we should start from `stock[i]`

having the previous biggest value cached!

So, instead of looking at each pair, we can just keep track of the maximum value preceding any other index. Here is the idea:

int MaxDrawdown(const vector<int>& stock) { auto mdd = 0; auto maxSoFar = stock.front(); for (auto i=1u; i<stock.size(); ++i) { mdd = std::max(mdd, maxSoFar - stock[i]); maxSoFar = std::max(maxSoFar, stock[i]); } return mdd; }

The solution is linear in time.

Now it’s time for me to show you how to get more out of this problem. It’s time for me to inspire you to achieve more every time you face with puzzles. There is no limitation.

In the rest of the post, I’m just freely playing with the problem to express myself.

As I elaborated in the previous post, sometimes we can combine known patterns to solve problems.

I don’t have a special recipe that brings out patterns from code. Sometimes I just “see” them between the lines (as for Kadane). Other times I try some tricks to discover them. Or I don’t find any patterns at all.

I recently met again the MDD problem (after some years) and it looked to me similar to Kadane. I was **biased **by Kadane and my mind was unconsciously suggesting me to apply similar patterns to the MDD problem because the code looked similar. It might be dangerous! It doesn’t work every time. Ever heard about “the hammer syndrome”? This is the caveat of “thinking **inside** the box”. Anyway, in this case my sixth sense was right.

First of all I had an intuition: I realized that all the evolving values of `maxSoFar`

are **totally independent** from any other decision points of the algorithm. I could enumerate them separately. One trick to use when searching for patterns is asking the question “which computations can be isolated or extracted?”.

maxSoFar is just a **cumulative maximum**. For instance:

4 3 2 6 8 5 7 20

The cumulative maximum is:

4 4 4 6 8 8 8 20

The pattern that can generate such a series is **prefix sum** (when “sum” is not addition but **maximum**).

So I refactored the original code by isolating the cumulative maximum calculation into a separate vector:

int MaxDrawdown(const vector<int>& stock) { std::vector<int> maxs(stock.size()); std::partial_sum(std::begin(stock), std::end(stock), std::begin(maxs), [](auto l, auto r) { return std::max(l, r); }); auto mdd = 0; for (auto i=1u; i<stock.size(); ++i) { mdd = std::max(mdd, maxs[i] - stock[i]); } return mdd; }

The next trick is to figure out if the loop hides another pattern. The question is: what kind of operation is underneath the calculation of `mdd`

?

We have some hints:

- at every step we calculate
`maxs[i] - stock[i]`

so we read the ith-value from two sequences, - every result of such a calculation is then reduced by applying
`std::max`

Do you know this pattern?

Sure! It’s **zip | map | reduce**!

See this post for more details.

In other words:

**zip**`maxs`

with`stock`

(we pair them off)**map**every pair with subtraction**reduce**the intermediate results of map with`std::max`

In C++ we can express such a pattern with `std::inner_product`

(I’m not saying “use this in production”, I’m just letting by brain work):

int MaxDrawdown(const vector<int>& stock) { std::vector<int> maxs(stock.size()); std::partial_sum(std::begin(stock), std::end(stock), std::begin(maxs), [](auto l, auto r) { return std::max(l, r); }); return std::inner_product(std::begin(maxs), std::end(maxs), std::begin(stock), 0, [](auto l, auto r) { return std::max(l, r); }, std::minus<>{}); }

Now we have a solution that is harder for people not familiar with STL algorithms, an additional scan as well as more memory usage…

First of all, although the code is not intended for production use, I am already satisfied because my brain has worked out. As you see, the line between production code and “training code” might be more or less marked. In my opinion, our brain can benefit from both training and production “styles” (when they differ).

Now, I would like to push myself even more by giving my own answer to the following question:

*What might this code look like in next-generation C++?*

What about using ranges? Might that help solve the issues introduced before?

Here is my answer:

int MaxDrawdown(const vector<int>& stock) { auto maxs = view::partial_sum(stock, [](auto l, auto r){ return std::max(l, r); }); auto dds = view::zip_with(std::minus<>{}, maxs, stock); return ranges::max(dds); }

The combination of `view::zip_with`

and `ranges::max`

has displaced `std::inner_product`

. In my opinion, it’s much more expressive.

I hope someone will propose and defend some function objects for min and max so we can avoid the lambda – after all, we have `std::minus`

and `std::plus`

, why not having `std::maximum`

and `std::minimum `

(or such)?

If you are wondering if this code does only one scan, the answer is yes. Every view here is lazy and does not use extra memory.

We can happily argue again that “beauty is free”.

**Note:**

usually the MDD is calculated as a ratio because it makes more sense to display it as a percentage. For example:

float MaxDrawdown(const vector<int>& stock) { auto maxs = view::partial_sum(stock, [](auto l, auto r){ return std::max(l, r); }); auto dds = view::zip_with([](auto peak, auto ith) { return (float)(peak-ith)/peak; }, maxs, stock); return 100.0f * ranges::max(dds); }

Consider again the brute force solution:

int MaxDrawdownQuadratic(const vector<int>& stock) { auto mdd = 0; for (auto j=0u; j<stock.size(); ++j) { for (auto i=j; i<stock.size(); ++i) { mdd = std::max(mdd, stock[j] - stock[i]); } } return mdd; }

We have seen that the optimal solution consists in just scanning the array forward and “caching” the biggest `stock[j]`

so far.

A similar schema applies if we think the solution backwards: we scan the array backwards and cache the **lowest** price so far:

int MaxdrawdownBackwards(const vector<int>& stock) { auto mdd = 0; auto minSoFar = stock.back(); for (auto i=stock.size()-1; i>0; --i) { mdd = std::max(mdd, stock[i-1] - minSoFar); minSoFar = std::min(minSoFar, stock[i-1]); } return mdd; }

Getting to the ranges-based solution is not so hard since we know how the problems is broken down into patterns: the forward cumulative maximum is replaced with the *backward* cumulative minimum. Still the *prefix sum* pattern. We just change the proper bits:

`stock`

is iterated backwards`std::min`

displaces`std::max`

*zip | map | reduce* stays the same except for the inputs order (we have to subtract `stock[i]`

to the i-th minimum) and the direction of `stock`

(backwards).

Thus, here is the code:

int MaxDrawdownBackwards(const vector<int>& stock) { auto mins = view::partial_sum(view::reverse(stock), [](auto l, auto r){ return std::min(l, r); }); return ranges::max(view::zip_with(std::minus<>{}, view::reverse(stock), mins)); }

If you have some difficulties at this point, write down the “intermediate” STL code without ranges.

The same challenge gave us the opportunity to find another solution with the same patterns.

Playing with patterns is to programmers creativity as playing with colors is to painters creativity.

Playing with patterns is a productive training for our brain.

Playing with patterns is also useful for tackling problem variations fluently. For instance, if the problem changes to “calculate the **minimum** drawdown”, we just have to replace `ranges::max`

with `ranges::min`

. That’s possible because we know how the problem has been broken down into patterns.

The MDD problem has interesting variations that can be solved with the same patterns (customizing the proper bits). A couple of challenges are left to the willing reader:

- Given an array A, find the maximum difference
`A[j] - A[i]`

with`i < j`

(MDD is`j < i`

).

Rephrasing: given a series of stock prices, find the maximum profit you can obtain by buying the stock on a certain day and selling it in a future day. Try your solutions here (alas, up to C++14). - Given an array of stock prices, each day, you can either buy one share of that stock, sell any number of shares of stock that you own, or not make any transaction at all. What is the maximum profit you can obtain with an optimum trading strategy? See the full problem and try your solutions here (alas, up to C++14).

Have fun practicing with STL algorithms and ranges!

]]>“Just one, the best one”.

The legendary answer by José Capablanca – a world chess champion of the last century – indicates a commonly known fact: chess champions win by being better at **recognizing patterns** that emerge during the game. They remember meaningful chess positions better than beginners. However, experts do not remember *random* positions effectively better than non-experts.

Patterns serve as a kind of shorthand that’s easier to remember than a meaningless configuration of pieces that could not occur in a real game.

This fact does not apply to chess only. Our brain works by constantly recognizing, learning and refining patterns on the world.

The reason is efficiency: the brain applies such an optimization to ignore some of the possible choices we have in every situation. Thus, experts get better results while thinking less, not more.

You should know another category of experts who are usually good at recognizing patterns: **programmers**.

We, programmers, get better results while *thinking in patterns*. We decompose complex problems in combinations of simpler patterns. However, many problems cannot be solved with known patterns. Here is where **creativity** kicks in.

However, not only creativity enables human beings to solve unknown problems with new ideas, but it’s also the capacity to reinterpret known problems in new and inspiring ways. It’s the art of creating new associations. As Jules Henri Poincaré once said “creativity is the ability of unite pre-existing elements in new combinations that are useful”. In programming, “pre-existing elements” are commonly called **patterns**.

That’s the spirit of this article. I will revisit a classical problem from another perspective.

The article is a bit verbose because the style is didactic: I spend some time explaining the example problem and the solution. If you already know the Maximum Subarray Problem, you can skip the following section. Even though you know Kadane’s algorithm, it’s worth reading the dedicated section anyway because I get to the solution from a slightly different point of view than the canonical one.

Let me introduce one protagonist of the story, the famous “Maximum Subarray” problem whose linear solution has been designed by Joseph “Jay” Kadane in the last century.

Here is a simple formulation of the problem:

*Given an array of numbers, find a contiguous subarray with the largest sum.*

We are just interested in the value of the largest sum (not the boundaries of the subarray).

Example:

[-3,1,-3,4,-1,2,1,-5,4] Max subarray: [4,-1,2,1]. Sum: 6.

Clearly, the problem is interesting when the array contains negative numbers (otherwise the maximum subarray is the whole array itself).

The literature around this problem is abundant. There exist different conversations about that. We’ll focus only on **Kadane’s algorithm**, the most famous (and efficient) solution.

The theory behind Kadane’s algorithm is not straightforward and it’s beyond the scope of this didactic post. The algorithm lies in an area of programming called **Dynamic Programming**, one of the hardest techniques to solve problems in computer science – and one of the most efficient as well. The basic principle of Dynamic Programming consists in breaking a complex problem into simpler sub-problems and caching their results.

For example, the task of calculating “1+1+1+1” can be broken down this way: first we calculate “1+1=2”, then we add “1” to get “3” and finally we add “1” to get “4”. Implicitly, we have “cached” the intermediate results since we did not started from scratch every time – e.g. to calculate (1+1)+1 we started from “1+1=2”. A more complex example is Fibonacci: each number is calculated from the famous formula *fibo(n) = fibo(n-1) + fibo(n-2)*.

For example:

fibo(4) = fibo(3) + fibo(2) = fibo(2) + fibo(1) + fibo(2) = fibo(1) + fibo(0) + fibo(1) + fibo(2) = fibo(1) + fibo(0) + fibo(1) + fibo(1) + fibo(0)

The sub-problems are called “overlapping” since we solve the same sub-problem multiple times (fibo(2) called twice and fibo(1) and fibo(0) three times). However, the main characteristic of Dynamic Programming is that we do not recalculate the sub-problems that we have already calculated. Instead, we “cache” them. Without stepping into further details, there exist two opposite approaches which come with a corresponding caching strategy: Top-Down and Bottom-Up. Roughly speaking, the former is recursive, the latter is iterative. In both we maintain a map of already solved sub-problems. More formally, in the Top-Down approach the storing strategy is called **memoization**, whereas in the Bottom-Up one it is called **tabulation**.

In Bottom-Up we go incrementally through all the sub-problems and reuse the previous results. For instance:

table[0] = 0; table[1] = 1; for(auto i=2; i<=n; ++i) table[i] = table[i-1] + table[i-2]; return table[n];

On the other hand, Top-Down involves recursion:

// suppose memo has size n+1 int fibo(std::vector<int>& memo, int n) { if(n < 2) return n; if(memo[n] != 0) return memo[n]; memo[n] = fibo(memo, n-1) + fibo(memo, n-2); return memo[n]; }

Now that we have a bit of background, let’s quickly meet *Kadane’s algorithm*.

Kadane’s algorithm lies in the Bottom-Up category of Dynamic Programming, so it works by first calculating a solution for every sub-problem and then by using the final “table” of results in some way. In Fibonacci, we use the table by just popping out its last element. In Kadane we do something else. Let’s see.

My explanation of the algorithm is a bit different from the popular ones.

First of all, we should understand how the table is filled. Differently than Fibonacci, Kadane’s table[i] does **not** contain the solution of the problem at index i. It’s a “partial” result instead. Thus, we call such a table “*partial_maxsubarray”*.

*partial_maxsubarray[i]* represents a partial solution on the subarray **ending at the ith index and including the ith element**. The last condition is the reason why the result is *partial_*. Indeed, the final solution might not include the ith element.

Let’s see what this means in practice:

[-3,1,-3,4,-1,2,1,-5,4] partial_maxsubarray[0] means solving the problem only on [-3], including -3. partial_maxsubarray[1] is only on [-3, 1], including 1. partial_maxsubarray[2] is only on [-3, 1, -3], including -3. partial_maxsubarray[3] is only on [-3, 1, -3, 4], including 4. partial_maxsubarray[4] is only on [-3, 1, -3, 4, -1], including -1. partial_maxsubarray[5] is only on [-3, 1, -3, 4, -1, 2], including 2. partial_maxsubarray[6] is only on [-3, 1, -3, 4, -1, 2, 1], including 1. partial_maxsubarray[7] is only on [-3, 1, -3, 4, -1, 2, 1, -5], including -5. partial_maxsubarray[8] is only on [-3, 1, -3, 4, -1, 2, 1, -5, 4], including 4.

For each index i, the ith element will be included in the partial_maxsubarray calculation. We have only a degree of freedom: we can change where to start.

Consider for example partial_maxsubarray[2]. If the main problem was on [-3, 1, -3], the solution would have been 1 (and the subarray would have been [1]). However, partial_maxsubarray[2] is -2 (and the subarray is [1, -3]), because of the invariant.

Another example is partial_maxsubarray[4] that is not 4 as you might expect, but 3 (the subarray is [4, -1]).

How to calculate partial_maxsubarray[i]?

Let’s do it by induction. First of all, partial_maxsubarray[0] is clearly equal to the first element:

partial_maxsubarray[0] = -3

Then, to calculate the next index (1) we note that we have only one “degree of freedom”: since we must include 1 anyway, we can either extend the current subarray by one (getting [-3, 1]) or we can start a new subarray from position 1. Let me list the two options:

- extend the current subarray, getting [-3, 1], or
- start a new subarray from the current index, getting [1].

The choice is really straightforward: we choose the subarray with the largest sum! Thus, we choose the second option (partial_maxsubarray[1] = 1).

To calculate partial_maxsubarray[2]:

- keep 1, [1, -3], or
- start a new subarray [-3]

Clearly, the former is better (partial_maxsubarray[2] = -2).

Again, partial_maxsubarray[3]:

- keep 4, [1, -3, 4], or
- start a new subarray [4]

The latter is larger (partial_maxsubarray[3] = 4).

Do you see the calculation underneath?

For each index, we calculate partial_maxsubarray[i] this way:

partial_maxsubarray[i] = max(partial_maxsubarray[i-1] + v[i], v[i])

At each step i, we decide if either start a new subarray from i or extend the current subarray by one on the right.

Once we have filled partial_maxsubarray, do you see how to use it to calculate the solution to the main problem?

Let’s recall how we calculated partial_maxsubarray[2]:

partial_maxsubarray[0] = -3 partial_maxsubarray[1] = 1 partial_maxsubarray[2] = max(partial_maxsubarray[1] + v[2], v[2])

Since v[2] is -3, we ended up with -2. Thus, partial_maxsubarray[1] is larger than partial_maxsubarray[2].

Running the algorithm on the remaining indexes we get:

partial_maxsubarray[3] = 4 partial_maxsubarray[4] = 3 partial_maxsubarray[5] = 5 partial_maxsubarray[6] = 6 partial_maxsubarray[7] = 1 partial_maxsubarray[8] = 5

It turns out that partial_maxsubarray[6] has the largest value. This means there is a subarray ending at index 6 having the largest sum.

Thus, the solution to the main problem is simply **calculating the maximum of partial_maxsubarray**.

Let’s write down the algorithm:

int kadane(const vector<int>& v) { vector<int> partial_maxsubarray(v.size()); partial_maxsubarray[0] = v[0]; for (auto i = 1u; i<v.size(); ++i) { partial_maxsubarray[i] = std::max(partial_maxsubarray[i-1] + v[i], v[i]); } return *max_element(begin(partial_maxsubarray), end(partial_maxsubarray)); }

If you knew this problem already, you have probably noticed this is not the canonical way to write Kadane’s algorithm. First of all, this version uses an extra array (partial_maxsubarray) that is not used at all in the classical version. Moreover, this version does two iterations instead of just one (the first for loop and then max_element).

“Marco, are you kidding me?” – Your subconscious speaks loudly.

Stay with me, you won’t regret it.

Let me solve the two issues and guide you to the canonical form.

To remove the support array, we need to merge the two iterations into one. We would kill two birds with one stone.

We can easily remove the second iteration (max_element) by calculating the maximum along the way:

int kadane(const vector<int>& v) { vector<int> partial_maxsubarray(v.size()); partial_maxsubarray[0] = v[0]; auto maxSum = partial_maxsubarray[0]; for (auto i = 1u; i<v.size(); ++i) { partial_maxsubarray[i] = std::max(partial_maxsubarray[i-1] + v[i], v[i]); maxSum = max(maxSum, partial_maxsubarray[i]); } return maxSum; }

After all, a maximum is just a *forward accumulation* – it never goes back.

Removing the extra array can be done by observing that we do not really use it entirely: we only need the **previous element**. Even in Fibonacci, after all we only need the last two elements to calculate the current one (indeed, removing the table in Fibonacci is even easier). Thus, we can replace the support array with a simple variable:

int kadane(const vector<int>& v) { int partialSubarraySum, maxSum; partialSubarraySum = maxSum = v[0]; for (auto i = 1u; i<v.size(); ++i) { partialSubarraySum = max(partialSubarraySum + v[i], v[i]); maxSum = max(maxSum, partialSubarraySum); } return maxSum; }

The code above is likely more familiar to readers who already knew Kadane’s algorithm, isn’t it?

Now, let’s have some fun.

As most people, the first time I saw Kadane’s algorithm, it was in the canonical form. At the time, I didn’t notice anything particular. It was 2008 and I was at the university.

Many years passed and I met the problem again in 2016. In the last years, I have been regularly practicing with coding challenges to develop my ability to “think in patterns”. With “pattern” I mean simply a “standard solution to a standard problem, with some degree of customization”. For example, “sorting an array of data” or “filtering out a list” are patterns. Many implementations of patterns are usually provided in programming languages standard libraries.

I am used to consider every C++ standard algorithm as a pattern. For example, std::copy_if and std::accumulate are, for me, two patterns. Some algorithms are actually much more general in programming. For example, std::accumulate is usually known in programming as **fold** or **reduce**. I have talked about that in a previous post. On the other hand, something like std::move_backwards is really C++ idiomatic.

*Thinking in patterns* can be some good for many reasons.

First of all, as I have mentioned at the beginning of this article, our brain is designed to work this way. Cognitive scientist call “the box” our own state of the art, our own model of the world that enables us to ignore alternatives. Clearly, the box has pros and cons. Constantly thinking *inside* the box works as long as we deal with known problems. Thinking *outside* the box is required to solve new challenges. This is *creativity*.

When I think of creativity, I think of cats: they can be coaxed but they don’t usually come when called. We should create conditions which foster creativity. However, something we can intentionally influence is training our own brain with pattern recognition and application. To some extent, this is like “extending our own box”. This is what I have been doing in the last years.

Another benefit of thinking in patterns is *expressivity*. Most of the times, patterns enable people to express algorithms fluently and regardless of the specific programming language they use. Patterns are more declarative than imperative. Some patterns are even understandable to non-programmers. For example, if I ask my father to “sort the yogurt jars by expiration date and take the first one”, that’s an easy task for him.

So, in 2016 something incredible happened. When I met again Kadane’s algorithm, my brain automatically recognized two patterns from the canonical form. After visualizing the patterns in my mind, I broke down the canonical form in two main parts. This is why I first showed you this version of the algorithm:

int kadane(const vector<int>& v) { vector<int> partial_maxsubarray(v.size()); partial_maxsubarray[0] = v[0]; for (auto i = 1u; i<v.size(); ++i) { partial_maxsubarray[i] = std::max(partial_maxsubarray[i-1] + v[i], v[i]); } return *max_element(begin(partial_maxsubarray), end(partial_maxsubarray)); }

The second pattern is clearly **maximum **(that is a special kind of **reduce**, after all).

What is the first one?

Someone might think of **reduce**, but it is not. The problem with reduce is that it does not have “memory” of the previous step.

The pattern is **prefix sum**. Prefix sum is a programming pattern calculating the running sum of a sequence:

array = [1, 2, 3, 4] psum = [1, 3, 6, 10]

How does that pattern emerge from Kadane’s algorithm?

Well, “sum” is not really an addition but it’s something different. The update function emerges from the loop:

thisSum = std::max(previousSum + vi, vi);

Imagine to call this line of code for every element of v (*vi*).

In C++, prefix sum is implemented by partial_sum. The first element of partial_sum is just v[0].

Here is what the code looks like with partial_sum:

int kadane(const vector<int>& v) { vector<int> partial_maxsubarray(v.size()); partial_sum(begin(v), end(v), begin(partial_maxsubarray), [](auto psumUpHere, auto vi){ return max(psumUpHere + vi, vi); }); return *max_element(begin(partial_maxsubarray), end(partial_maxsubarray)); }

When I ran this code getting a green bar I felt very proud of myself. I didn’t spend any effort. First of all, my brain recognized the pattern from the hardest version of the code (the canonical form). My brain popped this insight from my unconscious tier to my conscious reasoning. Then I did an intermediate step by arranging the code in two main parts (the cumulative iteration and then the maximum calculation). Finally, I applied *partial_sum* confidently.

You might think this is useless. I think this is a great exercise for the brain.

There is something more.

Since C++17, the code is easy to run in parallel:

int kadane(const vector<int>& v) { vector<int> partial_maxsubarray(v.size()); inclusive_scan(execution::par, begin(v), end(v), begin(partial_maxsubarray), [](auto psumUpHere, auto vi){ return max(psumUpHere + vi, vi); }); return *max_element(execution::par, begin(partial_maxsubarray), end(partial_maxsubarray)); }

inclusive_scan is like partial_sum but it supports parallel execution.

Some time ago I read a short post titled “Beauty is free” that I cannot find anymore. The author showed that the execution time of an algorithm coded with raw loops gave the same performance as the same one written with STL algorithms.

Compared to the canonical form, our “pattern-ized” alternative does two scans and uses an extra array. It’s clear that beauty is not free at all!

The reason why I am writing this article now and not in 2016 is that I have finally found some time to try my solution with range v3. The result – in my opinion – is simply beautiful. Check it out:

int kadane(const vector<int>& v) { return ranges::max(view::partial_sum(v, [](auto s, auto vi) { return std::max(s+vi, vi); })); }

**view::partial_sum** is a lazy view, meaning that it applies the function to the ith and (i-1)th elements only when invoked. Thus, the code does only one scan. Moreovwer, the support array is vanished.

Running a few performance tests with clang -O3, it seems that the optimizer does a better job on this code rather than on the canonical form. On the other hand, the code does not outperform the canonical one on GCC. As I expect, running the range-based code in debug is about 10 times slower. I have not tried on Visual Studio. My tests were not accurate so please take my affirmations with a grain of salt.

I would like to inspire you to take action. Practice is fundamental.

A common question people ask me is “how can I practice?”. This deserves a dedicated post. Let me just tell you that competitive programming websites are a great source of self-contained and verifiable challenges, but they are not enough. You should creatively use real-world problems to practice.

Ranges are the next-generation of STL. Ranges are the next-generation of C++.

However, if you want to learn how to use ranges, you have to know and apply STL patterns first.

Ranges are beyond the scope of this article. The documentation is here. A few good posts here and here. Help is needed as it was in 2011 to popularize C++11.

I hope to blog again on some extraordinary patterns and how to use them. For now, I hope you have enjoyed the journey through a new interpretation of Kadane’s algorithm.

Some scientific notions of this article come from The Eureka Factor.

]]>This post is just a note about using **std::size** and **std::empty** on static C-strings (statically sized). Maybe it’s a stupid thing but I found ~~more than one person~~ others than me falling into such “trap”. I think it’s worth sharing.

To make it short, some time ago I was working on a generic function to compare strings under a certain logic that is not important to know. In an ideal world I would have used std::string_view, but I couldn’t mainly for backwards-compatibility. I could, instead, put a couple of template parameters. Imagine this simplified signature:

template<typename T1, typename T2> bool compare(const T1& str1, const T2& str2);

Internally, I was using std::size, std::empty and std::data to implement my logic. To be fair, such functions were just custom implementations of the standard ones (exhibiting exactly the same behavior) – because at that time C++17 was not available yet on my compiler and we have had such functions for a long time into our company’s C++ library.

*compare* could work on std::string, std::string_view (if available) and static C-strings (e.g. “hello”). While setting up some unit tests, I found something I was not expecting. Suppose that compare on two equal strings results true, as a normal string comparison:

EXPECT_TRUE(compare(string("hello"), "hello"));

This was not passing at runtime.

Internally, at some point, compare was using **std::size. **The following is true:

std::size(string("hello")) != std::size("hello");

The reason is trivial: “hello” is just a statically-sized array of 6 characters. 5 + the** null terminator**. When called in this case, std::size just gives back the real size of such array, which clearly includes the null terminator.

As expected, **std::empty** follows std::size:

EXPECT_TRUE(std::empty("")); // ko EXPECT_TRUE(std::empty(string(""))); // ok EXPECT_TRUE(std::empty(string_view(""))); // ok

Don’t get me wrong, I’m not fueling an argument: the standard is correct. I’m just saying we have to be pragmatic and handle this subtlety. I just care about traps me and my colleagues can fall into. All the people I showed the failing expectations above just got confused. They worried about consistency.

If std::size is the “vocabulary function” to get the length of anything, I think it should be easy and special-case-free. We use std::size because we want to be generic and handling special cases is the first enemy of genericity. I think we all agree that std::size on null-terminated strings (any kind) should behave as **strlen**.

Anyway, it’s even possible that we don’t want to get back the length of the null-terminated string (e.g. suppose we have an empty string buffer and we want to know how many chars are available), so the most correct and generic implementation of std::size is the standard one.

Back to *compare* function I had two options:

- Work around this special case locally (or just don’t care),
- Use something else (possibly on top of std::size and std::empty).

Option 1 is “local”: we only handle that subtley for this particular case (e.g. *compare* function). Alas, next usage of std::size/empty possibly comes with the same trap.

Option 2 is quite intrusive although it can be implemented succinctly:

namespace mylib { using std::size; // "publish" ordinary std::size // on char arrays template<size_t N> constexpr auto size(const char(&)[N]) noexcept { return N-1; } // other overloads...(e.g. wchar_t) }

You can even overload on **const char*** by wrapping strlen (or such). This implementation is not *constexpr*, though. As I said before: we cannot generally assume that the size of an array of N chars is N – 1, even if it’s null-terminated.

*mylib::empty* is similar.

EXPECT_EQ(5, mylib::size("hello")); // uses overload EXPECT_EQ(5, mylib::size(string("hello")); // use std::size EXPECT_EQ(3, (mylib::size(vector<int>{1,2,3})); // use std::size

Clearly, *string_view* would solves most of the issues (and it has *constexpr* support too), but I think you have understood my point.

**[Edit]** Many people did not get my point. In particular, some have fixated on the example itself instead of getting the sense of the post. They just suggested string_view for solving this particular problem. I said that string_view would help a lot here, however I wrote a few times throughout this post that string_view was not viable.

My point is just be aware of the null-terminator when using generic functions like std::size, std::empty, std::begin etc because the null-terminator is an extra information that such functions don’t know about. That’s it. Just take actions as you need.

Another simple example consists in converting a sequence into a vector of its underlying type. We don’t want to store the null-terminator for char arrays. In this example we don’t even need to use std::size but just std::begin and std::end (thanks to C++17 template class deduction):

template<typename T> auto to_vector(const T& seq) { return vector(begin(seq), end(seq)); }

Clearly, this exhibits the same issue discussed before, requiring extra logic/specialization for char arrays.

I stop here, my intent was just to let you know about this fact. Use this information as you like.

**TL;DR**: Just know how std::size and std::empty work on static C-strings.

- static C-strings are null-terminated arrays of characters (size = number of chars + 1),
- std::size and std::empty on arrays simply give the total number of elements,
- be aware of the information above when using std::size and std::empty on static C-strings,
- it’s quite easy to wrap std::size and std::empty for handling strings differently,
- string_view could be helpful.

This article is also part of my series C++ in Competitive Programming.

In the very first installment of this series, I showed an example whose solution amazed some people. Let me recall the problem: we have to find the **minimum difference between any two elements in a sorted sequence of numbers**. For example:

[10, 20, 40, 100, 200, 300, 1000]

The minimum difference is 10, that is 20-10. Any other combination is greater. Then, I showed an unrolled solution for such problem (not the most amazing one!):

Imagine there is always at least one element in the sequence. Note that we calculate **elems[i+1]-elems[i]** for each i from 0 to length-1, meanwhile, we keep track of the maximum of such differences. I see a pattern, do you?

Before getting to the point, let me show you another example. **We want to calculate the number of equal adjacent characters in a string**:

ABAAACCBDBB

That is 4:

ABA**AA**C**C**BDB**B**

Again, here is a solution:

We compare **s[i]** with **s[i-1]** and, meanwhile, we count how many **true**s we get. Careful readers will spot a little difference from the previous example: at each step, we access s[i] and s[i-1], not s[i+1] and s[i]. I’ll develop this subtlety in a moment. Now, please, give me another chance to let you realize the pattern yourself, before I elaborate more.

This time we have two vectors of the same size, containing some values and we want to calculate the maximum absolute difference between any two elements. Imagine we are writing a test for a numerical computation, the first vector is our baseline (expectation) and the second vector is the (actual) result of the code under test. Here is some code:

Here we access the ith-elements of both the vectors at the same time. Is that similar to the other examples?

It’s time to get to the point, although you are already there if you have read the intro of this series – if you have not, please stay here and don’t spoil the surprise yourself!

Actually, there is not so much difference among the examples I showed. They are all obeying the same pattern: given two sequences, the pattern combines every two elements from input sequences at the same position using some function and accumulates these intermediate results along the way.

Actually, in functional programming, this pattern is the composition of three different patterns:

zip | map | fold

**zip** makes “pairs” from input sequences, **map** applies a function to each pair and returns some result, **fold** applies an operation to reduce everything to a single element. A picture is worth a thousand words:

For simplicity, imagine that zip and map are combined together into a single operation called **zipWith**.

Now we have two customization points:

- which function
*zipWith*uses to combine each pair, and - which function
*fold*uses to reduce each result of*zipWith*to a single element.

The general case of this pattern operates on any number of sequences, making a *tuple* for each application of zip (e.g. imagine we zip the rows of a matrix, we get its columns).

In C++ we have an algorithm that (partially) implements this pattern: inner_product. I said “partially” just because it accepts only two ranges and for this reason I say “pairs of elements”, not tuples – as in the general case. In C++17’s parallel STL, *inner_product* is made in parallel by transform_reduce (be aware of the additional requirements).

In the future we’ll do such things by using new tools that will be incorporated into the standard: *ranges*. For now, *inner_product* is an interesting and (sometimes) understimated tool we have. Regardless you are going to use such pattern in real-world code, I think that understanding when this pattern applies is mindblowing. If you regularly practice competitive programming as I do, you have the opportunity to recognize many patterns to solve your problems. In the last years I have found several cases when this pattern worked smoothly and I am sharing here a few.

The simplest form of *inner_product* takes two ranges of the same length and a starting value, and it calculates the sum of the products of each pair. It’s literally an inner product between two vectors. *inner_product* has two additional customization points to replace “product” and “sum” as we wish.

Let’s have some fun.

It’s time to code the solutions to the previous challenges in terms of *inner_product*. I start from the latter.

I recall that we want to find the maximum absolute difference of two vectors of double. In this case we replace “product” with “absolute difference” and “sum” with “max”. Or, we combine each pair by calculating the absolute difference and we keep track of the maximum along the way. I stress the fact that we reduce the combined pairs along the way and not at the end: *inner_product* is a single-pass algorithm (e.g. it works on stream iterators).

Here is the code:

I tend to use standard function objects as much as possible. They are just clearer, shorter and (should be) well-known. Thinking a bit more we come up with:

Better than the other? It’s debatable, I leave the choice to you.

The other two challenges are on a single sequence, aren’t they? Does the pattern still apply?

It does.

Zipping two distinct ranges is probably more intuitive, but what about zipping a sequence with itself? We only have to pass correct iterators to *inner_product*. We want to combine **s[i]** with **s[i-1]**. **zipWith** should use **operator==** and fold **operator+**. For the first sequence we take S from the second character to the last character. For the second sequence we take S from the first character to the second last character. That is:

- S.begin() + 1, S.end()
- S.begin(), S.end()-1

We have to pass the first three iterators to *inner_product*:

We zip with **equal_to** which uses **operator==** under the hood and we fold with **plus<>** which applies **operator+**. As you see, not having to specify the second sequence’s boundary is quite handy. If the solution is not clear, I hope you will find this picture useful:

When **equal_to** is called, the left hand side is S[i] and the right hand side is S[i-1]. In other words, the first range is passed as the first parameter to *zipWith* and the second range as the second parameter.

Careful readers will spot a subtle breaking change: **the solution is not protected against an empty string anymore**. Advancing an iterator that is not incrementable (e.g. *end*) is undefined behavior. We have to check this condition ourself, if we need. This example on HackerRank never fall into such condition, so the solution is just fine.

Finally, in the first exercise we are requested to calculate the minimum difference between any two elements in a sorted sequence of numbers. I intentionally wrote **elems[i+1]-elems[i]** and not **elems[i]-elems[i-1]**. Why? Just to show you another form of the same pattern. This one I like less because the call to *inner_product* is more verbose:

We can apply the other pattern by (mentally) turning the loop into **elems[i] – elems[i-1]**:

As before, the solution is not protected against an empty sequence. You understand that zipping a sequence on itself (by shifting it) is never protected against an empty range.

This pattern works in all the examples above just because **the stride between two elements is 1**. If it was greater, it would have failed – I know, we can use boost::strided or such but I don’t mind here. Basically, we have processed adjacent elements in a “window” of size 2. There are scenarios where this window can be larger and *inner_product* still applies.

As an example we take Max Min on HackerRank. This problem is very close to calculating the minimum difference between any two elements in a sorted sequence of numbers. It states: **Given a list of N non-unique integers, your task is to select K integers from the list such that its unfairness is minimized. **If

*max(x1, x2, …, xk) – min(x1, x2,…, xk)*

A possible solution to this problem consists in sorting the sequence and applying *zipWith | fold* as we did in the very first example. The only difference is that **the distance between the two elements we zip together is K-1**:

Do not misunderstand: *inner_product* still steps by one every time and still combines elements in pairs. It’s just that we zip the sequence with itself by shifting it by *K-1* positions and not by just 1. Here is what I mean:

As you see, although the size of the window is K, *inner_product* still works in the same way as before. The pairs that it conceptually creates (remember that the first sequence is shifted by *K-1* positions) are depicted below:

**This works only if K is less than or equal to N **(and N has to be at least 1).

The pattern fits this problem because we turn the sequence in a particular structure: we **sort **it. We have to select K elements and we know that *min(x1,…,xk)* is **x1** and *max(x1,…,xk)* is **xk**. This is just an effect of sorting. So we just check all these possible windows, incrementally, by using only **x1** and **xk**. We may ignore erverything inside the window. Another interesting property is that the first range passed to *inner_product* is always greater (the problem states that all elements are distinct) than the second, for each iteration. This is why we can use *minus<>* for *zipWith*. If we wanted the opposite, we would have changed the order of the iterators or we would have iterated backwards. Using algorithms make variations simpler than rolling a for loop.

Recap for the pragmatic C++ competitive coder:

- In C++,
**(zip | map | fold)**on**two**ranges is implemented by**inner_product:**- set the first callable to customize
*fold*(*plus*by default); - set the second callable to customize (
*zip | map)*– combined in a single operation (*multiplies*by default); - the order of the iterators matter: the first range is the left hand side of
*zipWith*, the second range is the right hand side;

- set the first callable to customize
- zipping a sequence on itself is just the same pattern;
- be aware it won’t work with ranges shorter than the number of positions we shift the sequence by (e.g. 1 in the first 3 examples);

- praciting recognizing and understanding coding patterns is food for the brain.

Technically, `basic_string_view`

is an object that can refer to a **constant** contiguous sequence of char-like objects with the first element of the sequence at position zero. The standard library provides several typedefs for standard character types and **std::string_view** is simply an alias for:

basic_string_view<char>

For simplicity, I’ll just refer to *string_view* for the rest of the post but what I’m going to discuss is valid for the other aliases as well.

You can imagine string_view as a **smart const char*** which provides any **const** member function of std::string as well as a few handy utilities to reduce its span. You cannot enlarge a string_view until you reassign it. Other languages (e.g. Go) have similar constructs that permit to grow the range as well as to participate in the ownership of such range. Although string_view does not, the power of such simple wrapper is huge, though.

The applications of string_view are many and it’s *relatively* simple to let string_view join your codebase. For years, I’ve been using a proprietary implementation of string_view dated back to the 90s and then improved on the base of boost::string_ref and recently on std::string_view. If you start today, it’s very likely you can adoperate your compiler’s string_view implementation (e.g. latest Visual Studio 2017 RC, *clang* and *GCC* support it), you can grab an implementation from the web or you can just use boost::string_ref or another library (e.g. Google’s, folly).

One can think that using **string_view** is as simple as using std::string with the only difference that string_view does not take the ownership of the char sequence and cannot change its content. That’s not completely true. Adoperating string_view requires you to pay attention to a few other traps that I’m going to describe later on. Before starting, let me show you a couple of simple examples.

Generally speaking, string_view is a good friend when we need to do text processing (e.g. parsing, comparing, searching), but first of all, string_view is an **adapter:** it allows different string types to be adapted into a std::string-like container. This means that string_view provides iterator support and STL naming conventions (e.g. size, empty). To create a string_view, we only require a null-terminated const char* or both a const char* and a length. Note that in the latter case we don’t need the char sequence to be null-terminated.

Suppose now that our codebase hosts many different string types but we want to write only one function doing a certain task on constant strings. Can string_view help? It can, if the string types manage a contiguous sequence of characters and also provide (read) access to it. Examples:

Then we may write only one function for our task:

ReturnType readonly_on_string_function(string_view sv); // only one implementation

Into *readonly_on_string_function* we can exploit the whole set of const functions of *std::string*. Just this simple capability is priceless. You know what I mean if you use more than three string types into your codebase

To show you other string_view functionalities, let me consider the problem of **splitting** a string. This problem can be tackled in many ways (e.g. iterator-based, range-based, etc) but let me keep things simple:

The worst things of this function are (imho):

- we create a new string for each token (this possibly ends up with dynamic allocation);
- we can split only
*std::string*and no other types.

Since string_view provides every const function of string, let’s try simply replacing *string* with *string_view*:

Not only is the code still valid, but also potentially **less demanding** because we just allocate 8/16 bytes (respectively on 32 and 64 bit platforms – a pointer and a length) for each token.

Now, let’s use some utilities to **shrink **the span. Suppose I get a string from some proprietary UI framework control, providing its own string representation:

auto name = uiControl.GetText();

Then imagine we want to remove all the whitespaces from the start and the end of such string (we want to **trim**). We can do it without changing the string itself, just by using string_view:

**remove_prefix** moves the start of the view forward by `n`

characters, **remove_suffix** does the opposite. Edge cases have been handled succinctly.

Now we have a string_view containing only the “good” part of the string. At this point, let me end with a bang: we’ll use the sanitized string to query a map without allocating extra memory for the key. How? Thanks to **heterogeneous lookup** of associative containers:

That’s possible because *less<>* is a **transparent comparator** and string_view can be implicitly constructed from std::string (thus, we don’t need to write operator< between std::string and std::string_view). That’s powerful.

It should be clear that string_view can be dramatically helpful to your daily job and I think it’s quite useless to show you other examples to support this fact. Rather, let me discuss a few common pitfalls I have met in the last years and how to cope with them.

The first error I have encountered many times is storing string_view as a member variable and forgetting that it will not participate in the ownership of the char sequence:

Suppose that *Parse* is never called with a temporary (moreover, we can enforce that assumption just by *deleting* such overload), this code is still fine because the caller of *Parse *has also ‘current’ in scope. Then some time later, a programmer that is not very familiar with string_view (or who is simply heedless) puts the following error in the code:

‘someProcessing’ is a temporary string and then *StatefulParser* will very likely refer to garbage.

So, string_view (as well as *span*, *array_view*, etc) is often not recommended as a data member. However, I think that string_view as data member sometimes is useful and in these scenarios we need to be prudent, just like using references and pointers as data members.

string_view seems a drop-in replacement of *const std::string& *because it provides the whole set of *std::string*‘s const functions and also because it’s a view (reference). So, the general rule you hear pretty much everywhere (especially nowdays that string_view has officially joined the C++ standard) is “*whenever you see const string&, just replace it with string_view*“.

So let’s do that:

void I_dont_know_how_string_will_be_used_but_i_am_cool(const string& s);

We turn into:

void I_dont_know_how_string_will_be_used_but_i_am_cool(string_view s);

As users of this function, we are now permitted to pass **whatever** valid string_view, aren’t we?

As writers of this function, we **may** have now serious problems.

We have introduced a subtle change to our interface that breaks a sort of guarantee that we had before: **null-termination**. string_view does not require (and then does not necessarily handle) a **null-terminated** sequence. On the other hand, string guarantees to get one back – with **c_str()**.

Maybe you don’t need that feature, in this case the rest of the interface should be ok. Otherwise, if you are lucky, your code simply stops compiling because you are using **c_str()** somewhere in the code. Else, you are using **data()**, and the code continues compiling just fine because string_view provides **data()** as well.

This is not a syntactic detail. What should be clear is that the interface of ‘I_dont_know_how_string_will_be_used_but_i_am_cool’ is not seamlessly changed because **now the user can just pass in a not null-terminated sequence of characters**:

string something = "hello world"; I_dont_know_how_string_will_be_used_but_i_am_cool(string_view{something.data(), 5}); // hello

Suppose at some point you call a C-function expecting a null-terminated string (it’s common), then you call *.data()* on string_view. What you obtain is “hello world\0” instead of what the user expected (“hello”). In this case, you maybe only get a logical error, because the \0 is at the end of the string. In this other case you are not so lucky:

char buff[] = {'h', 'e', 'l', 'l', 'o'}; I_dont_know_how_string_will_be_used_but_i_am_cool(string_view{buff, 5});

Even if uncommon (*generally* string_view refers to real strings, that are always null-terminated), that’s even worse, isn’t it?

In general, string_view “relaxes” (does not have) that requirement on null-termination (it’s just a wrapper on **const char***). Imagine that the **DNA**, the **identity**, of string_view is made of both the **pointer** **to the sequence of characters** and the **number of referred characters** (the **length** of the span). On the other hand, since *string::c_str()* **guarantees** that the returned sequence of characters is null-terminated, you can think that the identity of a string is just what *c_str()* returns – the length is a redundant information (e.g. computable by strlen(str.c_str())).

To conclude this point, **replacing const string& with string_view is safe as far as you don’t expect a null-terminated string** – if you are using *c_str()* then you can figure that out at compile time because the code simply not compile, otherwise you are possibly in trouble.

Since we are on the subject: replacing const string& with string_view has also another (minor) consequence because string_view involves *some* work, that is copying a pointer and a length. The latter is an extra, compared to const string&. That’s just theory. In practice you measure when in doubt.

From the previous point, it should be evident that wherever you need to create a string from a string_view you have to use both **data() and size()**, and not only *data()*. You have to use the DNA of string_view. I have reviewed this error many times:

string_view sv = ...; string s = sv.data(); // possibly UB

It does not work in general, for the same reasons I have just showed you (e.g. this constructor of *std::string* requires a null-terminated sequence of characters).

From C++17 you can just use one of string’s constructors:

string s { sv };

Before C++17, we have to use *data() + size()*:

string s { sv.data(), sv.size() };

Clearly, as for *std::string*, you have to do the same for other string types. E.g.:

CString cstr { sv.data(), sv.size() };

Although C and C++ provide many functions to perform conversions between a number and a string/C-string (and viceversa), **none supports a range of characters** (e.g. begin + end, or begin + length). Moreover, every C/C++ conversion function **expects the input string to be null-terminated**. These facts lead to the conclusion that it **does not exist any function able to convert a string_view into a number** out of the box. We can use some C/C++ functions, but we have limitations. I’ll show you some in this section.

For instance, using atoi or C++11 functions we fall into traps or undefined behavior:

So, how to properly convert a string_view into a number? Many ways exist, generally motivated by different requirements and compromises. For the rest of this section I’ll refer only to int conversions because the end of the story is similar for other numeric types.

Sometimes, although it seems counterintuitive, **to fulfill the null-termination requirement we can create an intermdiate std::string (or char array)**:

Actually, having a *std::string* we can rely on any C and C++ conversion function. Such intermdiate step of copying into a *std::string* is **sometimes** affordable because certain numeric types – like int – have a **small number of maximum digits** (e.g. int is 11). As far as the char sequence **really contains** one of such little data, the resulting *std::string* will be created without allocating dynamic memory thanks to **SSO** (*Small String Optimization*). Clearly, that shortcut does not hold for bigger numeric types and in general is not portable.

Other fragile solutions I encountered were based on *sscanf and friends*:

In some cases this code does not behave how we expect – e.g. when the converted value overflows and when the sequence contains leading whitespaces. Although I don’t recommend this approach, compared to the previous one, it only allocates a fixed amount of characters (e.g. 24) on the stack.

In many other cases, the **approach** is **strictly** **based on how string_view is employed**. This means that we have to make some **assumptions**. For example, suppose we write a parser for urls where we assume that each token is separated by ‘/’. Since atoi and strtol stops on the last character interpreted, **if the whole url is both well-formed and stored into a null-terminated string** (assumptions/preconditions) we can use such functions quite safely:

Basically, we assumed that the character past the end of any string_view is either a delimiter or the null-terminator. Pragmatically, many times we can make such assumptions, even if they distance our solution from genericity.

So, I encountered code like that:

In this example we use **strtol** to read an int and then we return the **rest** of the string_view. We basically try to “consume” an int from the beginning of the string_view.

Note that C and C++ conversion functions have more or less **relaxed policies on errors **(mainly for performance reasons). For instance, if the conversion cannot be performed, **strtol** returns 0 and if the representation overflows, it sets **errno** to *ERANGE.* Instead, in the latter case the return value of **atoi** is *undefined*. What I really mean is that if you decide to use such functions then you are going to accept the consequences of their limitations. So, just pay attention to such limitations and take actions if needed. For example, a more defensive version of the previous code is:

The fact that it makes sense to check against the null-terminator (*if (*entrPtr != 0)*) is the fundamental assumption we made here. Generally such assumption is easy to make. Scenarios like this, instead:

string whole = "12345"; parse_int ( {whole.data(), 3}, i );

are still not covered, because **the length of the string_view is not taken into account**. For this, we have at least three options: create and use an intermdiate *std::string* (or use a *std::stringstream* – however only *std::string* benefits the *SSO*), improve the *sscanf-based* solution that somehow uses such information, or write a conversion function manually. It’s quite clear that C++ lacks a set of simple functions to convert char ranges to numbers easily, efficiently and with a robust error handling.

Actually, I think the most elegant, robust and generic solution is based on boost::spirit:

However, if you don’t already depend on boost, it’s quite inconvenient to do just for converting strings into numbers.

We have a happy ending, though. Finally, C++17 fills this gap by introducing elementary string conversion functions:

This new function will just convert the given range of characters into an integer. It is locale-independent, non-allocating, and non-throwing. Only a small subset of parsing policies used by other libraries (such as sscanf) is provided. This is intended to allow the fastest possible implementation. Clearly, overloads for other numeric types are provided by the standard.

To be thorough, here is an example of the opposite operation, using **to_chars**:

Both to_chars and from_chars return a minimal output which contains an error flag and a pointer to the first character at which the parsing stopped (e.g. something like what is written into *endPtr* in the *strtol* example).

Are you already looking forward to putting your hands on them?!

Here is wrap-up of the main points we covered in this post:

**string_view**is a**smart const char***: an object that refers to a constant sequence of characters, keeps track of its length and provides any**const**function of*std::string*;- just like a reference or a pointer, you have to
**pay attention to storing string_view as a member variable**; **string_view’s****DNA**is both the**char sequence**and the**length**:- the pointed sequence of characters is
**not necessarily null-terminated**(e.g.*c_str()*does not exist); - whenever you need to copy the content of a
**string_view**into a string(-like container), you have to use both;

- the pointed sequence of characters is
- bear in mind that
**replacing const string& with string_view**implies the user can start passing not null-terminated strings into your functions (just ask yourself if that makes sense); - To convert a
**string_view**into a number:- pre-C++17: use boost::spirit if you can, agree to compromises and use C/C++ functions with their limitations, or roll some utilities yourself;
- since C++17: use from_chars.

**string_view**is already available in:- Microsoft Visual Studio 2017 RC
- clang HEAD 4.0 (or in 3.8, under the
*experimental*include folder) - gcc HEAD 7.0

At the end of October we organized the C++ Day, an event entirely dedicated to C++, made by the Italian C++ Community that I lead and coordinate. If you feel like reading some lines about the event, have a look here. It was a great day!

Some days after, I left for **Seattle**, to attend the **Microsoft MVP Summit** at the Microsoft Campus in Redmond. An awesome experience!

Casually, the ISO C++ Standard meeting was happening exactly the same week I was in Redmond. I couldn’t miss it! Then, at the end of the Summit, a few other MVPs (like Marius Bancila and Raffaele Rialdi) and I went to Issaquah to attend the meeting half a day. The game was afoot.

The experience was really amazing. First of all we added our names on the attendance sheet to get our names immortalized in the minutes That’s was suggested by Herb, who was very kind with all of us.

Then I am glad we met many members of the committee. They were really gentle and welcoming.

I think attending one of such meetings is a must for whoever cares about the C++ language and also wants to understand how things are discussed and evolved.

You probably know that the committee is divided into a few working and study groups – WG and SG. The working groups are **Evolution**, **Library Evolution**, **Library**, and **Core**. We were sitting in the **Evolution Working Group** (**EWG**), where we heard discussions about a few proposals for C++17 and C++20.

A proposal presentation starts with the author(s) of a proposal defending the idea, going through the paper and showing examples. For what I attended, that part was quick (10′) and other people eventually interrupted only for little clarifications.

Then the discussion starts. It is **coordinated** by a guy who goes around the room and **moderates** the discussion. Each member who wants to say something just raises her hand and **politely waits** to take the floor. Too many times I attend meetings where people just interrupt others. That was exactly the opposite!

Speaking with some guys of the committee, I discovered that some (crucial) discussions are instead **plenary** (they involve the whole committee and not only a certain working group) and they take place in another – bigger – room.

The discussions I was present in ended with a **poll**. Something like “how many people agree? How many disagree? How many *strongly* disagree? Etc.”. It also happened that a discussion was simply postponed because the co-author – Bjarne Stroustrup – was not there.

Each proposal is deeply inspected by **bringing out lots of detail,** counter examples and observations. That part was the most instructive for me.

On that point, I realized that one thing is particularly vital for the committee: **heterogeneity**. People have a different background/interests and they use C++ in different ways. The details that come out from discussions reflect this **heterogeneity. **Without that** **we would lose many details and observations.

For example, at some point **Peter Sommerlad** took the floor and asked something like “so, if we accept this proposal we should start **teaching** people to stop doing X”. Peter made that observation because he is a professor and his point of view is often influenced by his main job.

Other examples were concerns about legacy and old code, which a certain proposal could break under some circumstances that a few people were working on daily. Interesting also the observations made by **compiler implementers**, because they often already see how complicated would be to code a certain new C++ feature.

The experience was definitely worth it. Thanks to Herb and Andrew Pardoe for their hospitality.

Sometimes such meetings take place in Europe so if you cannot go to USA then just wait for one happening on this side of the World and attend! I’ll do it again.

The week next I went to Berlin for **Meeting C++ 2016**. I am happy I was part of the **staff**. There I had the opportunity to meet **Bjarne Stroustrup** and to dine with him and with other special people.

This amazing experience concludes my “C++ & friends weeks”.

My short-term plans constist mostly in **blogging** – I want to write a new “C++ in Competitive Programming” installment, as well as a couple of posts I have had in mind for months – and **planning** next **events** and **activities**. In 2017 my spare time won’t be more than in 2016 but I hope to be more active.

In June I want to make a new C++ event in Italy. If you feel like supporting/sponsoring/helping please get in touch with me.

]]>However, since each challenge has specific **requirements** and **constraints**, we can still imagine it as a small and simplified component of a bigger system. Indeed, a challenge specifies:

- the input format and the constraints of the problem;
- the expected output;
- the environment constraints (e.g. the solution should take less than 1s);
- [optional] amount of time available to solve the challenge (e.g. 1 hour).

Questions like: “can I use a quadratic solution?” or “may I use extra space?” will be answered only by taking into account all the problem information. Balancing competing trade-offs is a key in Competitive Programming too.

This post is a bit philosphical, it’s about what I consider the essence of Competitive Programming and why a professional could benefit from it. Since February of this year I’ve been monthly organizing “coding dojos” on these themes, here in Italy. After some months, I’m seeing good results in people regularly joining my dojos and practicing at home. It’s likely that in the future I’ll dedicate a post about how dojos are organized, but if you want more information now, please get in touch.

Follow me through this discussion and I’ll explain you some apparently hidden values of Competitive Programming and why I think they are precious for everyone.

Let me start by stating that **the only real compromise in Competitive Programming is one that makes an acceptable solution**. However, we distinguish two distinct “coding modes”: **competition** and **practice**. Competition is about being both fast and accurate at the same time. You just have to correctly solve all the challenges as fast as possible. On the other hand, practice mode is the opportunity to take time and think more in deep about compromises, different approaches and paradigms, scalability, etc. In practice mode we can also explore the usage of standard algorithms. A story about that concept: at one of my dojo I proposed this warm-up exercise:

*Given two arrays of numbers A and B, for i from 0 up to N-1, count how many times A[i] > B[i].*

For example:

A = [10, 2, 4, 6, 1] B = [3, 2, 3, 10, 0]

The answer is 3 (10>3, 4>3, 1>0).

This exercise was trivial and people just solved it by looping and counting.

During the retrospective I asked people: “can you solve this problem without looping explicitly?”. I lowered my aim: “Suppose I provide you a function which processes the arrays pair-wise”. One guy plucked up courage and replied ” like ZIP?”. I said “Yes!”. I dashed off the blackboard and showed the idea.

The C++ algorithm to use in this case was **std::inner_product**, that we already met in the very first post of this series. I’ll get back to inner_product again in the future, meanwhile here is the slick solution to this problem:

As we already saw, inner_product is kind of a **zip_and_reduce** function. Each pair is processed by the second function (greater) and results are sequentially accumulated by using the first one (plus). People realized the value of applying a known pattern to the problem. We discussed also about how easy it is to change the processing functions, getting different behaviors by maintaining the same core algorithm.

In this post I’ll show different scenarios by turning on “practice mode” and “competition mode” separately, discussing the different ways to approach problems and care about compromises in both.

Consider the problem we tackled in the previous post: the “most common word”. Since we used std::map, the solution was generic enough to work with types other than strings with a little effort. For example, given an array of integers, we can find the **mode **(as statisticians call it) of the sequence.

Imagine this solution still fits the requirements of the challenge.

In “competition mode”, we stay with this solution, just because it works. The pragmatic competitive coder was lucky because she already had a solution on hand, so she just exults and lands to the next challenge of the contest.

Now, let’s switch “practice mode” on. First of all, we notice two interesting constraints of the challenge: 1) the elements of the array belong to the range 0-1’000; 2) the size of the sequence is within 100’000. Often, constraints hide shortcuts and simplifications.

Ranting about life in general and not only about programming, I use to say “constraints hide opportunities”.

In Competitive Programming this means that whenever we see particular sentences like “assume no duplicates exist” or “no negative values will be given”, it’s very likely that these information should be used somehow.

Sometimes “very likely” becomes a command and we won’t have other options to solve the challenge: we **have to** craft a solution which exploits the constraints of the problem. Such challenges are interesting since they require us to find a way to cut off the space of all the possible solutions.

Anyway, back to the mode problem, since we know the domain is 0-1’000, we can just create a **histogram** (or frequency table) and literally count the occurrences of each number:

Although the solution is similar to the previous one (because std::map provides operator[]), this is dramatically faster because of *contiguity.*

However, we agreed to a compromise: we pay everytime a fixed amount of space to allocate the histrogram (~4 KB) and we support only a fixed domain of ints (the ones within 0-1’000). That’s ok for this challenge, but it could not be the case in other cases. Balancing speed, speed and genericity is something to think about in practice mode. In my coding dojos in Italy we use to discuss about such things at the end of each challenge, during a 15-min retrospective.

At this point we need to answer a related question: “are we always permitted to allocate that extra-space?”. Well, it depends. Speaking about stack allocations – or allocations known “at compile-time” – compilers have their own limits (that can be tuned), generally in order of a few MB.

The story is different for dynamic allocations. On websites like HackerRank we may have a look at the environment page which contains the limits for each language. For C++, we are usually allowed to allocate up to 512 MB. Other websites can limit that amount of space, typically per challenge. In these situations, we have to be creative and find smarter solutions.

Take InterviewBit as an example – a website focused on programming interviews. A challenge states: “given a read only array of n + 1 integers between 1 and n, find one number that repeats, in linear time using constant space and traversing the stream sequentially”. It’s clear that we cannot use an auxiliary data structure, so forget maps or arrays. A solution consists in applying a “fast-slow pointer strategy” – so does Floyd’s algorithm. I won’t show the solution because very soon we’ll have a guest post by Davide Di Gennaro who will show how to apply Floyd to solve another – more realistic – problem.

**Space and time have a special connection**. The rule of thumb is that we can set up faster solutions at the price of using more space, and viceversa. It’s another compromise. Sorting algorithms are an example: some people amaze when they discover that, under certain conditions, we can sort arrays in linear time. For example, you know that counting sort can perform very well when we have to sort very long sequences which contains values in a small range – like a long string of lowercase alphabetic letters.

Counting sort allocates an extra frequecy table (a histogram, as we have just seen), containing the frequencies of the values in the range. It exhibits linear time and space complexity that is O(N+K), where N is the length of the array to sort and K is the length of the domain. However, what if the elements go from 1 to N^2? Counting sort exhibits now quadratic complexity! If we need to care about this problem, we can try adoperating another algorithm, like Radix sort, which basically sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value.

Radix sort has also other limitations, but I won’t go into details because it’s not the point of this article. Again, I just want to highlight that other decisions have to be taken, depending on the requirements and the constraints of the problem. A general rule is that often we can exchange space for making faster solutions. Space and time go hand in hand.

Time is constrained as well: typically 1 or 2 seconds on HackerRank. To give you some feeling about what it means, let’s write down some time evidences measured on HackerRank’s environment.

std::accumulate – O(n) – on vector<int>:

100’000 elements: 65 microseconds

1’000’000 elements: 678 microseconds

10’000’000 elements: 6973 microseconds (6.973 milliseconds)

However, don’t be swayed by complexity. Time is also affected by other factors. For example, let’s try accumulating on std::set:

100’000 elements: 747 microseconds

1’000’000 elements: 16063 microseconds (16.063 milliseconds)

10’000’000 elements: timeout (>2 seconds)

You see that contiguity and locality make a huge difference.

Imagine now what happens if we passed from N to N^2. Roughly speaking, we can just square the time needed to accumulate a vector of 100’000 elements to get an approximation of what would happen: we obtain ~4.5 milliseconds. If the challenge expects at most 1 second, we are able to afford it.

Should we submit such a quadratic solution?

Again, we may in competition mode. Instead we should think about it in practice mode. What if the input grows? Moreover, special contests run extra tests once a day, generally stressing the solution by pushing inputs to the limit. In this case, we need to understand very carefully our constraints and limits.

Compromises also originate from the **format of input**. For example, if we are given a sorted sequence of elements, we are enabled to binary-search that sequence for free (that is, we don’t need to sort the sequence ourselves). We can code even more optimized searches if the sequence has also some special properties. For instance, if the sequence represents a time series containing samples of a 10-sec acquisition, at 100 Hz, we know precisely – and in constant time – where the sample of the 3rd second is. It’s like accessing an array by index.

The more we specialize (and optimize) a solution, the more we lose genericity. It’s just another compromise, another subject to discuss.

Compromises are also about **types**. In previous posts we have faced with problems using integer values and we have assumed that int was fine. It’s not always the case. For example, many times problems have 32bit integer inputs and wider outputs – which overflow 32bit. Sometimes the results do not even fit 64bit ints. Since we don’t have “big integers” in C++, we have to design such facility ourselves or…switch to another language. It’s not a joke: in “competition mode”, we just use another language if possible. It’s quicker.

Instead, in “practice mode” we can design/take from an external source a big integer class, for being reused in “competition mode”. This aspect is particularly important: when I asked to some top coders their thoughts on how to constantly improve and do better, their shared advice was to **solve problems and refine a support library**. Indeed, top levels competitive coders have their snippets on hand during contests.

This is a simplistic approximation to what we actually do as Software Engineers: we solve problems first of all by trying to directly use existing/tested/experienced solutions (e.g. libraries, system components). If it’s not the case, we have two options: trying to adapt an existing solution to embrace the new scenario, or writing a new solution (“writing” may mean also “introducing a new library”, “using some new system”, etc). Afterwards, we can take some time to understand if it’s possible to merge this solution to the others we already have, maybe with some work (we generally call it *refactoring*). You know, the process is more complicated, but the core is that.

Using or not an existing solution is a compromise. Mostly because the adapting and refactoring parts require time and resources.

I think everyone would love writing the most efficient solution that is also the simplest. Generally it’s not the case.

However it happens in many challenges – especially in the ones of difficulty Easy/Medium – that “average solutions” are accepted as well (these solutions work for the challenge, even if they are not the “best” – in terms of speed, space, etc).

The balance between the **difficulty** of a solution and its **effectiveness** is another compromise to take into account. About that, I tell you a story I lived a few days ago at my monthly coding dojo. We had this exercise:

*Given a string S, consisting of N lowercase English letters, reduce S as much as possible by deleting any pair of adjacent letters with the same value. N is at most 100.*

An example:

aabccbdea

becomes “dea”, since we can first delete “aa”getting:

bccbdea

Then we delete “cc” and get: “bbdea”, finally we remove “bb” and get “dea”.

The attendees proposed a few solutions, none of which exhibited linear time and constant space complexity. Such a solution exists, but it’s a bit complicated. I proposed that challenge because I wanted people to reason about the compromises and, indeed, I made one point very clear: the maximum size of the string is 100. It’s little.

We had two leading solutions: one quadratic and one linear in both time and space. The quadratic one could be written in a few lines by using adjacent_find:

The other one adoperated a deque (actually, I have seen such solution applied many times in other contexts):

Both passed. Afterwards, we discussed about the compromises of these solutions and we understood limits of both. In a more advanced dojo, I’ll propose to solve this challenge in linear time and constant space!

Competitive Programming gives us the opportunity to see with our own eyes the impact of self-contained pieces of code, first of all in terms of time spent and space allocated. As programmers, we are forced to do predictions and estimatations.

Estimating and predicting time/space is fundamental to decide quickly which solution is the most viable when different compromises cross our path. Back to the mode example, using std::map or std::vector makes a huge difference. However, when the domain is small we are allowed just not to care about that difference. Somehow, this happens also in ordinary software development. For instance, how many times do we use streams in production code? Sometimes? Never? Often? I think the most fair answer would be “it depends”. Streams are high-level abstractions, so they should be relatively easy to use. However, they are generally slower than C functions. On the other hand, C functions are low-level facilities, more error prone and less flexible. What to do entirely depends on our needs and possibilities, and very often also on the attitude of us and our company.

Although Competitive Programming offers a simplified and approximated reality, facing a problem “as-is” it’s an opportunity to understand compromises, estimate time and space, adapt known solutions/patterns to new scenarios, learn something completely new, think outside the box.

The latter point is related to another important fact that I see with my own eyes during my coding dojos in Italy: **young non-professional people often find clever solutions quicker than experienced programmers**. The reason is complex and I have discussed about that also with psychologists. To put it in simple terms, this happens because young people generally have a “weaker mindset”, have less experience and **are not afraid of failure**. The latter point is important as well: many professionals are terribly afraid of failures and of making a bad impression. Thus, an experienced solution that worked seems more reliable than something completely new. That’s basically a human behavior. The problem comes up when the experienced programmer cannot think outside the box and is not able to find new and creative ways to solve a problem.

Competitive Programming is a quick and self-contained way to practice pushing our “coding mind” to the limits. As a Software Engineer, programmer and professional, I find it precious.

Competitive Programming offers both a “practice mode”, during which I can “stop and think”, reasoning about compromises, time, space, etc. And a “competition mode”, where I can “stress-test” my skills and experience.

Learning new things is often mandatory. This needs time, will and patience. But you know, it’s a compromise…

Recap for the pragmatic C++ competitive coder:

**Compromises**in Competitive Programming have different shapes:- Dimension and format of the input;
- Time and space limits;
- Data types;
- Adaptability of a well-known solution;
- Simplicity of the solution.

- The
**essence**of Competitive Programming consists in**two phases**:**Competition**: the solution has just to work;- Top coders generally take on hand snippets and functions to use during contests.

**Practice**: deeply understanding compromises, variations and different implementations of the solution;- Top coders generally refine their snippets and functions.

- A challenge may have
**simplifications**and**shortcuts**:- The more we use those, the less generic our solution will be;
- Hopefully, the more we use those, the more optimized (and/or simple) our solution will be;
- Many challenges
**require**us to find shortcuts in the problem constraints and description.

Some posts ago, I anticipated that understanding containers is crucial for adoperating standard algorithms effectively. In a few words, the reason is that each container has some special features or it’s particularly indicated for some scenarios. On the other hand, **algorithms work only in terms of iterators**, which completely hide this fact. That’s great for writing generalized code, but it also merits attention because for exploiting a particular property of a container, you generally have to choose the best option yourself.

The only “specialization” that algorithms (may) do is in terms of iterators. Iterators are grouped in categories, which basically distinguish **how iterators can be moved**. For instance, consider std::advance that moves an iterator by N positions. On *random-access *iterators, *std::advance* just adds an offset (e.g. it += N), that is a **constant** operation. On the other hand, on *forward iterators* (basically they can advance only one step at a time)* std::advance* is obliged to call *operator++* N times, a **linear** operation.

Choosing – at compile time – different implementations depending on the nature of the iterators is a typical C++ idiom which works well in many situations (this technique is an application of **tag dispatching**, a classical C++ metaprogramming idiom). However, to exploit the (internal) characteristics of a container, we have to know how the container works, which (particular) member functions it provides and the differences between using the generic standard algorithm X and the specialized member function X.

As an example, I mentioned **std::find** and **std::map::find**. What’s the difference between them? Shortly, *std::find* is an idiomatic **linear search over a range of elements**. It basically goes on until either the target value or the end of the range is reached. Instead, *std::map::find*…Wait, I am not a spoiler! As usual, let me start this discussion through a challenge:

*Some days ago I gave my birthday party and I invited some friends. I want to know which name is the most common among my friends. Or, given a sequence of words, I want to know which one occurs the most.*

In this trivial exercise we need a way to count occurrences of words. For example:

matteo riccardo james guido matteo luca gerri paolo riccardo matteo

*matteo* occurs three times, *riccardo* twice, the others once. We print *matteo*.

Imagine to count the words by incrementing a counter for each of them. Incrementing a counter should be a fast operation. Finally, we’ll just print the string corresponding to the greatest counter.

The most common data structures to do this task is generally known as **associative array**: basically, a collection of unique elements – for some definition of “uniqueness”, which – at least – provides fast lookup time. The most common type of associative container is a **map** (or *dictionary*): a collection of key-value pairs, such that each possible key appears just once. The name “map” resembles the concept of **function** in mathematics: a relation between a domain (keys) and a codomain (values), where each element of the domain is related (*mapped*) to exactly one element of the codomain.

Designing maps is a classic problem of Computer Science, because inserting, removing and looking up these correspondences should be fast. Associative containers are designed to be especially efficient in **accessing** its elements **by key**, as opposed to sequence containers which are more efficient in **accessing** elements **by** **position. **The most straightforward and elementary associative container you can imagine is actually an **array**, where **keys coincide with indexes**. Suppose we want to count the most frequent character of a string of lowercase letters:

*freq* contains the frequencies of each lowercase letters (0 is a, 1 is b, and so on). *freq[c – ‘a’]* results in the distance between the char c and the first letter of the alphabet (‘a’), so is the corresponding index in the freq array (we already saw this idiom in the previous post). To get the most frequent char we just retrieve the iterator (a pointer, here) to the element with highest frequency (std::max_element returns such iterator), then we calculate the distance from the beginning of *freq* and finally we transform this index back to the corresponding letter.

Note that in this case lookup costs O(1). Although an array shows many limitations (e.g. cannot be enlarged, keys are just numbers lying in a certain range), we’ll see later in this series that (not only) in competitive programming these “frequency tables” are extremely precious.

A plain array does not help with our challenge: how to map instances of *std::string*?

In Computer Science many approaches to the “dictionary problem” exist, but the most common fall into a couple of implementations: **hashing** and **sorting**. With hashing, the idea is to – roughly speaking – map keys to integral values that index a table (array). The trio “insert, lookup, remove” has average constant time, and linear in the worst case. Clearly this depends on several factors, but explaining hash tables is beyond the target of this post.

The other common implementation **keeps elements in order** to exploit binary search for locating an element in logarithmic time. Often trees (commonly self-balancing binary search trees) are employed to maintain this ordering relation among the elements and also for having logarithmic performance on insert, lookup and removal operations.

The C++ STL provides both the hash-based (from C++11) and the sort-based implementations, providing also variants for non-unique keys (*multi*). From now I’ll refer to sort-based implementation as tree-based because this is the data structure used by the major C++ standard library implementers.

There is more: STL provides two kinds of associative containers: **maps** and **sets**. A **map** implements a dictionary – collection of key-value pairs. A **set** is a container with unique keys. We’ll discover that they provide pretty much the same operations and that under the hood they share a common implementation (clearly, either hash-based or tree-based). Also, a hash container and a tree container can be used **almost** interchangeably (from an interface point of view). For this reason, I’ll focus on the most used associative container: a tree-based map. We’ll discuss later about some general differences.

Please, give a warm welcome to the most famous C++ associative container: std::map. It’s a sorted associative container that contains key-value pairs with **unique** keys. Here is a list of useful facts about *std::map*:

- Search, removal, and insertion operations have
**logarithmic**time complexity; - elements are kept
**in order**, according to a customizable comparator – part of the type and specified as a template argument (*std::less*by default – actually the type is different since C++17, read on for more details); - iterators are
**bidirectional**(pay attention that increments/decrements by 1 are “amortized” constant, whereas by N are**linear**); - each map element is an instance of
**std::pair<const Key, Value>**.

The latter point means that we are **not permitted to change keys** (because it would imply reordering). Eventually, you can get the entry, remove it from the map, update the key, and then reinsert it.

Ordered associative containers use only a single comparison function, that establishes the concept of **equivalence: **Equivalence is based on the relative ordering of object values in a sorted range.

Two objects have equivalent values with respect to the sort order used by an associative container c if neither **precedes** the other in c’s sort order:

In the general case, the comparison function for an associative container isn’t **operator<** or even **std::less**, it’s a user-defined predicate (available through **std::key_comp** member function).

An important observation: in C++, every time you have to provide a “less” comparison function, the standard assumes you implement a strict weak ordering.

Let’s use *std::map* to solve the challenge:

How it works: as far as we read a string we increment the corresponding counter by 1. **map::operator[]** returns a reference to the value that is mapped to a key that is **equivalent** to a given key, performing an insertion if such key does not already exist. At the end of the loop, freq is basically a *histogram* of words: each word is associated with the number of times it occurs. Then we just need to iterate on the histogram to figure out which word occurs the most. We use **std::max_element:** a one-pass standard algorithm that returns the greatest element of a range, according to some comparison function (that is std::less be default, a standard function object which – unless specialized – invokes operator< on the objects to compare).

Given that map entries are pairs, we don’t use pair’s *operator< *because it compares lexicographically (it compares the first elements and **only if** they are equivalent, compares the second elements). For instance:

"marco", 1 "andrea", 5

according to* pair::operator<*, “marco” is greater than “andrea” then it will result the max_element. Instead, we have to consider only the second value of the pairs. Thus we use:

If your compiler does not support generic lambdas (*auto* parameters), explicitly declare *const pair<const string, int>&*. **const string** is not fussiness: if you only type *string*, you get an extra subtle copy that converts from *pair< const string, int>* to

Suppose now we have an extra requirement: *if two or more names occur the most, print the lexicographically least*. Got it?

matteo riccardo matteo luca riccardo

In this case, “matteo” and “riccardo” occur twice, but we print “matteo” because **lexicographically lower** than “riccardo”.

How to accommodate our solution to fit this extra requirement? There’s an interesting effect of using a sorted container: when we forward iterate over the elements, we go from lexicographically *lower* strings to lexicographically *greater* strings. This property combined with *max_element* automatically supports this extra requirement. In fact, using *max_element*, *if more than one element in the range is equivalent to the greatest element, it returns the iterator to the first such element*. Since the first such element is (lexicographically) the lowest, we already fullfill the new requirement!

Guess if we want to print the lexicographically **greatest** string…it’s just the matter of iterating *backward*! Having clear in mind these properties is a great thing. In this series we’ll meet many others.

Let’s continue our journey through *std::map*. Suppose that part of another challenge is to make our own contacts application. We are given some operations to perform. Two of them consist in adding and finding. For the add operation, we will have to add a new contact if it does not exist, or to update it otherwise. For the find operation, we will have to print the number of contacts who have a name starting with that **partial** name. For example, suppose our list contains:

marco, matteo, luca

**find(“ma”)** will result in 2 (**ma**rco and **ma**tteo).

The best data structure for this kind of task is probably a trie, but the pragmatic competitive programmer knows that in several cases *std::map* suffices. We can take advantage of the fact that a map keeps things in order. The challenge is also an excuse to show you** how to insert into a std::map**, since there are different ways to achieve this task.

We have two cases:

- insert/update an entry
- count names starting with some prefix

Our map looks like:

map<string, string> contacts; // let's suppose the contact number to be a string as well

The first one has been discussed a lot in many books and blogs, so much that C++17 contains an idiomatic function insert_or_assign. In a few words, to efficiently insert or assign into a map, we first look for the contact in the structure and in case of a match we update the corresponding entry; otherwise we insert it.

This is the simplest way to do that:

contacts[toInsertName] = toInsertNumber;

You may ask: why in C++17 do we bother with a specific function for this one-liner? Because we are C++. Because that one-liner is succinct, but it hides a subtle cost when the entry is not in the map: **default construction + assignment**.

As we have seen, *contacts[toInsertName]* performs a lookup and returns either the corresponding entry or a brand-new one. In the latter case a default construction happens. Then, *= toInsertNumber* does an assignment into the just created string. Is that expensive? Maybe not in this case, but it may be in general, and this kind of things matters in C++.

Here is more enlightening example: suppose we have a cache implemented with *std::map*:

You don’t want to update anything if key is already there. Rather, you first look for the value corresponding to key and only if it’s not there you invoke the lambda to calculate it. Can you solve it by using *operator[]*? Maybe (it depends on the value type), but it’s not effective nor even efficient. Often *std::map* novices come up with this code:

**map::find** locates the element with key equivalent to a certain key, or *map::end* if such element does not exist. **map::emplace** inserts a new element into the container constructed in-place with the given args if there is no element with the key in the container. emplace returns a *pair<iterator, bool>* consisting of an iterator to the inserted element, or the already-existing element if no insertion happened, and a bool denoting whether the insertion took place. True for “insertion”, false for “no insertion”.

This code implements what I call **the infamous** **double-lookup anti-pattern**. Basically, both find and emplace search the map. It would be better to – somehow – take advantage of the result of the first lookup to eventually insert a new entry into the map. Is that possible?

Sure.

This is the idea: if the key we look for is not in the map, the position it should be inserted in is exactly the position where the first element that does not compare less than the key is located. Let me explain. Consider this sorted list of numbers:

1 2 3 5 6 7 8

If we want to insert 4, where should it be inserted? At the position of the first number that does not compare less than 4. In other words, the first element greater or equal to 4. That is 5.

This is nothing more than what mathematics defines as **lower bound**. *std::map* provides lower_bound, and for consistency the STL defines std::lower_bound to perform a similar operation on **sorted ranges.** As a generic algorithm, *lower_bound* is a **binary search**.

Here is what the idiom looks like:

Since *lower_bound* returns the first element that does not compare less than key, it can be key itself or not. The former case is handled by the right hand side of the if condition: *data.key_comp()* returns the comparator used to order the elements (*operator<* by default). Since **two equal elements do not compare less**, **this check has to be false**. Otherwise, key is less than *lb->first* because lb points to one element **past** key (or to the end, if such element does not exist). Makes sense?

*emplace_hint* is like emplace, however it also takes an extra iterator to “suggest” where the new element should be constructed an placed into the tree. If correct, the hint makes the insertion O(1). *map::insert* has an overload taking this extra iterator too, resulting in a similar behavior (but remember that insert takes an already built **pair**).

A bit of simplification for pragmatic competitive programmers is when you do not use custom comparators: generally you may use operator== for checking equality:

if (lb != end(data) && key==lb->first)

Ha, C++17 has this insert_or_assign that should search the map and use Value’s operator= in case the entry has to be updated, or insert it otherwise (move semantics is handled as well). There is also another reason why *insert_or_assign* is important, but I’ll spend a few words about that leater, when *unordered_map* will be introduced.

Since I introduced *lower_bound*, I must say there is also an upper_bound: it locates the first element which compares **greater** than the searched one. For example:

1 2 3 4 5

**upper_bound(3)** locates 4 (position 3). What’s the point? Let’s turn our list into:

1 2 2 2 3 4

**lower_bound(2)** locates…2 (position 1), whereas **upper_bound(2)** results in position 4 (element 3). **Combining** lower_bound(2) with upper_bound(2) **we find the range of elements** **equivalent** to 2! Range is in C++ speaking (upper_bound(2) is one-past the last 2). This is extremely useful in *multimap* and *multiset* and indeed a function called **equal_range** which returns the combination of lower and upper bounds exists. *equal_range* is provided by all the associative containers (in unordered_* is only for interface interchangeability reason) and by the STL as an algorithm on sorted sequences – std::equal_range.

We’ll see applications of these algorithms in the future.

So, what about our challenge? Suppose it’s ok to use *operator[]* for inserting/updating elements – in this case string constructor + operator= are not a problem. We need to count how many elements start with a certain prefix. Is that easy? Sure, we have a sorted container, remember? Listen my idea: If we call lower_bound(P) we get either end, the first element equal to P or …suspense… the first element **starting** with P. Since *lower_bound* returns the position to the first element which does not compare less than P, the first element that looks like P$something is what we get if such element exists.

Now what? I’m sure you already did this step in your mind. We just iterate until we find either the end or the first element that *does not start* with P. From the previous post we already know how to verify if a string begins as another:

We are paying both a prefix comparison and a linear iteration from lb (write it as *O(|P|*K)*, where *|P|* is the length of the prefix P, and *K* is the number of strings starting with P). Advanced data structures that more efficiently deal with these – possible – problems exist, but they are beyond the scope of this post. In a moment I’ll do another observation about this code.

I realized that the post is becoming longer than I imagined, so let me recap what we have met so far:

- How to insert:
*operator[]*+*operator=;*- infamous double-lookup anti-pattern (e.g.
*find*+*insert*); - efficient “get or insert”:
*lower_bound*+*emplace_hint*/*insert*; - efficient “get or assign”/”insert or assign”:
*lower_bound + Value’s operator=;* - C++17
*insert_or_assign*.

- How to lookup:
*find*(aka: exact match);*lower_bound/upper_bound*(aka: tell me more information about what I’m looking for);*operator[]*(aka: give me always one instance – eventually new, if can be default-constructed);- bonus:
*at*(aka: throw an exception if the element is not there); - bonus:
*count*(aka: how many elements equivalent to K exist? 0 or 1 on non-multi containers).

- Taking advantage of sorting. For instance:
- we combined max_element’s “stability” – hold the first max found – with map’s order to get the max element that is also the lexicographically least (or the greatest, by iterating backward);
- we learnt how to locate and iterate on elements which start with a given prefix.

**To erase an element**, you just call **map::erase(first, last)**, or **erase(iterator)**, or **erase(key)**. More interesting is how to implement *erase_if*, an idiom simplified by C++11 because now erase returns the iterator to the last removed element. This idiom can be applied to every associative container:

So, we still have an open question, right? **What’s the difference between std::find and map::find?**

You know, *std::find* is a linear search on a range. Now, I hope you understood that *map::find* is a logarithmic search and it uses a notion of **equivalence**, instead of **equality** to search elements.

Actually, there is a bit more.

Let’s raise the bar: **what’s the difference between std::lower_bound and map::lower_bound?** First of all: is it possible to use

Basically, *std::lower_bound* – just like all the other algorithms – works only in terms of iterators; on the other hand *map::lower_bound* – just like all the other map’s algorithms – exploits map’s internal details, performing better. For example, *std::lower_bound* uses *std::advance* to move iterators and you know that advancing a *map::iterator* results in **linear** time. Instead, *map::lower_bound* does a tree traversal (an internal detail), avoiding such overhead.

Although exceptions exist, the rule of thumb is: **if a container provides its own version of a standard algorithm, don’t use the standard one**.

I tell you a story about this rule. Remember that the comparator is part of the static interface of an associative container (it’s part of the type), unlike what happens in other languages like C# where the comparator is decoupled from the type and can be dynamically injected:

Dictionary<int, string> dict = new Dictionary<int, strint>(StringComparer.OrdinalIgnoreCase);

Some time ago I had a discussion with a collegue about this aspect of C++: he was using a *map<string, SomeValueType>* to store some information, but he was using it only for case-insensitive searches by calling *std::find* (the linear one). That code made my hair stand on end: “why not using a case-insensitive comparator and making this choice part of the map type?” – I asked. He complained: “C++ is breaking incapsulation: I won’t commit my map to a specific comparator. My clients mustn’t know how elements are sorted”.

At first blush I got annoyed, but after a couple of emails I understood his point (it was about the architecture of the system he designed, rather than about C++ itself). At the end of a quite long – and certainly interesting – discussion, I come up with a solution to – more or less – save both ways: I introduced a new type which inherited from std::map and allowed to inject a comparator at construction time, like in C# (using *less<>* by default). I don’t recommend this approach (for example, because the comparator can’t be inlined and every comparison costs a virtual call – it uses *std::function* under the hood), but at least we are not linearly searching the map anymore…

This story is just to say: **use containers effectively**. Don’t go against the language. *std::map* is not for linear search, as *std::vector* is not for push elements at front.

I’d like mentioning a fact about the cache. *std::map* is generally considered a cache-unfriendly container. Ok we can use allocators, but try to understand me: by default we are just pushing tree nodes into the heap, moving through indirections, etc. On the other hand, we are all generally happy with contiguous storage, like vectors or arrays, aren’t we? Is that possible to easily design cache-friendly associative containers? It is, when the most common operation is lookup. After all, what about **using binary search on a sorted vector**? That’s the basic idea. Libraries like boost (flat_map) provide this kind of container.

As my friend *Davide Di Gennaro* pointed out, given the triplet of operations *(insert, lookup, erase)*, the best complexity you get for a general-purpose usage is *O(logN, logN, logN)*. However, you can amortize one operation, sacrificing the others. For example, if you do many lookups but a few insertions, *flat_map* performs *O(N, logN, N)*, but the middle factor is much lower (e.g. it’s cache-friendly).

Consider this example: we want to improve our algorithm to count our contact names which start with a certain prefix. This time, we use a **sorted vector** and **std::lower_bound** to find the first string starting with the prefix P. In the previous code we just iterated through the elements until a mismatch was found (a string not starting with P).

This time, we try thinking differently: say we have found the position of the first string starting with P. Call that position “lb” (lower bound). Now, it should be clear that we must find the next string not starting with P. Define this string to be the first **greater** than lb, provided that “greater” means “not starting with P”. At this point, do you remember which algorithm finds the first element greater than another, in a sorted sequence?

*std::upper_bound.*

So we can employ *upper_bound*, using a particular predicate, and we expect a logarithmic complexity. What will this predicate look like? Suppose we count “ma” prefixes. Strings starting with “ma” are all **equivalent**, aren’t they? So, we can use a predicate which compares only “ma” (P) with the current string. When the current string starts with P then it’s equivalent to P and the predicate will return false. Otherwise, it will return true. After all, starting the search from lower_bound’s result, we can get either *ma$something* or *$different-string:*

Some clarifications: the first parameter of the lambda is **always** the value (or a conversion) of the upper bound to search for in the range (P, in our case). This is a guarantee to remember. The lambda returns false only when the current string is not starting with P (s1, inside the lambda body). *std::upper_bound* will handle the rest.

Why didn’t we use this approach directly on *std::map*? As I said at the beginning of this section, standard algorithms works in terms of iterators. Using *std::upper_bound* on *std::map* would result in *logN * N*. That additional N factor is caused by advancing iterators, that is linear on map::iterators. On the other hand, sorted vector provides random access iterators and so the final cost of counting prefixes is O (|P|*logN), given that we have sacrificed insert and removal operations (at most, linear).

C++14 and C++17 add new powerful tools to associative containers:

**Heterogeneous lookup**: you are no longer required to pass the exact same object type as the key or element in member functions such as*find*and*lower_bound*. Instead, you can pass any type for which an overloaded*operator<*is defined that enables comparison to the key type. Heterogenous lookup is enabled on an opt-in basis when you specify the**std::less<>**or**std::greater<>**“diamond functor” comparator when declaring the container variable, like:*map<SomeKey, SomeValue, less<>>*. See here for an example. This works only for sorted associative containers.

This feature is also kwnown by some people as “transparent comparators”, because comparators that “support” this feature have to define a type**is_transparent = std::true_type**. This is basically required for backwards-compatibility with existing code (see for example here for a more detailed explanation). A terrific usage of this feature is, for example, searching on a map<string, Val> by passing a*const char**(no conversion to string will be performed).

**try_emplace**and**insert_or_assign**, as an improvement of the insertion interface for unique-keys maps (more details here).

**Ordered By Default**: Mostly for both design and ABI compatibility reasons, ordered associative containers now specify as a default compare functor the new**std::default_orderer_t**, (that behaves like**std::less**– more details here).

**Splicing maps and sets**: (following words by Herb Sutter) you will now be able to directly move internal nodes from one node-based container directly into another container of the same type (differing at most in the comparator template parameter), either one node at a time or the whole container. Why is that important? Because it guarantees no memory allocation overhead, no copying of keys or values, and even no exceptions if the container’s comparison function doesn’t throw. The mechanics are provided by new functions*.extract*and*.move*, and corresponding new*.insert*overloads (approved proposal here).

**Structure bindings**: we should be able to iterate on maps this way:

We end this long post by spending some words about **unordered** associative containers. I don’t show you multi* containers because they are more or less the same as the corresponding non-multi ones. Clearly, they support multiple instances of the same key and, as I said, *equal_range* plays an important role for lookups. I’ll probably spend more time on multi containers when needed in future challenges.

After this section we’ll see a final example using *unordered_set*.

As *std::map* does, **std::unordered_map** contains key-value pairs with unique keys. Unlike *std::map*, internally the elements **are not sorted** in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the **hash** of its key. This allows fast access to individual elements, since once the hash is computed, it refers to the exact bucket the element is placed into. For this reason, search, insertion, and removal of elements have **average** **constant-time complexity**. Due to the nature of hash, it’s hard/impossible to know in advance how many collisions you will get with your hash function. This can add an element of unpredictability to the performance of a hash table. For this reason we use terms like “average”, “amortized” or “statistically” constant-time when referring to operations of a hash container.

This is not a discussion on hash tables, so let me introduce some C++ things:

- STL provides a default
**std::hash**template to calculate hash of standard types; *std::hash*can be eventually specialized for your types (or you can specify your own functor);- when a collision happens, an “equality” functor is used to determine if two elements in the same bucket are different (
**std::equal_to**by default); - it’s possible to iterate through the elements of a bucket;
- some hash-specific functions are provided (like
*load_factor*,*rehash*and*reserve*); *unordered_map*provides**almost all**the the functions of std::map.

The latter point simplify our life to interchange *std::map* with *std::unordered_map*. There are two important things to say about this fact: 1) *lower_bound* and *upper_bound* are not provided, however *equal_range* is; 2) **passing hints to unordered_ containers insertion functions is not really useful** – actually it is to make the insertion “exit quickly”.

We know that on ordered associative containers, conditional insertion with *lower_bound* is the best way of performing an “insert or update”. So what? How can we say that ordered/unordered containers are more or less interchangeable if we miss *lower_bound*/*upper_bound*? We may apply *equal_range*:

This idiom is equivalent to the one using lower_bound (both semantically and from a performance point of view), plus it works on unordered maps.

Note that in C++17, **try_emplace** and **insert_or_assign** will dramatically improve the usability of unordered associative containers that will efficiently handle the case when we need to first perform a lookup and eventually insert a new element if that element is not present (first of all, the hash value won’t be recalculated). That’s the real value of such additions: not only using *insert_or_assign* is more efficient, but also clearer and truly interchangeable.

There are some general rules/facts to take into account when in doubt between tree-based or hash-based containers. They are general, this means – as always – that when **really** in doubt you can start with one, profile your code, change to the other (again, thanks to interface compatibility), profile again, and compare.

By the way, here are some facts for the pragmatic competitive coder:

- on average,
**hash-based**lookup is**faster**; - generally,
**hash-based**containers occupy**more memory**(e.g. big array) than tree-based ones (e.g. nodes and pointers); **tree-based**containers keep elements in**order**, is that feature important? (e.g. prefix lookups, get top elements, etc);

Since I introduced **std::unordered_map**…let’s do a challenge involving **unordered_set** Jokes apart, this long post hosted mostly maps, I’d like concluding with set, a really helpful and minimal associative container that we will meet again in the future.

That’s the challenge:

*Given N unique integers, count the number of pairs of integers whose difference is K.*

For example, for this sequence:

1 5 3 4 2

With K = 2, the answer is 3 (we have three pairs whose difference is 2: 4-2, 3-1, 5-3).

The trivial solution to this problem has **quadratic** time complexity: we enumerate all the pairs and we increment a counter when the difference is K.

The way to tackle this problem is to convert the problem space from one in which we consider pairs to a search for a single value. The i-th element *A *contributes to the final count only if *A + K* is in the sequence. For instance:

1 5 3 4 2

With K = 2. 1 contributes because 1 + 2 = 3 is in the list. Likewise, 3 is fine because 3 + 2 = 5 is in the list. And the same for 2, because we spot 2 + 2 = 4.

We can then store the input into an *unordered_set* (on average, constant time lookup), iterate on the elements and for each value *A* search for *A + K*:

Some sugar: std::count_if makes it clear that we are counting how many elements satisfy a predicate. Our predicate is true when **currElem + K exists in the set**: we use **unordered_set::count(A)** to get the number of elements equal to A (either 0 or 1 since we use a non-multi set). As an idiom, on non-multi associative containers, *container::count(Key)* gives 1 (say, true) if Key exists, 0 (say, false) otherwise.

On average, this solution has linear time complexity.

Another approach that uses no extra space and that involves sorting exists. Can you find it?

That’s enough for now.

Recap for the pragmatic C++ competitive coder:

- Don’t reinvent containers whenever standard ones fit your needs. Consider STL associative containers:
**std::map**,**std::set**,**std::multimap**and**std::multiset**are sorted, generally implemented as self-balancing binary-search trees;**std::unordered_map**,**std::unordered_set**,**std::unordered_multimap**and**std::unordered_multiset**are not sorted, imlemented as hash tables;

- Understand idioms to adoperate STL associative containers effectively:
- Does an element with key equivalent to K exist? Use
**count(K)**; - Where is the element with key equivalent to K? Use
**find(K)**; - Where should the element with key equivalent to K be inserted? Use
**lower_bound(K)**on sorted containers; - Insert a new element: use
**emplace**,**insert**; - Insert a new element, knowing where: use
**emplace_hint**,**insert**(only sorted containers take advices effectively, unordered ones are presumptuous!); - Insert or update an element:
**operator[] + operator=**, (C++17)**insert_or_assign**,**equal_range + hint insertion**(this is also for interface compatibility between ordered/unordered),**lower_bound + hint insertion**(only on sorted containers); - Get the element corresponding to key equivalent to K, or default if not present: use
**operator[K]**; - Get the element corresponding to key equivalent to K, or exception if not present: use
**at(K)**; - Erase elements: use
**erase(K)**,**erase(first, last)**,**erase(it).**

- Does an element with key equivalent to K exist? Use
- Understand the difference between containers member functions and STL generic algorithms. For example:
**std::find**anddo different searches, using different criteria;*$associative_container::*find**std::lower_bound**anddo the same, but the former performs worse than the member function because the latter works in terms of the container internal details and its structure.*$sorted_container**::*lower_bound

- Exploit standard algorithms properties. For example:
**std::max_element**and**std::min_element**are “stable”: max/min returned is always the**first**one found.

*Prefer*standard algorithms to hand-made for loops:**std::max_element/min_element**, to get the**first**biggest/smallest element in a range;**std::count/count_if**, to count how many elements satisfy specific criteria;**std::find/find_if**, to find the first element which satisfies specific criteria;**std::lower_bound**,**std::upper_bound**,**std::equal_range**, to find the “bounds” of an element within a sorted range.

In this installment of “C++ in Competitive Programming” we’ll meet a fundamental data structure in Computer Science, one that manages a sequence of characters, using some encoding: a **string**. As usual, let me introduce this topic through a simple challenge:

*A palindrome is a word, phrase, number, or other sequence of characters which reads the same backward and forward. Given a string print “YES” if it is palindrome, “NO” otherwise. The string contains only alphanumeric characters.*

For example:

abacaba

is palindrome; whereas:

abc

is not.

We need a type representing a sequence of characters and in C++ we have std::string**,** the main string datatype since 1998 (corresponding to the first version of the ISO C++ standard – known as **C++98**). Under the hood, imagine std::string as a pointer to a null-terminated (‘\0’-terminated) char array. Here is a list of useful facts about std::string:

- std::string
**generalizes**how sequences of characters are manipulated and stored; - roughly speaking, it
**manages**its content in a similar way std::vector does (e.g. growing automatically when required); - apart from reserve, resize, push_back, etc. std::string provides
**typical string operations**like substr, compare, etc; - it’s
**independant on the type of encoding**used: all its members, as well as its iterators, will still operate in terms of bytes; - implementers of std::string generally embeds a
**small**array in the string object itself to manage**short**strings.

The latter point is referred as the **Small String Optimization** and it means that short strings (generally **15/22** chars) are allocated directly in the string itself and don’t require a separate allocation (thanks to this guy on reddit who pointed out I was wrong here). Have a look at this table which shows the maximum size that each library implementation stores inside the std::basic_string.

The problem description states that the input is in the form:

string-till-end-of-stream

Thus we are allowed to simply read the string this way:

Usually in CC a string is preceded by its length:

N string-of-len-N

In this case we don’t need to read N at all:

That will skip every character until the first whitespace and then will read the string that follows the whitespace into S.

Now let’s face with the problem of determining if S is palindrome or not.

It should be easy to understand that S is palindrome if **reverse(S)** is equal to S. So let’s start coding such a naive solution:

As you can see, **we access characters as we do with an array**. As std::vector, std::string makes it possible to use also iterators and member functions to access individual characters. In the last line we applied **operator==** to verify if “char-by-char S is strictly equal to tmp”. We could also use **string::compare()** member function:

*compare()* returns a signed integral indicating the **relation between the strings**: 0 means they compare equal; a positive value means the string is lexicographically greater than the passed string; a negative value means the string is lexicographically lesser than the passed string. This function supports also comparison with offsets: suppose we want to** check if a string is prefix of another **(that is, a string **starts with** a certain prefix). This is the most effective way to do that:

Bear this trick in mind.

*compare()* supports also overloads with C-strings, preventing implicit conversions to std::string.

Now let’s turn back to reversing the string. Actually we don’t need to write a for loop manually because **reversing a range is a common function already provided by the STL**. Including <algorithm> we get the algorithms library that defines functions for a variety of purposes (e.g. searching, sorting, counting, manipulating) that operate on ranges of elements. To reverse in-place a sequence we adoperate **std::reverse**:

Since we don’t want to modify the original string, we can use **std::reverse_copy** to copy the elements from the source range to the destination range, in reverse order:

Here – as for std::vector – we have two options: creating an empty string, reserving enough space and then pushing letters back, or creating a properly-sized and zero-initialized string and then assigning every single char. Since char is a cheap data type, the latter option is generally faster (basically because push_back does some branching to check if the character to add fits the already initialized sequence). For this reason I prefer filling a string this way. As I pointed out in the previous post, a reader from reddit suggested to use this approach also for std::vector<int> and honestly I agree. Larger types may have a different story. Anyway, as usual I suggest you to profile your code when in doubt. **For Competitive Programming challenges this kind of finess makes no difference**.

This solution is a bit more efficient than the previous one because it’s only two passes (one pass for *reverse_copy* and another one for *operator==*). We have also got rid of the explicit loop. **What it really makes this solution bad is the usage of that extra string**. If we turn back to the initial for loop, it should be clear that we just need to check if each pair of characters that would be swapped are the same:

S = abacaba S[0] == S[6] S[1] == S[5] S[2] == S[4] S[3] == S[3]

That is, with i from 0 to N/2 we check that:

S[i] == S[N-i-1]

Applying this idea:

Ok, this solution seems better. Can we do even better? Sure. Let’s meet another standard function.

Some algorithms operate on two ranges at a time: **std::equal** belongs to this family. A function that **checks if two ranges are strictly the same**:

This function returns `true`

if the elements in both ranges match. It works by making a pair-wise comparison, from left to right. By default the comparison operator is operator==. For example:

string c1 = "ABCA", c2="ABDA"; 'A' == 'A' // yes, go on 'B' == 'B' // yes, go on 'C' == 'D' // no, stop

The comparison can be customized by passing a custom predicate as last parameter.

Now, consider the problem of checking if a string is palindrome. Our loop compares the first and the last character, the second and the second last character, and so on. If it finds a mismatch, it stops. It’s basically std::equal applied to S in one direction and to reverse(S) in the other. We just need to adapt the second range to go from the end of S to the beginning. That’s a job for reverse iterators:

Reverse iterators goes backward from the end of a range. Incrementing a reverse iterator means “going backward”.

There is only one thing left: this solutions now makes N steps, wheareas only N/2 are really needed. We perform redundant checks. For instance:

abacaba [0, 6]: a [1, 5]: b [2, 4]: a [3, 3]: c // middle point [4, 2]: a (already checked [2, 4]) [5, 1]: b (already checked [1, 5]) [6, 0]: a (already checked [0, 6])

Taking this consideration into account we get:

*std::next* returns a copy of the input iterator advanced by N positions (this version does not require random access iterators).

We finally have a one-liner, single-pass and constant space solution.

**I apologize if it took a while to get here**: not only I introduced some string functions, but I also incrementally approached to the problem showing how simple operations can be written in terms of standard algorithms. This process is precious since it helps to get familiar with the standard. Sometimes it does not make sense to struggle to find an algorithm that fits a piece of code, other times it does. The more you use the standard, the more it will be easy for you to identify these scenarios pretty much automatically.

Don’t worry, in future posts I’ll skip trivial steps.

Now let me raise the bar, just a little bit.

In Competitive Programming many variations of this topic exist. This is an example:

*Given a string, determine the index of the character whose removal will make the string a palindrome. Suppose a solution always exists.*

For example:

acacba

if we remove the second last character (‘b’), we get a palindrome

This problem – known as *palindrome index* – can be solved by introducing another useful algorithm that actually works like *std::equal* but it also returns the first mismatch, if any, instead of a bool. Guess the name of this algorithm…yeah, it’s called **std::mismatch**.

This problem is quite easy to solve:

- locate the first char that breaks the “palindromeness” – call that position
**m**(mismatch) - check if the string was palindrome if the
**m-th**char - if so, the palindrome index is
**m**, otherwise it is**N – m – 1**(basically, the char “on the other side”)

Since this solution can be implemented in many ways I take advantage of this opportunity to introduce other string operations. First of all, here is how **std::mismatch** works:

You see that ‘C’ and ‘X’ results in a mismatch. *mism* is a *std::pair*: *mism.first* is *S1.begin() + 2* and *mism.second* is *S2.begin() + 2*. Basically, they point to ‘C’ in the first string and to ‘X’ in the second. Suppose now we need to find this “palindrome index”. Consider as an example:

*mism.first* points to ‘c’ and *mism.second* points to ‘b’. Since we know a solution always exists, either of these chars makes S not palindrome. To determine which one, we need to check if S without the mismatch point **mism** was palindrome or not. For this check, we create a new string from the concatenation of two substrings of S:

- From the beginning to mism-1, and
- From mism+1 to the end

Although I don’t like this solution so much, I have browsed others (not only written in C++) on HackerRank and this resulted the most popular. So let me show you my translation into C++ code:

Let me introduce you **substr()**: *S.substr(pos, count)* returns the substring of S that starts at character position **pos** and spans **count** chars (or until the end of the string, whichever comes first) – S[pos, pos + count). If pos + count extends past the end of the string, or if count is **std::string::npos**, the returned substring is [pos, size()). For example:

substr results in “ell”.

It’s now evident that *toCheck* consists in the concatenation of S from 0 to diffIdx-1 and from diffIdx + 1 to the end:

acacba -> diffIdx = 1 toCheck = a + acba = aacba

Only for completeness, another (possibly more efficient) way to obtain toCheck consists in adoperating **std::copy**:

This solutions works and passes all tests, but I find it annoying to use extra space here.

Suppose we are free to modify our original string: it’s easier (and possibly faster) to remove the candidate iterator by using **string::erase**:

This avoids both creating extra (sub)strings and allocating extra memory (note that the internal sequence of characters can be relocated into the same block). The final part of the algorithm is similar:

The final cost of this solution is linear.

Now, **what if we cannot change the string?**

A solution consists in checking two substrings separately. Basically we just need to exclude the mismatch point and then check if the resulting string is palindrome, then we check:

- From the beginning of S to the mismatch point (excluded) with the corresponding chars on the other side;
- From one-past the mismatch point to the half of S with the corresponding chars on the other side.

Actually, the first check is already performed when we call mismatch, so we don’t need to repeat it.

To code the second check, just remember the second string goes from **diffIt + 1** to the half of the string. So, we just need to correctly advance the iterators:

Let’s see this snippet in detail: **next(diffIt)** is just **diffIt + 1**. **begin(S) + S.size()/2** is just the **half of S**. The third iterator, **rbegin(S) + diffIdx**, is the starting point of the string on the other side. Here is the complete solution:

If you followed my reasoning about positions then it’s just the matter of applying standard algorithms with some care for iterators.

You may complain this code seems tricky, so let me rewrite it in terms of explicit loops:

In the STL-based solution we clearly need to think in terms of iterators. The mismatch part is trivial (actually it could be replace with a call to *std::mismatch*, as in the STL-based solution), but the calls to *std::equal* are a little bit more difficult to understand. At the same time, it should be **evident** that std::equal checks that two ranges are identical. Nothing more. Also, if we replace std::string with another data structure that provides iterators, our code won’t change. Our algorithm is decoupled from the structure of the data.

On the other hand, in the for-based approach the logic is completely hidden inside the iterations and the final check. Moreover, this code depends on the random-access nature of the string.

Judge yourself.

This short section is dedicated to conversions between strings and numeric types. I will start by saying that, in terms of speed, the following functions can be beated given certain assumptions or in particular scenarios. For example, you maybe remember Alexandrescu’s talk about optimization (and here is a descriptive post) where he shows some improvements on string/int conversions. In CC the functions I’ll introduce are generally ok. It can happen that in uncommon challenges it’s required you to take some shortcuts to optimize a conversion, mainly because the domain has some particularities. I’ll talk about domain and constraints in the future.

The STL provides several functions for performing conversions between strings and numeric types. Conversions from numbers to string can be easily obtained since C++11 through a set of new overloads:

A disadvantage of this approach is that we pay a new instance of std::string every time we invoke **to_string**. Sometimes – especially when many conversions are needed – this approach is cheaper:

Or use *vector<char>* for allocating the string dynamically.

char_needed is the maximum number of chars needed to represent an int32. This value is basically:

From C++17 we’ll have string_span to easily wrap this array into a string-like object:

Moreover, from C++17 we’ll have string::data() as non-const member, so we’ll be able to write directly into a std::string:

In CC sprintf is good enough, even if **sprintf_s** (or another secure version) is preferred.

Anyhow, prefer using **std::to_string** if the challenge allows that.

Conversions in the other direction are a bit more confusing because we have both C++11 functions and C functions. Let me start just by saying that rolling a simple algorithm to convert a string into an unsigned integer is easy, pretty much elegant and interesting to know about:

To convert to an int32 we just need to handle the minus sign:

**nxt – ‘0’** is an idiom: if *digit *is a char in [0-9], **digit – ‘0’** results in the corresponding integral value. E.g.:

'1' - '0' = 1 (int)

The inverse operation is simply **char(cDigit + ‘0’)**. E.g.:

char(1 + '0') = '1' (char)

In C++ (as in C) adding an int to a char will result in an int value: for this reason a cast back to char is needed.

With these snippets we are just moving through the **ASCII table**. ‘1’ – ‘0’ represents how far ‘1’ is from ‘0’, that is 1 position. 1 + ‘0’ is one position after ‘0’, that is ‘1’. With this idea in mind we can easily perform trivial lowercase to uppercase conversions:

// only works if c is lowercase char(c - 'a' + 'A')

And viceversa:

// only works if c is uppercase char(c - 'A' + 'a')

Anyhow, as one guy commented on reddit, the ASCII table is designed in such a way just flipping one bit is needed to get the same results:

// assuming c is a letter char toLower(char c) { return c | 0x20; } char toUpper(char c) { return c & ~0x20; }

But remember that the C++ standard (from C, in *<cctype>*) **already provides** functions to **convert** characters to upper/lower case, to **check** if one character is upper/lower case, digit, alpha, etc. See here for more details.

In CC, these tricks should be kept on hand. For example, this challenge requires to implement a simple version of the Caesar cipher:

*Given a string S of digits [0-9], change S by shifting each digit by K positions to the right.*

For example 1289 with K = 3 results in 4512.

We can easily solve this task by applying the tricks we have just learned:

Note I used a range-based for loop even if a standard algorithm could help solve this problem. I don’t introduce it yet though.

Now, let’s see some standard functions to convert strings to numbers. Since C++11 we have **‘sto’** functions (and for unsigned values and floating points) which convert a std::string/std::wstring into numeric values (they support also different basis). Being STL functions, they throw exceptions if something goes wrong: an *std::invalid_argument* is thrown if no conversion could be performed, *std::out_of_range* is thrown if the converted value would fall out of the range of the result type or if the underlying function (std::strtol or std::strtoll) sets *errno* to ERANGE. For example:

This family of functions optionally outputs the number of processed characters:

On the other hand, C functions don’t throw exceptions, instead they return zeros in case of errors. For instance:

That’s enough for now.

Recap for the pragmatic C++ competitive coder:

- Don’t reinvent containers whenever standard ones fit your needs:
- Consider
**std::string**to handle a sequence of characters**std::string::compare**indicates the relation between strings**std::string::substr**creates substrings

- Consider
*Prefer*standard algorithms to hand-made for loops:**std::copy**, to copy the elements in a range to another**std::reverse**, to reverse the order of the elements in a range**std::reverse_copy**, to copy the elements in a range to another, but in reverse order**std::equal**, to know if two ranges are the same**std::mismatch**, to locate the first mismatch point of two ranges, if any

- Understand options for converting strings to numbers, and viceversa:
**std::to_string**, to convert numeric values into strings (a new instance of std::string is returned)**std::array**(**std::string in C++17**) +**C’s sprintf**(or equivalent – e.g. secure**_s**version) when reusing the same space is important**std::sto***functions to translate strings into numeric values (remember they throw exceptions)**C’s atoi & friends**when throwing exceptions is not permitted/feasible- Rememeber tricks to convert char digits into ints and viceversa:
**digitAsChar – ‘0’**= corresponding int**char ( digitAsInt + ‘0’ )**= corresponding char

I’m assuming you are already familiar with concepts like iterator, container and algorithm. Most of the time I’ll give hints for using these C++ tools effectively in Competitive Programming.

That’s the problem specification: *You are given an array of integers of size N. Can you find the sum of the elements in the array? It’s guaranteed the sum won’t overflow the int32 representation.*

First of all, we need an “array of size N”, where N is given at runtime. The C++ **STL** (*Standard Template Library*) provides many useful and cleverly designed data structures (containers) we don’t need to reinvent. Sometimes more complicated challenges require us to write them from scratch. Advanced exercises reveal less common data structures that cannot be light-heartedly included into the STL. We’ll deal with some examples in the future.

It’s not our case here. The primary lesson of this post is: **don’t reinvent the wheel**. Many times standard containers fit your needs, especially the simplest one: std::vector, basically a dynamic sequence of **contiguous** elements:

For the purpose of this basic post, here is a list of important things to remember about **std::vector**:

- it’s
**guaranteed**to store elements**contiguously**, so our cache will love it; - elements can be accessed through
**iterators**, using**offsets**on regular**pointers**to elements, using the**subscript****operator**(e.g. v[index]) and with convenient**member functions**(e.g. at, front, back); - it
**manages its size automatically**: it can enlarge as needed. The real**capacity**of the vector is usually different from its length (**size**, in STL speaking); - enlarging that capacity can be done explicitly by using
**reserve**member function, that is the standard way to gently order to the vector: “get ready for accommodating N elements”; - adding a new element at the end of the vector (
**push_back**/**emplace_back**) may not cause relocation as far as the internal storage can accommodate this extra element (that is: vector.size() + 1 <= vector.capacity()); - on the other hand, adding (not overwriting) an entry to
**any other position**requires to**relocate**the vector (eventually in the same block of memory, if the capacity allows that), since the**contiguity**has to be guaranteed; - the previous point means that inserting an element at the end is generally faster than inserting it at any other position (for this reason std::vector provides
**push_back**,**emplace_back**and**pop_back**member functions); - knowing in advance the number of elements to store is an information that can be exploited by applying the synergic duo
**reserve + push_back**(or**emplace_back**).

The latter point leads to an important pattern: **inserting at the end is O(1) as far as the vector capacity can accommodate the extra element – vector.size() + 1 <= vector.capacity().** You may ask: why not enlarging the vector first and then just assign values? We can do that by calling **resize**:

**resize** enlarges the vector up to N elements. The new elements must be **initialized** to some value, or to the default one – as in this case. This additional work does not matter in this challenge, however initialization may – in general – cause some overhead (read, for example, these thoughts by Thomas Young). As a reader pointed out on reddit, **push_back** hides a branching logic that can cause some cost. For this reason he suggests that two sequential passes over the data (that is contigous) may be faster. I think this can be true especially for small data, however the classical recommendation is to profile your code in case of such questions. In my opinion getting into the habit of using reserve + *_back is better and potentially faster in general cases.

The heart of the matter is: **need a dynamic array? Consider std::vector. **In competitive programming std::vector is 99% of the time the best replacement for a dynamic C-like array (e.g. T* or T**). 1% is due to more advanced challenges requiring us to design different kind of dynamic arrays that break some std::vector’s guarantees to gain some domain-specific performance. Replacing std::vector with custom optimized containers is more common in real-life code (to have an idea, give a look for example here, here and here).

If N was given at compile-time, a static array could be better (as far as N is small – say less than one million – otherwise we get a stack overflow). For this purpose, std::array is our friend – basically a richer replacement of T[]. “Richer replacement” means that std::array is the STL-adaptation of a C-array. It provides member functions we generally find in STL containers like .size(), .at(), .begin()/.end(). std::array combines the performance and accessibility of a C-style array with the benefits of a standard container. Just use it.

Since **much information is stated in the problem’s requirements**,** **we’ll see that static-sized arrays are extraordinarily useful in competitive programming. In the future I’ll spend some time about this topic.

Now, let’s look at my snippet again: can we do better? Of course we can (from my previous post):

At this point we have the vector filled and we need to compute the sum of the elements. A hand-made for loop could do that:

Can we do better?

Sure, by using the first numeric algorithm of this series: ladies and gentlemen, please welcome **std::accumulate**:

One of the most important loops in programming is one that adds a range of things together. This abstraction is known as **reduction** or **fold**. In C++, reduction is mimicked by **std::accumulate**. Basically, it accumulates elements **from left to right** by applying a binary operation:

accumulate with three parameters uses **operator+** as binary operation.

std::accumulate **guarantees**:

- the
**order of evaluation**is left to right (known also as**left fold**), and - the time
**complexity**is linear in the length of the range, and **if the range is empty**, the initial value is returned (that’s why we have to provide one).

The reduction function appears in this idiomatic form:

So the result type may be different from the underlying type of the range (ElementType). For example, given a vector of const char*, here is a simple way to calculate the length of the longest string by using std::accumulate (credits to Davide Di Gennaro for having suggested this example):

To accumulate from the right (known as **right fold**) we just us **reverse iterators**:

Right fold makes some difference – for example – when a non-associative function (e.g. subtraction) is used.

In functional programming fold is very generic and can be used to implement other operations. In this great article, Paul Keir describes how to get the equivalent results in C++ by accommodating std::accumulate.

Does std::accumulate have any pitfalls? **There exist cases where a+=b is better than a = a + b **(the latter is what std::accumulate does in the for loop). Although hacks are doable, I think if you fall into such scenarios, a for loop would be the simplest and the most effective option.

Here is another example of using std::accumulate to multiply the elements of a sequence:

*std::multiplies<>* is a** standard function object** (find others here).

Using standard function objects makes the usage of algorithms more succinct. For example, the problem of finding the missing number from an array of integers states: given an array of N integers called “baseline” and another array of N-1 integers called “actual”, find the number that exists in “baseline” but not in “actual”. Duplicates may exist. (this problem is a generalization of the “find the missing number” problem, where the first array is actually a range from 0 to N and a clever solution is to apply the famous Gauss’ formula N(N+1)/2 and subtracting this value from the sum of the elements “actual”). An example:

The missing number is 2.

A simple linear solution is calculating the sum of both the sequences and then subtracting the results. This way we obtain the missing number. This solution may easily result in integer overflow, that is undefined behavior in C++. Another wiser solution consists in xor-ing the elements of both the arrays and then xoring the results.

Xor is a **bitwise** operation – it does not “need” new bits – and then it never overflows. To realize how this solution works, remember how the xor works:

Suppose that “a” is the result of xor-ing all the elements **but** the missing one – basically it’s the result of xor-ing “actual”. Call the missing number “b”. This means that xor-ing “a” with the missing element “b” results in xor-ing together the elements in the “baseline” array. Call the total “c”. We have all the information to find the missing value since “a” ^ “c” is “b”, that is just the missing number. That’s the corresponding succint C++ code:

Let’s go back to the initial challenge. We can do even better.

To realize how, it’s important to get into the habit of thinking in terms of iterators rather than containers. Since standard algorithms work on ranges (pairs of iterators), we don’t need to store input elements into the vector at all:

Advancing by one – using **next **– is a licit action since the problem rigorously describes what the input looks like. This snippet solves the challenge in a single line, in **O(n) time** and **O(1) space**. That’s pretty good. It’s also our first optimization (actually not required) since our solution dropped to O(1) space – using std::vector was O(n).

That’s an example of what I named “standard reasoning” in the introduction of this series. Thinking in terms of standard things like iterators – objects making algorithms separated from containers – is convenient and it should become a habit. Although it seems counterintuitive, from our perspective of C++ coders thinking in terms of iterators is not possible without knowing containers. For example we’ll never use std::find on std::map, instead we’ll call the member function std::map.find(), and the reason is that we know how std::map works. In a future post I’ll show you other examples about this sensible topic.

Our solution leads to ranges naturally:

**view::tail** takes all the elements starting from the second (again, I skipped the input length), and **ranges::istream** is a convenient function which generates a range from an input stream (**istream_range**). If we had needed to skip more elements at the beginning, we would have used **view::drop**, which removes the first *N* elements from the front of a source range.

Iterators-based and ranges-based solutions version look very similar, however – as I said in the introduction of this series – iterators are not easy to compose whereas ranges are composable by design. In the future we’ll see examples of solutions that look extremely different because of this fact.

In Competitive Programming these **single-pass** algorithms are really useful. The STL provides several single-pass algorithms for accomplishing tasks like finding an element in a range, counting occurrences, veryfing some conditions, etc. We’ll see other applications in this series.

In the next post we’ll meet another essential container – **std::string **– and we’ll see other algorithms.

Recap for the pragmatic C++ competitive coder:

- Don’t reinvent containers whenever standard ones fit your needs:
- Dynamic array? (e.g. int*)
*Consider*std::vector - Static array? (e.g. int[]) Use

- Dynamic array? (e.g. int*)
*Prefer*standard algorithms to hand-made for loops:- often more efficient, more correct and consistent,
- more maintainable (a “language in the language”)
- use standard function objects when possible
- use
**std::accumulate**to*combine*a range of things together - If customizing an algorithm results in complicated code, write a loop instead

- Think in terms of standard
*things*:- iterators separate algorithms from containers
- understand containers member functions