Posts Tagged ‘C++’

Crafting software is about balancing competing trade-offs. It’s impossible to optimize every factor of a system, as speed, usability, accuracy, etc at the same time. Moreover, solutions of today impact decisions and solutions of tomorrow. On the other hand, in Competitive Programming, the best solution is one that just makes each test-case pass.

However, since each challenge has specific requirements and constraints, we can still imagine it as a small and simplified component of a bigger system. Indeed, a challenge specifies:

  • the input format and the constraints of the problem;
  • the expected output;
  • the environment constraints (e.g. the solution should take less than 1s);
  • [optional] amount of time available to solve the challenge (e.g. 1 hour).

Questions like: “can I use a quadratic solution?” or “may I use extra space?” will be answered only by taking into account all the problem information. Balancing competing trade-offs is a key in Competitive Programming too.

This post is a bit philosphical, it’s about what I consider the essence of Competitive Programming and why a professional could benefit from it. Since February of this year I’ve been monthly organizing “coding dojos” on these themes, here in Italy. After some months, I’m seeing good results in people regularly joining my dojos and practicing at home. It’s likely that in the future I’ll dedicate a post about how dojos are organized, but if you want more information now, please get in touch.

Follow me through this discussion and I’ll explain you some apparently hidden values of Competitive Programming and why I think they are precious for everyone.

Let me start by stating that the only real compromise in Competitive Programming is one that makes an acceptable solution. However, we distinguish two distinct “coding modes”: competition and practice. Competition is about being both fast and accurate at the same time. You just have to correctly solve all the challenges as fast as possible. On the other hand, practice mode is the opportunity to take time and think more in deep about compromises, different approaches and paradigms, scalability, etc. In practice mode we can also explore the usage of standard algorithms. A story about that concept: at one of my dojo I proposed this warm-up exercise:

Given two arrays of numbers A and B, for i from 0 up to N-1, count how many times A[i] > B[i].

For example:

A = [10, 2, 4, 6, 1]
B = [3, 2, 3, 10, 0]

The answer is 3 (10>3, 4>3, 1>0).

This exercise was trivial and people just solved it by looping and counting.

During the retrospective I asked people: “can you solve this problem without looping explicitly?”. I lowered my aim: “Suppose I provide you a function which processes the arrays pair-wise”. One guy plucked up courage and replied ” like ZIP?”. I said “Yes!”. I dashed off the blackboard and showed the idea.

The C++ algorithm to use in this case was std::inner_product, that we already met in the very first post of this series. I’ll get back to inner_product again in the future, meanwhile here is the slick solution to this problem:

As we already saw, inner_product is kind of a zip_and_reduce function. Each pair is processed by the second function (greater) and results are sequentially accumulated by using the first one (plus). People realized the value of applying a known pattern to the problem. We discussed also about how easy it is to change the processing functions, getting different behaviors by maintaining the same core algorithm.

In this post I’ll show different scenarios by turning on “practice mode” and “competition mode” separately, discussing the different ways to approach problems and care about compromises in both.

Consider the problem we tackled in the previous post: the “most common word”. Since we used std::map, the solution was generic enough to work with types other than strings with a little effort. For example, given an array of integers, we can find the mode (as statisticians call it) of the sequence.

Imagine this solution still fits the requirements of the challenge.

In “competition mode”, we stay with this solution, just because it works. The pragmatic competitive coder was lucky because she already had a solution on hand, so she just exults and lands to the next challenge of the contest.

Now, let’s switch “practice mode” on. First of all, we notice two interesting constraints of the challenge: 1) the elements of the array belong to the range 0-1’000; 2) the size of the sequence is within 100’000. Often, constraints hide shortcuts and simplifications.

Ranting about life in general and not only about programming, I use to say “constraints hide opportunities”.

In Competitive Programming this means that whenever we see particular sentences like “assume no duplicates exist” or “no negative values will be given”, it’s very likely that these information should be used somehow.

Sometimes “very likely” becomes a command and we won’t have other options to solve the challenge:  we have to craft a solution which exploits the constraints of the problem. Such challenges are interesting since they require us to find a way to cut off the space of all the possible solutions.

Anyway, back to the mode problem, since we know the domain is 0-1’000, we can just create a histogram (or frequency table) and literally count the occurrences of each number:

Although the solution is similar to the previous one (because std::map provides operator[]), this is dramatically faster because of contiguity.

However, we agreed to a compromise: we pay everytime a fixed amount of space to allocate the histrogram (~4 KB) and we support only a fixed domain of ints (the ones within 0-1’000). That’s ok for this challenge, but it could not be the case in other cases. Balancing speed, speed and genericity is something to think about in practice mode. In my coding dojos in Italy we use to discuss about such things at the end of each challenge, during a 15-min retrospective.

At this point we need to answer a related question: “are we always permitted to allocate that extra-space?”. Well, it depends. Speaking about stack allocations – or allocations known “at compile-time” – compilers have their own limits (that can be tuned), generally in order of a few MB.

The story is different for dynamic allocations. On websites like HackerRank we may have a look at the environment page which contains the limits for each language. For C++, we are usually allowed to allocate up to 512 MB. Other websites can limit that amount of space, typically per challenge. In these situations, we have to be creative and find smarter solutions.

Take InterviewBit as an example – a website focused on programming interviews. A challenge states: “given a read only array of n + 1 integers between 1 and n, find one number that repeats, in linear time using constant space and traversing the stream sequentially”. It’s clear that we cannot use an auxiliary data structure, so forget maps or arrays. A solution consists in applying a “fast-slow pointer strategy” – so does Floyd’s algorithm. I won’t show the solution because very soon we’ll have a guest post by Davide Di Gennaro who will show how to apply Floyd to solve another – more realistic – problem.

Space and time have a special connection. The rule of thumb is that we can set up faster solutions at the price of using more space, and viceversa. It’s another compromise. Sorting algorithms are an example: some people amaze when they discover that, under certain conditions, we can sort arrays in linear time. For example, you know that counting sort can perform very well when we have to sort very long sequences which contains values in a small range – like a long string of lowercase alphabetic letters.

Counting sort allocates an extra frequecy table (a histogram, as we have just seen), containing the frequencies of the values in the range. It exhibits linear time and space complexity that is O(N+K), where N is the length of the array to sort and K is the length of the domain. However, what if the elements go from 1 to N^2? Counting sort exhibits now quadratic complexity! If we need to care about this problem, we can try adoperating another algorithm, like Radix sort, which basically sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value.

Radix sort has also other limitations, but I won’t go into details because it’s not the point of this article. Again, I just want to highlight that other decisions have to be taken, depending on the requirements and the constraints of the problem. A general rule is that often we can exchange space for making faster solutions. Space and time go hand in hand.

Time is constrained as well: typically 1 or 2 seconds on HackerRank. To give you some feeling about what it means, let’s write down some time evidences measured on HackerRank’s environment.

std::accumulate – O(n) – on vector<int>:

100’000 elements: 65 microseconds

1’000’000 elements: 678 microseconds

10’000’000 elements: 6973 microseconds (6.973 milliseconds)

However, don’t be swayed by complexity. Time is also affected by other factors. For example, let’s try accumulating on std::set:

100’000 elements: 747 microseconds

1’000’000 elements: 16063 microseconds (16.063 milliseconds)

10’000’000 elements: timeout (>2 seconds)

You see that contiguity and locality make a huge difference.

Imagine now what happens if we passed from N to N^2. Roughly speaking, we can just square the time needed to accumulate a vector of 100’000 elements to get an approximation of what would happen: we obtain ~4.5 milliseconds. If the challenge expects at most 1 second, we are able to afford it.

Should we submit such a quadratic solution?

Again, we may in competition mode. Instead we should think about it in practice mode. What if the input grows? Moreover, special contests run extra tests once a day, generally stressing the solution by pushing inputs to the limit. In this case, we need to understand very carefully our constraints and limits.

Compromises also originate from the format of input. For example, if we are given a sorted sequence of elements, we are enabled to binary-search that sequence for free (that is, we don’t need to sort the sequence ourselves). We can code even more optimized searches if the sequence has also some special properties. For instance, if the sequence represents a time series containing samples of a 10-sec acquisition, at 100 Hz, we know precisely – and in constant time – where the sample of the 3rd second is. It’s like accessing an array by index.

The more we specialize (and optimize) a solution, the more we lose genericity. It’s just another compromise, another subject to discuss.

Compromises are also about types. In previous posts we have faced with problems using integer values and we have assumed that int was fine. It’s not always the case. For example, many times problems have 32bit integer inputs and wider outputs – which overflow 32bit. Sometimes the results do not even fit 64bit ints. Since we don’t have “big integers” in C++, we have to design such facility ourselves or…switch to another language. It’s not a joke: in “competition mode”, we just use another language if possible. It’s quicker.

Instead, in “practice mode” we can design/take from an external source a big integer class, for being reused in “competition mode”. This aspect is particularly important: when I asked to some top coders their thoughts on how to constantly improve and do better, their shared advice was to solve problems and refine a support library. Indeed, top levels competitive coders have their snippets on hand during contests.

This is a simplistic approximation to what we actually do as Software Engineers: we solve problems first of all by trying to directly use existing/tested/experienced solutions (e.g. libraries, system components). If it’s not the case, we have two options: trying to adapt an existing solution to embrace the new scenario, or writing a new solution (“writing” may mean also “introducing a new library”, “using some new system”, etc). Afterwards, we can take some time to understand if it’s possible to merge this solution to the others we already have, maybe with some work (we generally call it refactoring). You know, the process is more complicated, but the core is that.

Using or not an existing solution is a compromise. Mostly because the adapting and refactoring parts require time and resources.

I think everyone would love writing the most efficient solution that is also the simplest. Generally it’s not the case.

However it happens in many challenges – especially in the ones of difficulty Easy/Medium – that “average solutions” are accepted as well (these solutions work for the challenge, even if they are not the “best” – in terms of speed, space, etc).

The balance between the difficulty of a solution and its effectiveness is another compromise to take into account. About that, I tell you a story I lived a few days ago at my monthly coding dojo. We had this exercise:

Given a string S, consisting of N lowercase English letters, reduce S as much as possible by deleting any pair of adjacent letters with the same value. N is at most 100.

An example:

aabccbdea

becomes “dea”, since we can first delete “aa”getting:

bccbdea

Then we delete “cc” and get: “bbdea”, finally we remove “bb” and get “dea”.

The attendees proposed a few solutions, none of which exhibited linear time and constant space complexity. Such a solution exists, but it’s a bit complicated. I proposed that challenge because I wanted people to reason about the compromises and, indeed, I made one point very clear: the maximum size of the string is 100. It’s little.

We had two leading solutions: one quadratic and one linear in both time and space. The quadratic one could be written in a few lines by using adjacent_find:

The other one adoperated a deque (actually, I have seen such solution applied many times in other contexts):

Both passed. Afterwards, we discussed about the compromises of these solutions and we understood limits of both. In a more advanced dojo, I’ll propose to solve this challenge in linear time and constant space!

Competitive Programming gives us the opportunity to see with our own eyes the impact of self-contained pieces of code, first of all in terms of time spent and space allocated. As programmers, we are forced to do predictions and estimatations.

Estimating and predicting time/space is fundamental to decide quickly which solution is the most viable when different compromises cross our path. Back to the mode example, using std::map or std::vector makes a huge difference. However, when the domain is small we are allowed just not to care about that difference. Somehow, this happens also in ordinary software development. For instance, how many times do we use streams in production code? Sometimes? Never? Often? I think the most fair answer would be “it depends”. Streams are high-level abstractions, so they should be relatively easy to use. However, they are generally slower than C functions. On the other hand, C functions are low-level facilities, more error prone and less flexible. What to do entirely depends on our needs and possibilities, and very often also on the attitude of us and our company.

Although Competitive Programming offers a simplified and approximated reality, facing a problem “as-is” it’s an opportunity to understand compromises, estimate time and space, adapt known solutions/patterns to new scenarios, learn something completely new, think outside the box.

The latter point is related to another important fact that I see with my own eyes during my coding dojos in Italy: young non-professional people often find clever solutions quicker than experienced programmers. The reason is complex and I have discussed about that also with psychologists. To put it in simple terms, this happens because young people generally have a “weaker mindset”, have less experience and are not afraid of failure. The latter point is important as well: many professionals are terribly afraid of failures and of making a bad impression. Thus, an experienced solution that worked seems more reliable than something completely new. That’s basically a human behavior. The problem comes up when the experienced programmer cannot think outside the box and is not able to find new and creative ways to solve a problem.

Competitive Programming is a quick and self-contained way to practice pushing our “coding mind” to the limits. As a Software Engineer, programmer and professional, I find it precious.

Competitive Programming offers both a “practice mode”, during which I can “stop and think”, reasoning about compromises, time, space, etc. And a “competition mode”, where I can “stress-test” my skills and experience.

Learning new things is often mandatory. This needs time, will and patience. But you know, it’s a compromise…

Recap for the pragmatic C++ competitive coder:

  • Compromises in Competitive Programming have different shapes:
    • Dimension and format of the input;
    • Time and space limits;
    • Data types;
    • Adaptability of a well-known solution;
    • Simplicity of the solution.
  • The essence of Competitive Programming consists in two phases:
    • Competition: the solution has just to work;
      • Top coders generally take on hand snippets and functions to use during contests.
    • Practice: deeply understanding compromises, variations and different implementations of the solution;
      • Top coders generally refine their snippets and functions.
  • A challenge may have simplifications and shortcuts:
    • The more we use those, the less generic our solution will be;
    • Hopefully, the more we use those, the more optimized (and/or simple) our solution will be;
    • Many challenges require us to find shortcuts in the problem constraints and description.

I apologize if I didn’t publish any new posts in May but I was busy with the organization of the Italian C++ Conference 2016. If you feel like reading some lines about this event, check out this post.

In this installment of “C++ in Competitive Programming” we’ll meet a fundamental data structure in Computer Science, one that manages a sequence of characters, using some encoding: a stringAs usual, let me introduce this topic through a simple challenge:

A palindrome is a word, phrase, number, or other sequence of characters which reads the same backward and forward. Given a string print “YES” if it is palindrome, “NO” otherwise. The string contains only alphanumeric characters.

For example:

abacaba

is palindrome; whereas:

abc

is not.

We need a type representing a sequence of characters and in C++ we have std::string, the main string datatype since 1998 (corresponding to the first version of the ISO C++ standard – known as C++98). Under the hood, imagine std::string as a pointer to a null-terminated (‘\0’-terminated) char array. Here is a list of useful facts about std::string:

  • std::string generalizes how sequences of characters are manipulated and stored;
  • roughly speaking, it manages its content in a similar way std::vector does (e.g. growing automatically when required);
  • apart from reserve, resize, push_back, etc. std::string provides typical string operations like substr, compare, etc;
  • it’s independant on the type of encoding used: all its members, as well as its iterators, will still operate in terms of bytes;
  • implementers of std::string generally embeds a small array in the string object itself to manage short strings.

The latter point is referred as the Small String Optimization and it means that short strings (generally 15/22 chars) are allocated directly in the string itself and don’t require a separate allocation (thanks to this guy on reddit who pointed out I was wrong here). Have a look at this table which shows the maximum size that each library implementation stores inside the std::basic_string.

The problem description states that the input is in the form:

string-till-end-of-stream

Thus we are allowed to simply read the string this way:

Usually in CC a string is preceded by its length:

N string-of-len-N

In this case we don’t need to read N at all:

That will skip every character until the first whitespace and then will read the string that follows the whitespace into S.

Now let’s face with the problem of determining if S is palindrome or not.

It should be easy to understand that S is palindrome if reverse(S) is equal to S. So let’s start coding such a naive solution:

As you can see, we access characters as we do with an array. As std::vector, std::string makes it possible to use also iterators and member functions to access individual characters. In the last line we applied operator== to verify if “char-by-char S is strictly equal to tmp”. We could also use string::compare() member function:

compare() returns a signed integral indicating the relation between the strings: 0 means they compare equal; a positive value means the string is lexicographically greater than the passed string; a negative value means the string is lexicographically lesser than the passed string. This function supports also comparison with offsets: suppose we want to check if a string is prefix of another (that is, a string starts with a certain prefix). This is the most effective way to do that:

Bear this trick in mind.

compare() supports also overloads with C-strings, preventing implicit conversions to std::string.

Now let’s turn back to reversing the string. Actually we don’t need to write a for loop manually because reversing a range is a common function already provided by the STL. Including <algorithm> we  get the algorithms library that defines functions for a variety of purposes (e.g. searching, sorting, counting, manipulating) that operate on ranges of elements. To reverse in-place a sequence we adoperate std::reverse:

Since we don’t want to modify the original string, we can use std::reverse_copy to copy the elements from the source range to the destination range, in reverse order:

Here – as for std::vector – we have two options: creating an empty string, reserving enough space and then pushing letters back, or creating a properly-sized and zero-initialized string and then assigning every single char. Since char is a cheap data type, the latter option is generally faster (basically because push_back does some branching to check if the character to add fits the already initialized sequence). For this reason I prefer filling a string this way. As I pointed out in the previous post, a reader from reddit suggested to use this approach also for std::vector<int> and honestly I agree. Larger types may have a different story. Anyway, as usual I suggest you to profile your code when in doubt. For Competitive Programming challenges this kind of finess makes no difference.

This solution is a bit more efficient than the previous one because it’s only two passes (one pass for reverse_copy and another one for operator==). We have also got rid of the explicit loop. What it really makes this solution bad is the usage of that extra string. If we turn back to the initial for loop, it should be clear that we just need to check if each pair of characters that would be swapped are the same:

S = abacaba
S[0] == S[6]
S[1] == S[5]
S[2] == S[4]
S[3] == S[3]

That is, with i from 0 to N/2 we check that:

S[i] == S[N-i-1]

Applying this idea:

Ok, this solution seems better. Can we do even better? Sure. Let’s meet another standard function.

Some algorithms operate on two ranges at a time: std::equal belongs to this family. A function that checks if two ranges are strictly the same:

This function returns true if the elements in both ranges match. It works by making a pair-wise comparison, from left to right. By default the comparison operator is operator==. For example:

string c1 = "ABCA", c2="ABDA";
'A' == 'A' // yes, go on
'B' == 'B' // yes, go on
'C' == 'D' // no, stop

The comparison can be customized by passing a custom predicate as last parameter.

Now, consider the problem of checking if a string is palindrome. Our loop compares the first and the last character, the second and the second last character, and so on. If it finds a mismatch, it stops. It’s basically std::equal applied to S in one direction and to reverse(S) in the other. We just need to adapt the second range to go from the end of S to the beginning. That’s a job for reverse iterators:

Reverse iterators goes backward from the end of a range. Incrementing a reverse iterator means “going backward”.

There is only one thing left: this solutions now makes N steps, wheareas only N/2 are really needed. We perform redundant checks. For instance:

abacaba
[0, 6]: a
[1, 5]: b
[2, 4]: a
[3, 3]: c // middle point
[4, 2]: a (already checked [2, 4])
[5, 1]: b (already checked [1, 5])
[6, 0]: a (already checked [0, 6])

Taking this consideration into account we get:

std::next returns a copy of the input iterator advanced by N positions (this version does not require random access iterators).

We finally have a one-liner, single-pass and constant space solution.

I apologize if it took a while to get here: not only I introduced some string functions, but I also incrementally approached to the problem showing how simple operations can be written in terms of standard algorithms. This process is precious since it helps to get familiar with the standard. Sometimes it does not make sense to struggle to find an algorithm that fits a piece of code, other times it does. The more you use the standard, the more it will be easy for you to identify these scenarios pretty much automatically.

Don’t worry, in future posts I’ll skip trivial steps.

Now let me raise the bar, just a little bit.

Palindrome index

 

In Competitive Programming many variations of this topic exist. This is an example:

Given a string, determine the index of the character whose removal will make the string a palindrome. Suppose a solution always exists.

For example:

acacba

if we remove the second last character (‘b’), we get a palindrome

This problem – known as palindrome index – can be solved by introducing another useful algorithm that actually works like std::equal but it also returns the first mismatch, if any, instead of a bool. Guess the name of this algorithm…yeah, it’s called std::mismatch.

This problem is quite easy to solve:

  • locate the first char that breaks the “palindromeness” – call that position (mismatch)
  • check if the string was palindrome if the m-th char was removed
  • if so, the palindrome index is m, otherwise it is N – m – 1 (basically, the char “on the other side”)

Since this solution can be implemented in many ways I take advantage of this opportunity to introduce other string operations. First of all, here is how std::mismatch works:

You see that ‘C’ and ‘X’ results in a mismatch. mism is a std::pairmism.first is S1.begin() + 2 and mism.second is S2.begin() + 2. Basically, they point to ‘C’ in the first string and to ‘X’ in the second. Suppose now we need to find this “palindrome index”. Consider as an example:

mism.first points to ‘c’ and mism.second points to ‘b’. Since we know a solution always exists, either of these chars makes S not palindrome. To determine which one, we need to check if S without the mismatch point mism was palindrome or not. For this check, we create a new string from the concatenation of two substrings of S:

  • From the beginning to mism-1, and
  • From mism+1 to the end

Although I don’t like this solution so much, I have browsed others (not only written in C++) on HackerRank and this resulted the most popular. So let me show you my translation into C++ code:

Let me introduce you substr()S.substr(pos, count) returns the substring of S that starts at character position pos and spans count chars (or until the end of the string, whichever comes first) – S[pos, pos + count). If pos + count extends past the end of the string, or if count is std::string::npos, the returned substring is [pos, size()). For example:

substr results in “ell”.

It’s now evident that toCheck consists in the concatenation of S from 0 to diffIdx-1 and from diffIdx + 1 to the end:

acacba -> diffIdx = 1 
toCheck = a + acba = aacba

Only for completeness, another (possibly more efficient) way to obtain toCheck consists in adoperating std::copy:

This solutions works and passes all tests, but I find it annoying to use extra space here.

Suppose we are free to modify our original string: it’s easier (and possibly faster) to remove the candidate iterator by using string::erase:

This avoids both creating extra (sub)strings and allocating extra memory (note that the internal sequence of characters can be relocated into the same block). The final part of the algorithm is similar:

The final cost of this solution is linear.

Now, what if we cannot change the string?

A solution consists in checking two substrings separately. Basically we just need to exclude the mismatch point and then check if the resulting string is palindrome, then we check:

  • From the beginning of S to the mismatch point (excluded) with the corresponding chars on the other side;
  • From one-past the mismatch point to the half of S with the corresponding chars on the other side.

Actually, the first check is already performed when we call mismatch, so we don’t need to repeat it.

To code the second check, just remember the second string goes from diffIt + 1 to the half of the string. So, we just need to correctly advance the iterators:

Let’s see this snippet in detail: next(diffIt) is just diffIt + 1. begin(S) + S.size()/2 is just the half of S. The third iterator, rbegin(S) + diffIdx, is the starting point of the string on the other side. Here is the complete solution:

If you followed my reasoning about positions then it’s just the matter of applying standard algorithms with some care for iterators.

You may complain this code seems tricky, so let me rewrite it in terms of explicit loops:

In the STL-based solution we clearly need to think in terms of iterators. The mismatch part is trivial (actually it could be replace with a call to std::mismatch, as in the STL-based solution), but the calls to std::equal are a little bit more difficult to understand. At the same time, it should be evident that std::equal checks that two ranges are identical. Nothing more. Also, if we replace std::string with another data structure that provides iterators, our code won’t change. Our algorithm is decoupled from the structure of the data.

On the other hand, in the for-based approach the logic is completely hidden inside the iterations and the final check. Moreover, this code depends on the random-access nature of the string.

Judge yourself.

 

Conversions

 

This short section is dedicated to conversions between strings and numeric types. I will start by saying that, in terms of speed, the following functions can be beated given certain assumptions or in particular scenarios. For example, you maybe remember Alexandrescu’s talk about optimization (and here is a descriptive post) where he shows some improvements on string/int conversions. In CC the functions I’ll introduce are generally ok. It can happen that in uncommon challenges it’s required you to take some shortcuts to optimize a conversion, mainly because the domain has some particularities. I’ll talk about domain and constraints in the future.

The STL provides several functions for performing conversions between strings and numeric types. Conversions from numbers to string can be easily obtained since C++11 through a set of new overloads:

A disadvantage of this approach is that we pay a new instance of std::string every time we invoke to_string. Sometimes – especially when many conversions are needed – this approach is cheaper:

Or use vector<char> for allocating the string dynamically.

char_needed is the maximum number of chars needed to represent an int32. This value is basically:

From C++17 we’ll have string_span to easily wrap this array into a string-like object:

Moreover, from C++17 we’ll have string::data() as non-const member, so we’ll be able to write directly into a std::string:

In CC sprintf is good enough, even if sprintf_s (or another secure version) is preferred.

Anyhow, prefer using std::to_string if the challenge allows that.

Conversions in the other direction are a bit more confusing because we have both C++11 functions and C functions. Let me start just by saying that rolling a simple algorithm to convert a string into an unsigned integer is easy, pretty much elegant and interesting to know about:

To convert to an int32 we just need to handle the minus sign:

nxt – ‘0’ is an idiom: if digit is a char in [0-9], digit – ‘0’ results in the corresponding integral value. E.g.:

'1' - '0' = 1 (int)

The inverse operation is simply char(cDigit + ‘0’). E.g.:

char(1 + '0') = '1' (char)

In C++ (as in C) adding an int to a char will result in an int value: for this reason a cast back to char is needed.

With these snippets we are just moving through the ASCII table. ‘1’ – ‘0’ represents how far ‘1’ is from ‘0’, that is 1 position. 1 + ‘0’ is one position after ‘0’, that is ‘1’. With this idea in mind we can easily perform trivial lowercase to uppercase conversions:

// only works if c is lowercase
char(c - 'a' + 'A')

And viceversa:

// only works if c is uppercase
char(c - 'A' + 'a')

Anyhow, as one guy commented on reddit, the ASCII table is designed in such a way just flipping one bit is needed to get the same results:

// assuming c is a letter
char toLower(char c) { return c | 0x20; }
char toUpper(char c) { return c & ~0x20; }

But remember that the C++ standard (from C, in <cctype>already provides functions to convert characters to upper/lower case, to check if one character is upper/lower case, digit, alpha, etc. See here for more details.

In CC, these tricks should be kept on hand. For example, this challenge requires to implement a simple version of the Caesar cipher:

Given a string S of digits [0-9], change S by shifting each digit by K positions to the right.

For example 1289 with K = 3 results in 4512.

We can easily solve this task by applying the tricks we have just learned:

Note I used a range-based for loop even if a standard algorithm could help solve this problem. I don’t introduce it yet though.

Now, let’s see some standard functions to convert strings to numbers. Since C++11 we have ‘sto’ functions (and for unsigned values and floating points) which convert a std::string/std::wstring into numeric values (they support also different basis). Being STL functions, they throw exceptions if something goes wrong: an std::invalid_argument is thrown if no conversion could be performed, std::out_of_range is thrown if the converted value would fall out of the range of the result type or if the underlying function (std::strtol or std::strtoll) sets errno to ERANGE. For example:

This family of functions optionally outputs the number of processed characters:

On the other hand, C functions don’t throw exceptions, instead they return zeros in case of errors. For instance:

That’s enough for now.

Recap for the pragmatic C++ competitive coder:

  • Don’t reinvent containers whenever standard ones fit your needs:
    • Consider std::string to handle a sequence of characters
      • std::string::compare indicates the relation between strings
      • std::string::substr creates substrings
  • Prefer standard algorithms to hand-made for loops:
    • std::copy, to copy the elements in a range to another
    • std::reverse, to reverse the order of the elements in a range
    • std::reverse_copy, to copy the elements in a range to another, but in reverse order
    • std::equal, to know if two ranges are the same
    • std::mismatch, to locate the first mismatch point of two ranges, if any
  • Understand options for converting strings to numbers, and viceversa:
    • std::to_string, to convert numeric values into strings (a new instance of std::string is returned)
    • std::array (std::string in C++17) + C’s sprintf (or equivalent – e.g. secure _s version) when reusing the same space is important
    • std::sto* functions to translate strings into numeric values (remember they throw exceptions)
    • C’s atoi & friends when throwing exceptions is not permitted/feasible
    • Rememeber tricks to convert char digits into ints and viceversa:
      • digitAsChar – ‘0’ = corresponding int
      • char ( digitAsInt + ‘0’ ) = corresponding char

One of the first challenges in the HackerRank‘s “Warmup” section is probably the “Hello World” of learning arrays in any language: calculating the sum of a sequence of elements. Although this exercise is trivial, I’ll face with it to break the ice and show you a few concepts that lay the groundwork for more complicated challenges.

I’m assuming you are already familiar with concepts like iterator, container and algorithm. Most of the time I’ll give hints for using these C++ tools effectively in Competitive Programming.

That’s the problem specification: You are given an array of integers of size N. Can you find the sum of the elements in the array? It’s guaranteed the sum won’t overflow the int32 representation.

First of all, we need an “array of size N”, where N is given at runtime. The C++ STL (Standard Template Library) provides many useful and cleverly designed data structures (containers) we don’t need to reinvent. Sometimes more complicated challenges require us to write them from scratch. Advanced exercises reveal less common data structures that cannot be light-heartedly included into the STL. We’ll deal with some examples in the future.

It’s not our case here. The primary lesson of this post is: don’t reinvent the wheel. Many times standard containers fit your needs, especially the simplest one: std::vector, basically a dynamic sequence of contiguous elements:

For the purpose of this basic post, here is a list of important things to remember about std::vector:

  • it’s guaranteed to store elements contiguously, so our cache will love it;
  • elements can be accessed through iterators, using offsets on regular pointers to elements, using the subscript operator (e.g. v[index]) and with convenient member functions (e.g. at, front, back);
  • it manages its size automatically: it can enlarge as needed. The real capacity of the vector is usually different from its length (size, in STL speaking);
  • enlarging that capacity can be done explicitly by using reserve member function, that is the standard way to gently order to the vector: “get ready for accommodating N elements”;
  • adding a new element at the end of the vector (push_back/emplace_back) may not cause relocation as far as the internal storage can accommodate this extra element (that is: vector.size() + 1 <= vector.capacity());
  • on the other hand, adding (not overwriting) an entry to any other position requires to relocate the vector (eventually in the same block of memory, if the capacity allows that), since the contiguity has to be guaranteed;
  • the previous point means that inserting an element at the end is generally faster than inserting it at any other position (for this reason std::vector provides push_back, emplace_back and pop_back member functions);
  • knowing in advance the number of elements to store is an information that can be exploited by applying the synergic duo reserve + push_back (or emplace_back).

The latter point leads to an important pattern: inserting at the end is O(1) as far as the vector capacity can accommodate the extra element – vector.size() + 1 <= vector.capacity(). You may ask: why not enlarging the vector first and then just assign values? We can do that by calling resize:

resize enlarges the vector up to N elements. The new elements must be initialized to some value, or to the default one – as in this case. This additional work does not matter in this challenge, however initialization may – in general – cause some overhead (read, for example, these thoughts by Thomas Young). As a reader pointed out on reddit, push_back hides a branching logic that can cause some cost. For this reason he suggests that two sequential passes over the data (that is contigous) may be faster. I think this can be true especially for small data, however the classical recommendation is to profile your code in case of such questions. In my opinion getting into the habit of using reserve + *_back is better and potentially faster in general cases.

The heart of the matter is: need a dynamic array? Consider std::vector. In competitive programming std::vector is 99% of the time the best replacement for a dynamic C-like array (e.g. T* or T**). 1% is due to more advanced challenges requiring us to design different kind of dynamic arrays that break some std::vector’s guarantees to gain some domain-specific performance. Replacing std::vector with custom optimized containers is more common in real-life code (to have an idea, give a look for example here, here and here).

If N was given at compile-time, a static array could be better (as far as N is small – say less than one million – otherwise we get a stack overflow). For this purpose, std::array is our friend – basically a richer replacement of T[]. “Richer replacement” means that std::array is the STL-adaptation of a C-array. It provides member functions we generally find in STL containers like .size(), .at(), .begin()/.end(). std::array combines the performance and accessibility of a C-style array with the benefits of a standard container. Just use it.

Since much information is stated in the problem’s requirements, we’ll see that static-sized arrays are extraordinarily useful in competitive programming. In the future I’ll spend some time about this topic.

Now, let’s look at my snippet again: can we do better? Of course we can (from my previous post):

At this point we have the vector filled and we need to compute the sum of the elements. A hand-made for loop could do that:

Can we do better?

Sure, by using the first numeric algorithm of this series: ladies and gentlemen, please welcome std::accumulate:

One of the most important loops in programming is one that adds a range of things together. This abstraction is known as reduction or fold. In C++, reduction is mimicked by std::accumulate. Basically, it accumulates elements from left to right by applying a binary operation:

accumulate with three parameters uses operator+ as binary operation.

std::accumulate guarantees:

  • the order of evaluation is left to right (known also as left fold), and
  • the time complexity is linear in the length of the range, and
  • if the range is empty, the initial value is returned (that’s why we have to provide one).

The reduction function appears in this idiomatic form:

So the result type may be different from the underlying type of the range (ElementType). For example, given a vector of const char*, here is a simple way to calculate the length of the longest string by using std::accumulate (credits to Davide Di Gennaro for having suggested this example):

To accumulate from the right (known as right fold) we just us reverse iterators:

Right fold makes some difference – for example – when a non-associative function (e.g. subtraction) is used.

In functional programming fold is very generic and can be used to implement other operations. In this great article, Paul Keir describes how to get the equivalent results in C++ by accommodating std::accumulate.

Does std::accumulate have any pitfalls? There exist cases where a+=b is better than a = a + b (the latter is what std::accumulate does in the for loop). Although hacks are doable, I think if you fall into such scenarios, a for loop would be the simplest and the most effective option.

Here is another example of using std::accumulate to multiply the elements of a sequence:

std::multiplies<> is a standard function object (find others here).

Using standard function objects makes the usage of algorithms more succinct. For example, the problem of finding the missing number from an array of integers states: given an array of N integers called “baseline” and another array of N-1 integers called “actual”, find the number that exists in “baseline” but not in “actual”. Duplicates may exist. (this problem is a generalization of the “find the missing number” problem, where the first array is actually a range from 0 to N and a clever solution is to apply the famous Gauss’ formula N(N+1)/2 and subtracting this value from the sum of the elements “actual”). An example:

The missing number is 2.

A simple linear solution is calculating the sum of both the sequences and then subtracting the results. This way we obtain the missing number. This solution may easily result in integer overflow, that is undefined behavior in C++. Another wiser solution consists in xor-ing the elements of both the arrays and then xoring the results.

Xor is a bitwise operation – it does not “need” new bits – and then it never overflows. To realize how this solution works, remember how the xor works:

Suppose that “a” is the result of xor-ing all the elements but the missing one – basically it’s the result of xor-ing “actual”. Call the missing number “b”. This means that xor-ing “a” with the missing element “b” results in xor-ing together the elements in the “baseline” array. Call the total “c”. We have all the information to find the missing value since “a” ^ “c” is “b”, that is just the missing number. That’s the corresponding succint C++ code:

Let’s go back to the initial challenge. We can do even better.

To realize how, it’s important to get into the habit of thinking in terms of iterators rather than containers. Since standard algorithms work on ranges (pairs of iterators), we don’t need to store input elements into the vector at all:

Advancing by one – using next – is a licit action since the problem rigorously describes what the input looks like. This snippet solves the challenge in a single line, in O(n) time and O(1) space. That’s pretty good. It’s also our first optimization (actually not required) since our solution dropped to O(1) space – using std::vector was O(n).

That’s an example of what I named “standard reasoning” in the introduction of this series. Thinking in terms of standard things like iterators – objects making algorithms separated from containers – is convenient and it should become a habit. Although it seems counterintuitive, from our perspective of C++ coders thinking in terms of iterators is not possible without knowing containers. For example we’ll never use std::find on std::map, instead we’ll call the member function std::map.find(), and the reason is that we know how std::map works. In a future post I’ll show you other examples about this sensible topic.

Our solution leads to ranges naturally:

view::tail takes all the elements starting from the second (again, I skipped the input length), and ranges::istream is a convenient function which generates a range from an input stream (istream_range). If we had needed to skip more elements at the beginning, we would have used view::drop, which removes the first N elements from the front of a source range.

Iterators-based and ranges-based solutions version look very similar, however – as I said in the introduction of this series – iterators are not easy to compose whereas ranges are composable by design. In the future we’ll see examples of solutions that look extremely different because of this fact.

In Competitive Programming these single-pass algorithms are really useful. The STL provides several single-pass algorithms for accomplishing tasks like finding an element in a range, counting occurrences, veryfing some conditions, etc. We’ll see other applications in this series.

In the next post we’ll meet another essential container – std::string – and we’ll see other algorithms.

Recap for the pragmatic C++ competitive coder:

  • Don’t reinvent containers whenever standard ones fit your needs:
    • Dynamic array? (e.g. int*) Consider std::vector
    • Static array? (e.g. int[]) Use std::array
  • Prefer standard algorithms to hand-made for loops:
    • often more efficient, more correct and consistent,
    • more maintainable (a “language in the language”)
    • use standard function objects when possible
    • use std::accumulate to combine a range of things together
    • If customizing an algorithm results in complicated code, write a loop instead
  • Think in terms of standard things:
    • iterators separate algorithms from containers
    • understand containers member functions

In the last two years I have turned into a Competitive Programming aficionado, mostly active on HackerRank, less on TopCoder and CodingGame. This kind of websites enable coders not only to compete with others, but also to practice by solving several categorized exercises. After you write your program into the browser, they run and test it against some prepared test-cases. In all probability you can use your favorite language because they support pretty much everything, but I consider really precious spending time with other languages as well.

When I started I just wanted to challenge myself and push my brain beyond its limits. I really appreciated that facing with some kind of problems was a good excuse to explore new areas of CS and learn something new. Even if at that time it was not my target, competitive programming is a great way to prepare for technical job interviews.

Speaking in general, there are several reasons to get involved in Competitive Programming. I found a very nice article about that here. I quote the point that was the most important for me when I started: it’s guaranteed brain excercise: “On any given day, what you’re asked to do at work may or may not be challenging, or you may be exercising one of the many non-technical skills required of software developers. By practicing regularly with competitive programming problems, you can ensure that the coding part of your brain receives a regular workout. It’s like one of those brain training apps, but for real skills.”

Regardless of your programming language, spend some time on competitive coding.

Why C++ in Competitive Programming?

 

If you solve on websites like HackerRank a challenge then you may peek into other people solutions. Looking at C++ solutions, I have found a lot of “C++ people” using C++ mostly as C. Many people don’t consider (don’t know?) what the standard offers. On the other hand, I find it natural to face challenges with modern C++ by my side, coding as simply as possible. For example, often I ask myself: may I use an algorithm instead of this for loop? This attitude is worthwhile here as it has been for years in my daily job.

I realize the word modern is overloaded: in the last years we all have probably heard the expression modern C++ everywhere and sometimes it looked like a buzzword. I mean using the full standard toolset of C++, without external libraries, nor over complicated abstractions. Mainly because in competitive programming you don’t have libraries (sometimes you don’t have the standard library neither) and you cannot lose time with too much generalization. Many times the same results fit real life development.

It’s worth noting that I’m not going to explain things like “how to make a linked list”. Excellent algorithms & ds courses/books/series have been developed for this purpose. Rather, expect to see how I used a forward list to solve a problem, where I found necessary a multiset, or when lower_bound saved my life.

I have told to some friends about this series and they suspected that C++ was not the right language for talking about Competitive Programming. Actually I had a look at the top ranked people on websites like TopCoder and HackerRank and C++ was always there (together with Java and Python, mainly). I found another indicative example on Google Code Jam: more of the higher-performing contestants code in C++.

I’m not surprised at all.

Certainly C and C++ are considered the lingua franca of algorithms and data-structures, but I think the main reason is the control C++ offers and its independence of the paradigm:  it does not force you a single “way of doing” to solve a challenge.

My friend Alessandro Vergani has another sensible idea: traditionally C and C++ have a poor standard library compared to newer languages like Java and C#, for this reason C/C++ programmers are at ease with writing everything they need from scratch – and often too much. This means they are fluent with coding things that in other languages are took for granted.

Moreover, I think since lots of participants are students, C++ is very common as a “first language”.

 

Challenges are about trade-off

 

My purpose is solving challenges without reinventing the wheel. This means if I can use standard things to make all the test cases pass, I’ll do it. Obviously, I’ll justify my decisions and I’ll discuss a bit. For example, what about this snippet for pretty printing a non-zero unsigned 32-bit integer N into its binary representation:

It’s simple, but it’s not as efficient as it can be if written in other ways.

Does it suffice to solve the problem?

Maybe.

If so, I would be happy with this simple solution. If not, the challenge is asking me to find better ways to solve the problem.

Challenges are about trade-off. The requirements and the constraints of the problem matter a lot. They are a rough approximation of a real problem. Often you can exploit the constraints of the problem and take a shortcut. This may signify optimizations, or simplification, etc. Very often you have to exploit the constraints of the problem to solve it.

Many times these shortcuts are not so far from real life needs.

 

Standard reasoning

 

How many times you heard commands like “use standard algorithms” or “use a container”. With luck, we all know the classical resons for doing that: efficiency, correctness, maintainability. Now there is another fact we could care about: the future of our language.

Let me articulate.

Probably the biggest limitation of the STL is the lack of composability. One of the most mind-changing addition to C++17 will be Eric Niebler’s ranges. Ranges are basically a shift from iterators to a superior abstraction. Ranges enable programmers to write fluent and – hopefully – efficient algorithms.

Coding with ranges produces “pipelined” transformations. For example, to generate an infinite list of integers starting at 1, consider only even values, square them, take the first 10, and sum them:

It’s not my intention to talk about ranges and I am sure you rule them better than me (if not, give it a try and refer to the documentation). My point is: Eric’s proposal includes standard combinators like transform and take, and mutating ones like sort and unique. They are essentially the counterparts of those in the STL. Thinking in terms of standard blocks help embrace ranges effectively. If you are used to employ standard functions, you are already confident with several range views and algorithms.

When I can, I’ll try to show my solutions by using ranges as well. Clearly ranges are not part of the standard yet then these solutions are not intended to run on competitive programming sites yet. I’ll adoperate Clang on Visual Studio 2015 for my attempts.

 

Standard flexibility

 

Many people don’t employ standard algorithms massively. For some of them the reason is that algorithms are often closer to math than to programming – even if programming is math…

Consider the simple task of finding the minimum difference in a sequence of ascending sorted numbers. For example, in this sequence:

[10, 20, 30, 100, 200, 300, 1000]

10 is the minimum absolute difference (e.g. 20-10).

This is an explicit solution:

That solves the problem in a single pass.

The same problem written in Python:

minDiff = min([r - l for l, r in zip(elems, elems[1:])])

That is completely different because it employes a functional block that C++ misses: zip. Basically it combines elements pair-wise from N sequences (two in this case).

Look more closely at the C++ solution.

It’s basically an accumulation, right? In each iteration we calculate the difference between two adjacent elements and then we update the global minimum. But it’s not a std::accumulate because that function cannot operate on two sequences at the same time – at least without special iterators.

Any idea?

I am sure you found it. Your standard friend is inner_product:

We’ll turn back to inner_product later in this series because it’s powerful, but just to recap: the simplest overload of inner_product returns the result of accumulating a starting value with the inner products of the pairs formed by the elements of two ranges. That is:

init += range1[0]*range2[0] + range1[1]*range2[1] + ... + range1[N-1]*range2[N-1]

There is more: both sum (+) and multiplication (*) can be customized. In my example, I replaced accumulation with min and product with absolute difference. I also set the initial value to the maximum possible int value – in math I would set it to infinity, the identity of the minimum operation (e.g. min(infinity, value) is value). Note also that by passing [begin, end-1] and [begin+1, end] I basically obtained these “pairs”:

range[0], range[1]
range[1], range[2]
...
range[N-2], range[N-1]

inner_product has a name closer to math than to CS, but it is actually a general function, that hides a rich flexibility. When I say flexibility I am not saying composability. Flexibility means that the same function can be adapted to different scenarios. Standard functions generally have a great flexibility.

But flexibility can lead to misuses…

Suppose after we calculate the minimum difference we want to output the pairs of elements with that difference. Recalling the first example:

[10, 20, 30, 100, 200, 300, 1000]

The minimum difference is 10 and the pairs of elements differing by this value are:

10, 20
20, 30

So basically we need to iterate over adjacent elements and print them if their difference is 10:

[20-10, 30-20, 100-30, ...]

Is there any standard algorithm for this purpose?

std::transform is really close, but it does not really perform what we need mainly for two reasons: first of all, it does not guarantee in-order application of the custom function – or in-order iteration – and second, transformation is N to N, that is each element in the source gets mapped into another thing.

Ideally we need something like a zip and a copy_if or copy_if and a special iterator type wrapping two adjacent elements at the same time – things that are, for example, in boost. For this reason, I think a for loop is the best choice in “bare” C++:

Looking at this code, some C++ hackers may force std::equal to do the same thing, getting some sugar:

The sugar is given by not accessing the vector and not writing the for loop explicitly.

The problem is that the snippet is unclear and hackish: std::equal has not been designed to do such things. What a programmer expects is just a pair-wise comparison, eventually using a custom predicate. You know, don’t fall into the common “if all you have is a hammer everything looks like a nail”. Just say no. The more you practice with standard algorithms, the more they will pop up in case they match. Pretty much automatically.

Flexibility comes with responsibility. Standard algorithms constitute well-known patterns. They help understand the code because other people have clear expectations from them. Hacking standard algorithms can lead to disasters.

If you miss a block, at least two ways are doable: if you don’t need to reuse it, go with a local solution. If reuse is important (and generally it is) then write a function designed to be reused. This requires some care, and keeping in mind how standard C++ algorithms are designed can be a good starting point.

Since “some care” requires time, in competitive programming this is not always feasible. For this reason people tend to go with custom solutions, even for summing the elements of an array. In this series I have more time and there is nothing to prevent me to articulate and to show some solutions based on simple functions that C++ misses. Pragmatic competitive programmers generally keep such functions and snippets on hand. Sometimes the standard already provides such functions, and it’s just the matter of practicing.

Although standard algorithms are really basic, they support a rich flexibility, as I said. This means you can roll your own special iterator and solve the problem in a single pass – imagine also ostream supports operator<< for pairs:

Is it worth? There is not a single answer to this question. It’s really contextual, it depends on where you work at, if your codebase is too much complicated to need these abstractions, etc. Whenever you introduce (or use from a library) a new block (e.g. adj_it), it should be learnt by you and – hopefully – who works on your codebase.

Ranges should help a lot in these scenarios because in addition to a rich flexibility they have a strong composability. Let me turn the “partial C++17 support” on and show you my range-based solution to the initial problem:

It seems completely different from the inner_product one, but what it really changes is the way of thinking the algorithm. This solution is probably closer to Python than to C++, isn’t it? That’s because more blocks get available and composition plays an important role. This makes our way of thinking the algorithm closer to other paradigms. Again, this is an act of responsibility: choosing between – say – a for loop and a range-based solution can be disputable.

Are the inner_product and the ranges based solutions hard to understand? Definitely they are for programmers who don’t know standard idioms. It’s the same for map.insert that does not insert at all if the element is already there. Is std::map misleading? I know people who spend more time at dumping on standard things than the time they would need to learn standard things properly. Mastering language patterns is the key, and in the next posts I’ll show some patterns involving standard algorithms and containers.

In this series, first I’ll consider pre-C++17 solutions and whenever I can I’ll outline possible C++17 answers.

An invitation about ranges: please, post comments with your suggestions/corrections/opinions on my solutions because I think use cases are important both for Eric and for the whole C++ ecosystem.

 

What’s next

 

I don’t have a precise idea for the other posts yet. It’s possible I’ll discuss the usage of a certain standard algorithm in different scenarios, or how to deal with some kind of problems, or how to use a container to solve certain challenges, etc.

Definitely, since this series is about competitive programming, I’ll spend some posts on some peculiar aspects of challenges like paying attention to the constraints of the problem, or choosing the proper data type for the solution.

The next post will be kind of preliminary to the others. It’s basically about input and output.

Final note about code snippets

You probably noticed I used gist to render code snippets in this post. For the first time I needed to completely bypass WP: yesterday this post was broken because of snippets, with no apparent reason. I tried to fix them but after one hour of attempts I gave up. WP is the enemy of programming bloggers. I know, I should move to Jekyll or something better, but WP still has some advantages. For this reason I’m experimenting this “embedding Gist feature”. The procedure is a bit longer (create gist, paste code, give a sensible name, save, copy the URL into the post) but I think I would spend the same time at fixing WP troubles, so it’s ok for now.

A couple of weeks ago I found a simple bug in the dusty corners of a production repository I usually work on. A certain feature was rarely used and it seemed to be covered by a test…Actually it was not and some years ago a guy, feeling guarded by this test, changed the code bugging the feature, but nobody complained. Recently a user tried to resume this old feature but it didn’t work as expected.

I don’t want to bother you with all the details of the story but just to give you a bit of context: there was an old piece of code reading a small file through C FILE handles. Some years ago that piece of code was migrated to C++ streams (since files to read were really small and simple) and a silly bug was introduced. Since this bug was really simple I wondered if it was caused by inattention or ignorance, then I had a chat with the programmer who committed the change and I discovered it was caused by the latter reason. The fix was really easy.

Some days later I discussed about this problem with some friends and I realized they were unaware of this problem too. So I decided to write a short post about this story, maybe it is useful to other coders.

Imagine you are using streams to read some data from the standard input. This is what the input looks like:

number
some words in a line
number
some words in a line
...

And then imagine the following code reading that input:

int num; string line;
while ( (cin >> num) && getline(cin, line) )
; // something

Did you spot any problems?

If not, don’t worry, it’s a bit subtle.

Consider the invisible characters contained in the input stream:

10'\n'
some words'\n'
...'\n'

Actually this is not formally true on Windows, but in general you have a LF char at the end of each line. Let’s follow the code flow:

  • cin >> num correctly reads the int, stopping at (in the language of streams: “detecting but not consuming”) ‘\n’
  • getline(cin, line) now reads the next line until it encounters a line separator (‘\n’ by default). But ‘\n’ is still in the stream buffer and then getline returns immediately, storing nothing in line.
  • Again cin >> num is evaluated but this time it fails, because the stream is not fed with an int. failbit is set then. The loop terminates.
  • The user complains because the feature does not work as he expects. Ok, sorry this is not part of the code flow…

We just experienced a difference between operator>> and getline: the first skips any leading whitespace (actually any separator – according to the locale in use) before performing the read operation, instead, the second does not.

Basically, it has to do with the difference between formatted and unformatted input function. Stream operators (like operator>> for strings) belong to the former category, getline to the latter. In short – among other differences – formatted input functions skip leading separators (e.g. whitespaces, LF) by default, unformatted functions do not.

The long story is: both formatted and unformatted functions create basic_istream<CharT>::sentry objects for preparing input streams for I/O (e.g. checking the validity of the stream). One of the operations performed in the sentry constructor is skipping leading whitespaces. For deciding whether skipping or not it uses two information:

  • a bool parameter  passed to the constructor, that is false by default. Don’t get confused: it’s false when you want the sentry object to skip whitespaces (in fact, it’s generally called noskipws – e.g. _Noskip on Visual Studio).
  • ios_base::skipws flag (set or not on the stream object).

If _Noskip is false and the ios_base::skipws is true then leading whitespaces will be skipped.

I am sure you already imagine the rest of the story: when a formatted function creates a sentry, the parameter is left to its default value (false) and since cin‘s ios_base::skipws is true, operations like cin >> i work as expected even if some whitespaces stand in front of the int value. Conversely, unformatted functions create sentry instances by explicitly passing true to the constructor. For this reason the lonely leading ‘\n’ is not discarded.

[note]

Beware something about formatted/unformatted functions is changed between C++98 and C++11, in particular about istream& operator>>(streambuf*). In fact in C++98 it was a formatted operation, now it is unformatted.

[/note]

Why does getline preserve leading separators? I think because it’s kind of raw read operation. Note that if the delimiter is found, it is extracted and discarded (e.g. it is not stored and the next input operation will begin after it). This is important, because it enables such a code to work as expected:

stringstream ss("the first line\nthe second line)"
while (getline(ss, line)) 
{ ... // line does not contain '\n' at the end

How we fixed this issue?

The simplest thing you can do is just:

while ( (cin >> num >> std::ws) && getline(cin, line) )
;

The left hand side reads the int and then skips leading separators from the stream. std::ws is an input manipulator created for this purpose.

A bunch of other solutions are possible. For example the one using ignore:

while ( (cin >> num).ignore(numeric_limits<streamsize>::max(), '\n') && std::getline(cin, line))

Here we discard as many leading separators as possible, that is until either count characters are discarded, the delimiter (specified by the second parameter) is found or the end of the stream is reached.

Not only is the former solution simple and effective enough, but it also prevents oversights like:

10'\n'
'\n'
some words

Above the user left an empty line after the first number. The std::ws solution does not break, the ignore one does instead. On the other hand, std::ws solution does not preserve leading whitespaces, if you need them. But it was not our case (you can imagine the final code looked a bit more defensive than the one I showed here, but the main observations hold).

One can also develop a proxy object to allow code like this:

cin >> num >> std::ws >> as_line >> line;

as_line may also embody the std::ws part:

cin >> num >> as_line >> line;

It’s not hard to code such a machinery. For example:

struct lines_proxy
{
	istream& operator()(string& s)
	{
		return getline(is >> std::ws, s);
	}

	istream& is;
};

struct line_t {} as_line;

lines_proxy operator>>(istream& is, line_t)
{
	return{ is };
}

istream& operator>>(lines_proxy p, string& s)
{
	return p(s);
}

...

while (cin >> num >> as_line >> line)

Just for fun.

It’s over for today. The leading actor of our story was a really silly problem but the lesson learned was interesting: even if streams were designed as a “formatted” abstraction on top of I/O buffers, unformatted operations are still there. Mixing formatted and unformatted operations should be done with care.

Some general-purpose algorithms accept a range (e.g. two iterators) and an additional value. For example std::iota fills the specified range with sequentially increasing values, starting with a certain value V and repetitively evaluating ++V:


vector<int> aVector (5);

iota ( begin(aVector), end(aVector), 1);

// aVector is {1, 2, 3, 4, 5}

This algorithm is really simple, but how is it possible to operate on the initial value in a quite generic way? Suppose we want to reuse iota to fill the range with increasing squared values, e.g.: {1, 4, 9, 16, 25}.

Since iota accepts nothing but a value in the range, we cannot pass, say, a lambda. We have to feed it with a special value or something like that. A mimesis object (explained in Advanced C++ Metaprogramming, by Davide Di Gennaro) is something we may take inspiration from. Actually, according to Davide’s definition, a mimesis for a type T is a unary predicate that behaves like an instance of T. It generally implements operator==(const T&), operator!=(const T&) and operator “cast to T”. In other words if M is a mimesis for type T, then the fundamental property for M is:


M<T> m;

assert(m == static_cast<T>(m));

A simple example is a mimesis that identifies positive numbers:


template <typename scalar_t>;
struct positive
{
 bool operator==(const scalar_t& x) const
 {
    return x>0;
 }

 operator scalar_t() const
 {
    return 1; // an arbitrary positive number
 }
};

template <typename scalar_t>;
inline bool operator==(const scalar_t& x, const positive<scalar_t>& p)
{
    return p == x;
}

...
double arr[] = {-1.0, 2.5, 5.0};
auto it = std::find(arr, arr + 3, positive<double>()); // 2.5

With mimesis, we could unify algorithms (e.g. in the previous example we don’t need find_if anymore). You’re right, a mimesis takes more effort than a lambda, but it’s worth especially for generalizing functions that take a special value as an argument (I suggest you to read, for example, the “End of range” section in Davide’s book).

Ok, this was a very short intro. Back to our problem: How to use mimesis to generalize iota? Let’s have a look at a possible implementation of iota:

template<class ForwardIterator, class T>
void iota(ForwardIterator first, ForwardIterator last, T value)
{
    while(first != last) {
        *first++ = value;
        ++value; // pre-increment operator
    }
}

Suppose value is a mimesis that initially stores a certain int (say 1). The mimesis have to be able to perform two operations:

  • pre-increment (for line 6)
  • cast to int (for line 5)

My idea is to create a generic mimesis “holder” for iota that receives a function that performs the user’s operation, and which result will be casted to T (but you can extend it, for example, by providing also a pre-increment function):


template<typename T, typename MainFn>
struct iota_mimesis_t : MainFn
{
 template<typename TT, typename FF>;
 iota_mimesis_t(TT&& aValue, FF&& aMainFn)
    : MainFn(forward<FF>(aMainFn)), value(forward<TT>(aValue))
 {}

iota_mimesis_t& operator++()
 {
    ++value; // can be generalized
    return *this;
 }

 operator T()
 {
    return MainFn::operator()(value);
 }

 T value;
};

template<typename T, typename F>
iota_mimesis_t<T,F> iota_mimesis(T&& value, F&& fn)
{
 return iota_mimesis_t<T,F>(forward<T>(value), forward<F>(fn));
}

...

vector<int> v (5);
iota ( begin(v), end(v), iota_mimesis(1, [](int anInt){ return anInt*anInt; }));
// aVector is {1, 4, 9, 16, 25}

The key point here is to grasp the internals of iota (that are easy). Different algorithms could be more complicated, but often it’s worth striving to generalize some stuff. Mimesis could be precious allies to deal with “value-only” algorithms (think of legacy code also) and mixing them with modern lambdas can generalize even further.

This succint post is just an introduction. If you spot “lambda-unable” algorithms, try to make them “lambda-able” with mimes and be generic!

I think one of the most attractive feature of C++11 is about lambdas. They simplify and encourage the usage of STL algorithms more than before, and they (may) increase programmers productivity. Lambdas combine the benefits of function pointers and function objects: like function objects, lambdas are flexible and can maintain state, but unlike function objects, their compact syntax don’t require a class definition.

The syntax is simple:

auto aLambda = [ ] ( const string& name ) { cout << "hello " << name << endl; }
aLambda("Marco"); // prints "hello Marco"

The code above defines a lambda with no return value and receiving a const string& parameter. What about the “[ ]”? That identifier is the capture specification and tells the compiler we’re creating a lambda expression. As you know, inside the square brakets you can “capture” variables from the outside (the scope where the lambda is created). C++ provides two ways of capturing: by copy and by reference. For example:

string name = "Marco";
auto anotherLambda = [name] { cout << "hello " << name << endl; }
anotherLambda(); // prints "hello Marco"

This way the string is copied into the state of anotherLambda, so it stays alive until anotherLambda goes out of scope (note: you can omit the brackets if the lambda has no parameters). Differently:

string name = "Marco";
auto anotherLambda = [&name] () { cout << "hello " << name << endl; }
anotherLambda(); // prints "hello Marco"

The only difference is the way we capture the string name: this time we do by reference, so no copy is involved and the behavior is like passing a variable by reference. Obviously, if name is destroyed before the lambda is executed (or just before name is used): boom!

After this introduction, in this post  I’m going to discuss about an issue on capturing I encountered few days ago at work: what if I want to capture by moving an object instead of both copying and referencing? Consider this plausible scenario:

function<void()> CreateLambda()
{
   vector<HugeObject> hugeObj;
   // ...preparation of hugeObj...

   auto toReturn = [hugeObj] { ...operate on hugeObj... };
   return toReturn;
}

This fragment of code prepares a vector of HugeObject (e.g. expensive to copy) and returns a lambda which uses this vector (the vector is captured by copy because it goes out of scope when the lambda is returned). Can we do better?

Yes, of course we can!” – I heard. “We can use a shared_ptr to reference-count the vector and avoid copying it“:

function<void()> CreateLambda()
{
   shared_ptr<vector<HugeObject>> hugeObj(new vector<HugeObject>());
   // ...preparation of hugeObj...

   auto toReturn = [hugeObj] { ...operate on hugeObj via shared_ptr... };
   return toReturn;
}

I honestly don’t like the use of shared_ptr here but this should work well. The subtle (possible) aspect of this attempt is about style and clarity: why is the ownership shared? Why can’t I treat hugeObj as a temporary to move “inside” the lambda? I think that using a sharing mechanism here is like a hack to fill a gap of the language. I don’t want the lambda to share hugeObj with the outside, I’d like to “prevent” this:

function<void()> CreateLambda()
{
   shared_ptr<vector<HugeObject>> hugeObj(new vector<HugeObject>());
   // ...preparation of hugeObj...

   auto toReturn = [hugeObj] { ...operate on hugeObj via shared_ptr... };
   (*hugeObj)[0] = HugeObject(...); // can alter lambda's behavior
   return toReturn;
}

I need a sort of “capture-by-move”, so:

  1. I create the vector
  2. I “inject” it in the lambda (treating the vector like a temporary)
  3. (outside) the vector will be in a “valid but unspecified state” (what standard says about moved objects)
  4. nothing from the outside can alter the lambda’s vector

Since we are on the subject, getting rid of shared_ptr syntax (I repeat: here) should be nice!

To emulate a move capturing we can employ a wrapper that:

  • receives an rvalue reference to our to-move object
  • maintains this object
  • when copied, performs a move operation of the internal object

Here is a possible implementation:

#ifndef _MOVE_ON_COPY_
#define _MOVE_ON_COPY_

template<typename T>
struct move_on_copy
{
   move_on_copy(T&& aValue) : value(move(aValue)) {}
   move_on_copy(const move_on_copy& other) : value(move(other.value)) {}

   mutable T value;

private:

   move_on_copy& operator=(move_on_copy&& aValue) = delete; // not needed here
   move_on_copy& operator=(const move_on_copy& aValue) = delete; // not needed here

};

template<typename T>
move_on_copy<T> make_move_on_copy(T&& aValue)
{
   return move_on_copy<T>(move(aValue));
}
#endif // _MOVE_ON_COPY_

In this first version we use the wrapper this way:

vector<HugeObject> hugeObj;
// ...
auto moved = make_move_on_copy(move(hugeObj));
auto toExec = [moved] { ...operate on moved.value... };
// hugeObj here is in a "valid but unspecified state"

The move_on_copy wrapper works but it is not completed yet. To refine it, a couple of comments are needed. The first is about “usability“: the only aim of this class is to “replace” the capture-by-copy with the capture-by-move, nothing else. Now, the capture by move makes sense only when we operate on rvalues and movable objects, so is the following code conceptually correct?

// due to universal referencing, T is const T&, so no copy/move will be involved in move_on_copy's ctor
const vector<HugeObject> hugeObj;
auto moved = make_move_on_copy(hugeObj);
auto toExec = [moved] { ...operate on moved.value... };
// hugeObj here is the same as before

Not only is it useless, but also confusing. So, let’s impose our users to pass only rvalues:

template<typename T>
auto make_move_on_copy(T&& aValue)
     -> typename enable_if<is_rvalue_reference<decltype(aValue)>::value, move_on_copy<T>>::type
{
   return move_on_copy<T>(move(aValue));
}

We “enable” this function only if aValue is an rvalue reference, to do this we make use of a couple of type traits. Strangely this code does not compile on Visual Studio 2010, so, if you use it, try to settle for:

template<typename T>
move_on_copy<T> make_move_on_copy(T&& aValue)
{
   static_assert(is_rvalue_reference<decltype(aValue)>::value, "parameter should be an rvalue");
   return move_on_copy<T>(move(aValue));
}

You can also enforce the requirement about move-constructability by using other traits such as is_move_constructible, here I have not implemented it.

The second note is about compliance. Is the following code syntactically-clear?

vector<HugeObject> hugeObj;
auto moved = make_move_on_copy(move(hugeObj));
auto toExec = [moved]
  {
     moved.value[0] = HugeObject(...); // is it conform to standard lambda syntax?
  };

What aroused my suspicions was the syntax of lambda expressions: if you copy-capture an object, the only way to access its non-const members (aka: make changes) is to declare the lambda mutable. This is because a function object should produce the same result every time it is called. If we want to support this requirement then we have to make a little change:

template<typename T>
struct move_on_copy
{
   move_on_copy(T&& aValue) : value(move(aValue)) {}
   move_on_copy(const move_on_copy& other) : value(move(other.value)) {}

   T& Value()
   {
      return value;
   }

   const T& Value() const
   {
      return value;
   }

private:
   mutable T value;
   move_on_copy& operator=(move_on_copy&& aValue) = delete; // not needed here
   move_on_copy& operator=(const move_on_copy& aValue) = delete; // not needed here
};

And:

vector<HugeObject> hugeObj;
auto moved = make_move_on_copy(move(hugeObj));
// auto toExec = [moved]  { moved.Value()[0] = HugeObject(...); }; // ERROR
auto toExec = [moved] () mutable { moved.Value()[0] = HugeObject(...); }; // OK
auto toExec = [moved]  { cout << moved.Value()[0] << endl; }; // OK

If you want to play a bit, I posted a trivial example on ideone.

Personally I’m doubtful if this is the best way to capture expensive-to-copy objects. What I mean is that working with rvalues masked by lvalues can be a little bit harder to understand and then maintaining the code can be painful. If the language supported a syntax like:

HugeObject obj;
auto lambda = [move(obj)] { ... };
// obj was moved, it is clear without need to look at its type

It would be simpler to understand that obj will be in an unspecified state after the lambda creation statement. Conversely, the move_on_copy wrapper requires the programmer looks at obj’s type (or name) to realize it was moved and some magic happened:

HugeObject obj;
auto moved_obj = make_move_on_copy(move(obj)); // this name helps but it is not enough
auto lambda = [moved_obj] { ... };
// obj was moved, but you have to read at least two lines to realize it

[Edit]

As Dan Haffey pointed out (thanks for this) “the move_on_copy wrapper introduces another problem: The resulting lambda objects have the same weakness as auto_ptr. They’re still copyable, even though the copy has move semantics. So you’re in for trouble if you subsequently pass the lambda by value”. So, as I said just before, you have to be aware some magic happens under the hood. In my specific case, the move_on_copy_wrapper works well because I don’t copy the resulting lambda object.

Another important issue is: what semantics do we expect when a function that performed a capture-by-move gets copied? If you used the capture-by-move then it’s presumable you didn’t want to pay a copy, then why copying functions? The copy should be forbidden by design, so you can employ the approach suggested by jrb, but I think the best solution would be having the support of the language. Maybe in a next standard?

Since other approaches have been proposed, I’d like to share with you my final note about this topic. I propose a sort of recipe/idiom (I think) easy to use in existent codebases. My idea is to use my move_on_copy only with a new function wrapper that I called mfunction (movable function). Differently from other posts I read, I suggest to avoid rewriting a totally new type (that may break your codebase) but instead inherit from std::function. From a OO standpoint this is not perfectly consistent because I’m going to violate (a bit) the is-a principleIn fact, my new type will be non-copyable (differently from std::function).

Anyhow, my implementation is quite simple:

template<typename T>
struct mfunction;

template<class ReturnType, typename... ParamType>
struct mfunction<ReturnType(ParamType...)> : public std::function<ReturnType(ParamType...)>
{
  typedef std::function<ReturnType(ParamType...)> FnType;

  mfunction() : FnType()
  {}

  template<typename T>
  explicit mfunction(T&& fn) : FnType(std::forward<T>(fn))
  {}

  mfunction(mfunction&& other) : FnType(move(static_cast<FnType&&>(other)))
  {}

  mfunction& operator=(mfunction&& other)
  {
    FnType::operator=(move(static_cast<FnType&&>(other)));
    return *this;
  }

  mfunction(const mfunction&) = delete;
  mfunction& operator=(const mfunction&) = delete;
};

In my opinion, inheriting from std::function avoids reinventing the wheel and allows you to use mfunction where a std::function& is needed. My code is on ideone, as usual.

[/Edit]

Make the choice you like the most don’t forgetting readability and comprehensibility. Sometimes a shared_ptr suffices, even if it’s maybe untimely.

Lambdas are cool and it’s easy to start working with. They are very useful in a plenty of contexts, from  STL algorithms to concurrency stuff. When capturing by reference is not possible and capturing by copy is not feasible, you can then consider this “capture by move” idea and remember the language always offers the chance to bypass its shortcomings.

Sometimes we want some code to be executed after (or just before) the end of a function, also if an exeception is thrown. Suppose to deal with a C API that requires an initialization call and a termination call, something like that:


void use_C_api()

{

   Init_Lib();

   code_that_could_throw();

   Terminate_Lib();

}

The first thing a C++ programmer could think about is to encapsulate the library usage in a wrapper and use RAII (Resource Acquisition is Initialization). For example:

class LibWrapper
{
public:
 LibWrapper()
 {
    Init_Lib();
 }

 ~LibWrapper()
 {
    Terminate_Lib();
 }

 Call_code_that_could_throw()
 {
    code_that_could_throw();
 }
};

...

void use_C_api()
{
   LibWrapper wrapper; // Init here
   wrapper.Call_code_that_could_throw();
} // Terminate here (also if an exception occurs)

In C++  the only code that can be guaranteed to be executed after an exception is thrown are the destructors of objects residing on the stack, so this is our case.

Ok cool, but what if we just want to execute some code in a function? Do we have to write classes? What if we need something like Java’s finally or Go’s defer? Sometimes this kind of constructs suffice.

RAII is perfect for our purpose, but we need to create on-the-fly a sort of “finalizer”, passing it some code. Thanks to C++11 we can use lambdas to be enough generic. So, this is my idea:

template <typename F>
struct finalizer
{
template<typename T>
finalizer(T&& f) : m_f(std::forward<T>(f)) {}

~finalizer()
{
   m_f();
}
private:
   F m_f;
};

template <typename F>
finalizer<F> defer(F&& f)
{
   return finalizer<F>(std::forward<F>(f));
}

...

auto atEnd = defer( []{cout << "I'm printed out at the end of the scope!" << endl;} );

The code is trivial: the finalizer uses a template to accept any kind of “executable object” (lambdas, functions, functors, etc). I created a simple factory method to avoid expliciting the template parameter (the compiler will do the work for us).

But wait…I spot at least a possible problem in this mechanism, do you?

We’re relying on our compiler, aren’t we? What happens if our compiler doesn’t use RVO (Return Value Optimization)? We’ll be in a proper jam! We get a copy and then two destructions that lead us to a couple of calls to our at-end-code!

Is it clear? If not, try to change the factory this way and you’ll get the same effect:


template <typename F>
finalizer<F> defer_ops(F&& f)
{
   auto f1 = finalizer<F>(std::forward<F>(f));
   return f1;
}

Now our code is called twice (first when the copied object is destroyed, second at the end of the calling code’s scope). A simple way to “fix” this problem is to add a move constructor and a bool flag:

template <typename F>
struct finalizer
{
template<typename T>
finalizer(T&& f) : m_f(std::forward<T>(f)), m_isMoved(false) {}
finalizer(finalizer&& other) : m_f(std::move(other.m_f)), m_isMoved(other.m_isMoved)
{
   other.m_isMoved = true; // sort of "resource stealing" here
}
~finalizer()
{
  if (!m_isMoved)
  {
    m_f();
  }
}
private:
 F m_f;
 bool m_isMoved;
};

I don’t like this fix but it’s not possible to copy-assign a lambda-expression to another one (e.g. a no-op one – the idea would be to “reset” the moved finalizer’s m_f to an “empty” lambda). In fact the standard says “The closure type associated with a lambda-expression has a deleted copy assignment operator“.

If you’re unlucky and you compiler does not support C++11 move semantics then change your compiler! Obviously I’m joking. Let’s make another (even uglier) extra:

template <typename F>
struct finalizer
{
  // Please change it to a ctor taking a universal reference as soon as possible (when C++11 is available)
  finalizer(F f) : m_f(f), m_isMoved(false) {}

  // Please change it to a move ctor as soon as possible (when C++11 is available)
  finalizer(const finalizer& other) : m_f(other.m_f), m_isMoved(false)
  {
     other.m_isMoved = true; // sort of "resource stealing" here
  }
  ~finalizer()
  {
    if (!m_isMoved)
    {
      m_f();
    }
  }
  private:
   F m_f;
   // Please remove this mutable as soon as possible (when C++11 is available)
   mutable bool m_isMoved;
};

The key here is to replace the move constructor with the copy constructor and to add the mutable qualifier to our m_isMoved flag. Why? To be standard (the standard copy constructor receives a const&) and clear (it’s a design choice, though it is nasty).

To sum up: we started from a sort of elegant and simple code and we sank into a worse patched one. This regression may be caused by compilers that poorly support modern C++. In five minutes we wrote a simple code that worked and that solved our problem. More time was spent to fix and to meet compiler’s deficiencies.

RVO is a stupid example because I think that almost all C++ compilers support it, but sometimes you maybe need to disable it for some reason. Your code has to be prepared and must keep on working. Then be aware that design decisions are often made not only because of “core-business” but also because of your tools and facilities that are also part of your priorities just because they support your production. The more bad decisions are made because of infrastructure, the more you’re probably in trouble!

So, don’t forget to keep the barrel clean!

[Edit] Thanks to tr3w for letting me know some typos [/Edit]

Designing software systems is hard because it constantly asks you to choose, and in program design, just as in life, choice is hard. Choices can be conbined, which confers on design an evil multiplicity. Writing generic code is about giving the user the capability to choose the combination she likes the most and, possibly, the faculty to expand/customize the initial codebase.

C++ vector is a good example: it lets you choose what type of objects you are going to store in and, in case, what allocator you want to use. The allocator is optional because vector provides a default parameter. Default choices are prominent when your component admits simplified/general usage.

This post deals with fighting against evil multiplicity using a flexible and generative approach. I’m going to build a step-by-step example to support the discussion.

Consider a trivial logger class:

class Logger
{
public:
     Logger(ostream& stream) : m_stream(stream) {}

     void Log(const string& message)
     {
          m_stream << message;
     }

private:
     ostream& m_stream;
};

I have already made some decisions: construction (provide a reference to an ostream), message type (string), what to log (log everything), how to log (just use operator<<). Improving genericity by logging every type supporting operator<< is quite simple:

template <typename MessageType>
void Log(const MessageType& message)
{
     m_stream << message;
}

Templates come to the rescue. Now, what if we want to choose what to log? Suppose we want to filter log entries depending on a certain severity (e.g. info, error, debug). Log method could look like this:

template <typename MessageType>
void Log(const MessageType& message, const Severity& severity)
{
     if (...severity...)
          m_stream << message;
}

How to choose what to log here? An approach could be injection:

class Logger
{
public:
     Logger(ostream& stream, Filter& filter) : m_stream(stream), m_filter(filter) {}

     template <typename MessageType>
     void Log(const MessageType& message, const Severity& severity)
     {
          if (m_filter.CanPrint(severity))
               m_stream << message;
     }
private:
     ostream& m_stream;
     Filer& m_filter;
};

Consider an example of usage:

ErrorFilter filter;
Logger logger = Logger (cout, filter);
logger.Log("Hello, this is an info and won't be logged!", Info());
logger.Log("This error will be logged!", Error());

What I dont’ like:

  • Creating severities every time I have to log,
  • Creating a filter and worrying about its ownership,
  • Making static decisions and using them “dynamically”.

Consider the first issue. To avoid creating severity objects every time Log method is called, we can employ templates, again:

template <typename SeverityType, typename MessageType>
void Log(const MessageType& message)
{
if (m_filter.CanPrint<SeverityType>())
          m_stream << message;
}

I swapped SeverityType and MessageType to omit the second template parameter and let the compiler to get by on its own:

logger.Log<Error>("this is an error!");

Second and third issues are strictly related: first I created a filter then I injected it into the Logger. I consider these operations dynamic, meaning that the binding between Logger and Filter is after the program starts. How can we improve this? How can we benefit from static binding here?

An elagant way to encapsulate “static” decisions is built on policies. The mechanics of policies consist of a combination of templates with multiple inheritance. This technique is purposely conceived to support flexible code generation by combining a small number of these “primitive devices”. It’s easier to apply a filter policy to our example than to bore you with abstract concepts:

template <typename FilterPolicy>
class Logger : public FilterPolicy
{
     // just for readablity (a sort of "specification")
     using FilterPolicy::Filter;
public:
     Logger(ostream& stream) : m_stream(stream) {}

     template <typename SeverityType, typename MessageType>
     void Log(const MessageType& message)
     {
          // inherited from FilterPolicy
          if (Filter<SeverityType>())
               m_stream << message;
     }
private:
     ostream& m_stream;
};

A class that uses policies — a host class — is a template with many template parameters, each parameter being a policy. The host class “indirects” parts of its functionality through its policies and acts as a receptacle that combines several policies in a coherent aggregate.

Now we can use our Logger class easily:

// typedef Logger<ErrorFilter> LoggerErrorFilter; -> use typedef if you need readability
Logger<ErrorFilter> logger(cout);
logger.Log<Error>("this will be logged!");
logger.Log<Info>("this won't be logged!");

Possibile implementations of filter policies:

// Severity types
struct Info {};

struct Error {};

struct Generic {};

struct ErrorFilter
{
     // pass nothing by default
     template<typename Severity>
     bool Filter() { return false; }
};

// Specialization -> Let pass errors
template<>
bool ErrorFilter::Filter<Error>() { return true; }

Relying on template specializations you can quickly create new filters. Play – mildly – with macros if you are lazy:

// Generate a Filter by Name
#define CREATE_FILTER(Name)\
 struct Name \
 {\
 template<typename Severity> bool Filter() {return false;}\
 };

// [SUPPORT] Create a specialization
#define ADD_SEVERITY(FilterName, Severity)\
 template<> bool FilterName::Filter<Severity>() { return true; }

// One-Severity Filter
#define CREATE_FILTER_1(Name, Severity)\
 CREATE_FILTER(Name)\
 ADD_SEVERITY(Name, Severity)

// Two-Severities Filter
#define CREATE_FILTER_2(Name, Severity1, Severity2)\
 CREATE_FILTER_1(Name, Severity1)\
 ADD_SEVERITY(Name, Severity2)

// ... add 3,4,... versions if you need

// User's code

CREATE_FILTER_2(ErrorInfoFilter, Error, Info)

...

Logger<ErrorInfoFilter> logger(cout);
logger.Log<Error>("this will be logged!");
logger.Log<Info>("this will be logged too!");

This is just a possible implementation. In a future post I’ll show you another code based on tuples, with no macros nor specializations. By the way, you can add new filters with complicated logic by creating new policies that observe FilterPolicy’s specification (just the Filter function). Think of the specification as the methods the host class expects to use. This differs from classical object-orientation where you need to observe a typed interface; here this constraint is blunt because a policy only need to observe a sort of “signature expectations”.

Another responsability we can confine in a policy regards outputting the message:

struct OStreamPolicy
{
   OStreamPolicy(ostream& stream) : m_stream(stream) {}

   template<typename Message>
   void Out(const Message& message)
   {
       m_stream << message << endl;
   }

private:
   ostream& m_stream;
};

// very fake class!
struct DBPolicy
{
   DBPolicy(const string& credentials) : m_db(credentials) {}

   template<typename Message>
   void Out(const Message& message)
   {
       m_db.Insert(message);
   }

private:
   LogDB m_db;
};

template <typename FilterPolicy, typename OutPolicy>
class Logger : public FilterPolicy, public OutPolicy
{
     using FilterPolicy::Filter;
     using OutPolicy::Out;
public:
     template<typename ARG1>
     Logger(ARG1&& arg1) : OutPolicy(forward<ARG1>(arg1)) {}

     template <typename SeverityType, typename MessageType>
     void Log(const MessageType& message)
     {
          // inherited from FilterPolicy
          if (Filter<SeverityType>())
               // inherited from OutPolicy
               Out(message);
     }
};

...

Logger<ErrorFilter, OStreamPolicy> logger(cout);

Logger<ErrorFilter, DBPolicy> db_logger("POLICY-BASED");

I used perfect forwarding to enforce genericity. To improve this design you can, for example, employ a variadic constructor (generic and variable number of arguments). A radical approach could completely eliminate construction parameters:

struct CoutPolicy
{
   template<typename Message>
   void Out(const Message& message)
   {
      cout << message << endl;
   }
};

Choose the design more convenient for you but don’t be too maniac about static decisions (for example you may need to read a log file from a network source and this can’t be determined at compile time!). Adding other out policies is easier than adding filter ones!

The power of policies comes from their ability to mix and match. A policy-based class can accommodate very many behaviors by combining the simpler behaviors that its policies implement. This effectively makes policies a good weapon for fighting against the evil multiplicity of design.

Using policy classes, you can customize not only behavior but also structure. This important feature takes policy-based design beyond the simple type genericity that’s specific to container classes.

The last thing I’m going to show you is about enriched behavior: a policy can provide supplemental functionality that propagates through the host class due to public inheritance. Furthermore, the host class can implement enriched functionality that uses the optional functionality of a policy. If the optional functionality is not present, the host class still compiles successfully, provided the enriched functionality is not used.

For example, consider an OutPolicy such as:

struct SpecialOutPolicy
{
   template<typename Message>
   void Out(const Message& message)
   {
      // not relevant
   }

   template<typename Message>
   void SpecialOut(const Message& message)
   {
      // another logging method, not "required" by the host class "specification"
   }
};

You inherit SpecialOut by instantiating a Logger templetized on SpecialOutPolicy. This feature can be particularly useful when you need to control what operations can be executed according to static (aka by design) constraints. Trivial example: suppose you have to interact with an I/O device through a certain wrapper, specifying an appropriate opening mode (e.g. text read, text write, binary read, binary write). An old-style implementation could be:

class IODeviceWrapper
{
     IODeviceWrapper(int mode) // lots of #defines to avoid numbers
     { ...  }
     <SomeType> Read(<SomeParameters>) { ... }
     <SomeType> Write(<SomeParameters>) { ... }
     ...
};

What if I instantiate the device using a read mode and then I attempt to write? It shouldn’t be permitted by design. We can enforce this constraint by employing policies:

template<typename Mode>
class IODeviceWrapper : public Mode
{
   ...
};

struct ReadMode
{
    <SomeType> Read(<SomeParameters>) { ... }
};

struct WriteMode
{
    <SomeType> Write(<SomeParameters>) { ... }
};

struct ReadWriteMode : public ReadMode, public WriteMode {};

...

IODeviceWrapper<OpenMode> wrapper;
wrapper.Read(...); // ok
wrapper.Write(...); // not compile

IODeviceWrapper<ReadWriteMode> anotherWrapper;
anotherWrapper.Read(...); // ok
anotherWrapper.Write(...); // now ok

Such techinques can enforce not only customization but also design constraints. Thanks to their flexibility, policies are great allies if you are writing a library or a set of APIs. Furthermore, policy-based design is open-ended that is the library should expose the specifications from which these policies are built, so the client can build her own.

If you want to study in deep this topic I recommend Modern C++ Design by Alexandrescu. In addition to an excellent introduction to policy-based design, you can find real examples of usage applied to design patterns (e.g. command, visitor). Although this book was written more than 10 years ago, you can learn:

  • non-trivial examples of policy-based classes,
  • template metaprogramming techniques,
  • C++ & modern implementations of Design Patterns,
  • implementation ideas of some C++11 concepts.

Policy-based design requires you to change point of view about software decisions. Consider not only dynamic binding but also static (or compile time) binding. A more powerful approach lies in mixing static binding with dynamic binding, but this is another story!

A template is a way to generate code. C++ supports generic programming through the template mechanism, which allows defining parameterized classes and functions, over a set of types, functions or constants. Templates together with other C++ features constitute a Turing-complete (e.g. a queue automata), compile-time sublanguage of C++.

For example:

template <typename One, typename Two>
struct Pair
{
 One first;
 Two second;
};

Providing arguments to a template, instantiates (at compile-time) the template with those arguments:

Pair<float,int> myNumericalPair;
 // One = float
 // Two = int

This produces something like this:

struct Pair
{
 float first;
 int second;
};

Remember that template classes (and functions) are not generated if not used, then the following code is correct (but clearly immoral!):


template <typename T>
struct MyTemplateClass {
	void functionThatWorksWithInt() {
		cout << "Just a ping on cout!" << endl;
	}

	void functionThatDoesNotWorkWithInt() {
		member.function(); // int does not have "nested functions"!
	}

	T member;
};
//...
MyTemplateClass<int> mtc; // ok
mtc.functionThatWorksWithInt(); // ok

In the code above, the function functionThatDoesNotWorkWithInt is never called by mct so the code is correct! This is a caveat, be careful. I’m going to face with this problem at the end of this post.

Another important feature is template specialization: given a template with one or more parameters we can provide special versions of the template where some of the parameters are fixed. There are two types of specializations: full (when all the arguments are specialized) and partial (when arguments have particular structure):

/* Primary template */
template<typename T>  class Set
{
    // e.g. Use a binary tree
};

/* Full specialization */
template <> class Set<char>
{
    // e.g. Use a bit vector
};

/* Partial specialzation */
template <typename T>  class Set<T*>
{
    // e.g Use a hash table
};

When we have more than one parameter, we can specialize a subset of the types:

/* general version */
template<typename T, typename U> class A { ... };

/* specialization on T */
template<typename U> class A<int,U> { ... };

/* specialization on both */
template<> class A<int,int> { ... };

A<double, double> a;  // uses general version
A<int, double> b;     // uses first specialization
A<int, int> a;        // uses second specialization

Partial specialization does not apply to functions:

template <typename T, typename U>  T fun(U obj) {...}

// the following is illegal
template <typename U>  void fun<void, U>(U obj) {...}

// the following is legal (overloading)
template <typename T> T fun(Window obj) {...}

The parameters of a template can be constant numbers (they are known at compile-time) instead of types (and we can also specialize on certain numbers):

template<unsigned n>
struct Num {
     static const unsigned num = n;
};

Num<5> x;
Num<8> y;

cout << x.num << endl; // prints 5
cout << y.num << endl; // prints 8

We have all the key concepts to talk about Metaprogramming: a metaprogram is a program that produces or manipulates constructs of a target language. Specifically, a template metaprogram is a C++ program that uses templates to generate customized C++ code at compile-time.

Why would I ever want to do this? You rebut.

TMP has a great numbers of benefits. The most obvious reason  to rely on a metaprogram is that by dong as much work as possible before the resulting program starts, we get faster (but sometims “bigger”) programs. A subtler but perhaps more important argument for using a metaprogram is that the result of the computation can interact more deeply with the target language. For example, the brace-enclosed actions in a YACC grammar can contain arbitrary C/C++ code to be executed as part of the generated parser. That’s only possbile because the actions are processed during grammar compilation and passed on to the target C/C++ compiler.

Instead of doing computation at runtime or compile time, we could just do it by hand. A good metaprogram can outstrip this by enabling a more expressive and reusable code, narrowing human errors and improving the code maintenance process.

Ok, I can believe you, but when is metaprogramming appropriate? In their excellent book, Abrahams  and Gurtovoy – C++ Template Metaprogramming: Concepts, Tools and Techniques from Boost and Beyond – say that if any three of the following conditions apply then a metaprogrammed solution can be appropriate:

  • You want the code to be expressed in terms of the abstractions of the problem domain.
  • You would otherwise have to write a great deal of boilerplate implementation code.
  • You need to choose component implementations based on the properties of their type parameters.
  • You want to take advantage of valuable properties of generic programming in C++ such as static type checking and behavioral customization, without loss of efficiency.
  • You want to do it all within the C++ language, without an external tool or custom source code generator.

Stop talking, let’s write some code! I’ll show you a simple metaprogram. Note that the code generation engine embedded in the C++ compiler can actually make some calculation by using recursion – to learn TMP, the functional metaprogrammer in you should pipe up!

Here is how to compute the factorial:

template<int n>
struct Fact {
    enum { value = n * Fact::value };
};

template <>
struct Fact {
    enum { value = 1 };
};

cout << Fact<5>::value << endl;

The key trick is to combine specialization with integral parameters. The specialization “stops” the recursion (the base case).

TMP is a programming world (I said, C++ TMP is a programming language, just like C++) then there exist several techniques and usages.

If you want to deeply learn TMP, I suggest you to read a handful of books:

  • C++ Template Metaprogramming: Concepts, Tools and Techniques from Boost and Beyond – it employs boost metaprogramming library (MPL) – a very good and widespread library.
  • Modern C++ Design: Generic Programming and Design Patterns Applied (by Andrei Alexandrescu) – it employs Loki (TMP library written by Alexandrescu). This book is a must-have.
  • C++ Templates: The Complete Guide (by Vandevoorde and Josuttis) – a very good book, it provides complete and accurate information on using templates.

Let’s write other metaprograms!

Sometimes you need to detect if two generic types A and B are in a relationship of convertibility, say A is convertible to B. We know that sizeof() is a compile-time mechanism of C/C++ to compute the size of a type (since it is done at compile-time, its argument is not actually executed at run-time, but it is only evaluated). For example, if the argument contains a function call, it only checks its (return) type:

int myIntFunction();
sizeof(myIntFunction()); // equivalent to sizeof(int)

We can define two types of different sizes:

typedef char OneByte;
class TwoBytes { char dummy[2]; };

Now take a breath and read:

template<typename T, typename U>
class Conversion {
     typedef char OneByte;
     class TwoBytes { char dummy[2]; };
     static OneByte Test(U);
     static TwoBytes Test(...);
     static T MakeT();
public:
     enum { exists = sizeof(Test(MakeT())) == sizeof(OneByte) };
};

What’s happened?

Keep calm.

We have two template parameters: T and U. There is a simple MakeT() function which type is T. We have also two functions Test(): the first takes one of the template arguments (U) and the other everything (…).

Intuitively, if we invoke MakeT() when T = U then the type of MakeT() is U. At this point, we get the “sizeof” Test(U) (because we just have a Test(U) function) and this returns OneByte. Then the left-hand side is equal to the right-hand side of the last expression, so the enum exists is 1 (aka true).

The key point is to understand if the compiler can automatically convert T to U. If so, it will use Test(U), otherwise Test(…)Test(U) and Test(…) have different types with different sizes, then the final equality expresses if the conversion takes place.

On the wake of the factorial example, here it is another recursive metaprogram. The idea is to compute the sum of the squares of the first n natural numbers. Let’s start with the metafunction (class-template as a function) that computes the square of a number. It’s quite simple but remember that it has to compute values at compile-time:

template<int n>
struct Square
{
	enum { RET = n*n };
};

Ok, that was easy. Now we have to use this metafunction inside another metafunction! We have to employ this metafunction as a template parameter. The syntax is a little bit more complicated but don’t be scared:

template <int n, template<int> class F>
class MyClassWithMetafunction { ... };

This means that MyClassWithMetafunction receives an int (n) and a class (F) that receives an int. No more. Here it is the remaining code:

// metafunction as "template parameter"
template <int n, template<int> class F>
struct Accumulate
{
	enum { RET = Accumulate::RET + F::RET };
};

template <template<int> class F>
struct Accumulate<0, F>
{
	enum { RET = F::RET };
};

Accumulate<0,F> is the usual base case that stops the recursion. At each step, Accumulate sums the result of the metafunction applied to the current n and goes on with the recursion. Wake up the functional programmer in you, he’ll understand better!

Are you fed up with TMP?! Let me show you the last example and I stop. I promise!

Sometimes we would like to know if a type has a certain function. Generally this is called Interface Checking. This (also) deal with problems I pointed out at the beginning, talking about int and mysterious code that compiles though ints don’t have nested functions (just because these functions were not called)! Suppose we have a class MyClonable that has a Clone() function:

class MyClonable {
public:
	MyClonable* Clone() const { ... }
};

We want to write a metaprogram to verify if a type has this function. To do this, a good idea is to separate the concept of checking by confining it into a separate class.

template <typename T>
class HasClone
{
public:
	HasClone() { void (*p)() = Constraints; }

	static void Constraints()
	{
		T* (T::*test)() const = &T::Clone;
	}

};

template <typename T>
class InterfaceCheck : HasClone<T> {};

InterfaceCheck<MyClonable> c; // if MyClonable didn't have Clone then this would be a compilation error

First thing to undestand: we use a “base” class (HasClone<T>) to encapsulate the checking, when we construct the base class we execute its constructor that declares a pointer to the Constraints function. Setting up this little inheritance is also convenient to force the compiler to always instantiate HasClone<T> and then check the “constraints” it brings along.

Constraints does the trick (still at compile-time): it declares a pointer to the Clone member function of the template parameter (T), but if this function does not exist, all the evil spirits wil jump out to catch you because you’ll have a compilation error. Don’t get confused with all those pointer definitions, the code is easier than you think!

I hope this post can be useful to intrigue you about metaprogramming. I heartly recommend you to write your codes, starting with simple examples (the factorial is a good start). Then you can read one of the books I mentioned before. And don’t forget to wake up the functional programmer in you!