Posts Tagged ‘C++17’

string_view odi et amo

Posted: January 3, 2017 in Programming Recipes
Tags: ,

string_view-like wrappers have been successfully used in C++ codebases for years, made possible by libraries like boost::string_ref. I think all of you know that string_view has joined the C++ standard library since C++17.

Technically, basic_string_view is an object that can refer to a constant contiguous sequence of char-like objects with the first element of the sequence at position zero. The standard library provides several typedefs for standard character types and std::string_view is simply an alias for:

basic_string_view<char>

For simplicity, I’ll just refer to string_view for the rest of the post but what I’m going to discuss is valid for the other aliases as well.

You can imagine string_view as a smart const char* which provides any const member function of std::string as well as a few handy utilities to reduce its span. You cannot enlarge a string_view until you reassign it. Other languages (e.g. Go) have similar constructs that permit to grow the range as well as to participate in the ownership of such range. Although string_view does not, the power of such simple wrapper is huge, though.

The applications of string_view are many and it’s relatively simple to let string_view join your codebase. For years, I’ve been using a proprietary implementation of string_view dated back to the 90s and then improved on the base of boost::string_ref and recently on std::string_view. If you start today, it’s very likely you can adoperate your compiler’s string_view implementation (e.g. latest Visual Studio 2017 RC, clang and GCC support it), you can grab an implementation from the web or you can just use boost::string_ref or another library (e.g. Google’s, folly).

One can think that using string_view is as simple as using std::string with the only difference that string_view does not take the ownership of the char sequence and cannot change its content. That’s not completely true. Adoperating string_view requires you to pay attention to a few other traps that I’m going to describe later on. Before starting, let me show you a couple of simple examples.

Generally speaking, string_view is a good friend when we need to do text processing (e.g. parsing, comparing, searching), but first of all, string_view is an adapter: it allows different string types to be adapted into a std::string-like container. This means that string_view provides iterator support and STL naming conventions (e.g. size, empty). To create a string_view, we only require a null-terminated const char* or both a const char* and a length. Note that in the latter case we don’t need the char sequence to be null-terminated.

Suppose now that our codebase hosts many different string types but we want to write only one function doing a certain task on constant strings. Can string_view help? It can, if the string types manage a contiguous sequence of characters and also provide (read) access to it. Examples:

Then we may write only one function for our task:

ReturnType readonly_on_string_function(string_view sv); // only one implementation

Into readonly_on_string_function we can exploit the whole set of const functions of std::string. Just this simple capability is priceless. You know what I mean if you use more than three string types into your codebase 🙂

To show you other string_view functionalities, let me consider the problem of splitting a string. This problem can be tackled in many ways (e.g. iterator-based, range-based, etc) but let me keep things simple:

The worst things of this function are (imho):

  • we create a new string for each token (this possibly ends up with dynamic allocation);
  • we can split only std::string and no other types.

Since string_view provides every const function of string, let’s try simply replacing string with string_view:

Not only is the code still valid, but also potentially less demanding because we just allocate 8/16 bytes (respectively on 32 and 64 bit platforms – a pointer and a length) for each token.

Now, let’s use some utilities to shrink the span. Suppose I get a string from some proprietary UI framework control, providing its own string representation:

auto name = uiControl.GetText();

Then imagine we want to remove all the whitespaces from the start and the end of such string (we want to trim). We can do it without changing the string itself, just by using string_view:

remove_prefix moves the start of the view forward by n characters, remove_suffix does the opposite. Edge cases have been handled succinctly.

Now we have a string_view containing only the “good” part of the string. At this point, let me end with a bang: we’ll use the sanitized string to query a map without allocating extra memory for the key. How? Thanks to heterogeneous lookup of associative containers:

That’s possible because less<> is a transparent comparator and string_view can be implicitly constructed from std::string (thus, we don’t need to write operator< between std::string and std::string_view). That’s powerful.

It should be clear that string_view can be dramatically helpful to your daily job and I think it’s quite useless to show you other examples to support this fact. Rather, let me discuss a few common pitfalls I have met in the last years and how to cope with them.

#1: “losing sight of the string”

The first error I have encountered many times is storing string_view as a member variable and forgetting that it will not participate in the ownership of the char sequence:

Suppose that Parse is never called with a temporary (moreover, we can enforce that assumption just by deleting such overload), this code is still fine because the caller of Parse has also ‘current’ in scope. Then some time later, a programmer that is not very familiar with string_view (or who is simply heedless) puts the following error in the code:

‘someProcessing’ is a temporary string and then StatefulParser will very likely refer to garbage.

So, string_view (as well as span, array_view, etc) is often not recommended as a data member. However, I think that string_view as data member sometimes is useful and in these scenarios we need to be prudent, just like using references and pointers as data members.

#2: replacing const string& with string_view

string_view seems a drop-in replacement of const std::string& because it provides the whole set of std::string‘s const functions and also because it’s a view (reference). So, the general rule you hear pretty much everywhere (especially nowdays that string_view has officially joined the C++ standard) is “whenever you see const string&, just replace it with string_view“.

So let’s do that:

void I_dont_know_how_string_will_be_used_but_i_am_cool(const string& s);

We turn into:

void I_dont_know_how_string_will_be_used_but_i_am_cool(string_view s);

As users of this function, we are now permitted to pass whatever valid string_view, aren’t we?

As writers of this function, we may have now serious problems.

We have introduced a subtle change to our interface that breaks a sort of guarantee that we had before:  null-termination. string_view does not require (and then does not necessarily handle) a null-terminated sequence. On the other hand, string guarantees to get one back – with c_str().

Maybe you don’t need that feature, in this case the rest of the interface should be ok. Otherwise, if you are lucky, your code simply stops compiling because you are using c_str() somewhere in the code. Else, you are using data(), and the code continues compiling just fine because string_view provides data() as well.

This is not a syntactic detail. What should be clear is that the interface of ‘I_dont_know_how_string_will_be_used_but_i_am_cool’ is not seamlessly changed because now the user can just pass in a not null-terminated sequence of characters:

string something = "hello world";
I_dont_know_how_string_will_be_used_but_i_am_cool(string_view{something.data(), 5}); // hello

Suppose at some point you call a C-function expecting a null-terminated string (it’s common), then you call .data() on string_view. What you obtain is “hello world\0” instead of what the user expected (“hello”). In this case, you maybe only get a logical error, because the \0 is at the end of the string. In this other case you are not so lucky:

char buff[] = {'h', 'e', 'l', 'l', 'o'};
I_dont_know_how_string_will_be_used_but_i_am_cool(string_view{buff, 5});

Even if uncommon (generally string_view refers to real strings, that are always null-terminated), that’s even worse, isn’t it?

In general, string_view “relaxes” (does not have) that requirement on null-termination (it’s just a wrapper on const char*). Imagine that the DNA, the identity, of string_view is made of both the pointer to the sequence of characters and the number of referred characters (the length of the span). On the other hand, since string::c_str() guarantees that the returned sequence of characters is null-terminated, you can think that the identity of a string is just what c_str() returns – the length is a redundant information (e.g. computable by strlen(str.c_str())).

To conclude this point, replacing const string& with string_view is safe as far as you don’t expect a null-terminated string – if you are using c_str() then you can figure that out at compile time because the code simply not compile, otherwise you are possibly in trouble.

Since we are on the subject: replacing const string& with string_view has also another (minor) consequence because string_view involves some work, that is copying a pointer and a length. The latter is an extra, compared to const string&. That’s just theory. In practice you measure when in doubt.

#3: string = string_view::data() + string_view::size()

From the previous point, it should be evident that wherever you need to create a string from a string_view you have to use both data() and size(), and not only data(). You have to use the DNA of string_view. I have reviewed this error many times:

string_view sv = ...;
string s = sv.data(); // possibly UB

It does not work in general, for the same reasons I have just showed you (e.g. this constructor of std::string requires a null-terminated sequence of characters).

From C++17 you can just use one of string’s constructors:

string s { sv };

Before C++17, we have to use data() + size():

string s { sv.data(), sv.size() };

Clearly, as for std::string, you have to do the same for other string types. E.g.:

CString cstr { sv.data(), sv.size() };
#4: numerical conversions

Although C and C++ provide many functions to perform conversions between a number and a string/C-string (and viceversa), none supports a range of characters (e.g. begin + end, or begin + length). Moreover, every C/C++ conversion function expects the input string to be null-terminated. These facts lead to the conclusion that it does not exist any function able to convert a string_view into a number out of the box. We can use some C/C++ functions, but we have limitations. I’ll show you some in this section.

For instance, using atoi or C++11 functions we fall into traps or undefined behavior:

So, how to properly convert a string_view into a number? Many ways exist, generally motivated by different requirements and compromises. For the rest of this section I’ll refer only to int conversions because the end of the story is similar for other numeric types.

Sometimes, although it seems counterintuitive, to fulfill the null-termination requirement we can create an intermdiate std::string (or char array):

Actually, having a std::string we can rely on any C and C++ conversion function. Such intermdiate step of copying into a std::string is sometimes affordable because certain numeric types – like int – have a small number of maximum digits (e.g. int is 11). As far as the char sequence really contains one of such little data, the resulting std::string will be created without allocating dynamic memory thanks to SSO (Small String Optimization). Clearly, that shortcut does not hold for bigger numeric types and in general is not portable.

Other fragile solutions I encountered were based on sscanf and friends:

In some cases this code does not behave how we expect – e.g. when the converted value overflows and when the sequence contains leading whitespaces. Although I don’t recommend this approach, compared to the previous one, it only allocates a fixed amount of characters (e.g. 24) on the stack.

In many other cases, the approach is strictly based on how string_view is employed. This means that we have to make some assumptions. For example, suppose we write a parser for urls where we assume that each token is separated by ‘/’. Since atoi and strtol stops on the last character interpreted, if the whole url is both well-formed and stored into a null-terminated string (assumptions/preconditions) we can use such functions quite safely:

Basically, we assumed that the character past the end of any string_view is either a delimiter or the null-terminator. Pragmatically, many times we can make such assumptions, even if they distance our solution from genericity.

So, I encountered code like that:

In this example we use strtol to read an int and then we return the rest of the string_view. We basically try to “consume”  an int from the beginning of the string_view.

Note that C and C++ conversion functions have more or less relaxed policies on errors (mainly for performance reasons). For instance, if the conversion cannot be performed, strtol returns 0 and if the representation overflows, it sets errno to ERANGE. Instead, in the latter case the return value of atoi is undefined. What I really mean is that if you decide to use such functions then you are going to accept the consequences of their limitations. So, just pay attention to such limitations and take actions if needed. For example, a more defensive version of the previous code is:

The fact that it makes sense to check against the null-terminator (if (*entrPtr != 0)) is the fundamental assumption we made here. Generally such assumption is easy to make. Scenarios like this, instead:

string whole = "12345";
parse_int ( {whole.data(), 3}, i );

are still not covered, because the length of the string_view is not taken into account. For this, we have at least three options: create and use an intermdiate std::string (or use a std::stringstream – however only std::string benefits the SSO), improve the sscanf-based solution that somehow uses such information, or write a conversion function manually. It’s quite clear that C++ lacks a set of simple functions to convert char ranges to numbers easily, efficiently and with a robust error handling.

Actually, I think the most elegant, robust and generic solution is based on boost::spirit:

However, if you don’t already depend on boost, it’s quite inconvenient to do just for converting strings into numbers.

We have a happy ending, though. Finally, C++17 fills this gap by introducing elementary string conversion functions:

This new function will just convert the given range of characters into an integer. It is locale-independent, non-allocating, and non-throwing. Only a small subset of parsing policies used by other libraries (such as sscanf) is provided. This is intended to allow the fastest possible implementation. Clearly, overloads for other numeric types are provided by the standard.

To be thorough, here is an example of the opposite operation, using to_chars:

Both to_chars and from_chars return a minimal output which contains an error flag and a pointer to the first character at which the parsing stopped (e.g. something like what is written into endPtr in the strtol example).

Are you already looking forward to putting your hands on them?!

 

Here is wrap-up of the main points we covered in this post:

  • string_view is a smart const char*: an object that refers to a constant sequence of characters, keeps track of its length and provides any const function of std::string;
  • just like a reference or a pointer, you have to pay attention to storing string_view as a member variable;
  • string_view’s DNA is both the char sequence and the length:
    • the pointed sequence of characters is not necessarily null-terminated (e.g. c_str() does not exist);
    • whenever you need to copy the content of a string_view into a string(-like container), you have to use both;
  • bear in mind that replacing const string& with string_view implies the user can start passing not null-terminated strings into your functions (just ask yourself if that makes sense);
  • To convert a string_view into a number:
    • pre-C++17: use boost::spirit if you can, agree to compromises and use C/C++ functions with their limitations, or roll some utilities yourself;
    • since C++17: use from_chars.
  • string_view is already available in:
    • Microsoft Visual Studio 2017 RC
    • clang HEAD 4.0 (or in 3.8, under the experimental include folder)
    • gcc HEAD 7.0

This post concludes my introduction to C++ containers. We’ll meet other standard data structures such as lists, queues and heaps when needed along the way.

Some posts ago, I anticipated that understanding containers is crucial for adoperating standard algorithms effectively. In a few words, the reason is that each container has some special features or it’s particularly indicated for some scenarios. On the other hand, algorithms work only in terms of iterators, which completely hide this fact. That’s great for writing generalized code, but it also merits attention because for exploiting a particular property of a container, you generally have to choose the best option yourself.

The only “specialization” that algorithms (may) do is in terms of iterators. Iterators are grouped in categories, which basically distinguish how iterators can be moved. For instance, consider std::advance that moves an iterator by N positions. On random-access iterators, std::advance just adds an offset (e.g. it += N), that is a constant operation. On the other hand, on forward iterators (basically they can advance only one step at a time) std::advance is obliged to call operator++ N times, a linear operation.

Choosing – at compile time – different implementations depending on the nature of the iterators is a typical C++ idiom which works well in many situations (this technique is an application of tag dispatching, a classical C++ metaprogramming idiom). However, to exploit the (internal) characteristics of a container, we have to know how the container works, which (particular) member functions it provides and the differences between using the generic standard algorithm X and the specialized member function X.

As an example, I mentioned std::find and std::map::find. What’s the difference between them? Shortly, std::find is an idiomatic linear search over a range of elements. It basically goes on until either the target value or the end of the range is reached. Instead, std::map::find…Wait, I am not a spoiler! As usual, let me start this discussion through a challenge:

Some days ago I gave my birthday party  and I invited some friends. I want to know which name is the most common among my friends. Or, given a sequence of words, I want to know which one occurs the most.

In this trivial exercise we need a way to count occurrences of words. For example:

matteo riccardo james guido matteo luca gerri paolo riccardo matteo

matteo occurs three times, riccardo twice, the others once. We print matteo.

Imagine to count the words by incrementing a counter for each of them. Incrementing a counter should be a fast operation. Finally, we’ll just print the string corresponding to the greatest counter.

The most common data structures to do this task is generally known as associative array: basically, a collection of unique elements – for some definition of “uniqueness”, which – at least – provides fast lookup time. The most common type of associative container is a map (or dictionary): a collection of key-value pairs, such that each possible key appears just once. The name “map” resembles the concept of function in mathematics: a relation between a domain (keys) and a codomain (values), where each element of the domain is related (mapped) to exactly one element of the codomain.

Designing maps is a classic problem of Computer Science, because inserting, removing and looking up these correspondences should be fast. Associative containers are designed to be especially efficient in accessing its elements by key, as opposed to sequence containers which are more efficient in accessing elements by position. The most straightforward and elementary associative container you can imagine is actually an array, where keys coincide with indexes. Suppose we want to count the most frequent character of a string of lowercase letters:

freq contains the frequencies of each lowercase letters (0 is a, 1 is b, and so on). freq[c – ‘a’] results in the distance between the char c and the first letter of the alphabet (‘a’), so is the corresponding index in the freq array (we already saw this idiom in the previous post). To get the most frequent char we just retrieve the iterator (a pointer, here) to the element with highest frequency (std::max_element returns such iterator), then we calculate the distance from the beginning of freq and finally we transform this index back to the corresponding letter.

Note that in this case lookup costs O(1). Although an array shows many limitations (e.g. cannot be enlarged, keys are just numbers lying in a certain range), we’ll see later in this series that (not only) in competitive programming these “frequency tables” are extremely precious.

A plain array does not help with our challenge: how to map instances of std::string?

In Computer Science many approaches to the “dictionary problem” exist, but the most common fall into a couple of implementations: hashing and sorting. With hashing, the idea is to – roughly speaking – map keys to integral values that index a table (array). The trio “insert, lookup, remove” has average constant time, and linear in the worst case. Clearly this depends on several factors, but explaining hash tables is beyond the target of this post.

The other common implementation keeps elements in order to exploit binary search for locating an element in logarithmic time. Often trees (commonly self-balancing binary search trees) are employed to maintain this ordering relation among the elements and also for having logarithmic performance on insert, lookup and removal operations.

The C++ STL provides both the hash-based (from C++11) and the sort-based implementations, providing also variants for non-unique keys (multi). From now I’ll refer to sort-based implementation as tree-based because this is the data structure used by the major C++ standard library implementers.

There is more: STL provides two kinds of associative containers: maps and sets. A map implements a dictionary – collection of key-value pairs. A set is a container with unique keys. We’ll discover that they provide pretty much the same operations and that under the hood they share a common implementation (clearly, either hash-based or tree-based). Also, a hash container and a tree container can be used almost interchangeably (from an interface point of view). For this reason, I’ll focus on the most used associative container: a tree-based map. We’ll discuss later about some general differences.

Please, give a warm welcome to the most famous C++ associative container: std::map. It’s a sorted associative container that contains key-value pairs with unique keys. Here is a list of useful facts about std::map:

  • Search, removal, and insertion operations have logarithmic time complexity;
  • elements are kept in order, according to a customizable comparator – part of the type and specified as a template argument (std::less by default – actually the type is different since C++17, read on for more details);
  • iterators are bidirectional (pay attention that increments/decrements by 1 are “amortized” constant, whereas by N are linear);
  • each map element is an instance of std::pair<const Key, Value>.

The latter point means that we are not permitted to change keys (because it would imply reordering). Eventually, you can get the entry, remove it from the map, update the key, and then reinsert it.

Ordered associative containers use only a single comparison function, that establishes the concept of equivalence: Equivalence is based on the relative ordering of object values in a sorted range.
Two objects have equivalent values with respect to the sort order used by an associative container c if neither precedes the other in c’s sort order:

In the general case, the comparison function for an associative container isn’t operator< or even std::less, it’s a user-defined predicate (available through std::key_comp member function).

An important observation: in C++, every time you have to provide a “less” comparison function, the standard assumes you implement a strict weak ordering.

Let’s use std::map to solve the challenge:

How it works: as far as we read a string we increment the corresponding counter by 1. map::operator[] returns a reference to the value that is mapped to a key that is equivalent to a given key, performing an insertion if such key does not already exist. At the end of the loop, freq is basically a histogram of words: each word is associated with the number of times it occurs. Then we just need to iterate on the histogram to figure out which word occurs the most. We use std::max_element: a one-pass standard algorithm that returns the greatest element of a range, according to some comparison function (that is std::less be default, a standard function object which – unless specialized – invokes operator< on the objects to compare).

Given that map entries are pairs, we don’t use pair’s operator< because it compares lexicographically (it compares the first elements and only if they are equivalent, compares the second elements). For instance:

"marco", 1
"andrea", 5

according to pair::operator<, “marco” is greater than “andrea” then it will result the max_element. Instead, we have to consider only the second value of the pairs. Thus we use:

If your compiler does not support generic lambdas (auto parameters), explicitly declare const pair<const string, int>&. const string is not fussiness: if you only type string, you get an extra subtle copy that converts from pair<const string, int> to pair<string, int>. Bear in mind that entries of map are pair<const K, V>.

Suppose now we have an extra requirement: if two or more names occur the most, print the lexicographically least. Got it?

matteo riccardo matteo luca riccardo

In this case, “matteo” and “riccardo” occur twice, but we print “matteo” because lexicographically lower than “riccardo”.

How to accommodate our solution to fit this extra requirement? There’s an interesting effect of using a sorted container: when we forward iterate over the elements, we go from lexicographically lower strings to lexicographically greater strings. This property combined with max_element automatically supports this extra requirement. In fact, using max_element, if more than one element in the range is equivalent to the greatest element, it returns the iterator to the first such element. Since the first such element is (lexicographically) the lowest, we already fullfill the new requirement!

Guess if we want to print the lexicographically greatest string…it’s just the matter of iterating backward! Having clear in mind these properties is a great thing. In this series we’ll meet many others.

Let’s continue our journey through std::map. Suppose that part of another challenge is to make our own contacts application. We are given some operations to perform. Two of them consist in adding and finding. For the add operation, we will have to add a new contact if it does not exist, or to update it otherwise. For the find operation, we will have to print the number of contacts who have a name starting with that partial name. For example, suppose our list contains:

marco, matteo, luca

find(“ma”) will result in 2 (marco and matteo).

The best data structure for this kind of task is probably a trie, but the pragmatic competitive programmer knows that in several cases std::map suffices. We can take advantage of the fact that a map keeps things in order. The challenge is also an excuse to show you how to insert into a std::map, since there are different ways to achieve this task.

We have two cases:

  1. insert/update an entry
  2. count names starting with some prefix

Our map looks like:

map<string, string> contacts; // let's suppose the contact number to be a string as well

The first one has been discussed a lot in many books and blogs, so much that C++17 contains an idiomatic function insert_or_assign. In a few words, to efficiently insert or assign into a map, we first look for the contact in the structure and in case of a match we update the corresponding entry; otherwise we insert it.

This is the simplest way to do that:

contacts[toInsertName] = toInsertNumber;

You may ask: why in C++17 do we bother with a specific function for this one-liner? Because we are C++. Because that one-liner is succinct, but it hides a subtle cost when the entry is not in the map: default construction + assignment.

As we have seen, contacts[toInsertName] performs a lookup and returns either the corresponding entry or a brand-new one. In the latter case a default construction happens. Then, = toInsertNumber does an assignment into the just created string. Is that expensive? Maybe not in this case, but it may be in general, and this kind of things matters in C++.

Here is more enlightening example: suppose we have a cache implemented with std::map:

You don’t want to update anything if key is already there. Rather, you first look for the value corresponding to key and only if it’s not there you invoke the lambda to calculate it. Can you solve it by using operator[]? Maybe (it depends on the value type), but it’s not effective nor even efficient. Often std::map novices come up with this code:

map::find locates the element with key equivalent to a certain key, or map::end if such element does not exist. map::emplace inserts a new element into the container constructed in-place with the given args if there is no element with the key in the container. emplace returns a pair<iterator, bool> consisting of an iterator to the inserted element, or the already-existing element if no insertion happened, and a bool denoting whether the insertion took place. True for “insertion”, false for “no insertion”.

This code implements what I call the infamous double-lookup anti-pattern. Basically, both find and emplace search the map. It would be better to – somehow – take advantage of the result of the first lookup to eventually insert a new entry into the map. Is that possible?

Sure.

This is the idea: if the key we look for is not in the map, the position it should be inserted in is exactly the position where the first element that does not compare less than the key is located. Let me explain. Consider this sorted list of numbers:

1 2 3 5 6 7 8

If we want to insert 4, where should it be inserted? At the position of the first number that does not compare less than 4. In other words, the first element greater or equal to 4. That is 5.

This is nothing more than what mathematics defines as lower boundstd::map provides lower_bound, and for consistency the STL defines std::lower_bound to perform a similar operation on sorted ranges. As a generic algorithm, lower_bound is a binary search.

Here is what the idiom looks like:

Since lower_bound returns the first element that does not compare less than key, it can be key itself or not. The former case is handled by the right hand side of the if condition: data.key_comp() returns the comparator used to order the elements (operator< by default). Since two equal elements do not compare less, this check has to be false. Otherwise, key is less than lb->first because lb points to one element past key (or to the end, if such element does not exist). Makes sense?

emplace_hint is like emplace, however it also takes an extra iterator to “suggest” where the new element should be constructed an placed into the tree. If correct, the hint makes the insertion O(1). map::insert has an overload taking this extra iterator too, resulting in a similar behavior (but remember that insert takes an already built pair).

A bit of simplification for pragmatic competitive programmers is when you do not use custom comparators: generally you may use operator== for checking equality:

if (lb != end(data) && key==lb->first)

Ha, C++17 has this insert_or_assign that should search the map and use Value’s operator= in case the entry has to be updated, or insert it otherwise (move semantics is handled as well). There is also another reason why insert_or_assign is important, but I’ll spend a few words about that leater, when unordered_map will be introduced.

Since I introduced lower_bound, I must say there is also an upper_bound: it locates the first element which compares greater than the searched one. For example:

1 2 3 4 5

upper_bound(3) locates 4 (position 3). What’s the point? Let’s turn our list into:

1 2 2 2 3 4

lower_bound(2) locates…2 (position 1), whereas upper_bound(2) results in position 4 (element 3). Combining lower_bound(2) with upper_bound(2) we find the range of elements equivalent to 2! Range is in C++ speaking (upper_bound(2) is one-past the last 2). This is extremely useful in multimap and multiset and indeed a function called equal_range which returns the combination of lower and upper bounds exists. equal_range is provided by all the associative containers (in unordered_* is only for interface interchangeability reason) and by the STL as an algorithm on sorted sequences – std::equal_range.

We’ll see applications of these algorithms in the future.

So, what about our challenge? Suppose it’s ok to use operator[] for inserting/updating elements – in this case string constructor + operator= are not a problem. We need to count how many elements start with a certain prefix. Is that easy? Sure, we have a sorted container, remember? Listen my idea: If we call lower_bound(P) we get either end, the first element equal to P or …suspense… the first element starting with P. Since lower_bound returns the position to the first element which does not compare less than P, the first element that looks like P$something is what we get if such element exists.

Now what? I’m sure you already did this step in your mind. We just iterate until we find either the end or the first element that does not start with P. From the previous post we already know how to verify if a string begins as another:

We are paying both a prefix comparison and a linear iteration from lb (write it as O(|P|*K), where |P| is the length of the prefix P, and K is the number of strings starting with P). Advanced data structures that more efficiently deal with these – possible – problems exist, but they are beyond the scope of this post. In a moment I’ll do another observation about this code.

I realized that the post is becoming longer than I imagined, so let me recap what we have met so far:

  • How to insert:
    • operator[] + operator=;
    • infamous double-lookup anti-pattern (e.g. find + insert);
    • efficient “get or insert”: lower_bound + emplace_hint/insert;
    • efficient “get or assign”/”insert or assign”: lower_bound + Value’s operator=;
    • C++17 insert_or_assign.
  • How to lookup:
    • find (aka: exact match);
    • lower_bound/upper_bound (aka: tell me more information about what I’m looking for);
    • operator[] (aka: give me always one instance – eventually new, if can be default-constructed);
    • bonus: at (aka: throw an exception if the element is not there);
    • bonus: count (aka: how many elements equivalent to K exist? 0 or 1 on non-multi containers).
  • Taking advantage of sorting. For instance:
    • we combined max_element’s “stability” – hold the first max found – with map’s order to get the max element that is also the lexicographically least (or the greatest, by iterating backward);
    • we learnt how to locate and iterate on elements which start with a given prefix.

To erase an element, you just call map::erase(first, last), or erase(iterator), or erase(key). More interesting is how to implement erase_if, an idiom simplified by C++11 because now erase returns the iterator to the last removed element. This idiom can be applied to every associative container:

 

Design choices

 

So, we still have an open question, right? What’s the difference between std::find and map::find?

You know, std::find is a linear search on a range. Now, I hope you understood that map::find is a logarithmic search and it uses a notion of equivalence, instead of equality to search elements.

Actually, there is a bit more.

Let’s raise the bar: what’s the difference between std::lower_bound and map::lower_bound? First of all: is it possible to use std::lower_bound with map::iterator? std::lower_bound requires simple forward iterators thus the answer is yes. So what?

Basically, std::lower_bound – just like all the other algorithms – works only in terms of iterators; on the other hand map::lower_bound – just like all the other map’s algorithms – exploits map’s internal details, performing better. For example, std::lower_bound uses std::advance to move iterators and you know that advancing a map::iterator results in linear time. Instead, map::lower_bound does a tree traversal (an internal detail), avoiding such overhead.

Although exceptions exist, the rule of thumb is: if a container provides its own version of a standard algorithm, don’t use the standard one.

I tell you a story about this rule. Remember that the comparator is part of the static interface of an associative container (it’s part of the type), unlike what happens in other languages like C# where the comparator is decoupled from the type and can be dynamically injected:

Dictionary<int, string> dict = new Dictionary<int, strint>(StringComparer.OrdinalIgnoreCase);

Some time ago I had a discussion with a collegue about this aspect of C++: he was using a map<string, SomeValueType> to store some information, but he was using it only for case-insensitive searches by calling std::find (the linear one). That code made my hair stand on end: “why not using a case-insensitive comparator and making this choice part of the map type?” – I asked. He complained: “C++ is breaking incapsulation:  I won’t commit my map to a specific comparator. My clients mustn’t know how elements are sorted”.

At first blush I got annoyed, but after a couple of emails I understood his point (it was about the architecture of the system he designed, rather than about C++ itself). At the end of a quite long  – and certainly interesting – discussion, I come up with a solution to – more or less – save both ways: I introduced a new type which inherited from std::map and allowed to inject a comparator at construction time, like in C# (using less<> by default). I don’t recommend this approach (for example, because the comparator can’t be inlined and every comparison costs a virtual call – it uses std::function under the hood), but at least we are not linearly searching the map anymore…

This story is just to say: use containers effectively. Don’t go against the language. std::map is not for linear search, as std::vector is not for push elements at front.

I’d like mentioning a fact about the cache. std::map is generally considered a cache-unfriendly container. Ok we can use allocators, but try to understand me: by default we are just pushing tree nodes into the heap, moving through indirections, etc. On the other hand, we are all generally happy with contiguous storage, like vectors or arrays, aren’t we? Is that possible to easily design cache-friendly associative containers? It is, when the most common operation is lookup. After all, what about using binary search on a sorted vector? That’s the basic idea. Libraries like boost (flat_map) provide this kind of container.

As my friend Davide Di Gennaro pointed out, given the triplet of operations (insert, lookup, erase), the best complexity you get for a general-purpose usage is O(logN, logN, logN). However, you can amortize one operation, sacrificing the others. For example, if you do many lookups but a few insertions, flat_map performs O(N, logN, N), but the middle factor is much lower (e.g. it’s cache-friendly).

Consider this example: we want to improve our algorithm to count our contact names which start with a certain prefix. This time, we use a sorted vector and std::lower_bound to find the first string starting with the prefix P. In the previous code we just iterated through the elements until a mismatch was found (a string not starting with P).

This time, we try thinking differently: say we have found the position of the first string starting with P. Call that position “lb” (lower bound). Now, it should be clear that we must find the next string not starting with P. Define this string to be the first greater than lb, provided that “greater” means “not starting with P”. At this point, do you remember which algorithm finds the first element greater than another, in a sorted sequence?

std::upper_bound.

So we can employ upper_bound, using a particular predicate, and we expect a logarithmic complexity. What will this predicate look like? Suppose we count “ma” prefixes. Strings starting with “ma” are all equivalent, aren’t they? So, we can use a predicate which compares only “ma” (P) with the current string. When the current string starts with P then it’s equivalent to P and the predicate will return false. Otherwise, it will return true. After all, starting the search from lower_bound’s result, we can get either ma$something or $different-string:

Some clarifications: the first parameter of the lambda is always the value (or a conversion) of the upper bound to search for in the range (P, in our case). This is a guarantee to remember. The lambda returns false only when the current string is not starting with P (s1, inside the lambda body). std::upper_bound will handle the rest.

Why didn’t we use this approach directly on std::map? As I said at the beginning of this section, standard algorithms works in terms of iterators. Using std::upper_bound on std::map would result in logN * N. That additional N factor is caused by advancing iterators, that is linear on map::iterators. On the other hand, sorted vector provides random access iterators and so the final cost of counting prefixes is O (|P|*logN), given that we have sacrificed insert and removal operations (at most, linear).

 

Recent additions

 

C++14 and C++17 add new powerful tools to associative containers:

  • Heterogeneous lookup: you are no longer required to pass the exact same object type as the key or element in member functions such as find and lower_bound. Instead, you can pass any type for which an overloaded operator< is defined that enables comparison to the key type. Heterogenous lookup is enabled on an opt-in basis when you specify the std::less<> or std::greater<> “diamond functor” comparator when declaring the container variable, like: map<SomeKey, SomeValue, less<>>. See here for an example. This works only for sorted associative containers.
    This feature is also kwnown by some people as “transparent comparators”, because comparators that “support” this feature have to define a type is_transparent = std::true_type. This is basically required for backwards-compatibility with existing code (see for example here for a more detailed explanation). A terrific usage of this feature is, for example, searching on a map<string, Val> by passing a const char* (no conversion to string will be performed).
  • try_emplace and insert_or_assign, as an improvement of the insertion interface for unique-keys maps (more details here).
  • Ordered By Default: Mostly for both design and ABI compatibility reasons, ordered associative containers now specify as a default compare functor the new std::default_orderer_t, (that behaves like std::less – more details here).
  • Splicing maps and sets: (following words by Herb Sutter) you will now be able to directly move internal nodes from one node-based container directly into another container of the same type (differing at most in the comparator template parameter), either one node at a time or the whole container. Why is that important? Because it guarantees no memory allocation overhead, no copying of keys or values, and even no exceptions if the container’s comparison function doesn’t throw. The mechanics are provided by new functions .extract and .move, and corresponding new .insert overloads (approved proposal here).
  • Structure bindings: we should be able to iterate on maps this way:

 

A few words about unordered_map

 

We end this long post by spending some words about unordered associative containers. I don’t show you multi* containers because they are more or less the same as the corresponding non-multi ones. Clearly, they support multiple instances of the same key and, as I said, equal_range plays an important role for lookups. I’ll probably spend more time on multi containers when needed in future challenges.

After this section we’ll see a final example using unordered_set.

As std::map does, std::unordered_map contains key-value pairs with unique keys. Unlike std::map, internally the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its key. This allows fast access to individual elements, since once the hash is computed, it refers to the exact bucket the element is placed into. For this reason, search, insertion, and removal of elements have average constant-time complexity. Due to the nature of hash, it’s hard/impossible to know in advance how many collisions you will get with your hash function. This can add an element of unpredictability to the performance of a hash table. For this reason we use terms like “average”, “amortized” or “statistically” constant-time when referring to operations of a hash container.

This is not a discussion on hash tables, so let me introduce some C++ things:

  • STL provides a default std::hash template to calculate hash of standard types;
  • std::hash can be eventually specialized for your types (or you can specify your own functor);
  • when a collision happens, an “equality” functor is used to determine if two elements in the same bucket are different (std::equal_to by default);
  • it’s possible to iterate through the elements of a bucket;
  • some hash-specific functions are provided (like load_factor, rehash and reserve);
  • unordered_map provides almost all the the functions of std::map.

The latter point simplify our life to interchange std::map with std::unordered_map. There are two important things to say about this fact: 1) lower_bound and upper_bound are not provided, however equal_range is; 2) passing hints to unordered_ containers insertion functions is not really useful – actually it is to make the insertion “exit quickly”.

We know that on ordered associative containers, conditional insertion with lower_bound is the best way of performing an “insert or update”. So what? How can we say that ordered/unordered containers are more or less interchangeable if we miss lower_bound/upper_bound? We may apply equal_range:

This idiom is equivalent to the one using lower_bound (both semantically and from a performance point of view), plus it works on unordered maps.

Note that in C++17, try_emplace and insert_or_assign will dramatically improve the usability of unordered associative containers that will efficiently handle the case when we need to first perform a lookup and eventually insert a new element if that element is not present (first of all, the hash value won’t be recalculated). That’s the real value of such additions: not only using insert_or_assign is more efficient, but also clearer and truly interchangeable.

 

Tree or Hash?

 

There are some general rules/facts to take into account when in doubt between tree-based or hash-based containers. They are general, this means – as always – that when really in doubt you can start with one, profile your code, change to the other (again, thanks to interface compatibility), profile again, and compare.

By the way, here are some facts for the pragmatic competitive coder:

  • on average, hash-based lookup is faster;
  • generally, hash-based containers occupy more memory (e.g. big array) than tree-based ones (e.g. nodes and pointers);
  • tree-based containers keep elements in order, is that feature important? (e.g. prefix lookups, get top elements, etc);

 

An application of unordered_set

 

Since I introduced std::unordered_map…let’s do a challenge involving unordered_set 🙂 Jokes apart, this long post hosted mostly maps, I’d like concluding with set, a really helpful and minimal associative container that we will meet again in the future.

That’s the challenge:

Given N unique integers, count the number of pairs of integers whose difference is K.

For example, for this sequence:

1 5 3 4 2

With K = 2, the answer is 3 (we have three pairs whose difference is 2: 4-2, 3-1, 5-3).

The trivial solution to this problem has quadratic time complexity: we enumerate all the pairs and we increment a counter when the difference is K.

The way to tackle this problem is to convert the problem space from one in which we consider pairs to a search for a single value. The i-th element contributes to the final count only if A + K is in the sequence. For instance:

1 5 3 4 2

With K = 2. 1 contributes because 1 + 2 = 3 is in the list. Likewise, 3 is fine because 3 + 2 = 5 is in the list. And the same for 2, because we spot 2 + 2 = 4.

We can then store the input into an unordered_set (on average, constant time lookup), iterate on the elements and for each value A search for A + K:

Some sugar: std::count_if makes it clear that we are counting how many elements satisfy a predicate. Our predicate is true when currElem + K exists in the set: we use unordered_set::count(A) to get the number of elements equal to A (either 0 or 1 since we use a non-multi set). As an idiom, on non-multi associative containers, container::count(Key) gives 1 (say, true) if Key exists, 0 (say, false) otherwise.

On average, this solution has linear time complexity.

Another approach that uses no extra space and that involves sorting exists. Can you find it?

That’s enough for now.

Recap for the pragmatic C++ competitive coder:

  • Don’t reinvent containers whenever standard ones fit your needs. Consider STL associative containers:
    • std::map, std::set, std::multimap and std::multiset are sorted, generally implemented as self-balancing binary-search trees;
    • std::unordered_map, std::unordered_setstd::unordered_multimap and std::unordered_multiset are not sorted, imlemented as hash tables;
  • Understand idioms to adoperate STL associative containers effectively:
    • Does an element with key equivalent to K exist? Use count(K);
    • Where is the element with key equivalent to K? Use find(K);
    • Where should the element with key equivalent to K be inserted? Use lower_bound(K) on sorted containers;
    • Insert a new element: use emplace, insert;
    • Insert a new element, knowing where: use emplace_hint, insert (only sorted containers take advices effectively,  unordered ones are presumptuous!);
    • Insert or update an element: operator[] + operator=, (C++17) insert_or_assign, equal_range + hint insertion (this is also for interface compatibility between ordered/unordered), lower_bound + hint insertion (only on sorted containers);
    • Get the element corresponding to key equivalent to K, or default if not present: use operator[K];
    • Get the element corresponding to key equivalent to K, or exception if not present: use at(K);
    • Erase elements: use erase(K), erase(first, last), erase(it).
  • Understand the difference between containers member functions and STL generic algorithms. For example:
    • std::find and $associative_container::find do different searches, using different criteria;
    • std::lower_bound and $sorted_container::lower_bound do the same, but the former performs worse than the member function because the latter works in terms of the container internal details and its structure.
  • Exploit standard algorithms properties. For example:
    • std::max_element and std::min_element are “stable”: max/min returned is always the first one found.
  • Prefer standard algorithms to hand-made for loops:
    • std::max_element/min_element, to get the first biggest/smallest element in a range;
    • std::count/count_if, to count how many elements satisfy specific criteria;
    • std::find/find_if, to find the first element which satisfies specific criteria;
    • std::lower_bound, std::upper_bound, std::equal_range, to find the “bounds” of an element within a sorted range.

I apologize if I didn’t publish any new posts in May but I was busy with the organization of the Italian C++ Conference 2016. If you feel like reading some lines about this event, check out this post.

In this installment of “C++ in Competitive Programming” we’ll meet a fundamental data structure in Computer Science, one that manages a sequence of characters, using some encoding: a stringAs usual, let me introduce this topic through a simple challenge:

A palindrome is a word, phrase, number, or other sequence of characters which reads the same backward and forward. Given a string print “YES” if it is palindrome, “NO” otherwise. The string contains only alphanumeric characters.

For example:

abacaba

is palindrome; whereas:

abc

is not.

We need a type representing a sequence of characters and in C++ we have std::string, the main string datatype since 1998 (corresponding to the first version of the ISO C++ standard – known as C++98). Under the hood, imagine std::string as a pointer to a null-terminated (‘\0’-terminated) char array. Here is a list of useful facts about std::string:

  • std::string generalizes how sequences of characters are manipulated and stored;
  • roughly speaking, it manages its content in a similar way std::vector does (e.g. growing automatically when required);
  • apart from reserve, resize, push_back, etc. std::string provides typical string operations like substr, compare, etc;
  • it’s independant on the type of encoding used: all its members, as well as its iterators, will still operate in terms of bytes;
  • implementers of std::string generally embeds a small array in the string object itself to manage short strings.

The latter point is referred as the Small String Optimization and it means that short strings (generally 15/22 chars) are allocated directly in the string itself and don’t require a separate allocation (thanks to this guy on reddit who pointed out I was wrong here). Have a look at this table which shows the maximum size that each library implementation stores inside the std::basic_string.

The problem description states that the input is in the form:

string-till-end-of-stream

Thus we are allowed to simply read the string this way:

Usually in CC a string is preceded by its length:

N string-of-len-N

In this case we don’t need to read N at all:

That will skip every character until the first whitespace and then will read the string that follows the whitespace into S.

Now let’s face with the problem of determining if S is palindrome or not.

It should be easy to understand that S is palindrome if reverse(S) is equal to S. So let’s start coding such a naive solution:

As you can see, we access characters as we do with an array. As std::vector, std::string makes it possible to use also iterators and member functions to access individual characters. In the last line we applied operator== to verify if “char-by-char S is strictly equal to tmp”. We could also use string::compare() member function:

compare() returns a signed integral indicating the relation between the strings: 0 means they compare equal; a positive value means the string is lexicographically greater than the passed string; a negative value means the string is lexicographically lesser than the passed string. This function supports also comparison with offsets: suppose we want to check if a string is prefix of another (that is, a string starts with a certain prefix). This is the most effective way to do that:

Bear this trick in mind.

compare() supports also overloads with C-strings, preventing implicit conversions to std::string.

Now let’s turn back to reversing the string. Actually we don’t need to write a for loop manually because reversing a range is a common function already provided by the STL. Including <algorithm> we  get the algorithms library that defines functions for a variety of purposes (e.g. searching, sorting, counting, manipulating) that operate on ranges of elements. To reverse in-place a sequence we adoperate std::reverse:

Since we don’t want to modify the original string, we can use std::reverse_copy to copy the elements from the source range to the destination range, in reverse order:

Here – as for std::vector – we have two options: creating an empty string, reserving enough space and then pushing letters back, or creating a properly-sized and zero-initialized string and then assigning every single char. Since char is a cheap data type, the latter option is generally faster (basically because push_back does some branching to check if the character to add fits the already initialized sequence). For this reason I prefer filling a string this way. As I pointed out in the previous post, a reader from reddit suggested to use this approach also for std::vector<int> and honestly I agree. Larger types may have a different story. Anyway, as usual I suggest you to profile your code when in doubt. For Competitive Programming challenges this kind of finess makes no difference.

This solution is a bit more efficient than the previous one because it’s only two passes (one pass for reverse_copy and another one for operator==). We have also got rid of the explicit loop. What it really makes this solution bad is the usage of that extra string. If we turn back to the initial for loop, it should be clear that we just need to check if each pair of characters that would be swapped are the same:

S = abacaba
S[0] == S[6]
S[1] == S[5]
S[2] == S[4]
S[3] == S[3]

That is, with i from 0 to N/2 we check that:

S[i] == S[N-i-1]

Applying this idea:

Ok, this solution seems better. Can we do even better? Sure. Let’s meet another standard function.

Some algorithms operate on two ranges at a time: std::equal belongs to this family. A function that checks if two ranges are strictly the same:

This function returns true if the elements in both ranges match. It works by making a pair-wise comparison, from left to right. By default the comparison operator is operator==. For example:

string c1 = "ABCA", c2="ABDA";
'A' == 'A' // yes, go on
'B' == 'B' // yes, go on
'C' == 'D' // no, stop

The comparison can be customized by passing a custom predicate as last parameter.

Now, consider the problem of checking if a string is palindrome. Our loop compares the first and the last character, the second and the second last character, and so on. If it finds a mismatch, it stops. It’s basically std::equal applied to S in one direction and to reverse(S) in the other. We just need to adapt the second range to go from the end of S to the beginning. That’s a job for reverse iterators:

Reverse iterators goes backward from the end of a range. Incrementing a reverse iterator means “going backward”.

There is only one thing left: this solutions now makes N steps, wheareas only N/2 are really needed. We perform redundant checks. For instance:

abacaba
[0, 6]: a
[1, 5]: b
[2, 4]: a
[3, 3]: c // middle point
[4, 2]: a (already checked [2, 4])
[5, 1]: b (already checked [1, 5])
[6, 0]: a (already checked [0, 6])

Taking this consideration into account we get:

std::next returns a copy of the input iterator advanced by N positions (this version does not require random access iterators).

We finally have a one-liner, single-pass and constant space solution.

I apologize if it took a while to get here: not only I introduced some string functions, but I also incrementally approached to the problem showing how simple operations can be written in terms of standard algorithms. This process is precious since it helps to get familiar with the standard. Sometimes it does not make sense to struggle to find an algorithm that fits a piece of code, other times it does. The more you use the standard, the more it will be easy for you to identify these scenarios pretty much automatically.

Don’t worry, in future posts I’ll skip trivial steps.

Now let me raise the bar, just a little bit.

Palindrome index

 

In Competitive Programming many variations of this topic exist. This is an example:

Given a string, determine the index of the character whose removal will make the string a palindrome. Suppose a solution always exists.

For example:

acacba

if we remove the second last character (‘b’), we get a palindrome

This problem – known as palindrome index – can be solved by introducing another useful algorithm that actually works like std::equal but it also returns the first mismatch, if any, instead of a bool. Guess the name of this algorithm…yeah, it’s called std::mismatch.

This problem is quite easy to solve:

  • locate the first char that breaks the “palindromeness” – call that position (mismatch)
  • check if the string was palindrome if the m-th char was removed
  • if so, the palindrome index is m, otherwise it is N – m – 1 (basically, the char “on the other side”)

Since this solution can be implemented in many ways I take advantage of this opportunity to introduce other string operations. First of all, here is how std::mismatch works:

You see that ‘C’ and ‘X’ results in a mismatch. mism is a std::pairmism.first is S1.begin() + 2 and mism.second is S2.begin() + 2. Basically, they point to ‘C’ in the first string and to ‘X’ in the second. Suppose now we need to find this “palindrome index”. Consider as an example:

mism.first points to ‘c’ and mism.second points to ‘b’. Since we know a solution always exists, either of these chars makes S not palindrome. To determine which one, we need to check if S without the mismatch point mism was palindrome or not. For this check, we create a new string from the concatenation of two substrings of S:

  • From the beginning to mism-1, and
  • From mism+1 to the end

Although I don’t like this solution so much, I have browsed others (not only written in C++) on HackerRank and this resulted the most popular. So let me show you my translation into C++ code:

Let me introduce you substr()S.substr(pos, count) returns the substring of S that starts at character position pos and spans count chars (or until the end of the string, whichever comes first) – S[pos, pos + count). If pos + count extends past the end of the string, or if count is std::string::npos, the returned substring is [pos, size()). For example:

substr results in “ell”.

It’s now evident that toCheck consists in the concatenation of S from 0 to diffIdx-1 and from diffIdx + 1 to the end:

acacba -> diffIdx = 1 
toCheck = a + acba = aacba

Only for completeness, another (possibly more efficient) way to obtain toCheck consists in adoperating std::copy:

This solutions works and passes all tests, but I find it annoying to use extra space here.

Suppose we are free to modify our original string: it’s easier (and possibly faster) to remove the candidate iterator by using string::erase:

This avoids both creating extra (sub)strings and allocating extra memory (note that the internal sequence of characters can be relocated into the same block). The final part of the algorithm is similar:

The final cost of this solution is linear.

Now, what if we cannot change the string?

A solution consists in checking two substrings separately. Basically we just need to exclude the mismatch point and then check if the resulting string is palindrome, then we check:

  • From the beginning of S to the mismatch point (excluded) with the corresponding chars on the other side;
  • From one-past the mismatch point to the half of S with the corresponding chars on the other side.

Actually, the first check is already performed when we call mismatch, so we don’t need to repeat it.

To code the second check, just remember the second string goes from diffIt + 1 to the half of the string. So, we just need to correctly advance the iterators:

Let’s see this snippet in detail: next(diffIt) is just diffIt + 1. begin(S) + S.size()/2 is just the half of S. The third iterator, rbegin(S) + diffIdx, is the starting point of the string on the other side. Here is the complete solution:

If you followed my reasoning about positions then it’s just the matter of applying standard algorithms with some care for iterators.

You may complain this code seems tricky, so let me rewrite it in terms of explicit loops:

In the STL-based solution we clearly need to think in terms of iterators. The mismatch part is trivial (actually it could be replace with a call to std::mismatch, as in the STL-based solution), but the calls to std::equal are a little bit more difficult to understand. At the same time, it should be evident that std::equal checks that two ranges are identical. Nothing more. Also, if we replace std::string with another data structure that provides iterators, our code won’t change. Our algorithm is decoupled from the structure of the data.

On the other hand, in the for-based approach the logic is completely hidden inside the iterations and the final check. Moreover, this code depends on the random-access nature of the string.

Judge yourself.

 

Conversions

 

This short section is dedicated to conversions between strings and numeric types. I will start by saying that, in terms of speed, the following functions can be beated given certain assumptions or in particular scenarios. For example, you maybe remember Alexandrescu’s talk about optimization (and here is a descriptive post) where he shows some improvements on string/int conversions. In CC the functions I’ll introduce are generally ok. It can happen that in uncommon challenges it’s required you to take some shortcuts to optimize a conversion, mainly because the domain has some particularities. I’ll talk about domain and constraints in the future.

The STL provides several functions for performing conversions between strings and numeric types. Conversions from numbers to string can be easily obtained since C++11 through a set of new overloads:

A disadvantage of this approach is that we pay a new instance of std::string every time we invoke to_string. Sometimes – especially when many conversions are needed – this approach is cheaper:

Or use vector<char> for allocating the string dynamically.

char_needed is the maximum number of chars needed to represent an int32. This value is basically:

From C++17 we’ll have string_span to easily wrap this array into a string-like object:

Moreover, from C++17 we’ll have string::data() as non-const member, so we’ll be able to write directly into a std::string:

In CC sprintf is good enough, even if sprintf_s (or another secure version) is preferred.

Anyhow, prefer using std::to_string if the challenge allows that.

Conversions in the other direction are a bit more confusing because we have both C++11 functions and C functions. Let me start just by saying that rolling a simple algorithm to convert a string into an unsigned integer is easy, pretty much elegant and interesting to know about:

To convert to an int32 we just need to handle the minus sign:

nxt – ‘0’ is an idiom: if digit is a char in [0-9], digit – ‘0’ results in the corresponding integral value. E.g.:

'1' - '0' = 1 (int)

The inverse operation is simply char(cDigit + ‘0’). E.g.:

char(1 + '0') = '1' (char)

In C++ (as in C) adding an int to a char will result in an int value: for this reason a cast back to char is needed.

With these snippets we are just moving through the ASCII table. ‘1’ – ‘0’ represents how far ‘1’ is from ‘0’, that is 1 position. 1 + ‘0’ is one position after ‘0’, that is ‘1’. With this idea in mind we can easily perform trivial lowercase to uppercase conversions:

// only works if c is lowercase
char(c - 'a' + 'A')

And viceversa:

// only works if c is uppercase
char(c - 'A' + 'a')

Anyhow, as one guy commented on reddit, the ASCII table is designed in such a way just flipping one bit is needed to get the same results:

// assuming c is a letter
char toLower(char c) { return c | 0x20; }
char toUpper(char c) { return c & ~0x20; }

But remember that the C++ standard (from C, in <cctype>already provides functions to convert characters to upper/lower case, to check if one character is upper/lower case, digit, alpha, etc. See here for more details.

In CC, these tricks should be kept on hand. For example, this challenge requires to implement a simple version of the Caesar cipher:

Given a string S of digits [0-9], change S by shifting each digit by K positions to the right.

For example 1289 with K = 3 results in 4512.

We can easily solve this task by applying the tricks we have just learned:

Note I used a range-based for loop even if a standard algorithm could help solve this problem. I don’t introduce it yet though.

Now, let’s see some standard functions to convert strings to numbers. Since C++11 we have ‘sto’ functions (and for unsigned values and floating points) which convert a std::string/std::wstring into numeric values (they support also different basis). Being STL functions, they throw exceptions if something goes wrong: an std::invalid_argument is thrown if no conversion could be performed, std::out_of_range is thrown if the converted value would fall out of the range of the result type or if the underlying function (std::strtol or std::strtoll) sets errno to ERANGE. For example:

This family of functions optionally outputs the number of processed characters:

On the other hand, C functions don’t throw exceptions, instead they return zeros in case of errors. For instance:

That’s enough for now.

Recap for the pragmatic C++ competitive coder:

  • Don’t reinvent containers whenever standard ones fit your needs:
    • Consider std::string to handle a sequence of characters
      • std::string::compare indicates the relation between strings
      • std::string::substr creates substrings
  • Prefer standard algorithms to hand-made for loops:
    • std::copy, to copy the elements in a range to another
    • std::reverse, to reverse the order of the elements in a range
    • std::reverse_copy, to copy the elements in a range to another, but in reverse order
    • std::equal, to know if two ranges are the same
    • std::mismatch, to locate the first mismatch point of two ranges, if any
  • Understand options for converting strings to numbers, and viceversa:
    • std::to_string, to convert numeric values into strings (a new instance of std::string is returned)
    • std::array (std::string in C++17) + C’s sprintf (or equivalent – e.g. secure _s version) when reusing the same space is important
    • std::sto* functions to translate strings into numeric values (remember they throw exceptions)
    • C’s atoi & friends when throwing exceptions is not permitted/feasible
    • Rememeber tricks to convert char digits into ints and viceversa:
      • digitAsChar – ‘0’ = corresponding int
      • char ( digitAsInt + ‘0’ ) = corresponding char