This post concludes my introduction to C++ containers. We’ll meet other standard data structures such as lists, queues and heaps when needed along the way.

Some posts ago, I anticipated that understanding containers is crucial for adoperating standard algorithms effectively. In a few words, the reason is that each container has some special features or it’s particularly indicated for some scenarios. On the other hand, **algorithms work only in terms of iterators**, which completely hide this fact. That’s great for writing generalized code, but it also merits attention because for exploiting a particular property of a container, you generally have to choose the best option yourself.

The only “specialization” that algorithms (may) do is in terms of iterators. Iterators are grouped in categories, which basically distinguish **how iterators can be moved**. For instance, consider std::advance that moves an iterator by N positions. On *random-access *iterators, *std::advance* just adds an offset (e.g. it += N), that is a **constant** operation. On the other hand, on *forward iterators* (basically they can advance only one step at a time)* std::advance* is obliged to call *operator++* N times, a **linear** operation.

Choosing – at compile time – different implementations depending on the nature of the iterators is a typical C++ idiom which works well in many situations (this technique is an application of **tag dispatching**, a classical C++ metaprogramming idiom). However, to exploit the (internal) characteristics of a container, we have to know how the container works, which (particular) member functions it provides and the differences between using the generic standard algorithm X and the specialized member function X.

As an example, I mentioned **std::find** and **std::map::find**. What’s the difference between them? Shortly, *std::find* is an idiomatic **linear search over a range of elements**. It basically goes on until either the target value or the end of the range is reached. Instead, *std::map::find*…Wait, I am not a spoiler! As usual, let me start this discussion through a challenge:

*Some days ago I gave my birthday party and I invited some friends. I want to know which name is the most common among my friends. Or, given a sequence of words, I want to know which one occurs the most.*

In this trivial exercise we need a way to count occurrences of words. For example:

matteo riccardo james guido matteo luca gerri paolo riccardo matteo

*matteo* occurs three times, *riccardo* twice, the others once. We print *matteo*.

Imagine to count the words by incrementing a counter for each of them. Incrementing a counter should be a fast operation. Finally, we’ll just print the string corresponding to the greatest counter.

The most common data structures to do this task is generally known as **associative array**: basically, a collection of unique elements – for some definition of “uniqueness”, which – at least – provides fast lookup time. The most common type of associative container is a **map** (or *dictionary*): a collection of key-value pairs, such that each possible key appears just once. The name “map” resembles the concept of **function** in mathematics: a relation between a domain (keys) and a codomain (values), where each element of the domain is related (*mapped*) to exactly one element of the codomain.

Designing maps is a classic problem of Computer Science, because inserting, removing and looking up these correspondences should be fast. Associative containers are designed to be especially efficient in **accessing** its elements **by key**, as opposed to sequence containers which are more efficient in **accessing** elements **by** **position. **The most straightforward and elementary associative container you can imagine is actually an **array**, where **keys coincide with indexes**. Suppose we want to count the most frequent character of a string of lowercase letters:

*freq* contains the frequencies of each lowercase letters (0 is a, 1 is b, and so on). *freq[c – ‘a’]* results in the distance between the char c and the first letter of the alphabet (‘a’), so is the corresponding index in the freq array (we already saw this idiom in the previous post). To get the most frequent char we just retrieve the iterator (a pointer, here) to the element with highest frequency (std::max_element returns such iterator), then we calculate the distance from the beginning of *freq* and finally we transform this index back to the corresponding letter.

Note that in this case lookup costs O(1). Although an array shows many limitations (e.g. cannot be enlarged, keys are just numbers lying in a certain range), we’ll see later in this series that (not only) in competitive programming these “frequency tables” are extremely precious.

A plain array does not help with our challenge: how to map instances of *std::string*?

In Computer Science many approaches to the “dictionary problem” exist, but the most common fall into a couple of implementations: **hashing** and **sorting**. With hashing, the idea is to – roughly speaking – map keys to integral values that index a table (array). The trio “insert, lookup, remove” has average constant time, and linear in the worst case. Clearly this depends on several factors, but explaining hash tables is beyond the target of this post.

The other common implementation **keeps elements in order** to exploit binary search for locating an element in logarithmic time. Often trees (commonly self-balancing binary search trees) are employed to maintain this ordering relation among the elements and also for having logarithmic performance on insert, lookup and removal operations.

The C++ STL provides both the hash-based (from C++11) and the sort-based implementations, providing also variants for non-unique keys (*multi*). From now I’ll refer to sort-based implementation as tree-based because this is the data structure used by the major C++ standard library implementers.

There is more: STL provides two kinds of associative containers: **maps** and **sets**. A **map** implements a dictionary – collection of key-value pairs. A **set** is a container with unique keys. We’ll discover that they provide pretty much the same operations and that under the hood they share a common implementation (clearly, either hash-based or tree-based). Also, a hash container and a tree container can be used **almost** interchangeably (from an interface point of view). For this reason, I’ll focus on the most used associative container: a tree-based map. We’ll discuss later about some general differences.

Please, give a warm welcome to the most famous C++ associative container: std::map. It’s a sorted associative container that contains key-value pairs with **unique** keys. Here is a list of useful facts about *std::map*:

- Search, removal, and insertion operations have
**logarithmic**time complexity; - elements are kept
**in order**, according to a customizable comparator – part of the type and specified as a template argument (*std::less*by default – actually the type is different since C++17, read on for more details); - iterators are
**bidirectional**(pay attention that increments/decrements by 1 are “amortized” constant, whereas by N are**linear**); - each map element is an instance of
**std::pair<const Key, Value>**.

The latter point means that we are **not permitted to change keys** (because it would imply reordering). Eventually, you can get the entry, remove it from the map, update the key, and then reinsert it.

Ordered associative containers use only a single comparison function, that establishes the concept of **equivalence: **Equivalence is based on the relative ordering of object values in a sorted range.

Two objects have equivalent values with respect to the sort order used by an associative container c if neither **precedes** the other in c’s sort order:

In the general case, the comparison function for an associative container isn’t **operator<** or even **std::less**, it’s a user-defined predicate (available through **std::key_comp** member function).

An important observation: in C++, every time you have to provide a “less” comparison function, the standard assumes you implement a strict weak ordering.

Let’s use *std::map* to solve the challenge:

How it works: as far as we read a string we increment the corresponding counter by 1. **map::operator[]** returns a reference to the value that is mapped to a key that is **equivalent** to a given key, performing an insertion if such key does not already exist. At the end of the loop, freq is basically a *histogram* of words: each word is associated with the number of times it occurs. Then we just need to iterate on the histogram to figure out which word occurs the most. We use **std::max_element:** a one-pass standard algorithm that returns the greatest element of a range, according to some comparison function (that is std::less be default, a standard function object which – unless specialized – invokes operator< on the objects to compare).

Given that map entries are pairs, we don’t use pair’s *operator< *because it compares lexicographically (it compares the first elements and **only if** they are equivalent, compares the second elements). For instance:

"marco", 1 "andrea", 5

according to* pair::operator<*, “marco” is greater than “andrea” then it will result the max_element. Instead, we have to consider only the second value of the pairs. Thus we use:

If your compiler does not support generic lambdas (*auto* parameters), explicitly declare *const pair<const string, int>&*. **const string** is not fussiness: if you only type *string*, you get an extra subtle copy that converts from *pair< const string, int>* to

*pair<string, int>*. Bear in mind that entries of map are

*pair<*.

**const**K, V>Suppose now we have an extra requirement: *if two or more names occur the most, print the lexicographically least*. Got it?

matteo riccardo matteo luca riccardo

In this case, “matteo” and “riccardo” occur twice, but we print “matteo” because **lexicographically lower** than “riccardo”.

How to accommodate our solution to fit this extra requirement? There’s an interesting effect of using a sorted container: when we forward iterate over the elements, we go from lexicographically *lower* strings to lexicographically *greater* strings. This property combined with *max_element* automatically supports this extra requirement. In fact, using *max_element*, *if more than one element in the range is equivalent to the greatest element, it returns the iterator to the first such element*. Since the first such element is (lexicographically) the lowest, we already fullfill the new requirement!

Guess if we want to print the lexicographically **greatest** string…it’s just the matter of iterating *backward*! Having clear in mind these properties is a great thing. In this series we’ll meet many others.

Let’s continue our journey through *std::map*. Suppose that part of another challenge is to make our own contacts application. We are given some operations to perform. Two of them consist in adding and finding. For the add operation, we will have to add a new contact if it does not exist, or to update it otherwise. For the find operation, we will have to print the number of contacts who have a name starting with that **partial** name. For example, suppose our list contains:

marco, matteo, luca

**find(“ma”)** will result in 2 (**ma**rco and **ma**tteo).

The best data structure for this kind of task is probably a trie, but the pragmatic competitive programmer knows that in several cases *std::map* suffices. We can take advantage of the fact that a map keeps things in order. The challenge is also an excuse to show you** how to insert into a std::map**, since there are different ways to achieve this task.

We have two cases:

- insert/update an entry
- count names starting with some prefix

Our map looks like:

map<string, string> contacts; // let's suppose the contact number to be a string as well

The first one has been discussed a lot in many books and blogs, so much that C++17 contains an idiomatic function insert_or_assign. In a few words, to efficiently insert or assign into a map, we first look for the contact in the structure and in case of a match we update the corresponding entry; otherwise we insert it.

This is the simplest way to do that:

contacts[toInsertName] = toInsertNumber;

You may ask: why in C++17 do we bother with a specific function for this one-liner? Because we are C++. Because that one-liner is succinct, but it hides a subtle cost when the entry is not in the map: **default construction + assignment**.

As we have seen, *contacts[toInsertName]* performs a lookup and returns either the corresponding entry or a brand-new one. In the latter case a default construction happens. Then, *= toInsertNumber* does an assignment into the just created string. Is that expensive? Maybe not in this case, but it may be in general, and this kind of things matters in C++.

Here is more enlightening example: suppose we have a cache implemented with *std::map*:

You don’t want to update anything if key is already there. Rather, you first look for the value corresponding to key and only if it’s not there you invoke the lambda to calculate it. Can you solve it by using *operator[]*? Maybe (it depends on the value type), but it’s not effective nor even efficient. Often *std::map* novices come up with this code:

**map::find** locates the element with key equivalent to a certain key, or *map::end* if such element does not exist. **map::emplace** inserts a new element into the container constructed in-place with the given args if there is no element with the key in the container. emplace returns a *pair<iterator, bool>* consisting of an iterator to the inserted element, or the already-existing element if no insertion happened, and a bool denoting whether the insertion took place. True for “insertion”, false for “no insertion”.

This code implements what I call **the infamous** **double-lookup anti-pattern**. Basically, both find and emplace search the map. It would be better to – somehow – take advantage of the result of the first lookup to eventually insert a new entry into the map. Is that possible?

Sure.

This is the idea: if the key we look for is not in the map, the position it should be inserted in is exactly the position where the first element that does not compare less than the key is located. Let me explain. Consider this sorted list of numbers:

1 2 3 5 6 7 8

If we want to insert 4, where should it be inserted? At the position of the first number that does not compare less than 4. In other words, the first element greater or equal to 4. That is 5.

This is nothing more than what mathematics defines as **lower bound**. *std::map* provides lower_bound, and for consistency the STL defines std::lower_bound to perform a similar operation on **sorted ranges.** As a generic algorithm, *lower_bound* is a **binary search**.

Here is what the idiom looks like:

Since *lower_bound* returns the first element that does not compare less than key, it can be key itself or not. The former case is handled by the right hand side of the if condition: *data.key_comp()* returns the comparator used to order the elements (*operator<* by default). Since **two equal elements do not compare less**, **this check has to be false**. Otherwise, key is less than *lb->first* because lb points to one element **past** key (or to the end, if such element does not exist). Makes sense?

*emplace_hint* is like emplace, however it also takes an extra iterator to “suggest” where the new element should be constructed an placed into the tree. If correct, the hint makes the insertion O(1). *map::insert* has an overload taking this extra iterator too, resulting in a similar behavior (but remember that insert takes an already built **pair**).

A bit of simplification for pragmatic competitive programmers is when you do not use custom comparators: generally you may use operator== for checking equality:

if (lb != end(data) && key==lb->first)

Ha, C++17 has this insert_or_assign that should search the map and use Value’s operator= in case the entry has to be updated, or insert it otherwise (move semantics is handled as well). There is also another reason why *insert_or_assign* is important, but I’ll spend a few words about that leater, when *unordered_map* will be introduced.

Since I introduced *lower_bound*, I must say there is also an upper_bound: it locates the first element which compares **greater** than the searched one. For example:

1 2 3 4 5

**upper_bound(3)** locates 4 (position 3). What’s the point? Let’s turn our list into:

1 2 2 2 3 4

**lower_bound(2)** locates…2 (position 1), whereas **upper_bound(2)** results in position 4 (element 3). **Combining** lower_bound(2) with upper_bound(2) **we find the range of elements** **equivalent** to 2! Range is in C++ speaking (upper_bound(2) is one-past the last 2). This is extremely useful in *multimap* and *multiset* and indeed a function called **equal_range** which returns the combination of lower and upper bounds exists. *equal_range* is provided by all the associative containers (in unordered_* is only for interface interchangeability reason) and by the STL as an algorithm on sorted sequences – std::equal_range.

We’ll see applications of these algorithms in the future.

So, what about our challenge? Suppose it’s ok to use *operator[]* for inserting/updating elements – in this case string constructor + operator= are not a problem. We need to count how many elements start with a certain prefix. Is that easy? Sure, we have a sorted container, remember? Listen my idea: If we call lower_bound(P) we get either end, the first element equal to P or …suspense… the first element **starting** with P. Since *lower_bound* returns the position to the first element which does not compare less than P, the first element that looks like P$something is what we get if such element exists.

Now what? I’m sure you already did this step in your mind. We just iterate until we find either the end or the first element that *does not start* with P. From the previous post we already know how to verify if a string begins as another:

We are paying both a prefix comparison and a linear iteration from lb (write it as *O(|P|*K)*, where *|P|* is the length of the prefix P, and *K* is the number of strings starting with P). Advanced data structures that more efficiently deal with these – possible – problems exist, but they are beyond the scope of this post. In a moment I’ll do another observation about this code.

I realized that the post is becoming longer than I imagined, so let me recap what we have met so far:

- How to insert:
*operator[]*+*operator=;*- infamous double-lookup anti-pattern (e.g.
*find*+*insert*); - efficient “get or insert”:
*lower_bound*+*emplace_hint*/*insert*; - efficient “get or assign”/”insert or assign”:
*lower_bound + Value’s operator=;* - C++17
*insert_or_assign*.

- How to lookup:
*find*(aka: exact match);*lower_bound/upper_bound*(aka: tell me more information about what I’m looking for);*operator[]*(aka: give me always one instance – eventually new, if can be default-constructed);- bonus:
*at*(aka: throw an exception if the element is not there); - bonus:
*count*(aka: how many elements equivalent to K exist? 0 or 1 on non-multi containers).

- Taking advantage of sorting. For instance:
- we combined max_element’s “stability” – hold the first max found – with map’s order to get the max element that is also the lexicographically least (or the greatest, by iterating backward);
- we learnt how to locate and iterate on elements which start with a given prefix.

**To erase an element**, you just call **map::erase(first, last)**, or **erase(iterator)**, or **erase(key)**. More interesting is how to implement *erase_if*, an idiom simplified by C++11 because now erase returns the iterator to the last removed element. This idiom can be applied to every associative container:

#### Design choices

So, we still have an open question, right? **What’s the difference between std::find and map::find?**

You know, *std::find* is a linear search on a range. Now, I hope you understood that *map::find* is a logarithmic search and it uses a notion of **equivalence**, instead of **equality** to search elements.

Actually, there is a bit more.

Let’s raise the bar: **what’s the difference between std::lower_bound and map::lower_bound?** First of all: is it possible to use

*std::lower_bound*with

*map::iterator*?

*std::lower_bound*requires simple forward iterators thus the answer is yes. So what?

Basically, *std::lower_bound* – just like all the other algorithms – works only in terms of iterators; on the other hand *map::lower_bound* – just like all the other map’s algorithms – exploits map’s internal details, performing better. For example, *std::lower_bound* uses *std::advance* to move iterators and you know that advancing a *map::iterator* results in **linear** time. Instead, *map::lower_bound* does a tree traversal (an internal detail), avoiding such overhead.

Although exceptions exist, the rule of thumb is: **if a container provides its own version of a standard algorithm, don’t use the standard one**.

I tell you a story about this rule. Remember that the comparator is part of the static interface of an associative container (it’s part of the type), unlike what happens in other languages like C# where the comparator is decoupled from the type and can be dynamically injected:

Dictionary<int, string> dict = new Dictionary<int, strint>(StringComparer.OrdinalIgnoreCase);

Some time ago I had a discussion with a collegue about this aspect of C++: he was using a *map<string, SomeValueType>* to store some information, but he was using it only for case-insensitive searches by calling *std::find* (the linear one). That code made my hair stand on end: “why not using a case-insensitive comparator and making this choice part of the map type?” – I asked. He complained: “C++ is breaking incapsulation: I won’t commit my map to a specific comparator. My clients mustn’t know how elements are sorted”.

At first blush I got annoyed, but after a couple of emails I understood his point (it was about the architecture of the system he designed, rather than about C++ itself). At the end of a quite long – and certainly interesting – discussion, I come up with a solution to – more or less – save both ways: I introduced a new type which inherited from std::map and allowed to inject a comparator at construction time, like in C# (using *less<>* by default). I don’t recommend this approach (for example, because the comparator can’t be inlined and every comparison costs a virtual call – it uses *std::function* under the hood), but at least we are not linearly searching the map anymore…

This story is just to say: **use containers effectively**. Don’t go against the language. *std::map* is not for linear search, as *std::vector* is not for push elements at front.

I’d like mentioning a fact about the cache. *std::map* is generally considered a cache-unfriendly container. Ok we can use allocators, but try to understand me: by default we are just pushing tree nodes into the heap, moving through indirections, etc. On the other hand, we are all generally happy with contiguous storage, like vectors or arrays, aren’t we? Is that possible to easily design cache-friendly associative containers? It is, when the most common operation is lookup. After all, what about **using binary search on a sorted vector**? That’s the basic idea. Libraries like boost (flat_map) provide this kind of container.

As my friend *Davide Di Gennaro* pointed out, given the triplet of operations *(insert, lookup, erase)*, the best complexity you get for a general-purpose usage is *O(logN, logN, logN)*. However, you can amortize one operation, sacrificing the others. For example, if you do many lookups but a few insertions, *flat_map* performs *O(N, logN, N)*, but the middle factor is much lower (e.g. it’s cache-friendly).

Consider this example: we want to improve our algorithm to count our contact names which start with a certain prefix. This time, we use a **sorted vector** and **std::lower_bound** to find the first string starting with the prefix P. In the previous code we just iterated through the elements until a mismatch was found (a string not starting with P).

This time, we try thinking differently: say we have found the position of the first string starting with P. Call that position “lb” (lower bound). Now, it should be clear that we must find the next string not starting with P. Define this string to be the first **greater** than lb, provided that “greater” means “not starting with P”. At this point, do you remember which algorithm finds the first element greater than another, in a sorted sequence?

*std::upper_bound.*

So we can employ *upper_bound*, using a particular predicate, and we expect a logarithmic complexity. What will this predicate look like? Suppose we count “ma” prefixes. Strings starting with “ma” are all **equivalent**, aren’t they? So, we can use a predicate which compares only “ma” (P) with the current string. When the current string starts with P then it’s equivalent to P and the predicate will return false. Otherwise, it will return true. After all, starting the search from lower_bound’s result, we can get either *ma$something* or *$different-string:*

Some clarifications: the first parameter of the lambda is **always** the value (or a conversion) of the upper bound to search for in the range (P, in our case). This is a guarantee to remember. The lambda returns false only when the current string is not starting with P (s1, inside the lambda body). *std::upper_bound* will handle the rest.

Why didn’t we use this approach directly on *std::map*? As I said at the beginning of this section, standard algorithms works in terms of iterators. Using *std::upper_bound* on *std::map* would result in *logN * N*. That additional N factor is caused by advancing iterators, that is linear on map::iterators. On the other hand, sorted vector provides random access iterators and so the final cost of counting prefixes is O (|P|*logN), given that we have sacrificed insert and removal operations (at most, linear).

#### Recent additions

C++14 and C++17 add new powerful tools to associative containers:

**Heterogeneous lookup**: you are no longer required to pass the exact same object type as the key or element in member functions such as*find*and*lower_bound*. Instead, you can pass any type for which an overloaded*operator<*is defined that enables comparison to the key type. Heterogenous lookup is enabled on an opt-in basis when you specify the**std::less<>**or**std::greater<>**“diamond functor” comparator when declaring the container variable, like:*map<SomeKey, SomeValue, less<>>*. See here for an example. This works only for sorted associative containers.

This feature is also kwnown by some people as “transparent comparators”, because comparators that “support” this feature have to define a type**is_transparent = std::true_type**. This is basically required for backwards-compatibility with existing code (see for example here for a more detailed explanation). A terrific usage of this feature is, for example, searching on a map<string, Val> by passing a*const char**(no conversion to string will be performed).

**try_emplace**and**insert_or_assign**, as an improvement of the insertion interface for unique-keys maps (more details here).

**Ordered By Default**: Mostly for both design and ABI compatibility reasons, ordered associative containers now specify as a default compare functor the new**std::default_orderer_t**, (that behaves like**std::less**– more details here).

**Splicing maps and sets**: (following words by Herb Sutter) you will now be able to directly move internal nodes from one node-based container directly into another container of the same type (differing at most in the comparator template parameter), either one node at a time or the whole container. Why is that important? Because it guarantees no memory allocation overhead, no copying of keys or values, and even no exceptions if the container’s comparison function doesn’t throw. The mechanics are provided by new functions*.extract*and*.move*, and corresponding new*.insert*overloads (approved proposal here).

**Structure bindings**: we should be able to iterate on maps this way:

#### A few words about unordered_map

We end this long post by spending some words about **unordered** associative containers. I don’t show you multi* containers because they are more or less the same as the corresponding non-multi ones. Clearly, they support multiple instances of the same key and, as I said, *equal_range* plays an important role for lookups. I’ll probably spend more time on multi containers when needed in future challenges.

After this section we’ll see a final example using *unordered_set*.

As *std::map* does, **std::unordered_map** contains key-value pairs with unique keys. Unlike *std::map*, internally the elements **are not sorted** in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the **hash** of its key. This allows fast access to individual elements, since once the hash is computed, it refers to the exact bucket the element is placed into. For this reason, search, insertion, and removal of elements have **average** **constant-time complexity**. Due to the nature of hash, it’s hard/impossible to know in advance how many collisions you will get with your hash function. This can add an element of unpredictability to the performance of a hash table. For this reason we use terms like “average”, “amortized” or “statistically” constant-time when referring to operations of a hash container.

This is not a discussion on hash tables, so let me introduce some C++ things:

- STL provides a default
**std::hash**template to calculate hash of standard types; *std::hash*can be eventually specialized for your types (or you can specify your own functor);- when a collision happens, an “equality” functor is used to determine if two elements in the same bucket are different (
**std::equal_to**by default); - it’s possible to iterate through the elements of a bucket;
- some hash-specific functions are provided (like
*load_factor*,*rehash*and*reserve*); *unordered_map*provides**almost all**the the functions of std::map.

The latter point simplify our life to interchange *std::map* with *std::unordered_map*. There are two important things to say about this fact: 1) *lower_bound* and *upper_bound* are not provided, however *equal_range* is; 2) **passing hints to unordered_ containers insertion functions is not really useful** – actually it is to make the insertion “exit quickly”.

We know that on ordered associative containers, conditional insertion with *lower_bound* is the best way of performing an “insert or update”. So what? How can we say that ordered/unordered containers are more or less interchangeable if we miss *lower_bound*/*upper_bound*? We may apply *equal_range*:

This idiom is equivalent to the one using lower_bound (both semantically and from a performance point of view), plus it works on unordered maps.

Note that in C++17, **try_emplace** and **insert_or_assign** will dramatically improve the usability of unordered associative containers that will efficiently handle the case when we need to first perform a lookup and eventually insert a new element if that element is not present (first of all, the hash value won’t be recalculated). That’s the real value of such additions: not only using *insert_or_assign* is more efficient, but also clearer and truly interchangeable.

#### Tree or Hash?

There are some general rules/facts to take into account when in doubt between tree-based or hash-based containers. They are general, this means – as always – that when **really** in doubt you can start with one, profile your code, change to the other (again, thanks to interface compatibility), profile again, and compare.

By the way, here are some facts for the pragmatic competitive coder:

- on average,
**hash-based**lookup is**faster**; - generally,
**hash-based**containers occupy**more memory**(e.g. big array) than tree-based ones (e.g. nodes and pointers); **tree-based**containers keep elements in**order**, is that feature important? (e.g. prefix lookups, get top elements, etc);

#### An application of unordered_set

Since I introduced **std::unordered_map**…let’s do a challenge involving **unordered_set** 🙂 Jokes apart, this long post hosted mostly maps, I’d like concluding with set, a really helpful and minimal associative container that we will meet again in the future.

That’s the challenge:

*Given N unique integers, count the number of pairs of integers whose difference is K.*

For example, for this sequence:

1 5 3 4 2

With K = 2, the answer is 3 (we have three pairs whose difference is 2: 4-2, 3-1, 5-3).

The trivial solution to this problem has **quadratic** time complexity: we enumerate all the pairs and we increment a counter when the difference is K.

The way to tackle this problem is to convert the problem space from one in which we consider pairs to a search for a single value. The i-th element *A *contributes to the final count only if *A + K* is in the sequence. For instance:

1 5 3 4 2

With K = 2. 1 contributes because 1 + 2 = 3 is in the list. Likewise, 3 is fine because 3 + 2 = 5 is in the list. And the same for 2, because we spot 2 + 2 = 4.

We can then store the input into an *unordered_set* (on average, constant time lookup), iterate on the elements and for each value *A* search for *A + K*:

Some sugar: std::count_if makes it clear that we are counting how many elements satisfy a predicate. Our predicate is true when **currElem + K exists in the set**: we use **unordered_set::count(A)** to get the number of elements equal to A (either 0 or 1 since we use a non-multi set). As an idiom, on non-multi associative containers, *container::count(Key)* gives 1 (say, true) if Key exists, 0 (say, false) otherwise.

On average, this solution has linear time complexity.

Another approach that uses no extra space and that involves sorting exists. Can you find it?

That’s enough for now.

Recap for the pragmatic C++ competitive coder:

- Don’t reinvent containers whenever standard ones fit your needs. Consider STL associative containers:
**std::map**,**std::set**,**std::multimap**and**std::multiset**are sorted, generally implemented as self-balancing binary-search trees;**std::unordered_map**,**std::unordered_set**,**std::unordered_multimap**and**std::unordered_multiset**are not sorted, imlemented as hash tables;

- Understand idioms to adoperate STL associative containers effectively:
- Does an element with key equivalent to K exist? Use
**count(K)**; - Where is the element with key equivalent to K? Use
**find(K)**; - Where should the element with key equivalent to K be inserted? Use
**lower_bound(K)**on sorted containers; - Insert a new element: use
**emplace**,**insert**; - Insert a new element, knowing where: use
**emplace_hint**,**insert**(only sorted containers take advices effectively, unordered ones are presumptuous!); - Insert or update an element:
**operator[] + operator=**, (C++17)**insert_or_assign**,**equal_range + hint insertion**(this is also for interface compatibility between ordered/unordered),**lower_bound + hint insertion**(only on sorted containers); - Get the element corresponding to key equivalent to K, or default if not present: use
**operator[K]**; - Get the element corresponding to key equivalent to K, or exception if not present: use
**at(K)**; - Erase elements: use
**erase(K)**,**erase(first, last)**,**erase(it).**

- Does an element with key equivalent to K exist? Use
- Understand the difference between containers member functions and STL generic algorithms. For example:
**std::find**anddo different searches, using different criteria;*$associative_container::*find**std::lower_bound**anddo the same, but the former performs worse than the member function because the latter works in terms of the container internal details and its structure.*$sorted_container**::*lower_bound

- Exploit standard algorithms properties. For example:
**std::max_element**and**std::min_element**are “stable”: max/min returned is always the**first**one found.

*Prefer*standard algorithms to hand-made for loops:**std::max_element/min_element**, to get the**first**biggest/smallest element in a range;**std::count/count_if**, to count how many elements satisfy specific criteria;**std::find/find_if**, to find the first element which satisfies specific criteria;**std::lower_bound**,**std::upper_bound**,**std::equal_range**, to find the “bounds” of an element within a sorted range.

Great article – I have saved it for reference and will recommend it to others.

One small thing which I don’t think you mentioned but ties into the sorted containers is the ‘ordered by default’ proposal by Alisdair Meredith that was accepted into C++17. See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0181r0.html

It provides a solution where there are types you want to store in sorted container but do not have a natural operator <, like a GPS position or a complex number. However, you can define an arbitrary but natural sorting order – it's just not "less than".

Hi David, thanks a lot for your comment, I really appreciate your feedback.

About the “ordered by default” proposal (actually it’s been approved for C++17) I just mentioned it shortly in the section “Recent additions”.

>Another approach that uses no extra space and that involves sorting exists. Can you find it?

auto sortedVec = values; // copy

std::sort(sortedVec.begin(), sortedVec.end());

size_t matches = {};

for (const auto& element : sortedVec)

{

K key = element + difference;

auto match = std::lower_bound(sortedVec.begin(), sortedVec.end(), key);

if (match != sortedVec.end() && *match == key)

++matches;

}

Roughly 2-3x faster than the unordered_set soultion, and could easily paralleled.

Correct Lukas (actually, for this challenge you don’t need to copy the vector).

I’m not sure about the factor you say, it depends on many things. For example under 1 million of elements both the implementations take the same time to find the correct solution.

About the parallelization, we can easily parallelize the other solution too (std::accumulate is just a reduction – in C++17 we could use std::reduce, that is similar to accumulate, except out of order).

Using my test setup on my hardware using Clang 3.7 the result was reproducible both in debug as well as in release configurations and fairly consistent. Here an image of the release result, with int as key type, the values container holds a set of unique random values in range 0-testSize*3: http://imgur.com/dXUh82i

Makes sense, thanks for sharing!

Please, give also a try on HackerRank:

https://www.hackerrank.com/challenges/pairs

Here I see identical times (environment details here: https://www.hackerrank.com/environment).

It would be nice if std::unordered_map supported heterogeneous lookup from std::string_view and const char*.

Yes, it would be really useful.

Basically nobody made such a proposal.

[…] C++ in Competitive Programming: associative containers […]

better use std::equal_range: really tests for equivalence as it should, and not for equality. more benefits of the below solution: key value is only used in one place. range is adapted on each run, and not searched from the beginning over and over again. when key reaches end of values, search is halted, because there are no more possible values to be found anyways, could save many iterations if offset is a big value.

std::sort(values.begin(), values.end());

size_t matches = 0u;

auto iter = values.cbegin();

while ()

{

const auto match = std::equal_range(iter, values.cend(), *iter + offset);

if (match.first != match.second)

++matches;

else if ( match.first = values.cend() )

break;

++iter;

}

damm, put in a typo… the line should read:

else if ( match.second = values.cend() )

meaning if the found upper bound is end of container there are no more greater values than the last and we can stop searching