## C++ in Competitive Programming: warmup

Posted: April 14, 2016 in Competitive Programming
Tags: , , , , , ,

One of the first challenges in the HackerRank‘s “Warmup” section is probably the “Hello World” of learning arrays in any language: calculating the sum of a sequence of elements. Although this exercise is trivial, I’ll face with it to break the ice and show you a few concepts that lay the groundwork for more complicated challenges.

I’m assuming you are already familiar with concepts like iterator, container and algorithm. Most of the time I’ll give hints for using these C++ tools effectively in Competitive Programming.

That’s the problem specification: You are given an array of integers of size N. Can you find the sum of the elements in the array? It’s guaranteed the sum won’t overflow the int32 representation.

First of all, we need an “array of size N”, where N is given at runtime. The C++ STL (Standard Template Library) provides many useful and cleverly designed data structures (containers) we don’t need to reinvent. Sometimes more complicated challenges require us to write them from scratch. Advanced exercises reveal less common data structures that cannot be light-heartedly included into the STL. We’ll deal with some examples in the future.

It’s not our case here. The primary lesson of this post is: don’t reinvent the wheel. Many times standard containers fit your needs, especially the simplest one: std::vector, basically a dynamic sequence of contiguous elements:

For the purpose of this basic post, here is a list of important things to remember about std::vector:

• it’s guaranteed to store elements contiguously, so our cache will love it;
• elements can be accessed through iterators, using offsets on regular pointers to elements, using the subscript operator (e.g. v[index]) and with convenient member functions (e.g. at, front, back);
• it manages its size automatically: it can enlarge as needed. The real capacity of the vector is usually different from its length (size, in STL speaking);
• enlarging that capacity can be done explicitly by using reserve member function, that is the standard way to gently order to the vector: “get ready for accommodating N elements”;
• adding a new element at the end of the vector (push_back/emplace_back) may not cause relocation as far as the internal storage can accommodate this extra element (that is: vector.size() + 1 <= vector.capacity());
• on the other hand, adding (not overwriting) an entry to any other position requires to relocate the vector (eventually in the same block of memory, if the capacity allows that), since the contiguity has to be guaranteed;
• the previous point means that inserting an element at the end is generally faster than inserting it at any other position (for this reason std::vector provides push_back, emplace_back and pop_back member functions);
• knowing in advance the number of elements to store is an information that can be exploited by applying the synergic duo reserve + push_back (or emplace_back).

The latter point leads to an important pattern: inserting at the end is O(1) as far as the vector capacity can accommodate the extra element – vector.size() + 1 <= vector.capacity(). You may ask: why not enlarging the vector first and then just assign values? We can do that by calling resize:

resize enlarges the vector up to N elements. The new elements must be initialized to some value, or to the default one – as in this case. This additional work does not matter in this challenge, however initialization may – in general – cause some overhead (read, for example, these thoughts by Thomas Young). As a reader pointed out on reddit, push_back hides a branching logic that can cause some cost. For this reason he suggests that two sequential passes over the data (that is contigous) may be faster. I think this can be true especially for small data, however the classical recommendation is to profile your code in case of such questions. In my opinion getting into the habit of using reserve + *_back is better and potentially faster in general cases.

The heart of the matter is: need a dynamic array? Consider std::vector. In competitive programming std::vector is 99% of the time the best replacement for a dynamic C-like array (e.g. T* or T**). 1% is due to more advanced challenges requiring us to design different kind of dynamic arrays that break some std::vector’s guarantees to gain some domain-specific performance. Replacing std::vector with custom optimized containers is more common in real-life code (to have an idea, give a look for example here, here and here).

If N was given at compile-time, a static array could be better (as far as N is small – say less than one million – otherwise we get a stack overflow). For this purpose, std::array is our friend – basically a richer replacement of T[]. “Richer replacement” means that std::array is the STL-adaptation of a C-array. It provides member functions we generally find in STL containers like .size(), .at(), .begin()/.end(). std::array combines the performance and accessibility of a C-style array with the benefits of a standard container. Just use it.

Since much information is stated in the problem’s requirements, we’ll see that static-sized arrays are extraordinarily useful in competitive programming. In the future I’ll spend some time about this topic.

Now, let’s look at my snippet again: can we do better? Of course we can (from my previous post):

At this point we have the vector filled and we need to compute the sum of the elements. A hand-made for loop could do that:

Can we do better?

Sure, by using the first numeric algorithm of this series: ladies and gentlemen, please welcome std::accumulate:

One of the most important loops in programming is one that adds a range of things together. This abstraction is known as reduction or fold. In C++, reduction is mimicked by std::accumulate. Basically, it accumulates elements from left to right by applying a binary operation:

accumulate with three parameters uses operator+ as binary operation.

std::accumulate guarantees:

• the order of evaluation is left to right (known also as left fold), and
• the time complexity is linear in the length of the range, and
• if the range is empty, the initial value is returned (that’s why we have to provide one).

The reduction function appears in this idiomatic form:

So the result type may be different from the underlying type of the range (ElementType). For example, given a vector of const char*, here is a simple way to calculate the length of the longest string by using std::accumulate (credits to Davide Di Gennaro for having suggested this example):

To accumulate from the right (known as right fold) we just us reverse iterators:

Right fold makes some difference – for example – when a non-associative function (e.g. subtraction) is used.

In functional programming fold is very generic and can be used to implement other operations. In this great article, Paul Keir describes how to get the equivalent results in C++ by accommodating std::accumulate.

Does std::accumulate have any pitfalls? There exist cases where a+=b is better than a = a + b (the latter is what std::accumulate does in the for loop). Although hacks are doable, I think if you fall into such scenarios, a for loop would be the simplest and the most effective option.

Here is another example of using std::accumulate to multiply the elements of a sequence:

std::multiplies<> is a standard function object (find others here).

Using standard function objects makes the usage of algorithms more succinct. For example, the problem of finding the missing number from an array of integers states: given an array of N integers called “baseline” and another array of N-1 integers called “actual”, find the number that exists in “baseline” but not in “actual”. Duplicates may exist. (this problem is a generalization of the “find the missing number” problem, where the first array is actually a range from 0 to N and a clever solution is to apply the famous Gauss’ formula N(N+1)/2 and subtracting this value from the sum of the elements “actual”). An example:

The missing number is 2.

A simple linear solution is calculating the sum of both the sequences and then subtracting the results. This way we obtain the missing number. This solution may easily result in integer overflow, that is undefined behavior in C++. Another wiser solution consists in xor-ing the elements of both the arrays and then xoring the results.

Xor is a bitwise operation – it does not “need” new bits – and then it never overflows. To realize how this solution works, remember how the xor works:

Suppose that “a” is the result of xor-ing all the elements but the missing one – basically it’s the result of xor-ing “actual”. Call the missing number “b”. This means that xor-ing “a” with the missing element “b” results in xor-ing together the elements in the “baseline” array. Call the total “c”. We have all the information to find the missing value since “a” ^ “c” is “b”, that is just the missing number. That’s the corresponding succint C++ code:

Let’s go back to the initial challenge. We can do even better.

To realize how, it’s important to get into the habit of thinking in terms of iterators rather than containers. Since standard algorithms work on ranges (pairs of iterators), we don’t need to store input elements into the vector at all:

Advancing by one – using next – is a licit action since the problem rigorously describes what the input looks like. This snippet solves the challenge in a single line, in O(n) time and O(1) space. That’s pretty good. It’s also our first optimization (actually not required) since our solution dropped to O(1) space – using std::vector was O(n).

That’s an example of what I named “standard reasoning” in the introduction of this series. Thinking in terms of standard things like iterators – objects making algorithms separated from containers – is convenient and it should become a habit. Although it seems counterintuitive, from our perspective of C++ coders thinking in terms of iterators is not possible without knowing containers. For example we’ll never use std::find on std::map, instead we’ll call the member function std::map.find(), and the reason is that we know how std::map works. In a future post I’ll show you other examples about this sensible topic.

Our solution leads to ranges naturally:

view::tail takes all the elements starting from the second (again, I skipped the input length), and ranges::istream is a convenient function which generates a range from an input stream (istream_range). If we had needed to skip more elements at the beginning, we would have used view::drop, which removes the first N elements from the front of a source range.

Iterators-based and ranges-based solutions version look very similar, however – as I said in the introduction of this series – iterators are not easy to compose whereas ranges are composable by design. In the future we’ll see examples of solutions that look extremely different because of this fact.

In Competitive Programming these single-pass algorithms are really useful. The STL provides several single-pass algorithms for accomplishing tasks like finding an element in a range, counting occurrences, veryfing some conditions, etc. We’ll see other applications in this series.

In the next post we’ll meet another essential container – std::string – and we’ll see other algorithms.

Recap for the pragmatic C++ competitive coder:

• Don’t reinvent containers whenever standard ones fit your needs:
• Dynamic array? (e.g. int*) Consider std::vector
• Static array? (e.g. int[]) Use std::array
• Prefer standard algorithms to hand-made for loops:
• often more efficient, more correct and consistent,
• more maintainable (a “language in the language”)
• use standard function objects when possible
• use std::accumulate to combine a range of things together
• If customizing an algorithm results in complicated code, write a loop instead
• Think in terms of standard things:
• iterators separate algorithms from containers
• understand containers member functions