C++ in Competitive Programming: intro

Posted: February 15, 2016 in Competitive Programming
Tags: , ,

In the last two years I have turned into a Competitive Programming aficionado, mostly active on HackerRank, less on TopCoder and CodingGame. This kind of websites enable coders not only to compete with others, but also to practice by solving several categorized exercises. After you write your program into the browser, they run and test it against some prepared test-cases. In all probability you can use your favorite language because they support pretty much everything, but I consider really precious spending time with other languages as well.

When I started I just wanted to challenge myself and push my brain beyond its limits. I really appreciated that facing with some kind of problems was a good excuse to explore new areas of CS and learn something new. Even if at that time it was not my target, competitive programming is a great way to prepare for technical job interviews.

Speaking in general, there are several reasons to get involved in Competitive Programming. I found a very nice article about that here. I quote the point that was the most important for me when I started: it’s guaranteed brain excercise: “On any given day, what you’re asked to do at work may or may not be challenging, or you may be exercising one of the many non-technical skills required of software developers. By practicing regularly with competitive programming problems, you can ensure that the coding part of your brain receives a regular workout. It’s like one of those brain training apps, but for real skills.”

Regardless of your programming language, spend some time on competitive coding.

Why C++ in Competitive Programming?

 

If you solve on websites like HackerRank a challenge then you may peek into other people solutions. Looking at C++ solutions, I have found a lot of “C++ people” using C++ mostly as C. Many people don’t consider (don’t know?) what the standard offers. On the other hand, I find it natural to face challenges with modern C++ by my side, coding as simply as possible. For example, often I ask myself: may I use an algorithm instead of this for loop? This attitude is worthwhile here as it has been for years in my daily job.

I realize the word modern is overloaded: in the last years we all have probably heard the expression modern C++ everywhere and sometimes it looked like a buzzword. I mean using the full standard toolset of C++, without external libraries, nor over complicated abstractions. Mainly because in competitive programming you don’t have libraries (sometimes you don’t have the standard library neither) and you cannot lose time with too much generalization. Many times the same results fit real life development.

It’s worth noting that I’m not going to explain things like “how to make a linked list”. Excellent algorithms & ds courses/books/series have been developed for this purpose. Rather, expect to see how I used a forward list to solve a problem, where I found necessary a multiset, or when lower_bound saved my life.

I have told to some friends about this series and they suspected that C++ was not the right language for talking about Competitive Programming. Actually I had a look at the top ranked people on websites like TopCoder and HackerRank and C++ was always there (together with Java and Python, mainly). I found another indicative example on Google Code Jam: more of the higher-performing contestants code in C++.

I’m not surprised at all.

Certainly C and C++ are considered the lingua franca of algorithms and data-structures, but I think the main reason is the control C++ offers and its independence of the paradigm:  it does not force you a single “way of doing” to solve a challenge.

My friend Alessandro Vergani has another sensible idea: traditionally C and C++ have a poor standard library compared to newer languages like Java and C#, for this reason C/C++ programmers are at ease with writing everything they need from scratch – and often too much. This means they are fluent with coding things that in other languages are took for granted.

Moreover, I think since lots of participants are students, C++ is very common as a “first language”.

 

Challenges are about trade-off

 

My purpose is solving challenges without reinventing the wheel. This means if I can use standard things to make all the test cases pass, I’ll do it. Obviously, I’ll justify my decisions and I’ll discuss a bit. For example, what about this snippet for pretty printing a non-zero unsigned 32-bit integer N into its binary representation:

auto asBinary = bitset<32>(N).to_string();
asBinary.erase(0, asBinary.find_first_not_of('0')); // erasing leading zeros, if any
cout << asBinary;

It’s simple, but it’s not as efficient as it can be if written in other ways.

Does it suffice to solve the problem?

Maybe.

If so, I would be happy with this simple solution. If not, the challenge is asking me to find better ways to solve the problem.

Challenges are about trade-off. The requirements and the constraints of the problem matter a lot. They are a rough approximation of a real problem. Often you can exploit the constraints of the problem and take a shortcut. This may signify optimizations, or simplification, etc. Very often you have to exploit the constraints of the problem to solve it.

Many times these shortcuts are not so far from real life needs.

 

Standard reasoning

 

How many times you heard commands like “use standard algorithms” or “use a container”. With luck, we all know the classical resons for doing that: efficiency, correctness, maintainability. Now there is another fact we could care about: the future of our language.

Let me articulate.

Probably the biggest limitation of the STL is the lack of composability. One of the most mind-changing addition to C++17 will be Eric Niebler’s ranges. Ranges are basically a shift from iterators to a superior abstraction. Ranges enable programmers to write fluent and – hopefully – efficient algorithms.

Coding with ranges produces “pipelined” transformations. For example, to generate an infinite list of integers starting at 1, consider only even values, square them, take the first 10, and sum them:

int sum = accumulate(view::ints(1)
| view::remove_if([](int i){return i % 2 == 1;})
| view::transform([](int i){return i*i;})
| view::take(10), 0);

It’s not my intention to talk about ranges and I am sure you rule them better than me (if not, give it a try and refer to the documentation). My point is: Eric’s proposal includes standard combinators like transform and take, and mutating ones like sort and unique. They are essentially the counterparts of those in the STL. Thinking in terms of standard blocks help embrace ranges effectively. If you are used to employ standard functions, you are already confident with several range views and algorithms.

When I can, I’ll try to show my solutions by using ranges as well. Clearly ranges are not part of the standard yet then these solutions are not intended to run on competitive programming sites yet. I’ll adoperate Clang on Visual Studio 2015 for my attempts.

 

Standard flexibility

 

Many people don’t employ standard algorithms massively. For some of them the reason is that algorithms are often closer to math than to programming – even if programming is math…

Consider the simple task of finding the minimum difference in a sequence of ascending sorted numbers. For example, in this sequence:

[10, 20, 30, 100, 200, 300, 1000]

10 is the minimum absolute difference (e.g. 20-10).

This is an explicit solution:

vector<int> elems = ...;
auto minDiff = numeric_limits<int>::max();
for (auto i=0; i<elems.size()-1; ++i)
{
minDiff = min(elems[i+1]-elems[i], minDiff);
}

That solves the problem in a single pass.

The same problem written in Python:

minDiff = min([r - l for l, r in zip(elems, elems[1:])])

That is completely different because it employes a functional block that C++ misses: zip. Basically it combines elements pair-wise from N sequences (two in this case).

Look more closely at the C++ solution.

It’s basically an accumulation, right? In each iteration we calculate the difference between two adjacent elements and then we update the global minimum. But it’s not a std::accumulate because that function cannot operate on two sequences at the same time – at least without special iterators.

Any idea?

I am sure you found it. Your standard friend is inner_product:

auto minDiff = inner_product(v.begin(), v.end()-1, v.begin()+1, numeric_limits<int>::max(),
[](int l, int r){ return min(l, r); }, // "sum"
[](int l, int r) { return r-l; } // "product"
);

We’ll turn back to inner_product later in this series because it’s powerful, but just to recap: the simplest overload of inner_product returns the result of accumulating a starting value with the inner products of the pairs formed by the elements of two ranges. That is:

init += range1[0]*range2[0] + range1[1]*range2[1] + ... + range1[N-1]*range2[N-1]

There is more: both sum (+) and multiplication (*) can be customized. In my example, I replaced accumulation with min and product with absolute difference. I also set the initial value to the maximum possible int value – in math I would set it to infinity, the identity of the minimum operation (e.g. min(infinity, value) is value). Note also that by passing [begin, end-1] and [begin+1, end] I basically obtained these “pairs”:

range[0], range[1]
range[1], range[2]
...
range[N-2], range[N-1]

inner_product has a name closer to math than to CS, but it is actually a general function, that hides a rich flexibility. When I say flexibility I am not saying composability. Flexibility means that the same function can be adapted to different scenarios. Standard functions generally have a great flexibility.

But flexibility can lead to misuses…

Suppose after we calculate the minimum difference we want to output the pairs of elements with that difference. Recalling the first example:

[10, 20, 30, 100, 200, 300, 1000]

The minimum difference is 10 and the pairs of elements differing by this value are:

10, 20
20, 30

So basically we need to iterate over adjacent elements and print them if their difference is 10:

[20-10, 30-20, 100-30, ...]

Is there any standard algorithm for this purpose?

std::transform is really close, but it does not really perform what we need mainly for two reasons: first of all, it does not guarantee in-order application of the custom function – or in-order iteration – and second, transformation is N to N, that is each element in the source gets mapped into another thing.

Ideally we need something like a zip and a copy_if or copy_if and a special iterator type wrapping two adjacent elements at the same time – things that are, for example, in boost. For this reason, I think a for loop is the best choice in “bare” C++:

for (auto i=0; i<v.size()-1; ++i)
{
if (v[i+1] - v[i] == minDiff)
cout << v[i] << "," << v[i+1] << "\n";
}

Looking at this code, some C++ hackers may force std::equal to do the same thing, getting some sugar:

std::equal(begin(v), end(v)-1, begin(v)+1, [&](int l, int r) {
if (r-l == minDiff) cout << l << "," << r << "\n";
return true; // what?
});

The sugar is given by not accessing the vector and not writing the for loop explicitly.

The problem is that the snippet is unclear and hackish: std::equal has not been designed to do such things. What a programmer expects is just a pair-wise comparison, eventually using a custom predicate. You know, don’t fall into the common “if all you have is a hammer everything looks like a nail”. Just say no. The more you practice with standard algorithms, the more they will pop up in case they match. Pretty much automatically.

Flexibility comes with responsibility. Standard algorithms constitute well-known patterns. They help understand the code because other people have clear expectations from them. Hacking standard algorithms can lead to disasters.

If you miss a block, at least two ways are doable: if you don’t need to reuse it, go with a local solution. If reuse is important (and generally it is) then write a function designed to be reused. This requires some care, and keeping in mind how standard C++ algorithms are designed can be a good starting point.

Since “some care” requires time, in competitive programming this is not always feasible. For this reason people tend to go with custom solutions, even for summing the elements of an array. In this series I have more time and there is nothing to prevent me to articulate and to show some solutions based on simple functions that C++ misses. Pragmatic competitive programmers generally keep such functions and snippets on hand. Sometimes the standard already provides such functions, and it’s just the matter of practicing.

Although standard algorithms are really basic, they support a rich flexibility, as I said. This means you can roll your own special iterator and solve the problem in a single pass – imagine also ostream supports operator<< for pairs:

copy_if(adj_it(begin(v)), adj_it(end(v)), ostream_iterator<pair<int, int>>(cout, '\n') [&](const auto& p){
return p.second - p.first == minDiff;
});

Is it worth? There is not a single answer to this question. It’s really contextual, it depends on where you work at, if your codebase is too much complicated to need these abstractions, etc. Whenever you introduce (or use from a library) a new block (e.g. adj_it), it should be learnt by you and – hopefully – who works on your codebase.

Ranges should help a lot in these scenarios because in addition to a rich flexibility they have a strong composability. Let me turn the “partial C++17 support” on and show you my range-based solution to the initial problem:

auto pairs = view::zip(vi, view::tail(vi));
auto minDiff = ranges::min(pairs | view::transform([](auto p) { return p.second - p.first; }));
for (const auto& p : pairs | view::remove_if([&](auto p) { return p.second - p.first != minDiff; }))
cout << p.first << ", " << p.second << "\n";

It seems completely different from the inner_product one, but what it really changes is the way of thinking the algorithm. This solution is probably closer to Python than to C++, isn’t it? That’s because more blocks get available and composition plays an important role. This makes our way of thinking the algorithm closer to other paradigms. Again, this is an act of responsibility: choosing between – say – a for loop and a range-based solution can be disputable.

Are the inner_product and the ranges based solutions hard to understand? Definitely they are for programmers who don’t know standard idioms. It’s the same for map.insert that does not insert at all if the element is already there. Is std::map misleading? I know people who spend more time at dumping on standard things than the time they would need to learn standard things properly. Mastering language patterns is the key, and in the next posts I’ll show some patterns involving standard algorithms and containers.

In this series, first I’ll consider pre-C++17 solutions and whenever I can I’ll outline possible C++17 answers.

An invitation about ranges: please, post comments with your suggestions/corrections/opinions on my solutions because I think use cases are important both for Eric and for the whole C++ ecosystem.

 

What’s next

 

I don’t have a precise idea for the other posts yet. It’s possible I’ll discuss the usage of a certain standard algorithm in different scenarios, or how to deal with some kind of problems, or how to use a container to solve certain challenges, etc.

Definitely, since this series is about competitive programming, I’ll spend some posts on some peculiar aspects of challenges like paying attention to the constraints of the problem, or choosing the proper data type for the solution.

The next post will be kind of preliminary to the others. It’s basically about input and output.

Final note about code snippets

You probably noticed I used gist to render code snippets in this post. For the first time I needed to completely bypass WP: yesterday this post was broken because of snippets, with no apparent reason. I tried to fix them but after one hour of attempts I gave up. WP is the enemy of programming bloggers. I know, I should move to Jekyll or something better, but WP still has some advantages. For this reason I’m experimenting this “embedding Gist feature”. The procedure is a bit longer (create gist, paste code, give a sensible name, save, copy the URL into the post) but I think I would spend the same time at fixing WP troubles, so it’s ok for now.

Comments
  1. travor says:

    What’s wrong with the std::adjacent_difference? 😉
    And that’s is a part of a problem.
    The problem with a standard library is – somebody have to memorize it, because looking the things up helps only if you know already about them (adjacent_diff). It’s a trade off – either you/your company have to do more coding, but the code is specialized for your tasks or we all have to play memory.

    P.S. Sorry for my English. I’m not a native speaker/writer :).

    • Marco Arena says:

      The problem with adjacent_difference in the example not only is about iterating twice the vector (first for calculating the differences, second for obtaining the least value), but also about taking extra care for the first element (that is actually equal to the first in the original vector).
      In the next posts I’ll show patterns that will help to understand quickly if certain algorithm is already in the STL. Yes, it’s a tradeoff, but it is the same trade-off as knowing how standard things work (e.g. std::map).

      • adjacent_difference is made for this!
        And you don’t have to do two passes, nor take extra care of first element.
        Just have to do it slightly differently. http://coliru.stacked-crooked.com/a/edfe0d181c5f8a3d

        But thanks for showing various other methods, and the whole series 🙂

      • Marco Arena says:

        I think you should use adjacent_difference when you need to reuse the intermediate results and for this reason adjacent_difference outputs into a destination range. In this scenario we just need a zip + reduce, that is a job for inner_product. In this series we’ll try to understand when an algorithm fits better than another, and this is an example I’ll discuss a bit more in the future, so thanks for commenting 🙂

  2. […] C++ in Competitive Programming: intro […]

  3. Pete Barber says:

    Enjoying this series. In fact joined Hacker Rank. Are ranges (I’m guessing the Eric Niebler ones) available on HR. Any boost libs? The articles are interesting both from a CC and C++ perspective.

    • Marco Arena says:

      Thanks Pete!
      Ranges and boost are not available on CC websites. If ranges will be adopted in C++ at some point, I think they will be included in GCC, that is the compiler usually hosted on HR and other CC websites.

      • Pete Barber says:

        So when you mentioned ranges this was in a general C++ context rather than actually using them in CC?

      • Marco Arena says:

        It was because at some point they will be in the standard and compilers will ship a proper implementation. In the next posts I’ll try to understand how to use ranges to solve some challenges and I’ll post my solutions. I think this is useful also to start playing with the future of C++.

  4. […] C++ in Competitive Programming: intro […]

  5. […] C++ in Competitive Programming: intro […]

Leave a comment