Kadane in next-gen C++

Posted: June 5, 2019 in Programming Recipes
Tags: , , , ,

“How many moves ahead could you calculate?”

“Just one, the best one”.

The legendary answer by José Capablanca – a world chess champion of the last century – indicates a commonly known fact: chess champions win by being better at recognizing patterns that emerge during the game. They remember meaningful chess positions better than beginners. However, experts do not remember random positions effectively better than non-experts.

Patterns serve as a kind of shorthand that’s easier to remember than a meaningless configuration of pieces that could not occur in a real game.

This fact does not apply to chess only. Our brain works by constantly recognizing, learning and refining patterns on the world.

The reason is efficiency: the brain applies such an optimization to ignore some of the possible choices we have in every situation. Thus, experts get better results while thinking less, not more.

You should know another category of experts who are usually good at recognizing patterns: programmers.

We, programmers, get better results while thinking in patterns. We decompose complex problems in combinations of simpler patterns. However, many problems cannot be solved with known patterns. Here is where creativity kicks in.

However, not only creativity enables human beings to solve unknown problems with new ideas, but it’s also the capacity to reinterpret known problems in new and inspiring ways. It’s the art of creating new associations. As Jules Henri Poincaré once said “creativity is the ability of unite pre-existing elements in new combinations that are useful”. In programming, “pre-existing elements” are commonly called patterns.

That’s the spirit of this article. I will revisit a classical problem from another perspective.

The article is a bit verbose because the style is didactic: I spend some time explaining the example problem and the solution. If you already know the Maximum Subarray Problem, you can skip the following section. Even though you know Kadane’s algorithm, it’s worth reading the dedicated section anyway because I get to the solution from a slightly different point of view than the canonical one.

The Maximum Subarray Problem

Let me introduce one protagonist of the story, the famous “Maximum Subarray” problem whose linear solution has been designed by Joseph “Jay” Kadane in the last century.

Here is a simple formulation of the problem:

Given an array of numbers, find a contiguous subarray with the largest sum.

We are just interested in the value of the largest sum (not the boundaries of the subarray).

Example:

[-3,1,-3,4,-1,2,1,-5,4]

Max subarray: [4,-1,2,1]. Sum: 6.

Clearly, the problem is interesting when the array contains negative numbers (otherwise the maximum subarray is the whole array itself).

The literature around this problem is abundant. There exist different conversations about that. We’ll focus only on Kadane’s algorithm, the most famous (and efficient) solution.

The theory behind Kadane’s algorithm is not straightforward and it’s beyond the scope of this didactic post. The algorithm lies in an area of programming called Dynamic Programming, one of the hardest techniques to solve problems in computer science – and one of the most efficient as well. The basic principle of Dynamic Programming consists in breaking a complex problem into simpler sub-problems and caching their results.

For example, the task of calculating “1+1+1+1” can be broken down this way: first we calculate “1+1=2”, then we add “1” to get “3” and finally we add “1” to get “4”. Implicitly, we have “cached” the intermediate results since we did not started from scratch every time – e.g. to calculate (1+1)+1 we started from “1+1=2”. A more complex example is Fibonacci: each number is calculated from the famous formula fibo(n) = fibo(n-1) + fibo(n-2).

For example:

fibo(4) = fibo(3) + fibo(2)

= fibo(2) + fibo(1) + fibo(2)

= fibo(1) + fibo(0) + fibo(1) + fibo(2)

= fibo(1) + fibo(0) + fibo(1) + fibo(1) + fibo(0)

The sub-problems are called “overlapping” since we solve the same sub-problem multiple times (fibo(2) called twice and fibo(1) and fibo(0) three times). However, the main characteristic of Dynamic Programming is that we do not recalculate the sub-problems that we have already calculated. Instead, we “cache” them. Without stepping into further details, there exist two opposite approaches which come with a corresponding caching strategy: Top-Down and Bottom-Up. Roughly speaking, the former is recursive, the latter is iterative. In both we maintain a map of already solved sub-problems. More formally, in the Top-Down approach the storing strategy is called memoization, whereas in the Bottom-Up one it is called tabulation.

In Bottom-Up we go incrementally through all the sub-problems and reuse the previous results. For instance:

table[0] = 0;
table[1] = 1;

for(auto i=2; i<=n; ++i)
   table[i] = table[i-1] + table[i-2];
return table[n];

On the other hand, Top-Down involves recursion:

// suppose memo has size n+1
int fibo(std::vector<int>& memo, int n) 
{
   if(n < 2)
      return n;

   if(memo[n] != 0)
      return memo[n];

   memo[n] = fibo(memo, n-1) + fibo(memo, n-2);
   return memo[n];
}

Now that we have a bit of background, let’s quickly meet Kadane’s algorithm.

Kadane’s algorithm

Kadane’s algorithm lies in the Bottom-Up category of Dynamic Programming, so it works by first calculating a solution for every sub-problem and then by using the final “table” of results in some way. In Fibonacci, we use the table by just popping out its last element. In Kadane we do something else. Let’s see.

My explanation of the algorithm is a bit different from the popular ones.

First of all, we should understand how the table is filled. Differently than Fibonacci, Kadane’s table[i] does not contain the solution of the problem at index i. It’s a “partial” result instead. Thus, we call such a table “partial_maxsubarray”.

partial_maxsubarray[i] represents a partial solution on the subarray ending at the ith index and including the ith element. The last condition is the reason why the result is partial_. Indeed, the final solution might not include the ith element.

Let’s see what this means in practice:

[-3,1,-3,4,-1,2,1,-5,4]

partial_maxsubarray[0] means solving the problem only on [-3], including -3.
partial_maxsubarray[1] is only on [-3, 1], including 1.
partial_maxsubarray[2] is only on [-3, 1, -3], including -3.
partial_maxsubarray[3] is only on [-3, 1, -3, 4], including 4.
partial_maxsubarray[4] is only on [-3, 1, -3, 4, -1], including -1.
partial_maxsubarray[5] is only on [-3, 1, -3, 4, -1, 2], including 2.
partial_maxsubarray[6] is only on [-3, 1, -3, 4, -1, 2, 1], including 1.
partial_maxsubarray[7] is only on [-3, 1, -3, 4, -1, 2, 1, -5], including -5.
partial_maxsubarray[8] is only on [-3, 1, -3, 4, -1, 2, 1, -5, 4], including 4.

For each index i, the ith element will be included in the partial_maxsubarray calculation. We have only a degree of freedom: we can change where to start.

Consider for example partial_maxsubarray[2]. If the main problem was on [-3, 1, -3], the solution would have been 1 (and the subarray would have been [1]). However, partial_maxsubarray[2] is -2 (and the subarray is [1, -3]), because of the invariant.

Another example is partial_maxsubarray[4] that is not 4 as you might expect, but 3 (the subarray is [4, -1]).

How to calculate partial_maxsubarray[i]?

Let’s do it by induction. First of all, partial_maxsubarray[0] is clearly equal to the first element:

partial_maxsubarray[0] = -3

Then, to calculate the next index (1) we note that we have only one “degree of freedom”: since we must include 1 anyway, we can either extend the current subarray by one (getting [-3, 1]) or we can start a new subarray from position 1. Let me list the two options:

  1. extend the current subarray, getting [-3, 1], or
  2. start a new subarray from the current index, getting [1].

The choice is really straightforward: we choose the subarray with the largest sum! Thus, we choose the second option (partial_maxsubarray[1] = 1).

To calculate partial_maxsubarray[2]:

  1. keep 1, [1, -3], or
  2. start a new subarray [-3]

Clearly, the former is better (partial_maxsubarray[2] = -2).

Again, partial_maxsubarray[3]:

  1. keep 4, [1, -3, 4], or
  2. start a new subarray [4]

The latter is larger (partial_maxsubarray[3] = 4).

Do you see the calculation underneath?

For each index, we calculate partial_maxsubarray[i] this way:

partial_maxsubarray[i] = max(partial_maxsubarray[i-1] + v[i], v[i])

At each step i, we decide if either start a new subarray from i or extend the current subarray by one on the right.

Once we have filled partial_maxsubarray, do you see how to use it to calculate the solution to the main problem?

Let’s recall how we calculated partial_maxsubarray[2]:

partial_maxsubarray[0] = -3
partial_maxsubarray[1] = 1

partial_maxsubarray[2] = max(partial_maxsubarray[1] + v[2], v[2])

Since v[2] is -3, we ended up with -2. Thus, partial_maxsubarray[1] is larger than partial_maxsubarray[2].

Running the algorithm on the remaining indexes we get:

partial_maxsubarray[3] = 4
partial_maxsubarray[4] = 3
partial_maxsubarray[5] = 5
partial_maxsubarray[6] = 6
partial_maxsubarray[7] = 1
partial_maxsubarray[8] = 5

It turns out that partial_maxsubarray[6] has the largest value. This means there is a subarray ending at index 6 having the largest sum.

Thus, the solution to the main problem is simply calculating the maximum of partial_maxsubarray.

Let’s write down the algorithm:

int kadane(const vector<int>& v)
{
    vector<int> partial_maxsubarray(v.size());
    partial_maxsubarray[0] = v[0];
    for (auto i = 1u; i<v.size(); ++i) 
    {
      partial_maxsubarray[i] = std::max(partial_maxsubarray[i-1] + v[i], v[i]);
    }

    return *max_element(begin(partial_maxsubarray), end(partial_maxsubarray));
}

If you knew this problem already, you have probably noticed this is not the canonical way to write Kadane’s algorithm. First of all, this version uses an extra array (partial_maxsubarray) that is not used at all in the classical version. Moreover, this version does two iterations instead of just one (the first for loop and then max_element).

“Marco, are you kidding me?” – Your subconscious speaks loudly.

Stay with me, you won’t regret it.

Let me solve the two issues and guide you to the canonical form.

To remove the support array, we need to merge the two iterations into one. We would kill two birds with one stone.

We can easily remove the second iteration (max_element) by calculating the maximum along the way:

int kadane(const vector<int>& v)
{
    vector<int> partial_maxsubarray(v.size());
    partial_maxsubarray[0] = v[0];

    auto maxSum = partial_maxsubarray[0];
    for (auto i = 1u; i<v.size(); ++i) 
    {       
        partial_maxsubarray[i] = std::max(partial_maxsubarray[i-1] + v[i], v[i]);
        maxSum = max(maxSum, partial_maxsubarray[i]);
    }   
    return maxSum;
}

After all, a maximum is just a forward accumulation – it never goes back.

Removing the extra array can be done by observing that we do not really use it entirely: we only need the previous element. Even in Fibonacci, after all we only need the last two elements to calculate the current one (indeed, removing the table in Fibonacci is even easier). Thus, we can replace the support array with a simple variable:

int kadane(const vector<int>& v)
{
    int partialSubarraySum, maxSum;
    partialSubarraySum = maxSum = v[0];
    for (auto i = 1u; i<v.size(); ++i) 
    {       
        partialSubarraySum = max(partialSubarraySum + v[i], v[i]);
        maxSum = max(maxSum, partialSubarraySum);
    }   
    return maxSum;
}

The code above is likely more familiar to readers who already knew Kadane’s algorithm, isn’t it?

Now, let’s have some fun.

Emerging patterns

As most people, the first time I saw Kadane’s algorithm, it was in the canonical form. At the time, I didn’t notice anything particular. It was 2008 and I was at the university.

Many years passed and I met the problem again in 2016. In the last years, I have been regularly practicing with coding challenges to develop my ability to “think in patterns”. With “pattern” I mean simply a “standard solution to a standard problem, with some degree of customization”. For example, “sorting an array of data” or “filtering out a list” are patterns. Many implementations of patterns are usually provided in programming languages standard libraries.

I am used to consider every C++ standard algorithm as a pattern. For example, std::copy_if and std::accumulate are, for me, two patterns. Some algorithms are actually much more general in programming. For example, std::accumulate is usually known in programming as fold or reduce. I have talked about that in a previous post. On the other hand, something like std::move_backwards is really C++ idiomatic.

Thinking in patterns can be some good for many reasons.

First of all, as I have mentioned at the beginning of this article, our brain is designed to work this way. Cognitive scientist call “the box” our own state of the art, our own model of the world that enables us to ignore alternatives. Clearly, the box has pros and cons. Constantly thinking inside the box works as long as we deal with known problems. Thinking outside the box is required to solve new challenges. This is creativity.

When I think of creativity, I think of cats: they can be coaxed but they don’t usually come when called. We should create conditions which foster creativity. However, something we can intentionally influence is training our own brain with pattern recognition and application. To some extent, this is like “extending our own box”. This is what I have been doing in the last years.

Another benefit of thinking in patterns is expressivity. Most of the times, patterns enable people to express algorithms fluently and regardless of the specific programming language they use. Patterns are more declarative than imperative. Some patterns are even understandable to non-programmers. For example, if I ask my father to “sort the yogurt jars by expiration date and take the first one”, that’s an easy task for him.

So, in 2016 something incredible happened. When I met again Kadane’s algorithm, my brain automatically recognized two patterns from the canonical form. After visualizing the patterns in my mind, I broke down the canonical form in two main parts. This is why I first showed you this version of the algorithm:

int kadane(const vector<int>& v)
{
    vector<int> partial_maxsubarray(v.size());

    partial_maxsubarray[0] = v[0];
    for (auto i = 1u; i<v.size(); ++i) 
    {
      partial_maxsubarray[i] = std::max(partial_maxsubarray[i-1] + v[i], v[i]);
    }

    return *max_element(begin(partial_maxsubarray), end(partial_maxsubarray));
}

The second pattern is clearly maximum (that is a special kind of reduce, after all).

What is the first one?

Someone might think of reduce, but it is not. The problem with reduce is that it does not have “memory” of the previous step.

The pattern is prefix sum. Prefix sum is a programming pattern calculating the running sum of a sequence:

array = [1, 2, 3, 4]
psum  = [1, 3, 6, 10]

How does that pattern emerge from Kadane’s algorithm?

Well, “sum” is not really an addition but it’s something different. The update function emerges from the loop:

thisSum = std::max(previousSum + vi, vi);

Imagine to call this line of code for every element of v (vi).

In C++, prefix sum is implemented by partial_sum. The first element of partial_sum is just v[0].

Here is what the code looks like with partial_sum:

int kadane(const vector<int>& v)
{
    vector<int> partial_maxsubarray(v.size());

    partial_sum(begin(v), end(v), begin(partial_maxsubarray), [](auto psumUpHere, auto vi){
        return max(psumUpHere + vi, vi);
    });
    
    return *max_element(begin(partial_maxsubarray), end(partial_maxsubarray));
}

When I ran this code getting a green bar I felt very proud of myself. I didn’t spend any effort. First of all, my brain recognized the pattern from the hardest version of the code (the canonical form). My brain popped this insight from my unconscious tier to my conscious reasoning. Then I did an intermediate step by arranging the code in two main parts (the cumulative iteration and then the maximum calculation). Finally, I applied partial_sum confidently.

You might think this is useless. I think this is a great exercise for the brain.

There is something more.

Since C++17, the code is easy to run in parallel:

int kadane(const vector<int>& v)
{
    vector<int> partial_maxsubarray(v.size());

    inclusive_scan(execution::par, begin(v), end(v), begin(partial_maxsubarray), [](auto psumUpHere, auto vi){
        return max(psumUpHere + vi, vi);
    });

    return *max_element(execution::par, begin(partial_maxsubarray), end(partial_maxsubarray));
}

inclusive_scan is like partial_sum but it supports parallel execution.

Beauty is free

Some time ago I read a short post titled “Beauty is free” that I cannot find anymore. The author showed that the execution time of an algorithm coded with raw loops gave the same performance as the same one written with STL algorithms.

Compared to the canonical form, our “pattern-ized” alternative does two scans and uses an extra array. It’s clear that beauty is not free at all!

The reason why I am writing this article now and not in 2016 is that I have finally found some time to try my solution with range v3. The result – in my opinion – is simply beautiful. Check it out:

int kadane(const vector<int>& v)
{
   return ranges::max(view::partial_sum(v, [](auto s, auto vi) { return std::max(s+vi, vi); }));
}

view::partial_sum is a lazy view, meaning that it applies the function to the ith and (i-1)th elements only when invoked. Thus, the code does only one scan. Moreovwer, the support array is vanished.

Running a few performance tests with clang -O3, it seems that the optimizer does a better job on this code rather than on the canonical form. On the other hand, the code does not outperform the canonical one on GCC. As I expect, running the range-based code in debug is about 10 times slower. I have not tried on Visual Studio. My tests were not accurate so please take my affirmations with a grain of salt.

I would like to inspire you to take action. Practice is fundamental.

A common question people ask me is “how can I practice?”. This deserves a dedicated post. Let me just tell you that competitive programming websites are a great source of self-contained and verifiable challenges, but they are not enough. You should creatively use real-world problems to practice.

Ranges are the next-generation of STL. Ranges are the next-generation of C++.

However, if you want to learn how to use ranges, you have to know and apply STL patterns first.

Ranges are beyond the scope of this article. The documentation is here. A few good posts here and here. Help is needed as it was in 2011 to popularize C++11.

I hope to blog again on some extraordinary patterns and how to use them. For now, I hope you have enjoyed the journey through a new interpretation of Kadane’s algorithm.


Some scientific notions of this article come from The Eureka Factor.

Comments
  1. Hi Marco, it’s a pleasure reading you posts. I wrote a little test program to compare the performances of all the proposed solutions. I used only g++ 9.1 with its Intel TBB based implementation of C++17 parallel algorithms. Surprisly, the parallel version produces a different result. Have you experienced the same issue?

    • Marco Arena says:

      Hi Gennaro,
      thanks for your feedback!
      I have not tried parallel STL on GCC. The only reason I can find is that parallel versions have non-deterministic execution. In theory, since the function used in inclusive_scan is both associative and commutative, I expect consistent results.

      I can be wrong though!

      • Hi Marco,
        thanks for your reply. I think the result produced by parallel version differs from others because by breaking down the original sequence in multiple pieces, all parallel tasks will do a local scan and will produce local maximums. In the final step the biggest will be choosen. This maximum may differ from what is produced by a sequential version that scans the array as a whole.

      • Marco Arena says:

        This means that the function used in the partial_sum is not associative. I am not able to prove it, though…

Leave a comment