Reversing words of a string with ranges -#thatsarotate

Posted: June 23, 2021 in Competitive Programming, Programming Recipes
Tags: , , , ,

In this post I share the story of a “C++ aha-moment”, hence I use the hashtag #thatsarotate (resumed last Saturday at the Italian C++ Conference 2021 to honor our keynote speaker Sean Parent). Why not sharing your own stories of “C++ aha-moments” adding the same hashtag to your tweets, posts, or whatever? It would be really appreciated.

A classical coding puzzle states: “reverse the order of the words of a string”.

For example:

reverse words in this string

Should become:

string this in words reverse

This problem could be solved either in-place or using additional storage.

Think about it for a while, if you like.

A common in-place solution consists in reversing the string as-is and then reversing the individual words:

reverse words in this string

=> 

gnirts siht ni sdrow esrever

=>

string this in words reverse

Alternative solutions exist (also variants of the same puzzle with some additional requirements on whitespaces), yet my focus here is not discussing about them. Rather, I would like to show you a possible interpretation of the idea with C++ ranges. Well, not really C++20 ranges but range-v3 instead (hoping that most of the missing pieces will be merged into C++23).

A simple C++23 solution

When I applied ranges on this problem for the first time, I tried the following solution but range-v3 misled me, so I abandoned the idea. A reader commented on Reddit that this one works:

auto output = input | reverse | split(' ') | transform(reverse);

The code above has actually a caveat: it “consumes” all the delimiters. On the other hand, the solution(s) I am going to show you work for all kinds of spacing and delimiters.

Actually, as another reader commented on Reddit, the code above exploits a library feature of C++23 and it’s not currently supported by most of the available compilers.

I would like to thank the people who commented on Reddit and digged into such details. I have learned something totally new.

Why not just combining split and reverse?

We might intuitively think of jotting down a solution like this:

std::string input = "reverse words in this string";
reverse(input); // in-place reverse
auto reversed = input | views::split(' ') | views::reverse;

Unfortunately, that would not compile: split and reverse are not composable and that’s a known design decision. Basically, split produces at most a forward range, but reverse requires at least a bidirectional range. Read more details on this nicely tailored article by Walletfox.

Apropos, did you know we have 25% discount until the end of July on Walletfox’s awesome manual about ranges? Just drop this coupon code on the payment form: ITALIANCPP25. I have personally practiced ranges with this book and it was definitely worth it.

When I tried working around the split/reverse problem to solve this puzzle, I got stuck for a while:

std::string input = "reverse words in this string";
reverse(input); // gnirts siht ni sdrow esrever
// now what?

The problem remained unsolved for some time… Suddenly, when I was doing something totally unrelated, I got an insight.

“That’s a rotate!”

Well, it’s not really a std::rotate, but just a C++ aha-moment!

Let me drive you towards my intuition.

What’s in a word, after all?

Let’s get back to this code:

std::string input = "reverse words in this string";
reverse(input); // gnirts siht ni sdrow esrever

The continuation of the solution is based on the following intuition: “words” are made of characters of the same “kind”. That is:

  • alphabetical characters only
  • non-alphabetical characters only (e.g. whitespaces)

Applying this idea, the example string could be divided into these chunks:

["gnirts", " ", "siht", " ", "ni", " ", "sdrow", " ", "esrever"]

Reversing the individual sub-ranges here would be straightforward (and reversing sub-ranges containing whitespaces doesn’t really matter).

To create such a range, we could think of what this operation is really doing. It iterates over the elements one by one, and it puts together all those sharing the same “alphabeticalness”. Anytime it finds one which does not, it makes a new “chunk”. In contrast to split, this one does not really “eats” characters. Every chunk might be reversed afterwards since, intuitively, it just references a portion of the data.

For example:

gnirts siht ni sdrow esrever
^
 ^
  ^
   ^
    ^
     ^
      ^ <-- the first chunk ends here!

Checking if any two characters have the same “alphabeticalness” is quite straightforward:

[](auto c1, auto c2) {
    return isalpha(c1) == isalpha(c2);
};

Note that you might replace isalpha with isalnum to add numbers to the party.

As a reader commented, to be more pedantic, we should explicitly use unsigned char as arguments:

[](unsigned char c1, unsigned char c2) {
    return isalpha(c1) == isalpha(c2);
};

However, since the domain of the string only for this problem contains letters and whitespaces only, the representation is always safe.

In addition, we have a comment by Alessandro “Loghorn” Vergani on Twitter:

isalpha returns an int: 0 if the character is not alphabetical, any other number if it is. So, your comparison could fail even if two consecutive chars are both alphabeticals.

So we might change the predicate this way:

[](auto c1, auto c2) {
    return isalpha(c1) && isalpha(c2);
}

Since all the compilers I tried work with the first version of the predicate, I will leave it for the rest of the article. Bear in mind the clever comment by Alessandro, though.

That function returns true only if c1 and c2 are both alphabetical or non-alphabetical. Mixing alphabetical with non-alphabetical returns false, and this is expected.

Now what? Do we have any views that can help here group the range of characters by that predicate?

Well, this is a sort of auto-answering question 🙂

That’s a group_by!

std::string input = "reverse words in this string";
reverse(input); // gnirts siht ni sdrow esrever
auto output = input | views::group_by([](auto c1, auto c2) {
    return isalpha(c1) == isalpha(c2);
}) | ...

group_by is one of those patterns that are relatively new for C++. I mean that group_by does not imitate any of the existing STL algorithms. In contrast to others like, for example, views::partial_sum that expresses the same intentions as std::partial_sum.

To be precise, the resulting range will look like this:

[ ['g', 'n', 'i', 'r', 't', 's'], [' '], ['s', 'i', 'h', 't'] ... ]

group_by is a very powerful tool: it takes a binary predicate, (T, T) -> bool, and invokes this predicate on consecutive elements, starting a new group when that predicate returns false. Actually, group_by takes a shortcut: the left hand side of the check is always the first element of the consecutive range of elements. Here, it starts from the first element (g) and finds the first character for which the predicate does not evaluate to true:

gnirts siht ni sdrow esrever
^^
^ ^
^  ^
^   ^
^    ^
^     ^ <- found!

When the predicate checks g and ' ' , it returns false and then group_by yields a new group (sub-range). Then it finds the predicate is false again on ' ' and s, leading to a sub-range consisting only of a single ' '. Next, it finds 'siht' , and so on.

Moreover, we could catch sight of group_by if we implemented the solution of the puzzle in a more classical C++ way:

std::reverse(begin(s), end(s));
auto head = begin(s);
while (head != end(s))
{
	auto chunkIt = std::find_if_not(head, end(s), [=](auto i) {
		return isalpha(*head) == isalpha(i);
	});
	std::reverse(head, chunkIt);
	head = chunkIt;
}

Indeed, group_by (just the idea, not the view), could be implemented as follows (roughly tested):

auto head = begin(rng);
while (head != end(rng))
{
	auto chunkIt = std::find_if_not(head, end(rng), [=](auto i) {
		return pred(*head, i);
	});
	//... the sub-range is [head, chunkIt) ...
	head = chunkIt;
}

After all, the common pattern emerging from the two snippets above could be considered a form of the two-pointer technique.

Then, we need to reverse each group. Since we need to do an operation for each element of the range, we could call views::for_each for rescue:

std::string input = "reverse words in this string";
reverse(input); // gnirts siht ni sdrow esrever
auto output = 
 input | views::group_by([](auto c1, auto c2) {
            return isalpha(c1) == isalpha(c2);
         })  // [['g', 'n', 'i', 'r', 't', 's'], [' '], ['s', 'i', 'h', 't'], ...]
       | views::for_each(views::reverse_fn{}); // ['s', 't', 'r', 'i', 'n', 'g', ' ', ...]

That’s it! But…

The laziness is strong with this one

Feeling happy came after I compiled and ran the snippet above for the first time. However, just after pressing the launch button, I realized something more: the first string reverse could be done lazily:

std::string input = "reverse words in this string";
auto output = views::reverse(input) 
       | views::group_by([](auto c1, auto c2) {
            return isalpha(c1) == isalpha(c2);
         })  // [['g', 'n', 'i', 'r', 't', 's'], [' '], ['s', 'i', 'h', 't'], ...]
       | views::for_each(views::reverse_fn{}); // ['s', 't', 'r', 'i', 'n', 'g', ' ', ...]

std::cout << output;

Let’s see what is going on here:

  • for_each will apply reverse_fn on one element and “pulls” such an element from the block behind,
    • then group_by is asked for one element. One element here means one sub-range, made of consecutive elements taken until the predicate evaluates to false,
      • such elements are taken from the step behind that is reverse that takes elements from input starting from the back: g, n, i, r, t, s, ' ', s, i, ...
    • group_by then can make the first sub-range [g, n, i, r, t, s]. That sub-range is given back to for_each,
  • finally, for_each applies reverse_fn on what group_by returned, getting back the letters s, t, r, i, n, g
  • Bear in mind that for_each joins all the results together and thus the final range is flattened ([s, t, r, i, n, g, , t, h, i, s, , i, n, ...]). Try replacing for_each with transform and spot the differences!

This leads to the end of the story.

However, If you take a closer look, you will see that the range-based solution can’t really replace the in-place solution. Well, this educational post aimed just to reinterpret the idea with ranges, explaining a possible approach. This is not necessarily a replacement.

But I could push a bit more and show you that thinking range-fully could be useful for solving the problem in-place too. We are not so far from it, after all.

Hopefully, this could be educational too.

A down-to-earth in-place implementation

Let me start from here:

std::string input = "reverse words in this string";
reverse(input); // gnirts siht ni sdrow esrever

At this point, even though this affirmation does not make any sense at all, I would like to call actions::reverse (or ranges::reverse) on each of the sub-ranges produced by group_by.

Actually, my desire is more realistic if we see every sub-range as a slice of the original string. Each slice begins where the previous one ends. The very first slice begins at 0, obviously. You can imagine to use views::slice to take such portions [lower bound, upper bound) of the input range:

gnirts siht ni sdrow esrever
0     6

views::slice(input, 0, 6) => gnirts
views::slice(input, 6, 7) => ' '
...

Reversing each slice is easy:

actions::reverse(views::slice(input, 0, 6));
actions::reverse(views::slice(input, 6, 7));
...

The problem could be turned into the problem of producing the range of each slice bounds.

Think about it for a while.

To do so, first of all, we transform each sub-range produced by group_by to its length:

auto groups = input | views::group_by([](auto c1, auto c2) {
        return isalpha(c1) == isalpha(c2);
});
auto bounds = groups | views::transform([](auto g) { return size(g); })
                     | ...

For example:

gnirts siht ni sdrow esrever
[ ['g', 'n', 'i', 'r', 't', 's'], [' '], ['s', 'i', 'h', 't'] ... ]
=> [6, 1, 4, 1, 2, 1, 5, 1, 7]

Now look: the first slice gnirts goes from 0 to 6 (exclusive), the second one (only containing the first whitespace) goes to 6 to 6 + 1 (exclusive), the third one (siht) goes from 6 + 1 to 6 + 1 + 4 (exclusive),…do you see the pattern?

That’s a partial_sum!

auto groups = input | views::group_by([](auto c1, auto c2) {
        return isalpha(c1) == isalpha(c2);
});
auto bounds = groups | views::transform([](auto g) { return size(g); })
                     | views::partial_sum;

This leads to:

gnirts siht ni sdrow esrever
[ ['g', 'n', 'i', 'r', 't', 's'], [' '], ['s', 'i', 'h', 't'] ... ]
=> [6, 1, 4,  1,  2,  1,  5,  1,  7]
=> [6, 7, 11, 12, 14, 15, 20, 21, 27]

The very first 0 is missing. Let’s push it at front:

auto groups = input | views::group_by([](auto c1, auto c2) {
        return isalpha(c1) == isalpha(c2);
});
auto bounds = views::concat(
               views::single(0),
               groups | views::transform([](auto g) { return size(g); }))
              ) | views::partial_sum;

Easy peasy:

gnirts siht ni sdrow esrever
[ ['g', 'n', 'i', 'r', 't', 's'], [' '], ['s', 'i', 'h', 't'] ... ]
=> [0, 6, 1, 4,  1,  2,  1,  5,  1,  7]
=> [0, 6, 7, 11, 12, 14, 15, 20, 21, 27]

We are almost there. Now we need to turn this range of numbers into [lower bound, upper bound) sub-ranges. Basically, we need to create the range of 2-element sliding windows on top of the current range. After making such windows, we could create the corresponding string slices and apply actions::reverse on each. Here is the expected range of windows:

gnirts siht ni sdrow esrever
[ ['g', 'n', 'i', 'r', 't', 's'], [' '], ['s', 'i', 'h', 't'] ... ]
=> [0, 6, 1, 4,  1,  2,  1,  5, 1, 7]
=> [[0, 6], [6, 7], [7, 11], [11, 12], [12, 14], [14, 15], [15, 20], [20, 21], [21, 27]]

Indeed, views::slice(input, 0, 6) yields exactly ['g', 'n', 'i', 'r', 't', 's'], views::slice(input, 6, 7) yields [' '], and so on.

One possible way to get this range consists in using views::sliding(2):

auto bounds = views::concat(
                views::single(0),
                groups | views::transform([](auto g) { return size(g); })
              ) | views::partial_sum
                | views::sliding(2);

Then we can iterate this range and reverse each slice of the string:

for (auto window : bounds)
    actions::reverse(views::slice(input, *begin(window), *++begin(window)));

std::cout << input;

Here is the full code:

std::string input = "reverse words in this string";
reverse(input); // gnirts siht ni sdrow esrever

auto groups = input | views::group_by([](auto c1, auto c2) {
        return isalpha(c1) == isalpha(c2);
});

auto bounds = views::concat(
                views::single(0),
                groups | views::transform([](auto g) { return size(g); })
              ) | views::partial_sum
                | views::sliding(2);

for (auto window : bounds)
    actions::reverse(views::slice(input, *begin(window), *++begin(window)));

std::cout << input;

Another option to obtain the windows is using views::zip:

auto bounds = views::concat(
                views::single(0),
                groups | views::transform([](auto g) { return size(g); })
              ) | views::partial_sum;
    
for (auto [lb, ub] : views::zip(bounds, views::drop(bounds, 1)))
	actions::reverse(views::slice(input, lb, ub));

std::cout << input;

Note that the latter is causing bounds to be “evaluated” twice (one is because of views::drop).

After reading this post, Ruzena Gürkaynak (the author of Fully Functional C++ with Range-v3) made a great suggestion: we might zip with views::exclusive_scan instead of inserting 0 at front:

auto lengths = groups | views::transform([](auto g) { return size(g); });

auto intervals = views::zip( lengths | views::exclusive_scan(0),
                             lengths | views::partial_sum);
    
for (auto [lb, ub] : intervals)
	actions::reverse(views::slice(input, lb, ub));

std::cout << input;

That’s it folks! I won’t go further discussing but I would be very glad to hear your thoughts on this. How can we improve the snippets above? Which problems do you see here? Let’s start a conversation in the comments section below.

This is an educational post and it doesn’t claim to replace the original solution. It’s just a tale about having fun with ranges and patterns. I hope you enjoyed the journey and many thanks for getting here!

What about sharing your own #thatsarotate stories?

Comments
  1. kobica says:

    super interesting post! And totally agree on the Walletfox shoutout ! totally worth it.
    thanks again for such a nice post!

  2. My name says:

    You fell into a common trap with `[](auto c1, auto c2) {
    return isalpha(c1) == isalpha(c2);
    };`. The problem is that `std::isalpha` takes an integer argument that’s either `EOF` or an _unsigned_ version of a character. So instead of `auto` parameters, we need to be specific: `[](unsigned char c1, unsigned char c2)`.

    • Marco Arena says:

      That’s true! Thanks for commenting. Well, letters and whitespaces are the only admitted values for this problem so the cast should be unnecessary.
      Actually, I was discussing with a friend that probably the safest and most effective thing is using the overload provided in locale.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s