C++ in Competitive Programming: string basics

Posted: June 5, 2016 in Competitive Programming
Tags: C++, C++11, C++17, string

I apologize if I didn’t publish any new posts in May but I was busy with the organization of the Italian C++ Conference 2016. If you feel like reading some lines about this event, check out this post.

In this installment of “C++ in Competitive Programming” we’ll meet a fundamental data structure in Computer Science, one that manages a sequence of characters, using some encoding: a string. As usual, let me introduce this topic through a simple challenge:

A palindrome is a word, phrase, number, or other sequence of characters which reads the same backward and forward. Given a string print “YES” if it is palindrome, “NO” otherwise. The string contains only alphanumeric characters.

For example:

abacaba

is palindrome; whereas:

abc

is not.

We need a type representing a sequence of characters and in C++ we have std::string, the main string datatype since 1998 (corresponding to the first version of the ISO C++ standard – known as C++98). Under the hood, imagine std::string as a pointer to a null-terminated (‘\0’-terminated) char array. Here is a list of useful facts about std::string:

std::string generalizes how sequences of characters are manipulated and stored;
roughly speaking, it manages its content in a similar way std::vector does (e.g. growing automatically when required);
apart from reserve, resize, push_back, etc. std::string provides typical string operations like substr, compare, etc;
it’s independant on the type of encoding used: all its members, as well as its iterators, will still operate in terms of bytes;
implementers of std::string generally embeds a small array in the string object itself to manage short strings.

The latter point is referred as the Small String Optimization and it means that short strings (generally 15/22 chars) are allocated directly in the string itself and don’t require a separate allocation (thanks to this guy on reddit who pointed out I was wrong here). Have a look at this table which shows the maximum size that each library implementation stores inside the std::basic_string.

The problem description states that the input is in the form:

string-till-end-of-stream

Thus we are allowed to simply read the string this way:

	string S;
	cin >> S;

view raw Reading a string.cpp hosted with ❤ by GitHub

Usually in CC a string is preceded by its length:

N string-of-len-N

In this case we don’t need to read N at all:

	cin.ignore(numeric_limits<streamsize>::max(), ' ') >> S; // N will be ignored
	// or, by including <climits>
	cin.ignore(INT_MAX, ' ') >> S;

view raw Reading a string skipping its size.cpp hosted with ❤ by GitHub

That will skip every character until the first whitespace and then will read the string that follows the whitespace into S.

Now let’s face with the problem of determining if S is palindrome or not.

It should be easy to understand that S is palindrome if reverse(S) is equal to S. So let’s start coding such a naive solution:

	// reverse S into tmp
	auto tmp = S;
	for (auto i=0u; i<tmp.size()/2; ++i)
	{
	std::swap(tmp[i], tmp[tmp.size()-i-1]);
	}
	cout << (S==tmp ? "YES" : "NO");

view raw The worst way to check if a string is palindrome.cpp hosted with ❤ by GitHub

As you can see, we access characters as we do with an array. As std::vector, std::string makes it possible to use also iterators and member functions to access individual characters. In the last line we applied operator== to verify if “char-by-char S is strictly equal to tmp”. We could also use string::compare() member function:

auto areEqual = S.compare(tmp) == 0;

view raw Checking if a string compares equal to another.cpp hosted with ❤ by GitHub

compare() returns a signed integral indicating the relation between the strings: 0 means they compare equal; a positive value means the string is lexicographically greater than the passed string; a negative value means the string is lexicographically lesser than the passed string. This function supports also comparison with offsets: suppose we want to check if a string is prefix of another (that is, a string starts with a certain prefix). This is the most effective way to do that:

auto isPrefix = S.compare(0, prefix.size(), prefix) == 0;

view raw Checking if 'S' starts with 'prefix'.cpp hosted with ❤ by GitHub

Bear this trick in mind.

compare() supports also overloads with C-strings, preventing implicit conversions to std::string.

Now let’s turn back to reversing the string. Actually we don’t need to write a for loop manually because reversing a range is a common function already provided by the STL. Including <algorithm> we get the algorithms library that defines functions for a variety of purposes (e.g. searching, sorting, counting, manipulating) that operate on ranges of elements. To reverse in-place a sequence we adoperate std::reverse:

reverse(begin(S), end(S)); // in-place

view raw Reversing a string in-place.cpp hosted with ❤ by GitHub

Since we don’t want to modify the original string, we can use std::reverse_copy to copy the elements from the source range to the destination range, in reverse order:

	string tmp(S.size(), 0); // fill constructor
	reverse_copy(begin(S), end(S), begin(tmp));

view raw Copying a string in reverse order.cpp hosted with ❤ by GitHub

Here – as for std::vector – we have two options: creating an empty string, reserving enough space and then pushing letters back, or creating a properly-sized and zero-initialized string and then assigning every single char. Since char is a cheap data type, the latter option is generally faster (basically because push_back does some branching to check if the character to add fits the already initialized sequence). For this reason I prefer filling a string this way. As I pointed out in the previous post, a reader from reddit suggested to use this approach also for std::vector<int> and honestly I agree. Larger types may have a different story. Anyway, as usual I suggest you to profile your code when in doubt. For Competitive Programming challenges this kind of finess makes no difference.

This solution is a bit more efficient than the previous one because it’s only two passes (one pass for reverse_copy and another one for operator==). We have also got rid of the explicit loop. What it really makes this solution bad is the usage of that extra string. If we turn back to the initial for loop, it should be clear that we just need to check if each pair of characters that would be swapped are the same:

S = abacaba
S[0] == S[6]
S[1] == S[5]
S[2] == S[4]
S[3] == S[3]

That is, with i from 0 to N/2 we check that:

S[i] == S[N-i-1]

Applying this idea:

	bool isPalindrome = true;
	for (auto i=0u; isPalindrome && i<str.size()/2; ++i)
	{
	isPalindrome = (str[i] == str[str.size()-i-1])
	}
	cout << (isPalindrome ? "YES" : "NO");

view raw Checking if a string is palindrome (hand-made loop).cpp hosted with ❤ by GitHub

Ok, this solution seems better. Can we do even better? Sure. Let’s meet another standard function.

Some algorithms operate on two ranges at a time: std::equal belongs to this family. A function that checks if two ranges are strictly the same:

std::equal(begin(c1), end(c1), begin(c2), end(c2)); // 4 arguments overload since C++14

view raw Example of std::equal.cpp hosted with ❤ by GitHub

This function returns true if the elements in both ranges match. It works by making a pair-wise comparison, from left to right. By default the comparison operator is operator==. For example:

string c1 = "ABCA", c2="ABDA";
'A' == 'A' // yes, go on
'B' == 'B' // yes, go on
'C' == 'D' // no, stop

The comparison can be customized by passing a custom predicate as last parameter.

Now, consider the problem of checking if a string is palindrome. Our loop compares the first and the last character, the second and the second last character, and so on. If it finds a mismatch, it stops. It’s basically std::equal applied to S in one direction and to reverse(S) in the other. We just need to adapt the second range to go from the end of S to the beginning. That’s a job for reverse iterators:

auto isPalindrome = equal(begin(S), end(S), rbegin(S));

view raw Checking if a string is palindrome (std::equal).cpp hosted with ❤ by GitHub

Reverse iterators goes backward from the end of a range. Incrementing a reverse iterator means “going backward”.

There is only one thing left: this solutions now makes N steps, wheareas only N/2 are really needed. We perform redundant checks. For instance:

abacaba
[0, 6]: a
[1, 5]: b
[2, 4]: a
[3, 3]: c // middle point
[4, 2]: a (already checked [2, 4])
[5, 1]: b (already checked [1, 5])
[6, 0]: a (already checked [0, 6])

Taking this consideration into account we get:

	auto isPalindrome = equal(begin(S), begin(S) + S.size()/2, rbegin(S));
	// or, similarly
	auto isPalindrome = equal(begin(S), std::next(begin(S), S.size()/2), rbegin(S));

view raw Checking if a string is palindrome.cpp hosted with ❤ by GitHub

std::next returns a copy of the input iterator advanced by N positions (this version does not require random access iterators).

We finally have a one-liner, single-pass and constant space solution.

I apologize if it took a while to get here: not only I introduced some string functions, but I also incrementally approached to the problem showing how simple operations can be written in terms of standard algorithms. This process is precious since it helps to get familiar with the standard. Sometimes it does not make sense to struggle to find an algorithm that fits a piece of code, other times it does. The more you use the standard, the more it will be easy for you to identify these scenarios pretty much automatically.

Don’t worry, in future posts I’ll skip trivial steps.

Now let me raise the bar, just a little bit.

Palindrome index

In Competitive Programming many variations of this topic exist. This is an example:

Given a string, determine the index of the character whose removal will make the string a palindrome. Suppose a solution always exists.

For example:

acacba

if we remove the second last character (‘b’), we get a palindrome

This problem – known as palindrome index – can be solved by introducing another useful algorithm that actually works like std::equal but it also returns the first mismatch, if any, instead of a bool. Guess the name of this algorithm…yeah, it’s called std::mismatch.

This problem is quite easy to solve:

locate the first char that breaks the “palindromeness” – call that position m (mismatch)
check if the string was palindrome if the m-th char was removed
if so, the palindrome index is m, otherwise it is N – m – 1 (basically, the char “on the other side”)

Since this solution can be implemented in many ways I take advantage of this opportunity to introduce other string operations. First of all, here is how std::mismatch works:

	string S1 = "ABCD", S2 = "ABXD";
	auto mism = std::mismatch(begin(S1), end(S1), begin(S2));

view raw Example of std::mismatch.cpp hosted with ❤ by GitHub

You see that ‘C’ and ‘X’ results in a mismatch. mism is a std::pair: mism.first is S1.begin() + 2 and mism.second is S2.begin() + 2. Basically, they point to ‘C’ in the first string and to ‘X’ in the second. Suppose now we need to find this “palindrome index”. Consider as an example:

	string S = "acacba";
	auto mism = mismatch(begin(S), begin(S) + S.size()/2, rbegin(S));

view raw Using std::mismatch to find the palindrome index.cpp hosted with ❤ by GitHub

mism.first points to ‘c’ and mism.second points to ‘b’. Since we know a solution always exists, either of these chars makes S not palindrome. To determine which one, we need to check if S without the mismatch point mism was palindrome or not. For this check, we create a new string from the concatenation of two substrings of S:

From the beginning to mism-1, and
From mism+1 to the end

Although I don’t like this solution so much, I have browsed others (not only written in C++) on HackerRank and this resulted the most popular. So let me show you my translation into C++ code:

	const auto diffIt = mismatch(begin(S), begin(S) + S.size()/2, rbegin(S)).first; // first iterator
	const auto diffIdx = distance(begin(S), diffIt); // 0-based index of the mismatch
	const auto toCheck = S.substr(0, diffIdx) + S.substr(diffIdx + 1);
	if (equal(begin(toCheck), begin(toCheck) + toCheck.size()/2, rbegin(toCheck))) // is palindrome?
	cout << diffIdx << endl;
	else
	cout << S.size() - diffIdx - 1;

view raw Finding the palindrome index ('substr' version).cpp hosted with ❤ by GitHub

Let me introduce you substr(): S.substr(pos, count) returns the substring of S that starts at character position pos and spans count chars (or until the end of the string, whichever comes first) – S[pos, pos + count). If pos + count extends past the end of the string, or if count is std::string::npos, the returned substring is [pos, size()). For example:

	string S = "hello";
	auto substr = S.substr(1, 3); // ell

view raw Example of substr.cpp hosted with ❤ by GitHub

substr results in “ell”.

It’s now evident that toCheck consists in the concatenation of S from 0 to diffIdx-1 and from diffIdx + 1 to the end:

acacba -> diffIdx = 1 
toCheck = a + acba = aacba

Only for completeness, another (possibly more efficient) way to obtain toCheck consists in adoperating std::copy:

	string toCheck(S.size()-1, 0); // all but one
	auto newBegin = copy(begin(S), diffIt, begin(toCheck));
	copy(next(diffIt), end(S), newBegin);

	// or, by inlining the second call to copy:

	string toCheck(S.size()-1); // all but one
	copy(next(diffIt), end(S), copy(begin(S), diffIt, begin(toCheck)));

view raw Creating the substring by using copy.cpp hosted with ❤ by GitHub

This solutions works and passes all tests, but I find it annoying to use extra space here.

Suppose we are free to modify our original string: it’s easier (and possibly faster) to remove the candidate iterator by using string::erase:

S.erase(diffIt);

view raw Using string::erase.cpp hosted with ❤ by GitHub

This avoids both creating extra (sub)strings and allocating extra memory (note that the internal sequence of characters can be relocated into the same block). The final part of the algorithm is similar:

	if (equal(begin(S), begin(S) + S.size()/2, rbegin(S))) // is palindrome?
	cout << diffIdx << endl;
	else
	cout << S.size() - diffIdx - 1 << endl;

view raw Final part of the 'erase' solution.cpp hosted with ❤ by GitHub

The final cost of this solution is linear.

Now, what if we cannot change the string?

A solution consists in checking two substrings separately. Basically we just need to exclude the mismatch point and then check if the resulting string is palindrome, then we check:

From the beginning of S to the mismatch point (excluded) with the corresponding chars on the other side;
From one-past the mismatch point to the half of S with the corresponding chars on the other side.

Actually, the first check is already performed when we call mismatch, so we don’t need to repeat it.

To code the second check, just remember the second string goes from diffIt + 1 to the half of the string. So, we just need to correctly advance the iterators:

equal(next(diffIt), begin(S) + S.size()/2, rbegin(S) + diffIdx);

view raw Checking the second slice of the string.cpp hosted with ❤ by GitHub

Let’s see this snippet in detail: next(diffIt) is just diffIt + 1. begin(S) + S.size()/2 is just the half of S. The third iterator, rbegin(S) + diffIdx, is the starting point of the string on the other side. Here is the complete solution:

	const auto mid = next(begin(S), S.size()/2); // we'll reuse it
	const auto diffIt = mismatch(begin(S), mid, rbegin(S)).first;
	const auto diffIdx = distance(begin(S), diffIt); // diffIt as 0-based position

	if (equal(next(diffIt), mid, next(rbegin(S), diffIdx)))
	cout << diffIdx << endl;
	else
	cout << S.size() - diffIdx - 1 << endl;

view raw Finding the palindrome index.cpp hosted with ❤ by GitHub

If you followed my reasoning about positions then it’s just the matter of applying standard algorithms with some care for iterators.

You may complain this code seems tricky, so let me rewrite it in terms of explicit loops:

	auto mpoint = 0u;
	for (j = S.size()-1; mpoint < S.size() / 2; ++mpoint, --j)
	{
	if (S[mpoint] != S[j])
	break;
	}
	auto ii = mpoint+1;
	for (auto j = S.size() - 1; ii < S.size() / 2; ++ii)
	{
	if (S[ii] != S[j--])
	break;
	}

	if (ii == S.size() / 2)
	cout << mpoint << endl;
	else
	cout << S.size() - mpoint - 1 << endl;

view raw Finding the palindrome index ('hand-made loops' version).cpp hosted with ❤ by GitHub

In the STL-based solution we clearly need to think in terms of iterators. The mismatch part is trivial (actually it could be replace with a call to std::mismatch, as in the STL-based solution), but the calls to std::equal are a little bit more difficult to understand. At the same time, it should be evident that std::equal checks that two ranges are identical. Nothing more. Also, if we replace std::string with another data structure that provides iterators, our code won’t change. Our algorithm is decoupled from the structure of the data.

On the other hand, in the for-based approach the logic is completely hidden inside the iterations and the final check. Moreover, this code depends on the random-access nature of the string.

Judge yourself.

Conversions

This short section is dedicated to conversions between strings and numeric types. I will start by saying that, in terms of speed, the following functions can be beated given certain assumptions or in particular scenarios. For example, you maybe remember Alexandrescu’s talk about optimization (and here is a descriptive post) where he shows some improvements on string/int conversions. In CC the functions I’ll introduce are generally ok. It can happen that in uncommon challenges it’s required you to take some shortcuts to optimize a conversion, mainly because the domain has some particularities. I’ll talk about domain and constraints in the future.

The STL provides several functions for performing conversions between strings and numeric types. Conversions from numbers to string can be easily obtained since C++11 through a set of new overloads:

	std::to_string(numeric_type)

	// For example:
	auto piStr = to_string(3.14); // double
	auto houndredStr = to_string(100); // int

view raw Using std::to_string.cpp hosted with ❤ by GitHub

A disadvantage of this approach is that we pay a new instance of std::string every time we invoke to_string. Sometimes – especially when many conversions are needed – this approach is cheaper:

	array<char, char_needed> res{}; // initialized to '\0'
	sprintf(res.data(), "%d", intNumber);

view raw Converting an integer into a char array.cpp hosted with ❤ by GitHub

Or use vector<char> for allocating the string dynamically.

char_needed is the maximum number of chars needed to represent an int32. This value is basically:

constexpr auto char_needed = std::numeric_limits<int>::digits10 + 2;

view raw Maximum number of chars needed to represent an int32.cpp hosted with ❤ by GitHub

From C++17 we’ll have string_span to easily wrap this array into a string-like object:

	string_span<> ss(res.data(), char_needed);
	sprintf(ss.data(), "%d", intNumber);

view raw Wrapping a char array into a string_span (C++17).cpp hosted with ❤ by GitHub

Moreover, from C++17 we’ll have string::data() as non-const member, so we’ll be able to write directly into a std::string:

	string res(char_needed, 0);
	sprintf(res.data(), "%d", intNumber);

view raw Writing into a std::string by exploiting C++17 non-const string::data().cpp hosted with ❤ by GitHub

In CC sprintf is good enough, even if sprintf_s (or another secure version) is preferred.

Anyhow, prefer using std::to_string if the challenge allows that.

Conversions in the other direction are a bit more confusing because we have both C++11 functions and C functions. Let me start just by saying that rolling a simple algorithm to convert a string into an unsigned integer is easy, pretty much elegant and interesting to know about:

	const string number = "1234";
	auto toUInt = accumulate(begin(number), end(number), 0u, [](unsigned curr, char nxt){
	return curr * 10u + nxt - '0';
	});

view raw Hand-made string to unsigned int.cpp hosted with ❤ by GitHub

To convert to an int32 we just need to handle the minus sign:

	auto mult = number[0] == '-' ? -1 : 1;
	auto realBegin = mult > 0 ? begin(number) : next(begin(number));
	auto toInt = mult * accumulate(realBegin, end(number), 0, [](int curr, char nxt){
	return curr * 10 + nxt - '0';
	});

view raw Hand-made string to int.cpp hosted with ❤ by GitHub

nxt – ‘0’ is an idiom: if digit is a char in [0-9], digit – ‘0’ results in the corresponding integral value. E.g.:

'1' - '0' = 1 (int)

The inverse operation is simply char(cDigit + ‘0’). E.g.:

char(1 + '0') = '1' (char)

In C++ (as in C) adding an int to a char will result in an int value: for this reason a cast back to char is needed.

With these snippets we are just moving through the ASCII table. ‘1’ – ‘0’ represents how far ‘1’ is from ‘0’, that is 1 position. 1 + ‘0’ is one position after ‘0’, that is ‘1’. With this idea in mind we can easily perform trivial lowercase to uppercase conversions:

// only works if c is lowercase
char(c - 'a' + 'A')

And viceversa:

// only works if c is uppercase
char(c - 'A' + 'a')

Anyhow, as one guy commented on reddit, the ASCII table is designed in such a way just flipping one bit is needed to get the same results:

// assuming c is a letter
char toLower(char c) { return c | 0x20; }
char toUpper(char c) { return c & ~0x20; }

But remember that the C++ standard (from C, in <cctype>) already provides functions to convert characters to upper/lower case, to check if one character is upper/lower case, digit, alpha, etc. See here for more details.

In CC, these tricks should be kept on hand. For example, this challenge requires to implement a simple version of the Caesar cipher:

Given a string S of digits [0-9], change S by shifting each digit by K positions to the right.

For example 1289 with K = 3 results in 4512.

We can easily solve this task by applying the tricks we have just learned:

	string s; // to encrypt
	for (auto& c : s)
	{
	c = char((c-'0' + K)%10 + '0');
	}

view raw Caesar cipher on digits.cpp hosted with ❤ by GitHub

Note I used a range-based for loop even if a standard algorithm could help solve this problem. I don’t introduce it yet though.

Now, let’s see some standard functions to convert strings to numbers. Since C++11 we have ‘sto’ functions (and for unsigned values and floating points) which convert a std::string/std::wstring into numeric values (they support also different basis). Being STL functions, they throw exceptions if something goes wrong: an std::invalid_argument is thrown if no conversion could be performed, std::out_of_range is thrown if the converted value would fall out of the range of the result type or if the underlying function (std::strtol or std::strtoll) sets errno to ERANGE. For example:

	auto toInt = stoi("10");
	auto toDbl = stod("10.123");

view raw Example of stoi and stod.cpp hosted with ❤ by GitHub

This family of functions optionally outputs the number of processed characters:

	size_t howMany = 0u;
	auto num = stoi("10abc", &howMany); // 10, howMany = 2

view raw Getting the number of processed chars by stoi.cpp hosted with ❤ by GitHub

On the other hand, C functions don’t throw exceptions, instead they return zeros in case of errors. For instance:

auto i = atoi("abc"); // 0

view raw Failing atoi.cpp hosted with ❤ by GitHub

That’s enough for now.

Recap for the pragmatic C++ competitive coder:

Don’t reinvent containers whenever standard ones fit your needs:
- Consider std::string to handle a sequence of characters
  - std::string::compare indicates the relation between strings
  - std::string::substr creates substrings
Prefer standard algorithms to hand-made for loops:
- std::copy, to copy the elements in a range to another
- std::reverse, to reverse the order of the elements in a range
- std::reverse_copy, to copy the elements in a range to another, but in reverse order
- std::equal, to know if two ranges are the same
- std::mismatch, to locate the first mismatch point of two ranges, if any
Understand options for converting strings to numbers, and viceversa:
- std::to_string, to convert numeric values into strings (a new instance of std::string is returned)
- std::array (std::string in C++17) + C’s sprintf (or equivalent – e.g. secure _s version) when reusing the same space is important
- std::sto* functions to translate strings into numeric values (remember they throw exceptions)
- C’s atoi & friends when throwing exceptions is not permitted/feasible
- Rememeber tricks to convert char digits into ints and viceversa:
  - digitAsChar – ‘0’ = corresponding int
  - char ( digitAsInt + ‘0’ ) = corresponding char

Comments

The Week in C++ 2016.23: June 4th-10th | Utah C++ Users Group says:

June 22, 2016 at 4:38 pm

[…] C++ in Competitive Programming: string basics (HT: ISO C++ Blog) […]

Reply
C++ in Competitive Programming: associative containers | Growing up says:

July 20, 2016 at 7:26 am

[…] C++ in Competitive Programming: string basics […]

Reply
bertu says:

May 10, 2018 at 12:58 pm

Is
https://gist.githubusercontent.com/ilpropheta/eaf5ace120b255d3966f2370a6287056/raw/074bae15371fc71e3b90d41bdd7321b17881c5fc/Reversing%20a%20string%20in-place.cpp
correct?
reverse(begin(S), end(S), begin(S));

The signature of std::reverse looks to have only first and last iterators

Reply
- Marco Arena says:
  
  May 12, 2018 at 9:29 am
  
  It is! Thanks a lot for your correction, I’ve just updated the gist.
  
  Reply

Growing up

LATEST POSTS

CATEGORIES

Ingredients

Search