Just be aware of std::size and static C-strings

Posted: April 25, 2018 in Programming Recipes
Tags: , ,

C++17 added support for non-member std::size, std::empty and std::data. They are little gems for generic programming. Such functions have the same purpose of std::begin and the rest of the family: not only can’t you call functions on C-arrays (e.g. arr.begin() or arr.size()), but also free-functions allow for more generic programming because they can be added afterwards on classes you cannot modify.

This post is just a note about using std::size and std::empty on static C-strings (statically sized). Maybe it’s a stupid thing but I found more than one person others than me falling into such “trap”. I think it’s worth sharing.

To make it short, some time ago I was working on a generic function to compare strings under a certain logic that is not important to know. In an ideal world I would have used std::string_view, but I couldn’t mainly for backwards-compatibility. I could, instead, put a couple of template parameters. Imagine this simplified signature:


template<typename T1, typename T2>
bool compare(const T1& str1, const T2& str2);

Internally, I was using std::size, std::empty and std::data to implement my logic. To be fair, such functions were just custom implementations of the standard ones (exhibiting exactly the same behavior) – because at that time C++17 was not available yet on my compiler and we have had such functions for a long time into our company’s C++ library.

compare could work on std::string, std::string_view (if available) and static C-strings (e.g. “hello”). While setting up some unit tests, I found something I was not expecting. Suppose that compare on two equal strings results true, as a normal string comparison:


EXPECT_TRUE(compare(string("hello"), "hello"));

This was not passing at runtime.

Internally, at some point, compare was using std::size. The following is true:


std::size(string("hello")) != std::size("hello");

The reason is trivial: “hello” is just a statically-sized array of 6 characters. 5 + the null terminator. When called in this case, std::size just gives back the real size of such array, which clearly includes the null terminator.

As expected, std::empty follows std::size:


EXPECT_TRUE(std::empty("")); // ko

EXPECT_TRUE(std::empty(string(""))); // ok

EXPECT_TRUE(std::empty(string_view(""))); // ok

Don’t get me wrong, I’m not fueling an argument: the standard is correct. I’m just saying we have to be pragmatic and handle this subtlety. I just care about traps me and my colleagues can fall into. All the people I showed the failing expectations above just got confused. They worried about consistency.

If std::size is the “vocabulary function” to get the length of anything, I think it should be easy and special-case-free. We use std::size because we want to be generic and handling special cases is the first enemy of genericity. I think we all agree that std::size on null-terminated strings (any kind) should behave as strlen.

Anyway, it’s even possible that we don’t want to get back the length of the null-terminated string (e.g. suppose we have an empty string buffer and we want to know how many chars are available), so the most correct and generic implementation of std::size is the standard one.

Back to compare function I had two options:

  1. Work around this special case locally (or just don’t care),
  2. Use something else (possibly on top of std::size and std::empty).

Option 1 is “local”: we only handle that subtley for this particular case (e.g. compare function). Alas, next usage of std::size/empty possibly comes with the same trap.

Option 2 is quite intrusive although it can be implemented succinctly:


namespace mylib
{
   using std::size; // "publish" ordinary std::size
   // on char arrays
   template<size_t N>
   constexpr auto size(const char(&)[N]) noexcept
   {
      return N-1;
   }

   // other overloads...(e.g. wchar_t)
}

You can even overload on const char* by wrapping strlen (or such). This implementation is not constexpr, though. As I said before: we cannot generally assume that the size of an array of N chars is N – 1, even if it’s null-terminated.

mylib::empty is similar.

EXPECT_EQ(5, mylib::size("hello"));  // uses overload
EXPECT_EQ(5, mylib::size(string("hello")); // use std::size
EXPECT_EQ(3, (mylib::size(vector<int>{1,2,3})); // use std::size

Clearly, string_view would solves most of the issues (and it has constexpr support too), but I think you have understood my point.

[Edit] Many people did not get my point. In particular, some have fixated on the example itself instead of getting the sense of the post. They just suggested string_view for solving this particular problem. I said that string_view would help a lot here, however I wrote a few times throughout this post that string_view was not viable.

My point is just be aware of the null-terminator when using generic functions like std::size, std::empty, std::begin etc because the null-terminator is an extra information that such functions don’t know about. That’s it. Just take actions as you need.

Another simple example consists in converting a sequence into a vector of its underlying type. We don’t want to store the null-terminator for char arrays. In this example we don’t even need to use std::size but just std::begin and std::end (thanks to C++17 template class deduction):

template<typename T>
auto to_vector(const T& seq)
{
  return vector(begin(seq), end(seq));
}

Clearly, this exhibits the same issue discussed before, requiring extra logic/specialization for char arrays.

I stop here, my intent was just to let you know about this fact. Use this information as you like.

 

Conclusions

TL;DR: Just know how std::size and std::empty work on static C-strings.

  • static C-strings are null-terminated arrays of characters (size = number of chars + 1),
  • std::size and std::empty on arrays simply give the total number of elements,
  • be aware of the information above when using std::size and std::empty on static C-strings,
  • it’s quite easy to wrap std::size and std::empty for handling strings differently,
  • string_view could be helpful.
Advertisements
Comments
  1. Benjamin Buch says:

    You cannot distinguish between a C-string and an C-array of chars because these are two different semantics with the same syntax.

    As for size() I think it would be a better idea to have a generic length() function to measure string lengths. A different term for the semantic meaning of strings. C-strings are implemented as arrays, but they are semantically used as strings not arrays! Same syntax, different semantic. Note string and string_view both have a length() member function.

    If you want to check if you have an empty string compare its length with 0. This would fix the empty() problem. Note empty is a container term, a string can semantically not be “empty”, it can have no length. For string and string_view are also C++ containers so this is the same, but not for C-strings and C-arrays.

    This is in my opinion the solution for algorithms that process on strings.

    If you want to use a C-array (with the semantic meaning of a string) in an algorithm that expects C++ containers, you must make the semantic clear by wrapping it explicitly in a string_view. A std::string_view (as well as a std::string) is a C++ container and has the semantic meaning of a string, so it will work as expected.

    • Marco Arena says:

      We all agree that string_view is preferred here. My point was just be aware of using std::size with char arrays since, as you said, they could be used with two different semantics.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s