Learn how to capture by move

Posted: November 1, 2012 in Programming Recipes
Tags: , , , ,

I think one of the most attractive feature of C++11 is about lambdas. They simplify and encourage the usage of STL algorithms more than before, and they (may) increase programmers productivity. Lambdas combine the benefits of function pointers and function objects: like function objects, lambdas are flexible and can maintain state, but unlike function objects, their compact syntax don’t require a class definition.

The syntax is simple:

auto aLambda = [ ] ( const string& name ) { cout << "hello " << name << endl; }
aLambda("Marco"); // prints "hello Marco"

The code above defines a lambda with no return value and receiving a const string& parameter. What about the “[ ]“? That identifier is the capture specification and tells the compiler we’re creating a lambda expression. As you know, inside the square brakets you can “capture” variables from the outside (the scope where the lambda is created). C++ provides two ways of capturing: by copy and by reference. For example:

string name = "Marco";
auto anotherLambda = [name] { cout << "hello " << name << endl; }
anotherLambda(); // prints "hello Marco"

This way the string is copied into the state of anotherLambda, so it stays alive until anotherLambda goes out of scope (note: you can omit the brackets if the lambda has no parameters). Differently:

string name = "Marco";
auto anotherLambda = [&name] () { cout << "hello " << name << endl; }
anotherLambda(); // prints "hello Marco"

The only difference is the way we capture the string name: this time we do by reference, so no copy is involved and the behavior is like passing a variable by reference. Obviously, if name is destroyed before the lambda is executed (or just before name is used): boom!

After this introduction, in this post  I’m going to discuss about an issue on capturing I encountered few days ago at work: what if I want to capture by moving an object instead of both copying and referencing? Consider this plausible scenario:

function<void()> CreateLambda()
{
   vector<HugeObject> hugeObj;
   // ...preparation of hugeObj...

   auto toReturn = [hugeObj] { ...operate on hugeObj... };
   return toReturn;
}

This fragment of code prepares a vector of HugeObject (e.g. expensive to copy) and returns a lambda which uses this vector (the vector is captured by copy because it goes out of scope when the lambda is returned). Can we do better?

Yes, of course we can!” – I heard. “We can use a shared_ptr to reference-count the vector and avoid copying it“:

function<void()> CreateLambda()
{
   shared_ptr<vector<HugeObject>> hugeObj(new vector<HugeObject>());
   // ...preparation of hugeObj...

   auto toReturn = [hugeObj] { ...operate on hugeObj via shared_ptr... };
   return toReturn;
}

I honestly don’t like the use of shared_ptr here but this should work well. The subtle (possible) aspect of this attempt is about style and clarity: why is the ownership shared? Why can’t I treat hugeObj as a temporary to move “inside” the lambda? I think that using a sharing mechanism here is like a hack to fill a gap of the language. I don’t want the lambda to share hugeObj with the outside, I’d like to “prevent” this:

function<void()> CreateLambda()
{
   shared_ptr<vector<HugeObject>> hugeObj(new vector<HugeObject>());
   // ...preparation of hugeObj...

   auto toReturn = [hugeObj] { ...operate on hugeObj via shared_ptr... };
   (*hugeObj)[0] = HugeObject(...); // can alter lambda's behavior
   return toReturn;
}

I need a sort of “capture-by-move”, so:

  1. I create the vector
  2. I “inject” it in the lambda (treating the vector like a temporary)
  3. (outside) the vector will be in a “valid but unspecified state” (what standard says about moved objects)
  4. nothing from the outside can alter the lambda’s vector

Since we are on the subject, getting rid of shared_ptr syntax (I repeat: here) should be nice!

To emulate a move capturing we can employ a wrapper that:

  • receives an rvalue reference to our to-move object
  • maintains this object
  • when copied, performs a move operation of the internal object

Here is a possible implementation:

#ifndef _MOVE_ON_COPY_
#define _MOVE_ON_COPY_

template<typename T>
struct move_on_copy
{
   move_on_copy(T&& aValue) : value(move(aValue)) {}
   move_on_copy(const move_on_copy& other) : value(move(other.value)) {}

   mutable T value;

private:

   move_on_copy& operator=(move_on_copy&& aValue) = delete; // not needed here
   move_on_copy& operator=(const move_on_copy& aValue) = delete; // not needed here

};

template<typename T>
move_on_copy<T> make_move_on_copy(T&& aValue)
{
   return move_on_copy<T>(move(aValue));
}
#endif // _MOVE_ON_COPY_

In this first version we use the wrapper this way:

vector<HugeObject> hugeObj;
// ...
auto moved = make_move_on_copy(move(hugeObj));
auto toExec = [moved] { ...operate on moved.value... };
// hugeObj here is in a "valid but unspecified state"

The move_on_copy wrapper works but it is not completed yet. To refine it, a couple of comments are needed. The first is about “usability“: the only aim of this class is to “replace” the capture-by-copy with the capture-by-move, nothing else. Now, the capture by move makes sense only when we operate on rvalues and movable objects, so is the following code conceptually correct?

// due to universal referencing, T is const T&, so no copy/move will be involved in move_on_copy's ctor
const vector<HugeObject> hugeObj;
auto moved = make_move_on_copy(hugeObj);
auto toExec = [moved] { ...operate on moved.value... };
// hugeObj here is the same as before

Not only is it useless, but also confusing. So, let’s impose our users to pass only rvalues:

template<typename T>
auto make_move_on_copy(T&& aValue)
     -> typename enable_if<is_rvalue_reference<decltype(aValue)>::value, move_on_copy<T>>::type
{
   return move_on_copy<T>(move(aValue));
}

We “enable” this function only if aValue is an rvalue reference, to do this we make use of a couple of type traits. Strangely this code does not compile on Visual Studio 2010, so, if you use it, try to settle for:

template<typename T>
move_on_copy<T> make_move_on_copy(T&& aValue)
{
   static_assert(is_rvalue_reference<decltype(aValue)>::value, "parameter should be an rvalue");
   return move_on_copy<T>(move(aValue));
}

You can also enforce the requirement about move-constructability by using other traits such as is_move_constructible, here I have not implemented it.

The second note is about compliance. Is the following code syntactically-clear?

vector<HugeObject> hugeObj;
auto moved = make_move_on_copy(move(hugeObj));
auto toExec = [moved]
  {
     moved.value[0] = HugeObject(...); // is it conform to standard lambda syntax?
  };

What aroused my suspicions was the syntax of lambda expressions: if you copy-capture an object, the only way to access its non-const members (aka: make changes) is to declare the lambda mutable. This is because a function object should produce the same result every time it is called. If we want to support this requirement then we have to make a little change:

template<typename T>
struct move_on_copy
{
   move_on_copy(T&& aValue) : value(move(aValue)) {}
   move_on_copy(const move_on_copy& other) : value(move(other.value)) {}

   T& Value()
   {
      return value;
   }

   const T& Value() const
   {
      return value;
   }

private:
   mutable T value;
   move_on_copy& operator=(move_on_copy&& aValue) = delete; // not needed here
   move_on_copy& operator=(const move_on_copy& aValue) = delete; // not needed here
};

And:

vector<HugeObject> hugeObj;
auto moved = make_move_on_copy(move(hugeObj));
// auto toExec = [moved]  { moved.Value()[0] = HugeObject(...); }; // ERROR
auto toExec = [moved] () mutable { moved.Value()[0] = HugeObject(...); }; // OK
auto toExec = [moved]  { cout << moved.Value()[0] << endl; }; // OK

If you want to play a bit, I posted a trivial example on ideone.

Personally I’m doubtful if this is the best way to capture expensive-to-copy objects. What I mean is that working with rvalues masked by lvalues can be a little bit harder to understand and then maintaining the code can be painful. If the language supported a syntax like:

HugeObject obj;
auto lambda = [move(obj)] { ... };
// obj was moved, it is clear without need to look at its type

It would be simpler to understand that obj will be in an unspecified state after the lambda creation statement. Conversely, the move_on_copy wrapper requires the programmer looks at obj’s type (or name) to realize it was moved and some magic happened:

HugeObject obj;
auto moved_obj = make_move_on_copy(move(obj)); // this name helps but it is not enough
auto lambda = [moved_obj] { ... };
// obj was moved, but you have to read at least two lines to realize it

[Edit]

As Dan Haffey pointed out (thanks for this) “the move_on_copy wrapper introduces another problem: The resulting lambda objects have the same weakness as auto_ptr. They’re still copyable, even though the copy has move semantics. So you’re in for trouble if you subsequently pass the lambda by value”. So, as I said just before, you have to be aware some magic happens under the hood. In my specific case, the move_on_copy_wrapper works well because I don’t copy the resulting lambda object.

Another important issue is: what semantics do we expect when a function that performed a capture-by-move gets copied? If you used the capture-by-move then it’s presumable you didn’t want to pay a copy, then why copying functions? The copy should be forbidden by design, so you can employ the approach suggested by jrb, but I think the best solution would be having the support of the language. Maybe in a next standard?

Since other approaches have been proposed, I’d like to share with you my final note about this topic. I propose a sort of recipe/idiom (I think) easy to use in existent codebases. My idea is to use my move_on_copy only with a new function wrapper that I called mfunction (movable function). Differently from other posts I read, I suggest to avoid rewriting a totally new type (that may break your codebase) but instead inherit from std::function. From a OO standpoint this is not perfectly consistent because I’m going to violate (a bit) the is-a principleIn fact, my new type will be non-copyable (differently from std::function).

Anyhow, my implementation is quite simple:

template<typename T>
struct mfunction;

template<class ReturnType, typename... ParamType>
struct mfunction<ReturnType(ParamType...)> : public std::function<ReturnType(ParamType...)>
{
  typedef std::function<ReturnType(ParamType...)> FnType;

  mfunction() : FnType()
  {}

  template<typename T>
  explicit mfunction(T&& fn) : FnType(std::forward<T>(fn))
  {}

  mfunction(mfunction&& other) : FnType(move(static_cast<FnType&&>(other)))
  {}

  mfunction& operator=(mfunction&& other)
  {
    FnType::operator=(move(static_cast<FnType&&>(other)));
    return *this;
  }

  mfunction(const mfunction&) = delete;
  mfunction& operator=(const mfunction&) = delete;
};

In my opinion, inheriting from std::function avoids reinventing the wheel and allows you to use mfunction where a std::function& is needed. My code is on ideone, as usual.

[/Edit]

Make the choice you like the most don’t forgetting readability and comprehensibility. Sometimes a shared_ptr suffices, even if it’s maybe untimely.

Lambdas are cool and it’s easy to start working with. They are very useful in a plenty of contexts, from  STL algorithms to concurrency stuff. When capturing by reference is not possible and capturing by copy is not feasible, you can then consider this “capture by move” idea and remember the language always offers the chance to bypass its shortcomings.

About these ads
Comments
  1. sutton bcn says:

    Hello there, You’ve performed an excellent job. I will definitely digg it and personally recommend to my friends. I’m sure they will be benefited from this web site.

  2. Ghita says:

    I liked it also. A lot.

  3. Dan Haffey says:

    The move_on_copy wrapper introduces another problem: The resulting lambda objects have the same weakness as auto_ptr. They’re still copyable, even though the copy has move semantics. So you’re in for trouble if you subsequently pass the lambda by value: http://ideone.com/YG37md

    • Marco Arena says:

      Good point Dan. Maybe this is another case in which a shared_ptr wins. Being a sort of hack, this approach is not always suitable. As usual, software is all about tradeoffs. Thanks for your comment.

  4. I’ve not tested this, but I think you could express your rvalues-only restriction on make_move_on_copy more clearly/directly with std::remove reference instead of enable_if. With some luck, it might even work on VS2010!

    i.e.
    template
    move_on_copy<typename std::remove_reference::type>
    make_move_on_copy(typename std::remove_reference::type&& aValue) {…}

    • Attempting to fix the code w/html entities:

      template <typename T>
      move_on_copy<typename std::remove_reference<T>::type>
      make_move_on_copy(typename std::remove_reference<T>::type&& aValue) {…}

      • Marco Arena says:

        Hi Jeff, thanks for your suggestion. I just think you have to add an extra level of indirection to maintain the same interface and let the compiler deduce the template argument T. To be short, this won’t compile:


        vector<int> aVector = {1,2,3};
        auto vect_wrapper = make_move_on_copy(move(aVector)); // could not deduce template argument for T
        auto vecto_wrapper = make_move_on_copy<vector<int>>(move(aVector)); // now ok

        Something like that should work:


        template<typename T>
        move_on_copy_wrapper<typename std::remove_reference<T>::type> make_move_on_copy_internal(typename std::remove_reference<T>::type&& aValue)
        {
        return move_on_copy_wrapper<typename std::remove_reference<T>::type>(std::move(aValue));
        }

        template<typename T>
        move_on_copy_wrapper make_move_on_copy(T&& aValue)
        {
        return make_move_on_copy_wrapper_internal<T>(std::forward(aValue));
        }

        Does it make sense?

      • Re: comment @ November 30, 2012 at 11:48 pm (I don’t seem to be able to actually reply to said post)

        Marco – thanks for the reply, and doh, it didn’t occur to me that the change would prevent type deduction.

        Fixing the type by adding the extra level makes perfect sense, although the extra code takes away any readability advantage it had over enable_if.
        Still, it still ought to produce better error messages. In gcc47, I get “could not convert Foo to Foo&&” instead of “move_on_copy was not declared in this scope”

        I made a quick example to test it out this time, and added a refinement of yours that moves the remove_reference stuff around to simplify it a bit. It’s at https://gist.github.com/4183939; I’ve put the full error messages in there as comments for comparison

      • Marco Arena says:

        Thanks Jeff! I think you could write something about this argument, it is not obvious how to require an rvalue reference parameter in a templated function.My approach works but yours is clearer and simpler to read!

  5. jrb says:

    Here is another alternative with a little different syntax that accomplishes the same thing but with more safety and carity at the expense of a litte more verbosity.

    http://jrb-programming.blogspot.com/2012/11/another-alternative-to-lambda-move.html

    • Marco Arena says:

      Your solution is interesting but I think it’s harder to mantain and extend. What about employing it in an existent codebase, full of std::function and lambda expressions? Probably, as I said, a shared_ptr wins (for both behavior and clarity – the latter, as you know, is pivotal when you work on a big and complex codebase). Another important point is: what semantics do we expect when a function that performed a “capture-by-move” gets copied? If you used the “capture-by-move” then it’s presumable you didn’t want to pay a copy, then why copying functions? You’re right, the copy should not be possibile by design. So you have at least three solutions:

      • write another wrapper, to make non-copyable functions (as you did) – paying compatibility with existent code (you introduced a new type)
      • use a different approach (e.g. a shared_ptr) – paying nothing but sharing ownership, when it is not necessary
      • use the move_on_copy_wrapper judiciously – paying compatibility with semantics of std::function (we have to remember not to copy a function that performed a capture-by-move – or, at least, we should know what happens under the hood)

      The best solution could be having the support of the language.
      Thanks for your code and for your comment!

  6. cialis says:

    Hi, I do believe this is an excellent web site.

    I stumbledupon it ;) I am going to come back yet again since i have
    book marked it. Money and freedom is the greatest way to change, may you be rich and continue to help other people.

  7. A Cowherd says:

    Your `mfunction` is probably a bad idea. It inherits from `std::function`… but `std::function`s are often passed around by value (rather than by reference). Passing an `mfunction` to a callee that expects a `std::function` will cause slicing (i.e., constructing a new `std::function` using `std::function`’s copy constructor), which is probably not what the programmer intended. And unfortunately I don’t think there’s any way for `mfunction` to override or disable the “mfunction-to-std::function” conversion.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s