In-Depth Analysis of the C++ Object Model: Move Constructors

minimini code 0Messages 55 Views 2 Weeks ago (04-18) [Edit]

Introduction

One of the most important features of the C++11 standard is the introduction of support for object movement. To facilitate this, a new type of reference was introduced — the rvalue reference, which can only bind to an object that is about to be destroyed. After moving an object, it is essential to ensure that the source object is in a destructible state and is not used afterwards, as it could potentially be destroyed at any moment. Additionally, ensure that the source object’s destruction does not adversely affect the moved-into object. The enhancement of move semantics has significantly reduced the cost of moving large objects, such as containers, to be as low as copying a pointer. This has led to various rumors, such as compilers using move operations to replace copy operations for efficiency gains, and even claims that recompiling old code that complies with the C++98 standard with a C++11 standard compiler could improve performance without changing a single line of code. Is this true? Let’s delve into the details and find out. After reading this article, you will have the answers.

Support for Object Movement

The new standard has added move constructors and move assignment operators to facilitate object movement. These are similar, so they are discussed together. Is it reliable to say that if a program does not define a move constructor, the compiler will help generate one? Let’s analyze this with actual code. Since a move constructor requires an rvalue reference as its first parameter, we can use the std::move function from the standard library to generate an rvalue reference. std::move is essentially a type cast that can turn an lvalue into an rvalue reference. Let’s see if the compiler synthesizes a move constructor with the following code:

#include <utility>

class Object {
    int a;
};

int main() {
    Object d;
    Object d1 = std::move(d);
    
    return 0;
}

When compiled into assembly code, it shows:

main:						# @main
    push    rbp
    mov     rbp, rsp
    mov     dword ptr [rbp - 4], 0
    mov     eax, dword ptr [rbp - 8]
    mov     dword ptr [rbp - 16], eax
    xor     eax, eax
    pop     rbp
    ret

In fact, the compiler did not generate a move constructor, or any constructor for that matter. It’s unnecessary in this case, as the compiler can optimize by copying the object member-by-member directly, without needing to generate a function for it. Lines 5 and 6 of the assembly code above are copying the content of object d (stored in stack space [rbp - 8]) to eax register, then from eax to object d1 (stored in stack space [rbp - 16]).

When Will a Move Constructor Be Synthesized?

The compiler will only synthesize a move constructor in the following cases:

The class does not define a copy constructor, copy assignment operator, or destructor; and:
The class definition includes a member of class type, and this class member defines a move constructor; or:
The inherited parent class defines a move constructor; or:
The class defines or inherits more than one virtual function from the parent class; or:
There is a parent class in the inheritance chain that is a virtual base class.

If we add a std::string member to the Object class in the C++ code above, std::string is a class provided by the standard library for manipulating strings, and it defines a move constructor. The definition of the Object class is as follows:

class Object {
    std::string s;
    int a;
};

If we compile this into assembly code, we can see that the assembly code becomes more extensive, not only generating the move constructor for the Object class but also the default constructor and destructor. The assembly code for the main function is as follows:

main:							# @main
    push    rbp
    mov     rbp, rsp
    sub     rsp, 96
    mov     dword ptr [rbp - 4], 0
    lea     rdi, [rbp - 48]
    call    Object::Object() [base object constructor]
    lea     rdi, [rbp - 88]
    lea     rsi,

 [rbp - 48]
    call    Object::Object(Object&&) [base object constructor]
    mov     dword ptr [rbp - 4], 0
    lea     rdi, [rbp - 88]
    call    Object::~Object() [base object destructor]
    lea     rdi, [rbp - 48]
    call    Object::~Object() [base object destructor]
    mov     eax, dword ptr [rbp - 4]
    add     rsp, 96
    pop     rbp
    ret

Line 7 in the assembly code above calls the default constructor for the Object class. Since the string class also defines a default constructor, it needs to be called here, as analyzed in another article. Line 10 is actually calling the Object class’s move constructor. In the Object class’s move constructor, it calls the move constructor of the string class. So, it can be deduced that a move constructor will only be synthesized when it’s necessary to call the move constructor of a class-type member. The third scenario is similar; the fourth and fifth scenarios are because the compiler needs to reset the virtual table pointer, so a move constructor will also be generated. These scenarios are similar to the synthesis mechanism of the copy constructor, as analyzed in the article “Compiler Behavior Behind the Copy Constructor”, which is not detailed here again.

Situations Where the Compiler Suppresses Synthesis of Move Constructor

Although the timing for synthesizing a move constructor is similar to that for a copy constructor, the conditions for synthesizing a move constructor are much more stringent. In the following scenarios, the synthesis of the move constructor will be suppressed, and the compiler will not synthesize one.

If a class defines one of a copy constructor, copy assignment operator, or destructor, the compiler will not synthesize a move constructor.

There is a guiding principle called the Rule of Three, which essentially states: if you define one of a copy constructor, copy assignment operator, or destructor, you must define all of them. The reason is that since you need to implement copy operations, it means that resources need to be managed, such as memory allocation and release. If resources need to be managed in the copy constructor, it means they also need to be managed in the copy assignment operator function, and vice versa, and also in the destructor to release resources. From this, it can be inferred that if you define one of these functions, it means that resources need special handling. Therefore, the move constructor synthesized by the compiler might not be what you want, and might even disrupt the program’s logic, causing potential bugs. Thus, the compiler will not synthesize a move constructor.

According to the above reasoning, if a destructor is defined, then the compiler should not generate a copy constructor and copy assignment operator. However, the C++98 standard left a “bug”: after defining a destructor, the compiler would still synthesize a copy constructor and copy assignment operator when needed. To be compatible with the C++98 standard, the C++11 standard also allows this to be synthesized. However, for move constructors and move assignment operators, the C++11 standard explicitly states that once a destructor is defined, the compiler will no longer synthesize a move constructor and move assignment operator.

If your code does not define the above three functions, and the members of your class are movable, the compiler will also synthesize a move constructor or move assignment operator for your program. If this is exactly what you intended, then it is recommended that you explicitly declare the move constructor or move assignment operator in your code using =default. The reason is that suppose there is a class with a container that stores a large amount of data, and the class does not define a copy constructor and destructor, etc., the compiler also synthesizes a move constructor, making the object’s movement very efficient. But suddenly one day, there is a requirement to record during the object’s construction and destruction, so you add a constructor and destructor to meet the requirement. However, after adding the code and recompiling, you find that the program’s execution efficiency has decreased, possibly by several orders of magnitude. The root cause is that after you define the destructor, the compiler no longer synthesizes the move constructor, but replaces the move operation with a copy operation. Therefore, explicitly declaring them is a good habit, although we do not need to implement the function’s code, so using =default lets the compiler automatically generate it.

If a class’s definition includes a class-type member or inherits from a parent class, and the move constructor or move assignment operator of this class member or parent class is defined as deleted (=delete) or inaccessible (defined as private), then the move constructor or move assignment operator of this class is defined as deleted.

Consider the following example:

#include <utility>
#include <string>

class Base {
public:
    Base() = default;
    Base(Base&&

 rhs) = delete;
    int b;
};

class Object {
public:
    Base b;
    std::string s;
    int a;
};

int main() {
    Object d;
    Object d1 = std::move(d);	// This line does not compile.
    
    return 0;
}

In the example above, the compiler no longer generates a move constructor and copy constructor, so the code on line 20 will not compile because there is no copy constructor or move constructor available for calling.

If a class’s destructor is defined as deleted or inaccessible, then the move constructor of this class is defined as deleted.

Situations Where Move Operations Do Not Enhance Efficiency

In some cases, even if a move constructor or move assignment operator is correctly synthesized or defined by the programmer, the program does not achieve the expected increase in running efficiency, as in the following scenarios:

No Move Operation

Suppose a class has a move constructor (synthesized or user-defined), and at the same time, the class has a class-type member that stores a large amount of data. However, this member’s class definition does not define a move constructor, so it can only be copied, not moved. When performing a move operation on an object, it will actually recursively perform move calls on each member of the object, matching the operation suitable for this member, i.e., if the member is movable, then the move operation is performed, and if it is not movable, then the copy operation is performed. So, in reality, this member’s copy constructor will be called.

Another scenario is with the std::array container, which is a new container type provided by the C++11 standard, functionally equivalent to a built-in array. Unlike other container types that store data in the heap and use a pointer to point to the data, moving the container only requires assigning the pointer and then setting the source pointer to null. However, the data of the array container is stored in the object, so even if the element types stored in the array can provide move operations, each element still needs to be moved one by one, which is a linear time complexity.

Low Efficiency of Movement

The std::string class often adopts an implementation method called small string optimization (SSO). SSO stores small strings (e.g., less than 15 characters) directly in the string object’s buffer, and strings exceeding this length are stored in the heap. SSO optimization is adopted because, in practical application scenarios, most used strings are relatively short, which avoids the frequent memory allocation and release overhead. In the case of SSO, moving a string object is not faster than copying, and in fact, the move operation performs the copying action.

Move Operation Not Called

Even if the move operation provided by a class is much more efficient than the copy operation, it is still possible that the move operation is not called, and the copy operation is used instead, leading to issues of low actual efficiency. For example, the standard library’s vector container provides a push_back interface, which is called to add an element to the container. At this time, it is possible that the container’s capacity is full, requiring the allocation of a larger memory block, then moving the elements from the original memory location over and destroying them. The implementer of the vector container needs to ensure that the state before and after this process remains unchanged. During the movement of elements, if the element type provides a move function, then the vector container will use it, but it requires that this move operation must be noexcept. If the move operation cannot guarantee to be noexcept, the vector container will not use it.

Imagine if, while moving halfway, an exception is thrown at this time, and the move operation stops immediately. At this time, half of the elements are in the new space, and half are in the old space. The vector cannot return to its original state. There is no such problem with the copy operation. If a problem occurs during the copying process, only the elements in the new space and the newly allocated memory need to be released, and the state of the vector remains unchanged.

Therefore, if the move constructor of your type does not include a noexcept declaration, even if the type’s move operation is much more efficient than the corresponding copy operation, the compiler will still be forced to call the copy operation instead of the move operation. Therefore, it is recommended that when you define your version of the move constructor or move assignment operator, ensure that it will not throw exceptions and clearly include a noexcept declaration in the statement.

Title of this article：<In-Depth Analysis of the C++ Object Model: Move Constructors>Author：minimini
Original link：https://www.xxmjw.com/post/28.html
Unless otherwise specified, all content is original. Please indicate when reprinting.

支付宝

微信

QQ 钱包

minimini

22artictes 0comment