A look at LLVM Advanced Data Types and trivially copyable types

A look at LLVM Advanced Data Types and trivially copyable types

A few bugs have been lurking in the LLVM Bugzilla for a long time, namely #39427 and #35978, which are related to a custom implementation of the is_trivially_copyable data type, and they have a bad impact on the Application Binary Interface (ABI) of LLVM libraries. In this article, I will take a closer look at these issues and describe potential workarounds.

The LLVM compiler infrastructure relies on several Advanced Data Types (ADT) to provide different speed/size trade-offs than the containers from the Standard Template Library (STL). Additionally, this ADT library provides features from future standard versions, but implemented in the C++ version (currently C++11) that LLVM supports as a code base. Finally, these ADTs must be compatible with the compiler requirements of the LLVM code base; basically, GCC version >= 4.8 and Clang version >= 3.1. (If you are interested in LLVM ADTs, Chandler Carruth did a nice talk on the subject at CppCon 2016.)

Among these data types is the llvm::SmallVector type, an alternative to std::vector that uses in-place storage, if the array contains fewer than N + 1 elements, and heap storage otherwise. Interestingly, llvm::SmallVector has a specialization when T is known to be trivially copyable that allows less and faster data movement when pruning, copying, or moving data from one container to another. The specialization basically looks like this:

template<class T>
class SmallVectorBase {
   ... ;
};
template<class T>
class SmallVector : public SmallVectorBase<T> {
   ... ;
};

Unfortunately, std::is_trivially_copyable is not supported by older versions of GCC, so the LLVM code base used to provide its own version in this (simplified) form:

template<class T>
struct is_trivially_copyable {
    static constexpr bool value =
    #if defined(__GNUC__) && __GNUC__ >= 5
        std::is_trivially_copyable<T>::value
    #else
        !std::is_class<T>::value
    #endif
    ;
}

There’s an inherent problem in that implementation, and it’s not a validity issue. Consider the following compilation units:

// lib.cpp
#include 
struct DataType {
    struct SomeRandomType { int Value;};
    llvm::SmallVector<SomeRandomType> Data;
};
    
DataType Global;
// user.cpp
#include 
struct DataType {
    struct SomeRandomType { int Value;};
    llvm::SmallVector<SomeRandomType> Data;
};
   
extern DataType Global;
DataType Local;

What happens if lib.cpp gets compiled with GCC 4.9 and user.cpp gets compiled with GCC 5.1? A quick look at the symbol table (e.g., through nm -C) shows that lib.o defines the symbol llvm::SmallVectorTemplateBase::SmallVectorTemplateBase(unsigned long), whereas user.o defines the symbol llvm::SmallVectorTemplateBase::SmallVectorTemplateBase(unsigned long). This approach can lead to various errors, from types with the same name but different layout to link errors.

This type of scenario may happen on binary distribution, where the compiler used to compile system libraries and the compiler used to compile user code may differ, and it’s one of the possible instances of ABI error. Avoiding such errors is one of the software packager’s tasks.

Everything you need to grow your career.

With your free Red Hat Developer program membership, unlock our library of cheat sheets and ebooks on next-generation application development.

SIGN UP

Workarounds

As a workaround, llvm::is_trivially_copyable is specialized for various types to enforce the expected property, even if the trait implementation says the opposite. This is tedious to maintain, and error-prone (e.g., as of C++11, a pair of int is not trivially copyable; see https://godbolt.org/z/184QEc).

What is the solution then? Avoid the per-compiler-version implementation of llvm::is_trivially_copyable and provide a generic one. This is not an easy task, as it is generally implemented as a compiler built-in (namely, __is_trivially_copyable for clang). Fortunately, there is a way out, but to understand it, we need to understand what it means to be trivially copyable. A trivially copyable type verifies the following properties:

  • Every copy constructor is trivial or deleted
  • Every move constructor is trivial or deleted
  • Every copy assignment operator is trivial or deleted
  • Every move assignment operator is trivial or deleted
  • Trivial non-deleted destructor

Starting with C++17, there’s also the requirement that at least a copy/move constructor or assignment operator must exist, but the LLVM code base is not affected by this (yet).

Checking whether a constructor or assignment has been deleted can typically be achieved through a “substitution failure is not an error” (SFINAE), as in:

template <class T>
struct is_copy_assignable {
    template <class F>
    static auto get(F*) -> decltype(std::declval() = std::declval(), std::true_type{});
    static std::false_type get(...);
    static constexpr bool value = decltype(get((T*)nullptr))::value;
};

Checking whether the implementation is the default one is slightly trickier. Fortunately, starting with C++11:

If a union contains a non-static data member with a non-trivial special
member function (copy/move constructor, copy/move assignment, or
destructor), that function is deleted by default in the union and needs to
be defined explicitly by the programmer.

This means that instantiating is_copy_assignable for the following type:

template
union trivial_helper {
    T t;
};

tells us whether the associated type is trivially copyable assignable.

Once this tooling is set up, the trivially copyable trait can be summed up as:

static constexpr bool value =
  has_trivial_destructor<T> &&
  (has_deleted_move_assign<T> || has_trivial_move_assign<T>) &&
  (has_deleted_move_constructor<T> || has_trivial_move_constructor<T>) &&
  (has_deleted_copy_assign<T> || has_trivial_copy_assign<T>) &&
  (has_deleted_copy_constructor<T> || has_trivial_copy_constructor<T>);

To verify the consistency of the implementation with respect to the std one, the following guarded static assertion can be added:

#ifdef HAVE_STD_IS_TRIVIALLY_COPYABLE
  static_assert(value == std::is_trivially_copyable<T>::value,
                "inconsistent behavior between llvm:: and std:: implementation of is_trivially_copyable");
#endif

In the end, we get a fix for the ABI instability of the llvm::SmallVector implementation, hurray! As a sad (or happy, depending on your perspective) note, LLVM is getting close to requiring GCC 5.1 or higher, which makes this whole exploration obsolete.

Share