Python wheels, AI/ML, and ABI compatibility
Python has become a popular programming language in the AI/ML world. Projects like TensorFlow and PyTorch have Python bindings as the primary interface used by data scientists to write machine learning code. However, distributing AI/ML-related Python packages and ensuring application binary interface (ABI) compatibility between various Python packages and system libraries presents a unique set of challenges.
The manylinux standard (e.g., manylinux2014) for Python wheels provides a practical solution to these challenges, but it also introduces new challenges that the Python community and developers need to consider. Before we delve into these additional challenges, we’ll briefly look at the Python ecosystem for packaging and distribution.
Wheels, AI/ML, and ABIs
Python packages are installed using the
pip command, which downloads the package from pypi.org.
pip install <package-name>
These packages can be of two types:
- Pure Python wheels, which may or may not be targeted to a specific Python version
- Extension wheels, which use native code written in C/C++
All AI/ML Python packages are extension wheels that use native operating system libraries. Compiled Python extension modules built on one distribution may not work on other distributions, or even on different machines running the same distribution with different system libraries installed. This is because the compiled binaries have a record of the ABI they rely on, such as relocations, symbols and versions, size of global data symbols, etc. At runtime, if the ABI does not match, the loader may raise an error. An example of a missing symbol with a version would look like this:
/lib64/libfoo.so.1: version `FOO_1.2' not found (required by ./app)
AI/ML project maintainers need to build different Python packages for Windows, macOS X, and Linux distributions. The precompiled binaries are packaged in a wheel format with the
.whl file extension. A wheel is a zip file that can be interpreted as a Python library.
The file name contains specific tags, which are used by the
pip command to determine the Python version and operating system that match the system on which the AI/ML library is installed. The wheel also contains the layout of a Python project as it should be installed on the system. To avoid the need for users to compile these packages the project maintainers build and upload platform-specific wheels for Windows, macOS, and Linux on pypi.org.
Here are some examples of wheels for Linux and non-Linux distros:
tensorflow-2.0.0-cp27-cp27m-macosx_10_11_x86_64.whl tensorflow-2.0.0-cp35-cp35m-win_amd64.whl tensorflow-2.0.0-cp36-cp36m-manylinux1_x86_64.whl tensorflow-2.0.0-cp37-cp37m-manylinux2010_x86_64.whl
AI/ML project maintainers who want to distribute the Python library with native code for Linux distros have the difficult task of ensuring ABI compatibility. The compiled code needs to run on a wide variety of Linux distributions.
Fortunately, there is a way to make a binary compatible with most (though not all) Linux distributions. To do this, you need to build a binary and use an ABI baseline that is older than any distribution you want to support. The expectation is that the newer distributions will keep the ABI guarantees; that way, you’ll be able to run your binary on newer distributions as long as they provide the ABI baseline. Eventually, the ABI baselines will change in an incompatible way, and that may be a technical requirement for moving the baseline forward. There are other non-technical requirements for moving the ABI baseline forward, and they revolve around distribution lifecycles.
The manylinux platform tag is a way to make your Python libraries that are compatible with most Linux distributions. Python’s manylinux defines an ABI baseline and targets the baseline by building on an old version of a distribution. To achieve maximum compatibility, it uses the longest-supported freely distributable version of Linux: CentOS.
The first manylinux platform tag called manylinux1 uses CentOS 5. The second iteration called manylinux2010 uses CentOS 6. The latest specification manylinux2014 is a result of Red Hat, other vendors, and the Python community moving the manylinux specification ahead to use CentOS 7/UBI 7 and support more architectures.
To make the life of AI/ML Python project maintainers easier, the Python community provides a prebuilt manylinux build container, which can be used to build project wheels, listed here:
centos5 Image - quay.io/pypa/manylinux1_x86_64 centos6 Image - quay.io/pypa/manylinux2010_x86_64 ubi7 Image - quay.io/pypa/manylinux2014_x86_64(coming soon)
For AI/ML Python project users the
pip command is very important. The
pip command will install the appropriate wheel file based on wheel tags and also based on the manylinux platform tag of the wheel which matches the system. For example, a manylinux2014 wheel will not install on Red Hat Enterprise Linux (RHEL) 6 because it doesn’t have the system library versions specified in the manylinux2014 specification. Pip will install manylinux2010 wheels on RHEL 6 and manylinux2014 wheels on RHEL 7.
AI/ML Python project users have to ensure they update
pip command regularly before they update to the next AI/ML Python project version. If users are using containers, then the latest
pip command should be available in the container.
Although the manylinux standard has helped deliver reliable and stable extension wheels, it does introduce two additional challenges:
At some point, the reference platforms for the ABI baselines will have end-of-life. The Python community must actively track the end-of-life support and CVEs for different system libraries used by the project and potentially move project maintainers to the next available manylinux platform tag. Note: The EOL for CentOS 6 is November 30, 2020. The EOL for CentOS 7 is June 30, 2024.
Lastly, project maintainers should ensure that they build wheels for all the manylinux platform tags or at least the wheels of the most recent specifications. This will give users the most options for installation.
- Hardware vendor support
Almost all AI/ML Python projects have some form of hardware accelerator support, such as CUDA (NVIDIA), ROCm (AMD), Intel MKL. The hardware vendors might not support all versions of the toolchain and project maintainers should pick a baseline toolchain (gcc, binutils, glibc) and set their wheels to a certain manylinux platform tags that match. Some projects might need to support a variety of architectures including Intel/AMD (i686, x86_64), Arm (aarch64, armhfp), IBM POWER (ppc64, ppc64le), or IBM Z Series (s390x). Regression tests on different architectures are essential to catch compatibility issues. See the Red Hat Enterprise Linux ABI compatibility guides for RHEL 7 and RHEL 8.
The Python community must follow the lifecycle of the reference software that is used to target the ABI baselines and plan accordingly. Python developers must carefully match system tooling or developer tooling to the hardware vendor software requirements. Solving both of these is a difficult but ultimately rewarding challenge.