Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Red Hat Enterprise Linux 8.2 brings faster Python 3.8 run speeds

June 25, 2020
Tomas Orsava Victor Stinner Petr Viktorin
Related topics:
C, C#, C++LinuxPython
Related products:
Red Hat Enterprise Linux

    The Python interpreter shipped with Red Hat Enterprise Linux (RHEL) 8 is version 3.6, which was released in 2016. While Red Hat is committed to supporting the Python 3.6 interpreter for the lifetime of Red Hat Enterprise Linux 8, it is becoming a bit old for some use cases.

    For developers who need the new Python features—and who can live with the inevitable compatibility-breaking changes—Red Hat Enterprise Linux 8.2 also includes Python 3.8. Besides providing new features, packaging Python 3.8 with RHEL 8.2 allows us to release performance and packaging improvements more quickly than we could in the rock-solid python3 module.

    This article focuses on one specific performance improvement in the python38 package. As we'll explain, Python 3.8 is built with the GNU Compiler Collection (GCC)'s -fno-semantic-interposition flag. Enabling this flag disables semantic interposition, which can increase run speed by as much as 30%.

    Note: The python38 package joins other Python interpreters shipped in RHEL 8.2, including the python2 and python3 packages (which we described in a previous article, Python in RHEL 8). You can install Python 3.8 alongside the other Python interpreters so that it won't interfere with the existing Python stack.

    Where have I seen this before?

    Writing this article feels like taking credit for others' achievements. So, let us set this straight: The performance improvements we're discussing are others' achievements. As RHEL packagers, our role is similar to that of a gallery curator, rather than a painter: It is not our job to create features, but to seek out the best ones from the upstream Python project and combine them into a pleasing experience for developers after they've gone through review, integration, and testing in Fedora.

    Note that we do have "painter" roles on the team. But just as fresh paint does not belong in an exhibition hall, original contributions go to the broader community first and only appear in RHEL when they're well-tested (that is, somewhat boring and obvious).

    The discussions leading to the change we describe in this article include an initial naïve proposal by Red Hat's Python maintainers, a critique, a better idea by C expert Jan Kratochvil, and refining that idea. All of this back-and-forth happened openly on the Fedora development mailing list, with input from both Red Hatters and the wider community.

    Disabling semantic interposition in Python 3.8

    As we've mentioned, the most significant performance improvement in our RHEL 8.2 python38 package comes from building with GCC's -fno-semantic-interposition flag enabled. It increases run speed by as much as 30%, with little change to the semantics.

    How is that possible? There are a few layers to it, so let us explain.

    Python's C API

    All of Python's functionality is exposed in its extensive C API. A large part of Python's success comes from the C API, which makes it possible to extend and embed Python. Extensions are modules written in a language like C, which can provide functionality to Python programs. A classic example is NumPy, a library written in languages like C and Fortran that manipulates Python objects. Embedding means using Python from within a larger application. Applications like Blender or GIMP embed Python to allow scripting.

    Python (or more correctly, CPython, the reference implementation of the Python language) uses the C API internally: Every attribute access goes through a call to the PyObject_GetAttr function, every addition is a call to PyNumber_Add, and so on.

    Python's dynamic library

    Python can be built in two modes: static, where all code lives in the Python executable, or shared, where the Python executable is linked to its dynamic library called libpython. In Red Hat Enterprise Linux, Python is built in shared mode, because applications that embed Python, like Blender, use the Python C API of libpython.

    The python3.8 command is a minimalist example of embedding: It only calls the Py_BytesMain() function:

    int
    main(int argc, char **argv)
    {
        return Py_BytesMain(argc, argv);
    }
    

    All the code lives in libpython. For example, on RHEL 8.2, the size of /usr/bin/python3.8 is just around 8 KiB, whereas the size of the /usr/lib64/libpython3.8.so.1.0 library is around 3.6 MiB.

    Semantic interposition

    When executing a program, the dynamic loader allows you to override any symbol (such as a function) of the dynamic libraries that will be used in the program. You implement the override by setting the LD_PRELOAD environment variable. This technique is called ELF symbol interposition, and it's enabled by default in GCC.

    Note: In Clang, semantic interposition is disabled by default.

    This feature is commonly used, among other things, to trace memory allocation (by overriding the libc malloc and free functions) or to change a single application's clocks (by overriding the libc time function). Semantic interposition is implemented using a procedure linkage table (PLT). Any function that can be overridden with LD_PRELOAD is looked up in a table before it is called.

    Python calls libpython functions from other libpython functions. To respect semantic interposition, all of these calls must be looked up in the PLT. While this activity does introduce some overhead, the slowdown is negligible compared to the time spent in the called functions.

    Note: Python uses the tracemalloc module to trace memory allocations.

    LTO and function inlining

    In recent years, GCC has enhanced link-time optimization (LTO) to produce even more efficient code. One common optimization is to inline function calls, which means replacing a function call with a copy of the function's code. Once a function call is inlined, the compiler can go even further in terms of optimizations.

    However, it is not possible to inline functions that are looked up in the PLT. If the function can be swapped out entirely using LD_PRELOAD, the compiler cannot apply assumptions and optimizations based on what that function does.

    GCC 5.3 introduced the -fno-semantic-interposition flag, which disables semantic interposition. With this flag, functions in libpython that call other libpython functions don't have to go through the PLT indirection anymore. As a result, they can be inlined and optimized with LTO.

    So, that's what we did. We enabled the -fno-semantic-interposition flag in Python 3.8.

    Drawbacks of -fno-semantic-interposition

    The main drawback of building Python with -fno-semantic-interposition enabled is that we can no longer override libpython functions using LD_PRELOAD. However, the impact is limited to libpython. It is still possible, for example, to override malloc/free from libc to trace memory allocations.

    However, this is still an incompatibility: We do not know if developers are using LD_PRELOAD with Python on RHEL 8 in a way that would break with -fno-semantic-interposition. That is why we only enabled the change in the new Python 3.8, while Python 3.6—the default python3—continues to work as before.

    Performance comparison

    To see the -fno-semantic-interposition optimization in practice, let's take a look at the _Py_CheckFunctionResult() function. This function is used by Python to check whether a C function either returned a result (is not NULL) or raised an exception.

    Here is the simplified C code:

    PyObject*
    PyErr_Occurred(void)
    {
        PyThreadState *tstate = _PyRuntime.gilstate.tstate_current;
        return tstate->curexc_type;
    }
    
    PyObject*
    _Py_CheckFunctionResult(PyObject *callable, PyObject *result,
                            const char *where)
    {
        int err_occurred = (PyErr_Occurred() != NULL);
        ...
    }
    

    Assembly code with semantic interposition enabled

    Let's first take a look at Python 3.6 in Red Hat Enterprise Linux 7, which has not been built with -fno-semantic-interposition. Here is an extract of the assembly code (read by's disassemble command):

    Dump of assembler code for function _Py_CheckFunctionResult:
    (...)
    callq  0x7ffff7913d50 <PyErr_Occurred@plt>
    (...)
    

    As you can see, _Py_CheckFunctionResult() calls PyErr_Occurred(), and the call has to go through a PLT indirection.

    Assembly code with semantic interposition disabled

    Now let's look at an extract of the same assembly code after disabling semantic interposition:

    Dump of assembler code for function _Py_CheckFunctionResult:
    (...)
    mov 0x40f7fe(%rip),%rcx # rcx = &_PyRuntime
    mov 0x558(%rcx),%rsi    # rsi = tstate = _PyRuntime.gilstate.tstate_current
    (...)
    mov 0x58(%rsi),%rdi     # rdi = tstate->curexc_type
    (...)
    

    In this case, GCC inlined the PyErr_Occurred() function call. As a result _Py_CheckFunctionResult() gets the tstate directly from _PyRuntime, and then it directly reads its member tstate->curexc_type. There is no function call and no PLT indirection, which results in faster performance.

    Note: In more complex situations, the GCC compiler is free to optimize the inlined function even more, according to the context in which it is called.

    Try it for yourself!

    In this article, we focused on one specific improvement on the performance side, leaving new features to the upstream documents What's new In Python 3.7 and What's new In Python 3.8. If you are intrigued by the new compiler performance possibilities in Python 3.8, grab the python38 package from the Red Hat Enterprise Linux 8 repository and try it out. We hope you will enjoy the run speed-up, as well as a host of other new features that you will discover for yourself.

    Last updated: February 5, 2024

    Recent Posts

    • Every layer counts: Defense in depth for AI agents with Red Hat AI

    • Fun in the RUN instruction: Why container builds with distroless images can surprise you

    • Trusted software factory: Building trust in the agentic AI era

    • Build a zero trust AI pipeline with OpenShift and RHEL CVMs

    • Red Hat Hardened Images: Top 5 benefits for software developers

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.