As Python continues playing a central role in diverse fields, such as web development, data science, artificial intelligence and machine learning (AI/ML), performance optimizations in the language are critical to meet the growing demand of modern applications and workloads. In order to meet the growing needs of users and customers, and address bottlenecks across the Red Hat Enterprise Linux (RHEL) ecosystem, Red Hat constantly analyzes data and tests new features in order to provide an enhanced experience while maintaining the same stability guarantees that RHEL is known and trusted for.
New performance optimizations in RHEL 9.5 and more
With the release of RHEL 9.5, Python 3.9 (the default system version Python) and the alternative stacks of Python 3.11 and Python 3.12 were compiled with GCC's -O3
flag. This flag enables a higher optimization level during the compilation, activating more aggressive optimizations aimed at improving performance, such as loop splitting and peeling, predictive commoning, and more at the cost of a potential binary size increase.
At the same time we have also applied these performance optimizations across Red Hat Enterprise Linux 8.10 Python stacks as well as CentOS Stream 10, where Red Hat Enterprise Linux 10 will be derived from.
Testing and benchmarking
As Fedora and Red Hat Enterprise Linux utilize by default GCC’s -O2
flag, we tested the optimizations in Fedora, verifying the performance gains across the interpreter and various third party modules with a minimal binary size increase of 1.16%, and then backported the changes to RHEL. This update brings in line our downstream distributions with Python upstream, which has been using the -O3
optimization flag for some time. Aligning with the upstream approach ensures that Fedora, RHEL and their derivatives are taking full advantage of the performance gains that have been available in the wider Python community.
Depending on the workload, this change can result in faster performance for both the Python interpreter and applications built on top of it. Python C extensions built using the updated Python interpreters will inherit the -O3
flag and benefit from the change as well.
In a testing environment, a seemingly basic operation demonstrates the reduced overhead and faster execution times enabled by the new optimizations. A simple loop using the timeit
module, which analyzes multiple iterations and applies a consistent warmup. For example:
python3.12 -m timeit 'for i in range(1000): j=i'
However, the true impact of the changes becomes clearer when running more comprehensive benchmarking tools like pyperformance, the Phoronix test suite and the Pytorch benchmarks, which show consistent improvements across a wide range of tests.
Enhancing AI/ML workloads
Overall, the adoption of the -O3
flag in Red Hat Enterprise Linux is set to deliver faster Python applications throughout the entire RHEL ecosystem. This will not only enhance the performance of general-purpose Python applications but also provide a significant boost for AI/ML workloads. Python is the dominant language for machine learning and artificial intelligence, with popular frameworks like TensorFlow, Pytorch, and InstructLab being heavily reliant on both Python code and Python C extensions. These workloads, which often rely on heavy computations and large datasets, stand to benefit particularly from the optimizations leading to faster training times, quicker inference and more efficient resource usage.
Red Hat’s ongoing commitment to quality, stability, and performance
As the Python Maintenance team at Red Hat, we continuously test and investigate new features, potential performance improvements as well as resolving bugs and security issues across the Python landscape while collaborating with upstreams to ensure that the latest innovations are available across the ecosystems of Fedora, RHEL and CentOS Stream.
We encourage Python developers to begin testing their applications with these new optimizations to experience the benefits firsthand. Stay tuned for future updates as we continue to improve the Python stack and optimize performance for a wide range of workloads.