In Red Hat Enterprise Linux (RHEL) 9, we upgraded the instruction set architecture (ISA) baseline to the x86-64-v2 microarchitecture level. For RHEL 10, we are exploring whether we can go a step further, to the x86-64-v3 level.
New CPU capabilities in x86-64-v3
The x86-64-v3 x86-64 microarchitecture level primarily benefits numerical applications (for data science, for example) which do not include specialized implementations for modern CPU microarchitectures. Overall, the x86-64-v3 change brings the following improvements:
- The AVX and AVX2 instruction sets increase the width of the vector registers from 128 bits to 256 bits and add many new vector operations.
- The fused multiply-add (FMA) instruction is supported, combining multiplication and addition into a single operation that computes the intermediate result with infinite precision.
- The VEX encoding adds variants for many existing instructions which include a second or third (destination) operand. Input operands are no longer updated destructively with these instruction variants. This reduces the need for explicit move instructions, which in turn increases code density and decreases instruction cache pressure.
- Many bit manipulation operations have been added for scalar registers, including parallel bit deposit and bit extract.
The vector instruction enhancements look particularly promising because starting with version 12, GCC supports auto-vectorization at
-O2 (the default optimization level for RHEL).
The x86-64-v3 level has been implemented first in Intel’s Haswell CPU generation (2013). AMD implemented x86-64-v3 support with the Excavator microarchitecture (2015). Intel’s Atom product line added x86-64-v3 support with the Gracemont microarchitecture (2021), but Intel has continued to release Atom CPUs without AVX support after that (Parker Ridge in 2022, and an Elkhart Lake variant in 2023).
While some ISA enhancements can be used in isolated code blocks using manually crafted run-time dispatching logic, these additional ISA capabilities impact instruction selection across the entire program (including scalar code), in many cases without explicit use of compiler built-in functions. This means that run-time selection between code that uses these ISA enhancements and code that does not is not feasible, and adopting x86-64-v3 will exclude some systems from being able to run RHEL 10, just as the choice of x86-64-v2 for RHEL 9 excluded some systems.
Our analysis suggests that the environments that are in scope for RHEL 10 are ready for x86-64-v3. Hypervisors may need reconfiguration to pass through x86-64-v3 CPU capabilities. Some hypervisors have more restrictive defaults, typically for enabling migration to different CPUs whose capabilities are not a perfect match for the current host. Such configuration changes were already required for the x86-64-v2 transition in RHEL 9. Reconfiguring hypervisors will also result in a performance improvement for previous releases of RHEL, where the function selection (e.g., for C string functions) is often not optimal for the host CPU due to masked CPU capabilities. We plan to improve diagnostics for these scenarios.
Verifying performance improvements
The CentOS ISA SIG has produced rebuilds of CentOS 9 with x86-64-v2 and x86-64-v3 baselines, after upgrading the system compiler to GCC 12. As mentioned above, GCC 11, the RHEL 9 system compiler, does not support auto-vectorization at
-O2, which is why we switched to GCC 12 for these rebuilds. For this experiment, GCC 12 is still reasonably close to GCC 11 in terms of bug-for-bug C++ compatibility, so that few packages needed fixing before they could be rebuilt. We hope that these builds can be used to show performance improvements for key packages and workloads.
Even if we cannot show performance improvements for software included in RHEL, it may still make sense to go ahead with the switch. The reason is that if RHEL 10 requires the x86-64-v3 baseline, ISVs will be able to rely on it, too. This reduces maintenance cost for some ISVs because they no longer need to maintain (and test) AVX and non-AVX code paths in their manually tuned software.
What about x86-64-v4?
We do not think that x86-64-v4 is useful for a general-purpose operating system today. Intel’s current generation of efficiency cores does not support x86-64-v4. The Tiger Lake generation of CPUs supported AVX-512 in the client segment, but earlier and later client generations lacked support. Furthermore, Intel has announced a variant of AVX-512 called AVX10 that has a reduced vector width of 256 bits and does not provide the 512 bit vectors required by x86-64-v4.
AMD offers x86-64-v4 across their Zen 4 CPUs, but the switch to that CPU generation will still be relatively recent when RHEL 10 is scheduled to come out. Given that we need to build a single operating system image for all supported x86-64 CPUs, we need to pick a lower baseline than x86-64-v4.
While our plan of record may change based on further findings, we are excited about the prospect that RHEL 10 will move to the x86-64-v3 baseline. You can check out your own software today by rebuilding it with
-march=x86-64-v3 and testing it against the x86-64-v3 package builds from the CentOS ISA SIG. We welcome your feedback on the CentOS devel mailing list.