Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Instruction-level Multithreading to improve processor utilization

May 4, 2016
William Cohen
Related products:
Developer Toolset

    No one wants the hardware in their computer sitting idle - we all want to get as much useful work out of our hardware as possible. Mechanisms such as cache and branch prediction have been incorporated into processors to minimize the amount of processor idle time caused by memory accesses and changes in program flow; however, these mechanism are not perfect.

    There are still times that the processor could be idle waiting for data or computational results to become available - these delays are relatively short, generally less than a few hundred clock cycles, typically around ten.   The operating system software context switch to another runnable task takes on the order of hundreds of cycles.  Thus, the overheads are too large for the operating system to switch to another runnable tasks to hide these short times of idleness.

    One approach to get better utilization is to have the physical processor support multiple logical processors. If one logical processor has to wait for some result, the physical processor can switch to processing instructions from other logical processors to keep the hardware busy doing useful work and get better utilization of the  hardware.

    The hardware cost of supporting multiple logical processors is relatively modest. The Pentium 4 support of two logical processors on one physical core was estimated to be 5% of the die space providing up to a 30% improvement in performance.

    The idea of multiplexing the processor hardware between multiple logical processors has been around for quite some time. The 1960's CDC 6600 peripheral processor implemented ten logical processors sharing the same processor hardware to hide the latency of memory accesses. Thus, the CDC 6600 peripheral processor effectively got ten times the processing performance with a little added hardware to store the state information for each of the logical processors.

    The main advantage of multithreading is to fill the previously idle delays due to operation latency with useful work. A program that just has a single thread of execution is not going to get any performance improvement from hardware that supports multithreading, and the individual thread are not any faster due to the multithreading. However, the aggregate throughput of the multiple threads is higher because the hardware is better utilized.

    Improved performance is not a given with instruction level multithreading, since resources are shared between the logical processors. If one logical processor uses 100% of a resource, running additional threads on other logical processors will not improve utilization past 100%. One example of such a shared resource is the memory interface to RAM that is shared between the logical processors. If one thread can use all the memory interface bandwidth, adding more threads running on the other logical processors will simply split that bandwidth between them, not providing any improvement in aggregate throughput.

    In some cases splitting the resources between multiple logical processors can hurt performance. Having additional logical processors may cause excessive cache misses because the working sets of the multiple logical processors exceed the size of the cache. Thus, rather than getting a performance improvement with logical processors the physical processor might spend a greater portion of its time waiting for cache misses to be satisfied.

    Whether multithreading helps is very application and hardware implementation specific. You should benchmark performance both with and without multithreading enabled to determine whether multithreading helps.  To improve the probability that your application benefits from multithreading:

    • Expose the concurrency in the code so there are multiple threads that can be assigned to separate logical processors.
    • Ensure that the data working sets fit into the reduced cache space due to the cache being shared between multiple threads. For example if the processor has four physical cores, each handling two logical processors, and a last-level cache of 8MB, there could be eight threads each with approximately 1MB of cache space. If the code assumed that each processor had 2MB of cache space, this could lead to excessive cache evictions and memory accesses.
    • Be careful with spin-lock operations on multithreaded processors. Naive implementations of spin-lock operation can cause the spin-wait loop to slow down other threads sharing the physical processor because the spin-wait is a very tight loop.

    Investigating multithreading further

    The Intel optimization manual has a entire chapter about optimizing code for multicore and multithreading for their processors which includes suggestions on how to address some of the issues:

    • Intel 64 and IA-32 Architectures Optimization Reference Manual
    Last updated: February 23, 2024

    Recent Posts

    • Federated identity across the hybrid cloud using zero trust workload identity manager

    • Confidential virtual machine storage attack scenarios

    • Introducing virtualization platform autopilot

    • Integrate zero trust workload identity manager with Red Hat OpenShift GitOps

    • Best Practice Configuration and Tuning for Linux and Windows VMs

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.