Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Examining Huge Pages or Transparent Huge Pages performance

March 10, 2014
William Cohen
Related topics:
Linux
Related products:
Red Hat Enterprise Linux

    All modern processors use page-based mechanisms to translate the user-space processes virtual addresses into physical addresses for RAM. The pages are commonly 4KB in size and the processor can hold a limited number of virtual-to-physical address mappings in the Translation Lookaside Buffers (TLB). The number TLB entries ranges from tens to hundreds of mappings. This limits a processor to a few
    megabytes of memory it can address without changing the TLB entries. When a virtual-to-physical address mapping is not in the TLB the processor must do an expensive computation to generate a new virtual-to-physical address mapping.

    To increase the amount of memory the processor can address without performing the expensive TLB updates many processors allow larger page sizes to be used. On x86_64 processors huge pages are 2MB, 512 times larger than regular 4KB pages. In ideal situations huge pages can decrease the overhead of the TLB updates (misses). However, huge page use can increase memory pressure, add latency for minor pages faults, and add overhead when splitting huge pages or coalescing normal sized pages into huge pages.

    There are two mechanisms available for huge pages in Linux: the hugepages and Transparent Huge Pages (THP). Explicit configuration is required for the original hugepages mechanism. The newer transparent hugepage (THP) mechanism will automatically use larger pages for dynamically allocated memory in Red Hat Enterprise Linux 6.

    To determine whether the newer Transparent HugePages (THP) or the older HugePages mechanism are being used, look at the output of /proc/meminfo as below:

    $ cat /proc/meminfo|grep Huge
    AnonHugePages:   3049472 kB
    HugePages_Total:       0
    HugePages_Free:        0
    HugePages_Rsvd:        0
    HugePages_Surp:        0
    Hugepagesize:       2048 kB

    The AnonHugePages entry lists the number of pages that the newer Transparent Huge Page mechanism currently has in use. For this machine there are 309472kB, 1489 huge pages each 2048kB in size.

    In this case there are zero pages in the pool of the older hugepage mechanism as shown by HugePages_Total of 0. The HugePages_Free shows how many pages are still available for allocation, which is going to be less than or equal to HugePages_Total. The number of HugePages in use can be computed as HugePages_Total-HugePagesFree. For more information about the configuration of HugePages see Tuning and Optimizing Red Hat Enterprise Linux for Oracle 9i and 10g Databases.

    Determining whether page fault latency is due to huge pages use

    Huge page use can reduce the number of TLB updates required to access large regions of memory and reducing the overall cost of TLB updates but increase costs and latency for other operations. When a user-space application is given a range of addresses for a memory allocation the assignment of a physical page is deferred until the first time the page is accessed. To prevent information leakage from the previous user of the page the kernel writes zeros in the entire page. For a 4096 byte page this is a relatively short operation and will only take a couple of microseconds. The x86 hugepages are 2MB in size, 512 times larger than the normal page. Thus, the operation may take hundreds of microseconds and impact the operation of latency sensitive code. Below is a simple SystemTap command line script to show which applications have huge pages zeroed out and how long those operations take. It will run until cntl-c is pressed.

    stap  -e 'global huge_clear probe kernel.function("clear_huge_page").return {
      huge_clear [execname(), pid()] <<< (gettimeofday_us() - @entry(gettimeofday_us()))}'

    Below is the a run of the above SystemTap clear huge page script. The script will output a list sorted from the executable name and process with the most huge page clears to the least. The @count is the number of times that process encountered a huge page clear operation. Following that information is time statistics displayed in microseconds of wall clock time. The @min and the @max are the minimum and the maximum time respectively to clear out a page. The @sum is the total wall clock time. In the example below the ld process 17050 took a total 1924 microseconds to clear out huge pages and on average those page clears took 128 microseconds.

    #  stap  -e 'global huge_clear probe kernel.function("clear_huge_page").return {
      huge_clear [execname(), pid()] <<< (gettimeofday_us() - @entry(gettimeofday_us()))}'
    ^CChuge_clear["ld",17050] @count=15 @min=114 @max=148 @sum=1924 @avg=128
    huge_clear["ld",27996] @count=13 @min=121 @max=160 @sum=1674 @avg=128
    huge_clear["ld",19595] @count=11 @min=86 @max=181 @sum=1251 @avg=113
    huge_clear["cc1",22840] @count=6 @min=108 @max=180 @sum=862 @avg=143
    huge_clear["ld",15640] @count=5 @min=160 @max=599 @sum=1274 @avg=254
    huge_clear["ld",27733] @count=4 @min=95 @max=145 @sum=443 @avg=110
    huge_clear["cc1",24455] @count=4 @min=103 @max=159 @sum=535 @avg=133
    huge_clear["cc1",20431] @count=3 @min=112 @max=172 @sum=408 @avg=136
    huge_clear["cc1",21906] @count=3 @min=125 @max=159 @sum=431 @avg=143

    The system may attempt to save memory by using the same physical page for multiple processes. When one of the processes attempts to modify the contents of the page a new copy needs to be made of the page. The Copy-On-Write (COW) operation for the huge page can be observed with a script very similar to the one watching for huge pages to be zeroed out. Below is the script to watch for Copy-On-Write for huge pages and it will output data in a similar format.

    stap  -e 'global huge_cow probe kernel.function("copy_user_huge_page").return {
      huge_cow [execname(), pid()] <<< (gettimeofday_us() - @entry(gettimeofday_us()))}'

    Determining whether huge page split and collapse operations are affecting performance

    Because some portions of the kernel code only work with normal-sized pages the kernel may convert a huge page into a set of normal-sized pages using a split operation. One can identify if split operations are occurring with the following systemtap script:

    stap -e 'probe kernel.function("split_huge_page") { printf("%s: %s(%d)n", pp(), execname(), pid());}'

    Below is an example run of the script showing which processes are performing split huge page operations. In this case the same virtualized guest machine (qemu-system-x86_64) has some huge pages splits.

    # stap -e 'probe kernel.function("split_huge_page") { printf("%s: %s(%d)n", pp(), execname(), pid());}'
    kernel.function("split_huge_page@include/linux/huge_mm.h:103"): qemu-system-x86(9473)
    kernel.function("split_huge_page@include/linux/huge_mm.h:103"): qemu-system-x86(9473)
    kernel.function("split_huge_page@include/linux/huge_mm.h:103"): plugin-containe(16582)
    kernel.function("split_huge_page@include/linux/huge_mm.h:103"): StreamT~ns #697(2942)

    The inverse of the huge page split operation is the huge page collapse operation that converts a set of normal-sized pages into a single huge page. It is desirable to have a range of addresses need fewer TLB entries, but the conversion process is expensive because the system needs to find a candidate set of pages to group together and then copy all the memory from the possibly scattered normal-sized pages into a single huge page. The khugepaged kernel thread searches for candidates pages to collapse into a single huge page. Even if khugepaged is not successful converting normal-sized pages into huge pages it may still be taking processor time to search for candidate pages. You can see if the khugepaged kernel thread is taking a significant amount of processor time with:

    top -p `pidof khugepaged`

    If you want to see when the huge page collapse operations occur, the following will note each time khugepaged is able to collapse normal-sized pages into huge pages:

    stap -e 'probe kernel.function("collapse_huge_page") {  printf("%-25s: %s (%d) collapse_huge_pagen", tz_ctime(gettimeofday_s()), execname(), pid())}'

    The above one line script will generate output like the following:

    $ stap -e 'probe kernel.function("collapse_huge_page") {  printf("%-25s: %s (%d) collapse_huge_pagen", ctime(gettimeofday_s()), execname(), pid())}'
    Mon Oct 21 15:12:44 2013 : khugepaged (88) collapse_huge_page
    Mon Oct 21 15:13:44 2013 : khugepaged (88) collapse_huge_page
    Mon Oct 21 15:13:54 2013 : khugepaged (88) collapse_huge_page
    Mon Oct 21 15:14:54 2013 : khugepaged (88) collapse_huge_page
    Mon Oct 21 15:15:04 2013 : khugepaged (88) collapse_huge_page

    References

    • http://kvm.et.redhat.com/wiki/images/9/9e/2010-forum-thp.pdf
    • http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=blob_plain;f=Documentation/vm/transhuge.txt;hb=HEAD
    • Red Hat Enterprise Linux 6 SystemTap Beginners Guide Introduction to SystemTap
    Last updated: January 10, 2023

    Recent Posts

    • Trusted software factory: Building trust in the agentic AI era

    • Build a zero trust AI pipeline with OpenShift and RHEL CVMs

    • Red Hat Hardened Images: Top 5 benefits for software developers

    • How EvalHub manages two-layer Kubernetes control planes

    • Tekton joins the CNCF as an incubating project

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.