Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Optimizing Quay/Clair: Database profiling results

March 19, 2024
Vishnu Challa
Related topics:
ContainersDatabasesKubernetes
Related products:
Red Hat Quay

    Welcome to the second part of our exploration. In this continuation from our previous article, we will delve deeper into the results of our database profiling efforts and discuss strategies for further optimizing overall application performance.

    Profiling results before optimization for Clair database

    To gain insights into the inner workings of the Clair database, we conducted profiling using the pgBadger tool, specifically designed for PostgreSQL. This tool extracts information from the database logs and generates a comprehensive HTML report for result analysis. Through profiling, we identified that the recurrent spikes are attributed to bulk updates executed on the database, specifically around 10,000 updates by the updater, a sub-component of Clair that is responsible for maintaining an updated view containing the latest list of vulnerabilities. These vulnerabilities are sourced from trusted remote entities referred to as matchers. Here are a few key observations from the profiling process:

    • Examining the Entity-Relationship model of the database, we observed that the tables listed below have the highest number of rows and undergo intensive operations during the occurrences of those updates on the database.
    table_name                | total_size      |  row_count
    --------------------------|-----------------|-----------------
    enrichment                | 89145344        |   131558
    uo_enrich                 | 26337280        |   263187
    uo_vuln                   | 1271250944      |   9.402272e+06
    vuln                      | 9406947328      |   6.381969e+06
    • Furthermore, during instances of a spike in the database activity, our observation revealed that 99% of the queries were initiated by libvuln, the application name of updater. See Figure 1.
    Alt text
    Figure 1: Pie chart above illustrating that the majority of queries are generated by libvuln, i.e. the updater. This data was captured during the periodic spikes in activity.

    For a more in-depth investigation, we examined the most time-consuming query templates, and the following (Figure 2) are the top four along with their corresponding statistics.

    Alt text
    Figure 2: Top 4 query templates with their corresponding statistics which are the reason for intensive activity on the database during periodic spikes.

    Another noteworthy observation is that the aforementioned time-consuming queries are executed on tables housing the highest number of rows in the attached Entity-Relationship model. These tables are entirely isolated from the rest in the Entity-Relationship diagram. This is shown in Figure 3 below.

    Alt text
    Figure 3: Updater tables Entity-Relationship diagram in the Clair database.

    And finally below (Figure 4 and 5) are a few performance metrics of the database due to these periodic spikes.

    Alt text
    Figure 4: Average queries execution time.
    Figure 4: Average queries execution time.
    Alt text
    Figure 5: Amount of queries and their respective execution time-range statistics.
    Figure 5: Amount of queries and their respective execution time-range statistics.

    Approach 1

    Considering all the observations mentioned earlier, our initial approach was to consider potentially migrating the sizable tables, which are segregated in the Entity-Relationship model and execute resource-intensive queries, to a column-store database. In traditional relational databases, queries scan row by row, making the process time-consuming and resource-intensive, especially when dealing with billions of rows. In contrast, a column-store database allows the implementation of partitioning, indexing, and sharding, supporting parallel queries and accelerating results in a distributed setting. Additionally, since these intensive queries run periodically (every 6 hours), it aligns well with the Online Analytical Processing (OLAP) use case, for which column-store databases are particularly suitable.

    However, maintaining an additional database poses operational challenges. Alternatively, flattening all the data introduces trade-offs, including data duplication and increased maintenance. Consequently, we opted to explore improvement opportunities within our existing database, the settings for which are described below in the subsequent section.

    Approach 2 

    While investigating other improvement opportunities, we noticed that Clair executed a lot of individual INSERT/UPDATE/DELETE queries on top of their database using the github.com/jackc/pgx/v4 GO library. It's commonly assumed that sending multiple queries as a batch would result in automatic optimized execution by the library. To verify if this was correct, we have transformed one of most time consuming query templates into a single bulk query string and fed it as an input to the library to see if there are any performance improvements. Following are the POCs on this idea.

    • GitHub issue: https://github.com/quay/claircore/issues/995 
    • GitHub PR: https://github.com/quay/claircore/pull/996 

    Profiling results after optimization for Clair database

    After applying these changes, we noticed a significant performance improvement in the application. Below are the screenshots for those metrics (Figure 6 and Figure 7).

    Alt text
    Figure 6: Average queries execution time.

    Average query execution time got reduced from 250 ms to 2.5 ms which is a 100 fold difference.

    Alt text
    Figure 7: Amount of queries and their respective execution time-range statistics.

    The amount of queries got reduced by half which saves a lot of resources spent on opening and closing database connections.

    Also, here are some of the performance metrics of Clair app and database (Figures 8-10).

    Alt text
    Figure 8: Disk IOPS for Clair database got reduced from 3K to 2K which is also a good improvement.
    Alt text
    Figure 9: Clair database peak CPU usage got reduced to 3 cores which previously used to be 14.3 Cores.
    Alt text
    Figure 10: Clair app peak CPU usage got reduced to 0.35 cores which previously used to be 2.5 Cores.
    Figure 10: Clair app peak CPU usage got reduced to 0.35 cores, which previously used to be 2.5 cores.

    After this optimization, we can see a significant improvement in the database’s disk IOPS and CPU activity which also got reflected at the application CPU level, leaving more room for the incoming requests. This indicates that by implementing all the suggested optimizations, we can efficiently handle and fulfill additional requests using the same existing resources, resulting in cost savings. Feel free to take a look at the POC links referenced above to learn more about the code changes.

    Further digging into Clair application profiling

    For a deeper dive into application-level details, we conducted application-level profiling for Clair utilizing a tool called pyroscope. The following flame graphs (Figure 11 and Figure 12) depict the performance of the Clair application.

    Alt text
    Figure 11: Clair app CPU profile which indicates 68.35% usage by libvuln i.e. updater.
    Alt text
    Figure 12: Clair app memory profile which indicates 97.89% usage by libvuln i.e. updater.
    Figure 12: Clair app memory profile which indicates 97.89% usage by libvuln i.e. updater.

    This strongly confirms that updater functions involving database interactions are the most time-consuming, emphasizing the need for optimizations in this specific area.

    Profiling Quay

    Database profiling 

    We also performed database profiling for Quay to assess the database performance, both in standalone conditions and under load. The average query time remains below 2 milliseconds in both the conditions, indicating that the database activity is entirely driven by incoming requests to the application. From the database standpoint, there is nothing unusual or anomalous .

    Application profiling

    Currently, Quay lacks an integrated application profiling tool. Integrating a continuous profiling tool, such as pyroscope, would introduce a 2-5% overhead, representing a trade-off. To enhance the overall application, Quay recently underwent a database migration from MySQL to Aurora Postgres. Additional details on this migration can be found in the blog How We Moved Quay.io's Database From MySQL to Postgres And Lived To Tell About I.

    Learnings throughout the process

    From the above attached screenshots showcasing the outcome after query optimization, here are a few points relevant to the Clair database.

    Reasons why bulk queries are better on databases

    • Reduced overhead: Reduces the overhead of opening/closing transactions for individual queries, as a single SQL statement acts on multiple rows. The significant reduction in the number of queries, halved after the optimization in Clair, provides clear evidence of this improvement.
    • Minimized disk I/O: Bulk operations allow DB to act in sequential manner, resulting in minimized disk I/O operations. Sequential operations are much faster than random ones, as the disk arm movement is minimized. Reduction in the number of Clair Disk IOPS after the optimization proves this statement.
    • Few index updates: Bulk operations temporarily disable/defer index updates, which saves time and resources that would have been spent updating the indexes for each individual row, and the indexes are rebuilt once bulk operations are completed. The average query execution time which got drastically reduced after the POC pull request in Clair supports this statement.
    • Reduced locking overhead: Few locks are acquired since all the operations are done in a single transaction, reducing resource contention with better concurrency and performance which is again visible in the attached screenshots above.
    • Improved WAL utilization: Write-Ahead logging (WAL) is a mechanism used by databases to provide data consistency and durability. Bulk operations have better WAL utilization as data is written in larger blocks, reducing its overhead. For more details please take a look at the “temporary files” section in the below attached results.

    For more information on the above reasons, feel free to look at the below hosted web pages:

    • Profiling results before bulk insert
    • Profiling results after bulk insert

    Disregarding the outliers in the results attached above, when considering the overall average outcomes, all the factors should align with the numerical data.

    Never trust an inbuilt library blindly

    For executing queries on the Clair database, we utilize the github.com/jackc/pgx/v4 GO library. It's commonly assumed that sending multiple queries as a batch would result in automatic and optimized execution by the library. However, our analysis above revealed this assumption to be incorrect. Therefore, it's advisable to craft optimized queries directly within the query string and provide them as input to the language library responsible for database communication.

    What’s next?

    We're currently working on making it easy for the Quay/Clair dev teams to run regular performance tests with the new tooling and providing a simple way for them to check and understand the results. See Figures 13 and 14 below for examples of this.

    Alt text
    Figure 13: Integrating Quay performance testing into a continuous performance testing framework and visualizing it on a dashboard constructed using React and Patternfly components. This integration will offer an overview of higher-level metrics, presenting aggregated data.
    Alt text
    Figure 14: Grafana dashboard designed for conducting run-to-run comparisons, offering valuable insights for analysis for both developer and QE perspectives.

    Our goal is to provide the Quay development and QE teams with an easy way to review results, and we will continue to make improvements in the future. Currently, we use measures like throughput, latencies, and HTTP status codes to compare performance between two runs with the same setup. This makes it easier for the Quay quality engineering (QE) team to test before each release. We're looking to expand this approach by including more scenarios to help the Quay development and QE teams get ready for pre-production or pre-release situations.

    In conclusion

    This write-up was an effort at providing insights into our investments in performance and scale testing of Quay/Clair. The performance and scale team is dedicated to ongoing learning and facilitating improvements to Quay/Clair to enhance stability and scalability to the fullest extent, in collaboration with development and quality engineering teams.

    We will continue to invest on adding more load testing scenarios to further improvise on the test setup we have as of today. This includes adding more endpoints and metrics to analyze their performance results. When it comes to result analysis, currently the Quay QE team plots all the graphs in an Excel sheet, which is a very tedious task. Continuous Performance Testing (CPT) Dashboard is our ongoing effort to move away from excel sheets and have all the key visualizations at one place along with Grafana dashboard links which will provide a developer or QE with more zoom-in view at the results.

    Last updated: March 21, 2024

    Related Posts

    • Using Quay.io to find vulnerabilities in your container images

    • 3 ways to install a database with Helm charts

    • Connect to an external PostgreSQL database using SSL/TLS for Red Hat's single sign-on technology

    • How to deploy a MSSQL database using Ansible Vault

    • Containerize .NET for Red Hat OpenShift: Linux containers and .NET Core

    • Leveraging Kubernetes and OpenShift for automated performance tests (part 1)

    Recent Posts

    • Debugging image mode with Red Hat OpenShift 4.20: A practical guide

    • EvalHub: Because "looks good to me" isn't a benchmark

    • SQL Server HA on RHEL: Meet Pacemaker HA Agent v2 (tech preview)

    • Deploy with confidence: Continuous integration and continuous delivery for agentic AI

    • Every layer counts: Defense in depth for AI agents with Red Hat AI

    What’s up next?

    This learning path will show you how to create a Quarkus application that uses the Micrometer library to expose metrics, and tie it into Prometheus for monitoring and alerts.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.