Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Optimizing Quay/Clair: Database profiling results

March 19, 2024
Vishnu Challa
Related topics:
ContainersDatabasesKubernetes
Related products:
Red Hat Quay

Share:

    Welcome to the second part of our exploration. In this continuation from our previous article, we will delve deeper into the results of our database profiling efforts and discuss strategies for further optimizing overall application performance.

    Profiling results before optimization for Clair database

    To gain insights into the inner workings of the Clair database, we conducted profiling using the pgBadger tool, specifically designed for PostgreSQL. This tool extracts information from the database logs and generates a comprehensive HTML report for result analysis. Through profiling, we identified that the recurrent spikes are attributed to bulk updates executed on the database, specifically around 10,000 updates by the updater, a sub-component of Clair that is responsible for maintaining an updated view containing the latest list of vulnerabilities. These vulnerabilities are sourced from trusted remote entities referred to as matchers. Here are a few key observations from the profiling process:

    • Examining the Entity-Relationship model of the database, we observed that the tables listed below have the highest number of rows and undergo intensive operations during the occurrences of those updates on the database.
    table_name                | total_size      |  row_count
    --------------------------|-----------------|-----------------
    enrichment                | 89145344        |   131558
    uo_enrich                 | 26337280        |   263187
    uo_vuln                   | 1271250944      |   9.402272e+06
    vuln                      | 9406947328      |   6.381969e+06
    • Furthermore, during instances of a spike in the database activity, our observation revealed that 99% of the queries were initiated by libvuln, the application name of updater. See Figure 1.
    Alt text
    Figure 1: Pie chart above illustrating that the majority of queries are generated by libvuln, i.e. the updater. This data was captured during the periodic spikes in activity.

    For a more in-depth investigation, we examined the most time-consuming query templates, and the following (Figure 2) are the top four along with their corresponding statistics.

    Alt text
    Figure 2: Top 4 query templates with their corresponding statistics which are the reason for intensive activity on the database during periodic spikes.

    Another noteworthy observation is that the aforementioned time-consuming queries are executed on tables housing the highest number of rows in the attached Entity-Relationship model. These tables are entirely isolated from the rest in the Entity-Relationship diagram. This is shown in Figure 3 below.

    Alt text
    Figure 3: Updater tables Entity-Relationship diagram in the Clair database.

    And finally below (Figure 4 and 5) are a few performance metrics of the database due to these periodic spikes.

    Alt text
    Figure 4: Average queries execution time.
    Figure 4: Average queries execution time.
    Alt text
    Figure 5: Amount of queries and their respective execution time-range statistics.
    Figure 5: Amount of queries and their respective execution time-range statistics.

    Approach 1

    Considering all the observations mentioned earlier, our initial approach was to consider potentially migrating the sizable tables, which are segregated in the Entity-Relationship model and execute resource-intensive queries, to a column-store database. In traditional relational databases, queries scan row by row, making the process time-consuming and resource-intensive, especially when dealing with billions of rows. In contrast, a column-store database allows the implementation of partitioning, indexing, and sharding, supporting parallel queries and accelerating results in a distributed setting. Additionally, since these intensive queries run periodically (every 6 hours), it aligns well with the Online Analytical Processing (OLAP) use case, for which column-store databases are particularly suitable.

    However, maintaining an additional database poses operational challenges. Alternatively, flattening all the data introduces trade-offs, including data duplication and increased maintenance. Consequently, we opted to explore improvement opportunities within our existing database, the settings for which are described below in the subsequent section.

    Approach 2 

    While investigating other improvement opportunities, we noticed that Clair executed a lot of individual INSERT/UPDATE/DELETE queries on top of their database using the github.com/jackc/pgx/v4 GO library. It's commonly assumed that sending multiple queries as a batch would result in automatic optimized execution by the library. To verify if this was correct, we have transformed one of most time consuming query templates into a single bulk query string and fed it as an input to the library to see if there are any performance improvements. Following are the POCs on this idea.

    • GitHub issue: https://github.com/quay/claircore/issues/995 
    • GitHub PR: https://github.com/quay/claircore/pull/996 

    Profiling results after optimization for Clair database

    After applying these changes, we noticed a significant performance improvement in the application. Below are the screenshots for those metrics (Figure 6 and Figure 7).

    Alt text
    Figure 6: Average queries execution time.

    Average query execution time got reduced from 250 ms to 2.5 ms which is a 100 fold difference.

    Alt text
    Figure 7: Amount of queries and their respective execution time-range statistics.

    The amount of queries got reduced by half which saves a lot of resources spent on opening and closing database connections.

    Also, here are some of the performance metrics of Clair app and database (Figures 8-10).

    Alt text
    Figure 8: Disk IOPS for Clair database got reduced from 3K to 2K which is also a good improvement.
    Alt text
    Figure 9: Clair database peak CPU usage got reduced to 3 cores which previously used to be 14.3 Cores.
    Alt text
    Figure 10: Clair app peak CPU usage got reduced to 0.35 cores which previously used to be 2.5 Cores.
    Figure 10: Clair app peak CPU usage got reduced to 0.35 cores, which previously used to be 2.5 cores.

    After this optimization, we can see a significant improvement in the database’s disk IOPS and CPU activity which also got reflected at the application CPU level, leaving more room for the incoming requests. This indicates that by implementing all the suggested optimizations, we can efficiently handle and fulfill additional requests using the same existing resources, resulting in cost savings. Feel free to take a look at the POC links referenced above to learn more about the code changes.

    Further digging into Clair application profiling

    For a deeper dive into application-level details, we conducted application-level profiling for Clair utilizing a tool called pyroscope. The following flame graphs (Figure 11 and Figure 12) depict the performance of the Clair application.

    Alt text
    Figure 11: Clair app CPU profile which indicates 68.35% usage by libvuln i.e. updater.
    Alt text
    Figure 12: Clair app memory profile which indicates 97.89% usage by libvuln i.e. updater.
    Figure 12: Clair app memory profile which indicates 97.89% usage by libvuln i.e. updater.

    This strongly confirms that updater functions involving database interactions are the most time-consuming, emphasizing the need for optimizations in this specific area.

    Profiling Quay

    Database profiling 

    We also performed database profiling for Quay to assess the database performance, both in standalone conditions and under load. The average query time remains below 2 milliseconds in both the conditions, indicating that the database activity is entirely driven by incoming requests to the application. From the database standpoint, there is nothing unusual or anomalous .

    Application profiling

    Currently, Quay lacks an integrated application profiling tool. Integrating a continuous profiling tool, such as pyroscope, would introduce a 2-5% overhead, representing a trade-off. To enhance the overall application, Quay recently underwent a database migration from MySQL to Aurora Postgres. Additional details on this migration can be found in the blog How We Moved Quay.io's Database From MySQL to Postgres And Lived To Tell About I.

    Learnings throughout the process

    From the above attached screenshots showcasing the outcome after query optimization, here are a few points relevant to the Clair database.

    Reasons why bulk queries are better on databases

    • Reduced overhead: Reduces the overhead of opening/closing transactions for individual queries, as a single SQL statement acts on multiple rows. The significant reduction in the number of queries, halved after the optimization in Clair, provides clear evidence of this improvement.
    • Minimized disk I/O: Bulk operations allow DB to act in sequential manner, resulting in minimized disk I/O operations. Sequential operations are much faster than random ones, as the disk arm movement is minimized. Reduction in the number of Clair Disk IOPS after the optimization proves this statement.
    • Few index updates: Bulk operations temporarily disable/defer index updates, which saves time and resources that would have been spent updating the indexes for each individual row, and the indexes are rebuilt once bulk operations are completed. The average query execution time which got drastically reduced after the POC pull request in Clair supports this statement.
    • Reduced locking overhead: Few locks are acquired since all the operations are done in a single transaction, reducing resource contention with better concurrency and performance which is again visible in the attached screenshots above.
    • Improved WAL utilization: Write-Ahead logging (WAL) is a mechanism used by databases to provide data consistency and durability. Bulk operations have better WAL utilization as data is written in larger blocks, reducing its overhead. For more details please take a look at the “temporary files” section in the below attached results.

    For more information on the above reasons, feel free to look at the below hosted web pages:

    • Profiling results before bulk insert
    • Profiling results after bulk insert

    Disregarding the outliers in the results attached above, when considering the overall average outcomes, all the factors should align with the numerical data.

    Never trust an inbuilt library blindly

    For executing queries on the Clair database, we utilize the github.com/jackc/pgx/v4 GO library. It's commonly assumed that sending multiple queries as a batch would result in automatic and optimized execution by the library. However, our analysis above revealed this assumption to be incorrect. Therefore, it's advisable to craft optimized queries directly within the query string and provide them as input to the language library responsible for database communication.

    What’s next?

    We're currently working on making it easy for the Quay/Clair dev teams to run regular performance tests with the new tooling and providing a simple way for them to check and understand the results. See Figures 13 and 14 below for examples of this.

    Alt text
    Figure 13: Integrating Quay performance testing into a continuous performance testing framework and visualizing it on a dashboard constructed using React and Patternfly components. This integration will offer an overview of higher-level metrics, presenting aggregated data.
    Alt text
    Figure 14: Grafana dashboard designed for conducting run-to-run comparisons, offering valuable insights for analysis for both developer and QE perspectives.

    Our goal is to provide the Quay development and QE teams with an easy way to review results, and we will continue to make improvements in the future. Currently, we use measures like throughput, latencies, and HTTP status codes to compare performance between two runs with the same setup. This makes it easier for the Quay quality engineering (QE) team to test before each release. We're looking to expand this approach by including more scenarios to help the Quay development and QE teams get ready for pre-production or pre-release situations.

    In conclusion

    This write-up was an effort at providing insights into our investments in performance and scale testing of Quay/Clair. The performance and scale team is dedicated to ongoing learning and facilitating improvements to Quay/Clair to enhance stability and scalability to the fullest extent, in collaboration with development and quality engineering teams.

    We will continue to invest on adding more load testing scenarios to further improvise on the test setup we have as of today. This includes adding more endpoints and metrics to analyze their performance results. When it comes to result analysis, currently the Quay QE team plots all the graphs in an Excel sheet, which is a very tedious task. Continuous Performance Testing (CPT) Dashboard is our ongoing effort to move away from excel sheets and have all the key visualizations at one place along with Grafana dashboard links which will provide a developer or QE with more zoom-in view at the results.

    Last updated: March 21, 2024

    Related Posts

    • Using Quay.io to find vulnerabilities in your container images

    • 3 ways to install a database with Helm charts

    • Connect to an external PostgreSQL database using SSL/TLS for Red Hat's single sign-on technology

    • How to deploy a MSSQL database using Ansible Vault

    • Containerize .NET for Red Hat OpenShift: Linux containers and .NET Core

    • Leveraging Kubernetes and OpenShift for automated performance tests (part 1)

    Recent Posts

    • Container starting and termination order in a pod

    • More Essential AI tutorials for Node.js Developers

    • How to run a fraud detection AI model on RHEL CVMs

    • How we use software provenance at Red Hat

    • Alternatives to creating bootc images from scratch

    What’s up next?

    This learning path will show you how to create a Quarkus application that uses the Micrometer library to expose metrics, and tie it into Prometheus for monitoring and alerts.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue