Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

How to find vulnerabilities across millions of Quay.io images

June 20, 2024
Bill Dettelback
Related topics:
ContainersKubernetesSecure Coding
Related products:
Red Hat OpenShiftRed Hat Quay

Share:

    Red Hat Quay.io was the first private container registry on the internet and began serving images back in 2013. Today, it hosts millions of images for hundreds of thousands of critical customer workloads, including Red Hat’s own product catalog.

    One of the marquee features of Quay.io is free security scanning for any image that we host. This happens automatically as soon as you push an image to Quay.io. Not only does it scan your image at upload time, it also provides an up-to-date view of your vulnerabilities, no matter how long ago you pushed it. There is no need to ask Quay.io to rescan your image—that information is available whenever you request it. 

    This is critical, as container images typically only gather more and more vulnerabilities as they age. An old container image isn’t always a bad container image (especially if it still powers your application), but more than likely, plenty of vulnerabilities have been discovered in its libraries and base OS since you first built it. Figure 1 shows a sample security report with hundreds of vulnerabilities detected.

    alt text
    Figure 1: The Security Report tab for a container image on Quay.io.

    The Security Report on Quay.io provides this up to date view of what vulnerabilities we’ve discovered in your image. It also reports on recent vulnerabilities; this is important, as every day hundreds of new vulnerabilities are documented.

    But how does all that work? And how can we provide this information so quickly for literally any image out of millions?

    Introducing Clair

    Clair is the open source project that powers the vulnerability detection behind Quay.io. Clair was designed to inspect container image contents and then rapidly match those contents against known vulnerability information. "Scanning" all of Quay.io in real time would be impractical due to the sheer number of images and vulnerabilities, but Clair can give real-time vulnerability results in an efficient manner by taking advantage of a few guarantees container images give us.

    • Container images are immutable. This means that once Clair has inspected an image’s contents, it is guaranteed that those contents will never change. Of course, a user might push new versions of an image, but from Clair’s perspective each of those images are distinct (identified by a manifest digest). So once Clair has gone through an image’s contents, it never needs to go through them again.

    • Container images are made of individual layers of content. Each layer is a single tarball that can be uniquely identified via an SHA value. This makes the inspection process simpler because we can now index each layer independently. This greatly reduces the problem space, as we can build a data structure called an IndexReport which contains all of the package and library data found in all the layers an image contains, as shown in Figure 2.

    alt text
    Figure 2: Content addressable layers within a container image.

    Lastly, image layers are usually shared across images themselves. This further reduces our problem space when inspecting an image. Chances are we’ve already seen some (or most) of the layers already (Figure 3). Since we know they are immutable and uniquely identifiable, we can avoid re-indexing them. This helps greatly when a development team’s CI pipelines are producing repeated copies of an application and only the top most layers are changing. Clair can devote time to indexing only the layers it hasn’t yet seen.

    alt text
    Figure 3: Common layers are typically shared across similar container images.

    Knowing these facts about container images, Clair breaks vulnerability reporting into three parts.

    Indexing

    When Clair is presented with an image to "scan," it first indexes the image contents. Indexing is the act of looking through all of the image layers in well-known areas for OS packages, languages, libraries, and such. Once an image has been successfully indexed, Clair stores an IndexReport for that manifest ID. The IndexReport is itself like a little database that can explain what’s inside the image. Requesting an IndexReport for an image is idempotent. If Clair is presented with a manifest that it has already indexed, it will simply return the existing IndexReport.

    Matching

    When Clair is asked to produce a list of vulnerabilities about an image (via its manifest digest), it performs a matching operation. Clair is configured with several matchers that each know how to compare entries in an IndexReport with known vulnerability sources. The end result of all matchers is a VulnReport structure that contains the vulnerability data that Quay.io shows in its UI. Requesting a VulnReport is not idempotent, as the set of possible vulnerabilities may have changed since the last time a VulnReport was requested for that manifest ID.

    Updating

    Because the universe of known vulnerabilities continues to grow, Clair must continuously check to keep its databases up to date. The updaters work asynchronously every few hours to see what new vulnerabilities have been published and add them to Clair’s database. This ensures Clair can give up-to-date answers at all times. Clair behind Quay.io is configured with the following vulnerability data sources:

    • Alpine Linux
    • AWS Linux
    • Debian Linux
    • Oracle Linux
    • VMWare Photon Linux
    • Red Hat Enterprise Linux
    • Red Hat Ecosystem Catalog
    • SUSE Linux
    • Ubuntu Linux
    • Open Source Vulnerability
    • NVD CVSS (this is primarily used to get severity ratings when a CVE does not have one already)

    Putting it all together

    Clair operates as a set of indexing and matching services behind Quay.io. Let’s walk through what happens behind the scenes when a user pushes an image to Quay.io, as shown in Figure 4.

    alt text
    Figure 4: A user pushing an image to Quay.io and Clair indexing the image.
    1. A user pushes a very old Redis image to Quay.io. The Quay.io API receives the request and takes the uploaded layers and manifest.
    2. Once the push is successfully completed, the manifest becomes available to the security worker that runs periodically to find newly pushed images.
    3. The security worker passes the manifest to Clair’s indexer service and requests an IndexReport be created. If Clair has never seen this manifest then the image layers are pulled from Quay.io and indexed, creating an IndexReport. If Clair has already seen this image (e.g., the same manifest was submitted by two different security worker instances) then Clair does nothing.

    Now let’s see what happens when a user goes to the Security Report page on Quay.io for an image (Figure 5).

    alt text
    Figure 5: Viewing the Security Report tab on Quay.io for the recently pushed image.
    1. When the user opens the Security Report page, Quay.io sends the manifest digest for that image to Clair’s matcher service. 
    2. The matcher contacts the indexer service to see if an IndexReport is available. If an IndexReport is not available (for example, the image has just been pushed seconds earlier and the indexer is still working) then the user is told that the scan results are not yet available. Otherwise, the IndexReport is provided back to the matcher and its contents are compared against the lists of known vulnerabilities for each artifact type.
    3. Any identified vulnerabilities are recorded in a VulnReport structure which is then handed back to Quay.io. This forms the basis of the SecurityReport page.
    4. While all of this is happening, the updaters are working asynchronously in the background to periodically pull vulnerability data into the matcher database.

    Making it scale

    Clair was designed so that it can be deployed in numerous configurations. This is because the performance profiles of indexing and matching are quite different. Indexing tends to be very I/O intensive, as it needs to not only scan the full image contents but it also employs considerable amounts of temporary disk space while working. Matching tends to be very memory intensive as it needs to hold IndexReports and vulnerability data from the database for many simultaneous requests. To make sure Clair can scale to meet as many different scenarios as possible, deployments can leverage three different patterns. See Figure 6.

    alt text
    Figure 6: Various deployment methods for Clair.

    For the simplest deployment, indexing, matching, and updating can all be deployed as a single process with a single database instance. If one side of the database load is starting to affect overall performance, the indexer and matcher databases can be split out into separate instances. If there are CPU or memory limitations, it is possible to split the indexer away from the matcher and updater processes as well.

    On Quay.io we are running the indexer as a 40-pod StatefulSet and the matcher as a 10-pod Deployment, each with their own dedicated Postgres instances. This gives us good performance with the ability to increase capacity either at the pod or database level as push traffic on Quay.io increases.

    Measuring performance

    For Quay.io customers, the two most important metrics regarding security are how fast initial results are available after a push and how fast current results appear when the Security Report page is viewed. These equate directly to the time Clair takes to produce an IndexReport and the time to produce a VulnReport.

    Roughly speaking, the Clair behind Quay.io can produce an IndexReport between 3 to 4 seconds. It is impossible to get a precise value, as the time needed to generate an IndexReport is directly related to the size of the image pushed. For tiny (sub GiB) images, Clair can produce the IndexReport in milliseconds however larger images can take substantially longer to scan. Measured in "real time," this means the "queued" indicator on a freshly pushed image typically clears in about a minute or so (based on other factors within Quay.io). See Figure 7.

    alt text
    Figure 7: Graph showing latency times for producing an IndexReport on Quay.io.

    The time needed to compute a VulnReport is more consistent compared to IndexReport generation. Generally speaking we hand these back to Quay.io within a second or less. Interestingly, we observe that most VulnReports are produced either within 500 milliseconds or around 1 second. The exact reason for this is something that we’re continuing to investigate. These latencies allow Quay.io to populate its Security tab within a reasonable response time for someone sitting behind a web browser. See Figure 8.

    alt text
    Figure 8: Graph showing latency times for producing a VulnReport on quay.io.

    Conclusion

    Tracking and reporting on vulnerabilities across a large container registry like Quay.io requires careful architectural choices. Quay.io provides an experience where all container images appear to be continuously scanned no matter how long ago they were pushed. This is something that simply cannot be done cost effectively in a naive manner. A more clever solution is required. Clair balances the tasks of understanding (indexing) and reporting (matching) across two services that work together to efficiently produce security information on any image in Quay.io. With a modular architecture, we can scale up (or down) either side of the problem space as traffic fluctuates on Quay.io. This architecture also helps future-proof us as new vulnerability sources and use cases for Clair are found.

    Related Posts

    • Using Quay.io to find vulnerabilities in your container images

    • Optimizing Quay/Clair: Profiling, performance, and efficiency

    • Optimizing Quay/Clair: Database profiling results

    • How to transition from Docker to Podman

    • Remote container development with VS Code and Podman

    • A beginner's guide to Python containers

    Recent Posts

    • What's New in OpenShift GitOps 1.18

    • Beyond a single cluster with OpenShift Service Mesh 3

    • Kubernetes MCP server: AI-powered cluster management

    • Unlocking the power of OpenShift Service Mesh 3

    • Run DialoGPT-small on OpenShift AI for internal model testing

    What’s up next?

    In this activity, discover how you can go from initial app idea to prototype code in as little as five minutes. You'll scaffold a Quarkus application, build a container image locally using Podman Desktop, and then see how to install, run, and test the application in the Developer Sandbox from the command line.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue