Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Continuous learning in Project Thoth using Kafka and Argo

April 26, 2021
Kevin Postlethwait
Related topics:
Event-drivenPython
Related products:
Red Hat Enterprise Linux

    Project Thoth provides Python programmers with information about support for packages they use, dependencies, performance, and security. Right now it focuses on pre-built binary packages hosted on the Python Package Index (PyPI) and other Python indexes. Thoth gathers metrics such as the following:

    • Solvers indicate whether a package can be installed on a particular runtime environment, such as Red Hat Enterprise Linux 8 running Python 3.6.
    • Security indicators turn up vulnerabilities and provide security advice by optimizing a software stack to minimize our computed security vulnerability score.
    • Project meta-information investigates project maintenance status and development process behavior that affects the overall project.
    • Amun and Dependency Monkey look for code quality issues or performance problems across packages.

    Thoth's main role is to advise programmers about different software stacks based on requirements specified by the programmer. The component thoth-adviser then produces a locked software stack.

    This article shows the tools and workflows that let Thoth intelligently respond to programmer requests when it can't find the relevant packages or related information.

    How Thoth updates its knowledge of packages

    In an ideal world, Thoth would have absolute knowledge of all versions of all Python packages. But in reality, users often request advice for a version or package that Thoth has not seen. Figure 1 shows the number of new versions released daily. PyPI alone grows by 500 to 2,000 packages per day; this makes it unlikely that Thoth will have perfect knowledge.

    Python package version
    Figure 1: Python package version releases published to PyPI per day from Oct. 27 to Nov. 2, 2020.

    Thoth is trained to learn from its failures to find packages. When programmers request packages that Thoth doesn't know about, it schedules solvers to add them. The next section describes how Thoth uses messages and investigators to implement continuous learning, adding knowledge of new packages and versions to its database.

    Events and messages for missing packages

    Using a messaging/event platform, Thoth generates an event for each failure to find a package. These events are sent to Kafka, a highly scalable messaging platform maintained by the Apache Foundation. From there, they are directed through Argo, a workflow manager designed to work with Kafka, to a consumer that will try to discover the missing package.

    thoth-messaging acts as a layer over the Confluent Kafka (confluent-kafka-python) package to create Thoth-specific messages and facilitate the creation of a producer or consumer. Support from Confluent offers confidence as to Confluent Kafka's long-term availability. This package, in turn, invokes a popular C extension called librdkafka.

    Investigators and workflows

    The core of continuous learning in Thoth is thoth-investigator, a Kafka message consumer that handles all message subscriptions sent through Confluent Kafka by the thoth-messaging library. The logic behind each consumer can be as simple as a remote function call to schedule a workflow; it can also involve more complex logic that transforms message contents or opens issues and pull requests on different Git services.

    By deploying thoth-investigator in one namespace, Thoth is able to rely on a single component that has access to the other namespaces. This reduces the need to use role binding so that different components can access different namespaces.

    Continuous learning

    This section describes two common failures that cause Thoth's indicators to look for new information.

    An adviser fails because it lacks the knowledge needed to provide advice

    When a user requests advice, an adviser workflow is triggered depending on the integration used to interact with Thoth (see Thoth integrations). In this example, we'll use Kebechet, the GitHub app integration. When the workflow ends, Thoth provides advice to the programmer in the form specific to the integration: in this case, a check run shown in a GitHub pull request such as this example.

    When Thoth fails because knowledge is missing, the logs indicate which package is missing. Using the workflow shown in Figure 2, Thoth discovers the missing information and generates the advice to return to the programmer.

    The workflow when an advisor has to discover missing information
    Figure 2. The workflow when an advisor has to discover missing information.

    A simplified view of the workflow follows.

    1. The adviser workflow sends an UnresolvedPackageMessage message to thoth-investigator.
    2. thoth-investigator consumes the event messages and schedule solvers to learn about missing information.
    3. During the solver workflow, the investigator receives aSolvedPackageMessage message to indicate that the investigator should schedule the next workflows (i.e., security indicators).
    4. The solver workflow sends AdviserReRunMessages, which contains the information for the investigator to reschedule the advice that failed.

    Thoth's security indicator workflow fails because a package or source distribution is missing

    Thoth generates alerts if it has not performed security indicator (SI) analysis or if a new package becomes available. The investigator consumes these messages and starts new SI workflows. When a package's source code is available to Thoth, the system runs the SIs and stores the generated data. However, sometimes PyPI has only binary package releases available. Without a source distribution, Thoth cannot do static code analysis.

    In such cases, the system sends a message back to the investigator, which sets a flag in the database to indicate that security information is missing. Thoth stores these errors so that workflows fail only once.

    Similarly, the investigator updates the corresponding flag in Thoth's database after receiving a MissingVersionMessage message indicating that a package version has gone missing. Thoth will no longer use this package version when it gives advice.

    Figure 3 shows the workflow for missing security information.

    The workflow to handle missing security information
    Figure 3. The workflow to handle missing security information.

    Conclusion

    With a constantly evolving supply of information, providing guarantees to users is difficult. Thoth aggregates information as needed through event-driven learning by using event streams (in Kafka) to trigger complex container workflows (in Argo). Both technologies are highly extensible, so new features are easy to add.

    Last updated: August 11, 2023

    Recent Posts

    • Every layer counts: Defense in depth for AI agents with Red Hat AI

    • Fun in the RUN instruction: Why container builds with distroless images can surprise you

    • Trusted software factory: Building trust in the agentic AI era

    • Build a zero trust AI pipeline with OpenShift and RHEL CVMs

    • Red Hat Hardened Images: Top 5 benefits for software developers

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.