Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Continuous learning in Project Thoth using Kafka and Argo

April 26, 2021
Kevin Postlethwait
Related topics:
Event-DrivenPython
Related products:
Red Hat Enterprise Linux

Share:

    Project Thoth provides Python programmers with information about support for packages they use, dependencies, performance, and security. Right now it focuses on pre-built binary packages hosted on the Python Package Index (PyPI) and other Python indexes. Thoth gathers metrics such as the following:

    • Solvers indicate whether a package can be installed on a particular runtime environment, such as Red Hat Enterprise Linux 8 running Python 3.6.
    • Security indicators turn up vulnerabilities and provide security advice by optimizing a software stack to minimize our computed security vulnerability score.
    • Project meta-information investigates project maintenance status and development process behavior that affects the overall project.
    • Amun and Dependency Monkey look for code quality issues or performance problems across packages.

    Thoth's main role is to advise programmers about different software stacks based on requirements specified by the programmer. The component thoth-adviser then produces a locked software stack.

    This article shows the tools and workflows that let Thoth intelligently respond to programmer requests when it can't find the relevant packages or related information.

    How Thoth updates its knowledge of packages

    In an ideal world, Thoth would have absolute knowledge of all versions of all Python packages. But in reality, users often request advice for a version or package that Thoth has not seen. Figure 1 shows the number of new versions released daily. PyPI alone grows by 500 to 2,000 packages per day; this makes it unlikely that Thoth will have perfect knowledge.

    Python package version
    Figure 1: Python package version releases published to PyPI per day from Oct. 27 to Nov. 2, 2020.

    Thoth is trained to learn from its failures to find packages. When programmers request packages that Thoth doesn't know about, it schedules solvers to add them. The next section describes how Thoth uses messages and investigators to implement continuous learning, adding knowledge of new packages and versions to its database.

    Events and messages for missing packages

    Using a messaging/event platform, Thoth generates an event for each failure to find a package. These events are sent to Kafka, a highly scalable messaging platform maintained by the Apache Foundation. From there, they are directed through Argo, a workflow manager designed to work with Kafka, to a consumer that will try to discover the missing package.

    thoth-messaging acts as a layer over the Confluent Kafka (confluent-kafka-python) package to create Thoth-specific messages and facilitate the creation of a producer or consumer. Support from Confluent offers confidence as to Confluent Kafka's long-term availability. This package, in turn, invokes a popular C extension called librdkafka.

    Investigators and workflows

    The core of continuous learning in Thoth is thoth-investigator, a Kafka message consumer that handles all message subscriptions sent through Confluent Kafka by the thoth-messaging library. The logic behind each consumer can be as simple as a remote function call to schedule a workflow; it can also involve more complex logic that transforms message contents or opens issues and pull requests on different Git services.

    By deploying thoth-investigator in one namespace, Thoth is able to rely on a single component that has access to the other namespaces. This reduces the need to use role binding so that different components can access different namespaces.

    Continuous learning

    This section describes two common failures that cause Thoth's indicators to look for new information.

    An adviser fails because it lacks the knowledge needed to provide advice

    When a user requests advice, an adviser workflow is triggered depending on the integration used to interact with Thoth (see Thoth integrations). In this example, we'll use Kebechet, the GitHub app integration. When the workflow ends, Thoth provides advice to the programmer in the form specific to the integration: in this case, a check run shown in a GitHub pull request such as this example.

    When Thoth fails because knowledge is missing, the logs indicate which package is missing. Using the workflow shown in Figure 2, Thoth discovers the missing information and generates the advice to return to the programmer.

    The workflow when an advisor has to discover missing information
    Figure 2. The workflow when an advisor has to discover missing information.

    A simplified view of the workflow follows.

    1. The adviser workflow sends an UnresolvedPackageMessage message to thoth-investigator.
    2. thoth-investigator consumes the event messages and schedule solvers to learn about missing information.
    3. During the solver workflow, the investigator receives aSolvedPackageMessage message to indicate that the investigator should schedule the next workflows (i.e., security indicators).
    4. The solver workflow sends AdviserReRunMessages, which contains the information for the investigator to reschedule the advice that failed.

    Thoth's security indicator workflow fails because a package or source distribution is missing

    Thoth generates alerts if it has not performed security indicator (SI) analysis or if a new package becomes available. The investigator consumes these messages and starts new SI workflows. When a package's source code is available to Thoth, the system runs the SIs and stores the generated data. However, sometimes PyPI has only binary package releases available. Without a source distribution, Thoth cannot do static code analysis.

    In such cases, the system sends a message back to the investigator, which sets a flag in the database to indicate that security information is missing. Thoth stores these errors so that workflows fail only once.

    Similarly, the investigator updates the corresponding flag in Thoth's database after receiving a MissingVersionMessage message indicating that a package version has gone missing. Thoth will no longer use this package version when it gives advice.

    Figure 3 shows the workflow for missing security information.

    The workflow to handle missing security information
    Figure 3. The workflow to handle missing security information.

    Conclusion

    With a constantly evolving supply of information, providing guarantees to users is difficult. Thoth aggregates information as needed through event-driven learning by using event streams (in Kafka) to trigger complex container workflows (in Argo). Both technologies are highly extensible, so new features are easy to add.

    Last updated: August 11, 2023

    Recent Posts

    • GuideLLM: Evaluate LLM deployments for real-world inference

    • Unleashing multimodal magic with RamaLama

    • Integrate Red Hat AI Inference Server & LangChain in agentic workflows

    • Streamline multi-cloud operations with Ansible and ServiceNow

    • Automate dynamic application security testing with RapiDAST

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue