Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Build and extend containerized applications with Project Thoth

November 25, 2021
Fridolin Pokorny Harshad Reddy Nalla Francesco Murdaca
Related topics:
Artificial intelligenceContainersData ScienceKubernetesPython
Related products:
Red Hat OpenShift

Share:

    Container technologies have created a de facto industry standard for developing, deploying, and shipping applications. Containers make it possible to provide more maintainable and self-sustaining runnable units that can be directly managed using cluster orchestrators such as Kubernetes and Red Hat OpenShift.

    This article is for developers interested in using intelligent package management to control the quality of container images and provide more robust containerized runtime environments. Our discussion is based on Project Thoth for Python, one of the world's most popular programming languages. The ideas we present can be generalized to other language ecosystems.

    Thoth and Python packaging standards

    One of our previous articles discussed tools that allow installations of Python modules following the packaging standards provided by the Python Packaging Authority (PyPA). We'll continue that focus in this article.

    Note: Anaconda is another packaging solution for Python, but it creates environments that don't conform to PyPA standards, so we won't discuss Anaconda in this article.

    Tools such as pip, Pipenv, and Poetry tend to resolve application stacks to the latest possible libraries available to date (respecting specified version ranges), considering the runtime environment they run in. Project Thoth offers more flexibility, proposing packages that meet the developer's quality, security, and performance criteria.

    Because Python is a language of choice for data scientists, a very common environment for data preprocessing, data analysis, and data exploration is Jupyter notebook. In a previous article, we described an extension called jupyterlab-requirements that integrates with the tooling we discuss in this article. The extension helps generate reproducible installations inside notebooks and can consume recommendations by Thoth’s recommender system.

    A smarter way to analyze container images and predictable stacks

    As we've mentioned, container technologies create de facto application standards. Anyone can download prepared container images from container image registries, such as Quay.io, and run the application after some minimal setup. An example of publicly available images is the Jupyter image that can be used to spawn a Jupyter notebook environment. In such a case, the image is pulled and run in a cluster or locally, based on the image and the developer’s use case.

    Container images bundle content that is required to run the application. Project Thoth offers container image analyses that introspect what is present in the container image. Notably, it can extract:

    • Information about the operating system
    • Information about RPM packages that are present in the container image
    • Python packages that are present in the container image and their locations, if multiple virtual environments are available
    • Python interpreters and their available versions
    • Information about the ABI provided
    • Container image metadata as extracted by Skopeo
    • Information about other libraries, such as the CUDA version (GPU software) available

    This information is automatically extracted from container images, ready to be explored by developers as well as consumed by the cloud-based Python resolver, which offers recommendations based on the content available in container images. The container image analysis is run in an OpenShift cluster and the results are computed using the package-extract component.

    Container images for data science

    Thoth additionally provides a set of container images that were identified as suitable for Python developers or data scientists:

    • ps-ip is for images suitable for image processing.
    • ps-cv is for images designed for computer vision.
    • ps-nlp is for images dedicated to natural language processing.

    The project makes it easier for developers to create a containerized environment for running applications without needing to fix dependency issues or provide missing content for the environment.

    Building container images with artificial intelligence

    Project Thoth is associated with Red Hat's Artificial Intelligence Center of Excellence (AICoE) and tightly integrates with AICoE's other tools. AICoE-CI is a service that builds container images using Tekton pipelines under the hood. Once a build is done, the resulting container image is sent to Thoth for analysis. If a container image build fails, AICoE-CI automatically reports the failure to the Thoth backend together with build logs capturing information about the failure. Figure 1 shows how the recommender system gathers information about container images built in AICoE-CI.

    How Thoth gathers information from container image builds done in AICoE-CI.
    Figure 1. How Thoth gathers information from container image builds done in AICoE-CI.

     

     

     

    Thoth uses the combined build information to provide better recommendations for using the container images produced. If developers are running their applications in noncontainer environments, Thoth can offer guidance on software that doesn't have the issues seen in AICoE-CI during container image builds.

     

     

     

    Note: Built container images can be tested using Thoth Dependency Monkey.

     

     

     

    Thoth recommendations for containerized applications

     

     

     

    Open source resolvers, such as pip, Pipenv, and Poetry, resolve Python software packages inside the environments where the resolvers run. The resolution process can be additionally adjusted using environment markers. Thoth’s cloud resolver goes a step further in this area, serving developers who build container images by accounting for runtime environment information even outside the Python packaging standards.

     

     

     

    The resolver considers the results of container image analyses listed earlier, along with available hardware, to guide the resolution process and come up with the best configuration for a given application. Figure 2 shows how the recommender system (the Thoth resolver implemented in a component called adviser) uses the gathered information.

     

     

     

    An illustration of how the recommender system uses the gathered information.
    Figure 2. How the Thoth recommender system uses the gathered information.

     

     

     

    If no container image is used, Thoth’s resolver falls back to the standard resolution process compatible with the Python packaging standards. In both cases, Thoth’s resolution process additionally offers developers guidance about the software stack in use, such as by adjusting environment variables to make sure the environment is correctly set up.

     

     

     

    The recommendation engine uses centralized knowledge about Python software packages as well as software and hardware environments. This knowledge guides the resolution process to satisfy the application's needs. Together with Thoth prescriptions, the container image analyses and post-processed container image build logs provide valuable guidance on all the building blocks of a containerized application (Figure 3).

     

     

     

    Thoth's guidance covers all the building blocks of a containerized application.
    Figure 3. Thoth's guidance covers all the building blocks of a containerized application.

     

     

     

    Use cases for Thoth's cloud resolver and prescriptions

     

     

     

    An example of a problem that was fixed by Thoth’s cloud resolver was an issue reported in the flask-openid package. This package was no longer installable into environments with a recent Setuptools package that dropped 2to3 support. To avoid trying to install flask-openid into environments that have this version of Setuptools, Thoth provides a prescription that checks which Setuptools package is shipped in the used container image. The cloud resolver automatically avoids resolving flask-openid versions that would cause installation failures and looks for another resolution path.

     

     

     

    Another Thoth prescription declares a requirement for the Git RPM package to be present in the container image in order for the GitPython package to operate. If the base container image used to build the application does not offer Git, the resolver again tries to find another resolution path so that the resulting container image will work.

     

     

     

    Another use case is for developers or data scientists using opencv-python or PyTorch in their environment. In that case, Thoth recommends using a pre-built container image with a computer vision stack built from the ps-cv repository.

     

     

     

    Resolving to multiple container images

     

     

     

    With the widespread adoption of containers, applications can be split into multiple container images. These container images create separate entities that can communicate with each other via a specified protocol. To make sure a resolution process can target multiple container images at the same time, the resolver offers labeled requests to the resolution engine. The resolution still takes place for each container image individually, but will keep a context. Within this context, labels can specify how the resolution process should operate to make sure the resolution to multiple containers is done properly and meets desired criteria (e.g., ensuring the proper operation of the communication layer made out of multiple packages that form an application dependency subgraph).

     

     

     

    Extending already available container images

     

     

     

    Yet another specific use case is extending prebuilt container images. An example is a TensorFlow container image used for model training. If a developer wants to extend the container image, let’s say by installing TensorBoard to visualize the trained model, the developer can ask Thoth for an advisory. If the base container image is supplied, Thoth can adjust the resolution process based on already existing Python packages that are available, and pick the most appropriate TensorBoard package that will work inside the container image.

     

     

     

    Feel free to browse the open source database available at our prescriptions repository to find more recommendations for open source Python software packages, including some recommendations not solely dedicated to container images.

     

     

     

    Helping the Python community create healthy applications

     

     

     

    As part of Project Thoth, we are accumulating knowledge about Python packages to help Python developers create healthy and secure applications. We suggest you analyze some of your container images using Thoth. You can submit an analysis request to Thoth’s endpoints, and they will analyze your container image. See an example container image analysis result for the quay.io/thoth-station/ps-cv-pytorch:v0.1.2 container image. (Note that the file size is 7.4MB.)

     

     

     

    To follow updates in the project, please subscribe to the Thoth Station YouTube channel or follow us on Twitter at @ThothStation.

     

     

    Last updated: September 20, 2023

    Related Posts

    • Customize Python dependency resolution with machine learning

    • Managing Python dependencies with the Thoth JupyterLab extension

    • Thoth prescriptions for resolving Python dependencies

    • AI software stack inspection with Thoth and TensorFlow

    • Access more data from your Jupyter notebook

    • Resolve Python dependencies with Thoth Dependency Monkey

    Recent Posts

    • Meet the Red Hat Node.js team at PowerUP 2025

    • How to use pipelines for AI/ML automation at the edge

    • What's new in network observability 1.8

    • LLM Compressor: Optimize LLMs for low-latency deployments

    • How to set up NVIDIA NIM on Red Hat OpenShift AI

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue