Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Prevent Python dependency confusion attacks with Thoth

December 21, 2021
Fridolin Pokorny
Related topics:
ContainersData ScienceLinuxKubernetesPythonSecurity
Related products:
Red Hat OpenShiftRed Hat Enterprise Linux

Share:

    Python became popular as a casual scripting language but has since evolved into the corporate space, where it is used for data science and machine learning applications, among others. Because Python is a high-level programming language, developers often use it to quickly prototype applications. Python native extensions make it easy to optimize any computation-intensive parts of the application using a lower-level programming language like C or C++.

    For applications that need to scale, we can use Python Source-to-Image tooling (S2I) to convert a Python application into a container image. That image can then be orchestrated and scaled using cluster orchestrators such as Kubernetes or Red Hat OpenShift. All of these features together provide a convenient platform for solving problems using Python-based solutions that scale, are maintainable, and are easily extensible.

    As a community-based project, the main source of open-source Python packages is the Python Package Index (PyPI). As of this writing, PyPI hosts more than 3 million releases, and the number of releases available continues to grow exponentially. PyPI's growth is an indicator of Python's popularity worldwide.

    However, Python's community-driven dependency resolvers were not designed for corporate environments, and that has led to dependency management issues and vulnerabilities in the Python ecosystem. This article describes some of the risks involved in resolving Python dependencies and introduces Project Thoth's tools for avoiding them.

    Dependency management in Python

    The Python package installer, pip, is a popular tool for resolving Python application dependencies. Unfortunately, pip does not provide a way to manage lock files for application dependencies. Pip resolves dependencies to the latest possible versions at the given point in time, so the resolution is highly dependent on the time when the resolution process was triggered. Dependency problems such as overpinning (requesting too wide a range of versions) frequently introduce issues to the Python application stack.

    To address lock file management issues, the Python community developed tools such as pip-tools, Pipenv, and Poetry. (Our article introducing micropipenv includes an overview of these projects.)

    The Python Package Index is the primary index consulted by pip. In some cases, applications need libraries from other Python package indexes. For these, pip provides the --index-url and --extra-index-url options. Most of the time, there are two primary reasons you might need to install dependencies from Python package sources other than PyPI:

    • Installing specific builds of packages whose features cannot be expressed using wheel tags, or that do not meet manylinux standards; e.g., the AVX2-enabled builds of TensorFlow hosted on the Python package index of the Artificial Intelligence Center of Excellence (AICoE).
    • Installing packages that should not be hosted on PyPI, such as packages specific to one company or patched versions of libraries used only for testing.

    Why Python is vulnerable to dependency confusion attacks

    The pip options --index-url and --extra-index-url provide a way to specify alternate Python package indexes for resolving and installing Python packages. The first option, --index-url, specifies the main Python package index for resolving Python packages, and defaults to PyPI. When you need a second package index, you can include the --extra-index-url option as many times as needed. The resolution logic in pip first uses the main index, then, if the required package or version is not found there, it checks the secondary indexes.

    Thus, although you can specify the order in which indexes are consulted, the configuration is not specified for each package individually. Moreover, the index configuration is applied for transitive dependencies introduced by direct dependencies, as well.

    To bypass this order, application developers can manage requirements with hashes that are checked during installation and resolution to differentiate releases. This solution is unintuitive and error-prone, however. Although we encourage keeping hashes in lock files for integrity checks, they should be managed automatically using the appropriate tools.

    Now, let’s imagine a dependency named foo that a company uses on a private package index. Suppose a different package with the same name is hosted on PyPI. An unexpected glitch—such as a temporary network issue when resolving the company private package index—could lead the application to import the foo package from PyPI in default setups. In the worst case, the package published on PyPI might be a malicious alternative that reveals company secrets to an attacker.

    This issue also applies to pip-tools, Pipenv, and Poetry). Pipenv provides a way to configure a Python package index for a specific package, but it does not enforce the specified configuration. All the mentioned dependency resolution tools treat multiple Python package indexes supplied as mirrors.

    Using Thoth to resolve dependency confusion

    Thoth is a project sponsored by Red Hat that takes a fresh look at the complex needs of Python applications and moves the resolution process to the cloud. Naturally, being cloud-based has its advantages and disadvantages depending on how the tool is used.

    Because Thoth moves dependency resolution to the cloud, a central authority can resolve application requirements. This central authority can be configured with fine-grained control over which application dependencies go into desired environments. For instance, you could handle dependencies in test environments and production environments differently.

    Thoth's resolver pre-aggregates information about Python packages from various Python package indexes. This way, the resolver can monitor Python packages published on PyPI, on the AICoE-specific TensorFlow index, on a corporate Pulp Python index, on the PyTorch CUDA 11.1 index, and on builds for CPU use, which the PyTorch community provides for specific cases. Moreover, the cloud-based resolver introspects the published packages with respect to security or vulnerabilities (see PyPA’s Python Packaging Advisory Database) to additionally guide a secure resolution process.

    Note: Please contact the Thoth team if you wish to register your own Python package index to Thoth.

    Solver rules in Thoth

    A central authority can be configured to allow or block packages or specific package releases that are hosted on the Python package indexes. This feature is called solver rules and is maintained by a Thoth operator.

    Note: See Configuring solver rules in the Thoth documentation for more about this topic. Also check out our YouTube video demonstrating solver rules.

    You can use solver rules to allow the Thoth operator to specify which Python packages or specific releases can be considered during the resolution process, respecting the Python package indexes registered when a request is made to the cloud-based resolver. You can also use solver rules to block the analysis of packages that are considered too old, are no longer supported, or simply don't adhere to company policies.

    Note: Report issues with open source Python packages to help us create new solver rules.

    Strict index configuration

    Another feature in Thoth is the ability to configure a strict Python package index configuration. By default, the recommendation engine considers all the packages published on the indexes it monitors and uses a reinforcement learning algorithm to come up with a set of packages that are considered most appropriate. However, in some situations, Thoth users want to suppress this behavior and explicitly configure Python package indexes for consuming Python packages on their own.

    Note: If you are interested in the strict index configuration, please browse the documentation and watch our video demonstration.

    Prescriptions

    Thoth also supports a mechanism called prescriptions that provides additional, detailed guidelines for package resolution. Prescriptions are analogous to manifests in Kubernetes and OpenShift. A manifest lists the desired state of the cluster, and the machinery behind the cluster orchestrator tries to create and maintain the desired state. Similarly, prescriptions provide a declarative way to specify the resolution process for the particular dependencies and Python package indexes used.

    Note: See the prescriptions section in the Thoth documentation for more about this feature. You can also browse Thoth's prescriptions repository for prescriptions available for open source Python projects. See our article about prescriptions for more insight into this concept.

    Thoth's reinforcement learning algorithm searches for a solution that satisfies application requirements, taking prescriptions into account. This algorithm provides the power to adjust the resolution process in whatever manner users desire. Adjustments to the resolution process can be made using labeled requests to the resolver which can pick prescriptions that match specified criteria written in YAML files. An example can be consuming all the packages solely from one package index (such as a Python package index hosted using Pulp) that hosts packages that can be considered as trusted for Thoth users.

    About Project Thoth

    As part of Project Thoth, we are accumulating knowledge to help Python developers create healthy applications. If you would like to follow project updates, please subscribe to our YouTube channel or follow us on the @ThothStation Twitter handle.

    Last updated: September 20, 2023

    Related Posts

    • Thoth prescriptions for resolving Python dependencies

    • Secure your Python applications with Thoth recommendations

    • AI software stack inspection with Thoth and TensorFlow

    • Customize Python dependency resolution with machine learning

    • Automate dependency analytics with GitHub Actions

    Recent Posts

    • How to enable Ansible Lightspeed intelligent assistant

    • Why some agentic AI developers are moving code from Python to Rust

    • Confidential VMs: The core of confidential containers

    • Benchmarking with GuideLLM in air-gapped OpenShift clusters

    • Run Qwen3-Next on vLLM with Red Hat AI: A step-by-step guide

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue