Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Why you should use Fromager to build your Python dependency trees from source

Fromager: A Tool for Building Python Dependency Trees from Source

June 25, 2026
Lalatendu Mohanty Rohan Devasthale
Related topics:
Application development and delivery
Related products:
Red Hat Enterprise Linux

    If you've ever tried to rebuild an entire Python dependency tree from source, you know it's not a simple process. Packages depend on other packages, which have build-time dependencies, which also have their own dependencies, and so on. It's easy to get the order wrong, which eventually causes the build to fail. Fromager solves this problem with two key artifacts: build-order.json and graph.json. Together, these artifacts turn complex Python dependency trees into a reproducible, auditable build pipeline.

    But there are already projects and tools that build Python wheels, so the obvious question is why we need Fromager. In this blog post, we delve into the motivations behind Fromager and explain its unique role in the ecosystem.

    The trust problem

    When you run pip install numpy, a pre-built binary lands on your machine, and it works. That's the happy path and for most development workflows, it's fine. But there are environments where the happy path is not enough.

    A pre-built Python wheel is a binary artifact compiled by someone else's CI system, from source code you may not have reviewed, with build tools you didn't choose. If you work in an environment where you need to audit every binary you deploy (for example, financial services, government, defense, regulated AI), then this is a non-starter. And after a string of supply chain attacks on Python packages (typosquatting, compromised maintainer accounts, malicious build hooks), this isn't just a theoretical concern any more.

    To maintain a trustworthy environment, you must be able to prove that every binary in your environment was built from auditable source code. Along with auditability, you also need absolute reproducibility: the ability to build the exact same set of wheels again tomorrow. This is what Fromager does. It rebuilds entire Python dependency trees from source and produces two artifacts that make the process transparent: build-order.json and graph.json.

    But security is only one piece. Organizations also end up building from source for other reasons:

    • Custom hardware: If you need to link against a specific accelerator SDK i.e. a particular CUDA version, ROCm or you're targeting an architecture nobody publishes wheels for, you have to build from source with the right flags.
    • Regulatory compliance: Some frameworks require provenance: a documented chain from source code to deployed binary. A wheel downloaded from PyPI has no such chain.
    • Reproducibility across platforms: When you manage multiple variants (x86_64, aarch64, CPU-only, CUDA), you need the same packages built consistently across all of them. Grabbing whatever wheels happen to exist on PyPI doesn't give you that.

    These requirements show up especially in the AI/ML ecosystem, where dependency trees are large, native code is everywhere, and hardware diversity is becoming the norm.

    Why pip install --no-binary :all: doesn't work

    The pip module can build from source. Pass --no-binary :all: and it compiles every package instead of downloading wheels. But pip doesn't solve the bootstrapping problem.

    For example, to build numpy from source, pip needs setuptools. To build setuptools from source, pip needs setuptools. This is a circular dependency! To break this cycle, pip downloads pre-built wheels for build tools from PyPI. It must do this because its architecture assumes that build tools are already available as binaries.

    If your requirement is that every binary must be built from source, then pip cannot satisfy it. The build tools themselves are a gap. This bootstrapping challenge is well-known in the Python packaging community and has been discussed since 2020 and remains an open problem.

    Also there's no way to customize build flags per package. You can set global environment variables but you can't, for instance, specify to build numpy with CUDA support and scipy with OpenBLAS. There's no patching mechanism if upstream source doesn't compile on your platform. There's no cross-compilation support. And there's no way to manage multiple platform variants (CPU, CUDA, ROCm) from a single build pipeline.

    Common alternatives have similar issues. The uv tool has the same limitations as pip, and conda doesn't build from source at all. Bazel can auto-generate targets for PyPI dependencies, but primarily consumes pre-built wheels. The ability to build from source sdists with native extensions is still an open feature request. It's designed for monorepos where you control your own code, not for bootstrapping an upstream ecosystem of packages entirely from source.

    Why not build everything from source?

    Nix and Gentoo do build everything from source, including build tools. They solve the bootstrapping problem. But they solve it by requiring a manually written build specification for every package (a Nix derivation or a Gentoo ebuild).

    Spack, widely used in HPC and scientific computing, takes the same approach: Every package needs a handwritten package.py recipe that specifies how to fetch, configure, and build it.

    For a Python dependency tree containing hundreds of packages that change frequently, writing and maintaining a build recipe for literally every package just doesn't scale.

    What Fromager does

    Here's the thing: Python already has a standardized way for packages to declare build requirements: PEP 517 and pyproject.toml. Every well-maintained Python package already says what it needs to build. The information is there. It just needs a tool that can use it, all the way down.

    To demonstrate how Fromager's bootstrap process works, let's look at stevedore, a library for managing dynamic plugins maintained by the OpenStack community.

    fromager bootstrap stevedore

    This single command kicks off a multi-step orchestration: 1. Resolves stevedore to a specific version 2. Discovers that stevedore needs setuptools and pbr to build 3. Discovers that stevedore needs pbr at install time 4. Discovers that pbr itself needs setuptools to build 5. Builds setuptools first (it has no dependencies) 6. Builds pbr next (its dependency setuptools is now available) 7. Builds stevedore last (both setuptools and pbr are available)

    In the end, Fromager generates two files in the working directory: build-order.json and graph.json.

    The recipe: build-order.json

    The artifact build-order.json is a simple ordered list, a recipe that defines the order in which the packages must be built:

    [
      {
        "req": "setuptools>=40.8.0",
        "constraint": "",
        "dist": "setuptools",
        "version": "75.8.2",
        "prebuilt": false,
        "source_url": "https://pypi.org/...",
        "source_url_type": "sdist"
      },
      {
        "req": "pbr>=1.0",
        "constraint": "",
        "dist": "pbr",
        "version": "6.1.1",
        "prebuilt": false,
        "source_url": "https://pypi.org/...",
        "source_url_type": "sdist"
      },
      {
        "req": "stevedore",
        "constraint": "",
        "dist": "stevedore",
        "version": "5.4.1",
        "prebuilt": false,
        "source_url": "https://pypi.org/...",
        "source_url_type": "sdist"
      }
    ]

    The sequence within this file is critical. The first is setuptools because both pbr and stevedore need it to build. Next is pbr because stevedore depends on it. Each entry records exactly where the source came from (source_url), what type it was, and whether it was built from source or used as-is (prebuilt). This build-order.json file can be added into version control, and it is your auditable record of what went into the build.

    The complete map: graph.json

    While build-order.json tells you what to build and when, graph.json tells you why. It captures every relationship between every package:

      {
      "": {
        "download_url": "",
        "pre_built": false,
        "version": "0",
        "canonicalized_name": "",
        "edges": [
          {
            "key": "stevedore==5.4.1",
            "req_type": "toplevel",
            "req": "stevedore"
          }
        ]
      },
      "stevedore==5.4.1": {
        "download_url": "https://pypi.org/...",
        "pre_built": false,
        "version": "5.4.1",
        "canonicalized_name": "stevedore",
        "edges": [
          {
            "key": "pbr==6.1.1",
            "req_type": "install",
            "req": "pbr!=2.1.0,>=2.0.0"
          },
          {
            "key": "setuptools==75.8.2",
            "req_type": "build-system",
            "req": "setuptools>=40.8.0"
          }
        ]
      },
      "pbr==6.1.1": {
        "download_url": "https://pypi.org/...",
        "pre_built": false,
        "version": "6.1.1",
        "canonicalized_name": "pbr",
        "edges": [
          {
            "key": "setuptools==75.8.2",
            "req_type": "build-system",
            "req": "setuptools>=40.8.0"
          }
        ]
      },
      "setuptools==75.8.2": {
        "download_url": "https://pypi.org/...",
        "pre_built": false,
        "version": "75.8.2",
        "canonicalized_name": "setuptools",
        "edges": []
      }
    }
    

    The graph reveals several distinct edge types, such as top-level, install, and build-system. These categories are significant because they define the specific nature of each dependency relationship. The stevedore module needs pbr at install time because it is run-time dependency, but it needs setuptools only at build time. The setuptools module itself has no dependencies. It's a leaf node, the foundation everything else rests upon. The root node in the form of an empty string represents your original request. From there, you can trace every path through the tree to understand exactly how each package ended up in your build. Figure 1 shows a visual representation of graph.json.

    Visual representation of graph.json for the stevedore Python library.
    Figure 1: Visual representation of graph.json for the stevedore Python library

    Fromager takes a list of top-level packages you want and discovers, resolves, and builds the entire dependency tree from source, including all build tools, all the way down to the bottom.

    Key features of Fromager

    Fromager works entirely within the Python ecosystem. It reads pyproject.toml, calls PEP 517 hooks, consumes packages from PyPI, and produces standard wheels.

    Along with that, it provides several advantages.

    Fromager distinguishes build dependencies from install dependencies

    The dependency graph has two fundamentally different classifications of dependencies:

    • Build dependencies: Packages needed to compile the software (setuptools, Cython, wheel). These must be fully built and installed before the package that needs them can be compiled.
    • Install dependencies: Packages needed to use the software at runtime (numpy, requests). These are discovered after the build phase completes, by reading metadata from the resulting wheel or sdist.

    Build dependencies must be processed depth-first and compiled immediately. Install dependencies can be deferred. Fromager tracks this because different categories of build requirements must be installed in a specific sequence (you need the build system before you can call PEP 517 hooks to discover what else is needed).

    Fromager discovers dependencies automatically

    Fromager doesn't require manually written build specifications. For each package, it:

    1. Reads pyproject.toml to find build system requirements
    2. Installs those into an isolated build environment
    3. Calls PEP 517's get_requires_for_build_wheel() hook to discover additional build requirements
    4. Recursively applies the same process to every dependency it discovers

    The build order emerges from traversal. It's an iterative depth-first loop over an explicit stack, not from a human writing it down.

    Fromager customizable without forking

    This is one of the biggest strengths of Fromager. When you are building hundreds of packages, you inevitably come across situations where the Python project doesn't follow the standard repository structure or other Python standards, which would normally cause issues during the build. For example a package may need a specific environment variable, a patch to its build system, or a pinned version that differs from what PyPI advertises.

    Fromager handles this with a layered override system. Common adjustments, environment variables, patches, version pins go in a YAML settings file that's specific to the package and doesn't require any coding.

    While standard Python packaging currently lacks awareness of accelerators, Fromager was specifically designed to fill this void.

    A --variant flag lets you build for different targets (CPU, CUDA, ROCm) using the same pipeline with different settings per variant. For packages that don't follow Python packaging standards at all, you can write a plugin to handle the edge case.

    The system evolves over time. As common patterns emerge across packages, they get promoted into Fromager itself, so what started as a plugin becomes a YAML setting.

    Fromager separates discovery from building

    This is where the supply chain security story comes together. Fromager can split the build process into two stages separated by a data-only boundary:

    Stage 1 (the discovery phase) resolves versions, queries package indexes, and runs PEP 517 hooks. Because those hooks are arbitrary Python code defined by each package's build backend, this stage executes untrusted upstream code and requires network access. It produces:

    • graph.json: Every package, version, and typed dependency edge
    • build-order.json: A topologically sorted build sequence
    • Requirements: Cached requirement files for each package
    • Source: Downloaded source archives

    Stage 2 (the build phase) compiles packages using only Stage 1 artifacts. It uses cached requirement files instead of re-running discovery hooks, and builds from pre-downloaded sources. On Linux, when network isolation is enabled, build commands run inside network namespaces (unshare --net) with no outbound connectivity. This deterministic approach allows builds to function in air-gapped environments without PyPI access. It ensures that the same packages are built from identical sources in the exact same order every time, with the build-order.json file serving as the formal contract between discovery and execution.

    Between the two stages, you can inspect everything, diff graph.json to see what changed, review it in a pull request, or transfer the artifacts to an air-gapped system and build with no network access at all. No compilation happens until you're satisfied with the plan.

    fromager build-sequence build-order.json

    Fromager scales

    For large dependency trees, serial building is slow. Fromager can schedule concurrent builds using topology-aware parallelism i.e. packages that don't depend on each other for building can be compiled simultaneously. Resource-intensive packages like PyTorch can be marked for exclusive builds so they don't run in parallel with other compilations.

    When onboarding new packages, you don't know how many will fail to build from source. Fromager's test mode continues after failures by substituting a pre-built binary for any package that fails, so downstream packages can still be built. At the end, it produces a JSON report of every failure classified by type , giving you a clear map of what still needs fixing.

    Try Fromager

    Most Python packaging tools treat dependency resolution as a solved problem and defer it to pip but pip resolves and installs in one pass, using pre-built wheels from PyPI. Standard tools like pip fall short when you require full builds from source for security audits, isolated network environments, or specialized platform targets.

    When you can see every dependency, trace every relationship, and replay every build, you move from "it works on my machine" to the ability to prove what exactly is in your environment. For teams that need that level of assurance, Fromager turns Python's complex packaging into something transparent and reproducible.

    Fromager is an open source project, and is actively developed.

    • GitHub: python-wheel-build/fromager
    • Documentation: fromager.readthedocs.io

    Related Posts

    • Build trusted Python containers with Project Hummingbird and Calunga

    • How to implement observability with Python and Llama Stack

    • Python packaging for RHEL 9 & 10 using pyproject RPM macros

    • How to manage Python dependencies in Ansible execution environments

    • A beginner's guide to Python containers

    Recent Posts

    • Why you should use Fromager to build your Python dependency trees from source

    • Optimizing distributed AI inference: Advanced deployment patterns

    • Beyond regex: Harvesting security logic with LLMs

    • Build a Red Hat Enterprise Linux EUS image with image-builder CLI

    • Connect EvalHub to protected production model servers

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.