Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Why some agentic AI developers are moving code from Python to Rust

September 15, 2025
Louis Imershein
Related topics:
Artificial intelligenceProgramming languages & frameworksPythonRust
Related products:
Red Hat AI

Share:

    If you’re looking to build agentic AI solutions that perform well,  one area that doesn’t get discussed nearly often enough is the core runtime environment. For AI developers, that term runtime is almost always synonymous with Python.  

    But as agent-oriented designs take on more performance-critical roles, I’m speaking to more and more developers who are starting to question whether Python is the right long-term solution. It is this question that’s led them to use Rust for some of their projects, and you might want to consider it as well.

    Python's dominance in AI

    The evolution of machine learning among data scientists led to Python becoming the lingua franca of AI. A simple core language with a large ecosystem of libraries well suited for AI tasks (Hugging Face, LangChain, PyTorch, TensorFlow, and so on) means we can spin up a proof-of-concept agent in an afternoon. 

    But there’s a critical question we need to ask as we move from a single cool demo to a production system running hundreds of agents: How does it scale? 

    For CPU-bound tasks in Python, the answer isn’t great. The reason is the Global Interpreter Lock (GIL), and it’s the bottleneck that will force you to rethink your architecture. This is where Rust comes in, offering a path to build concurrent, scalable agentic systems.

    The scaling problem: From 5 to 500 agents

    An agentic framework is inherently concurrent. At any moment, you can have multiple agents performing different tasks: one making an API call, another processing a large text file, and a third is running a simulation.

    With 5 agents, you can often get by. The tasks are spread out, and the performance hiccups aren't critical.

    With 500 agents, any inefficiency is magnified exponentially. If each agent's "thinking" process is a CPU-bound task, a Python-based system will grind to a halt. The GIL ensures that no matter how many CPU cores you have, only one agent can "think" at a time. It’s like having a 16-lane highway that narrows down to a single-lane bridge. 

    Multi-processing in Python, combined with message passing, are typical solutions used to get around the GIL, but there are several advantages to thread-based programming, a big one being reduced complexity. Additionally, as you scale, there is less overhead from instance creation and context switching that will come into play.

    Python's GIL bottleneck: A practical demonstration

    Let’s demonstrate this with a common CPU-bound task. We'll have our "agents" perform a heavy computation (summing prime numbers) using both a single thread and multiple threads. I’ve provided examples so you can try it out yourself.

    Python: Hitting the wall

    Because this task is CPU-bound, the GIL prevents the threads from running in parallel. The multi-threaded version is actually slightly slower due to the overhead of managing the threads. Here’s some code to copy into a file we’ll call cpu_perf.py:

    import time
    import threading
    
    # A simple, CPU-intensive task to simulate an agent "thinking"
    def sum_primes(start, end):
        total = 0
        for num in range(start, end):
            is_prime = True
            for i in range(2, int(num**0.5) + 1):
                if num % i == 0:
                    is_prime = False
                    break
            if is_prime:
                total += num
        # This print is just for verification, can be removed for pure speed tests
        # print(f"Sum for range {start}-{end}: {total}")
    
    LIMIT = 200000
    MIDPOINT = LIMIT // 2
    
    # --- Single-threaded version (1 agent) ---
    start_time = time.perf_counter()
    sum_primes(2,LIMIT)
    end_time = time.perf_counter()
    print(f"Single-threaded (1 agent) time: {end_time - start_time:.4f} seconds\n")
    
    # --- Multi-threaded version (2 agents) ---
    thread1 = threading.Thread(target=sum_primes, args=(2,MIDPOINT))
    thread2 = threading.Thread(target=sum_primes, args=(MIDPOINT,LIMIT))
    
    start_time = time.perf_counter()
    thread1.start()
    thread2.start()
    thread1.join()
    thread2.join()
    end_time = time.perf_counter()
    print(f"Multi-threaded (2 agents) time:  {end_time - start_time:.4f} seconds")

    When I run this on my Fedora system with:

    python cpu_perf.py

    I see the following output:

    Single-threaded (1 agent) time: 0.1408 seconds
    
    Multi-threaded (2 agents) time:  0.1520 seconds

    Splitting the work into two "agents" took even longer than just running one. This is the GIL bottleneck in action.

    Rust: Parallelism delivers concurrency

    If you’re new to Rust, you start out setting up a Rust project using cargo:

    sudo dnf install rust cargo
    cargo new cpu_perf

    You’ll see the output:

    Creating binary (application) `cpu_perf` package
    note: see more `Cargo.toml` keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

    Rust will create a directory called cpu_perf containing a Cargo.toml file and a src directory.  We’re going to create two files within the src directory. Be careful not to overwrite the Cargo.toml file in the cpu_perf directory with the contents of the one that is intended for the src directory.

    Place the following in the file src/main.rs:

    use rayon::prelude::*;
    use std::time::Instant;
    
    // A function to check if a single number is prime
    fn is_prime(n: u32) -> bool {
        if n <= 1 { return false; }
        for i in 2..=(n as f64).sqrt() as u32 {
            if n % i == 0 {
                return false;
            }
        }
        true
    }
    
    const LIMIT: u32 = 200_000;
    
    fn main() {
        // --- Single-threaded version (for comparison) ---
        let start = Instant::now();
        let single_core_sum: u64 = (2..LIMIT).filter(|&n| is_prime(n)).map(|n| n as u64).sum();
        let duration = start.elapsed();
        println!("Sum: {}", single_core_sum);
        println!("Single-threaded time: {:.4?} seconds\n", duration.as_secs_f64());
    
        // --- Multi-threaded version with Rayon ---
        let start = Instant::now();
        
        // Rayon's `par_iter` automatically distributes the work across all available CPU cores
        let multi_core_sum: u64 = (2..LIMIT).into_par_iter().filter(|&n| is_prime(n)).map(|n| n as u64).sum();
        
        let duration = start.elapsed();
        println!("Sum: {}", multi_core_sum);
        println!("Multi-threaded time with Rayon: {:.4?} seconds", duration.as_secs_f64());
    }

    The following commands, when executed from the cpu_perf project directory, will bring in the Rayon crate and compile and run our Rust code:

    cargo add rayon
    cargo run --release

    The resulting output will look something like this:

       Compiling crossbeam-utils v0.8.21
       Compiling rayon-core v1.13.0
       Compiling either v1.15.0
       Compiling crossbeam-epoch v0.9.18
       Compiling crossbeam-deque v0.8.6
       Compiling rayon v1.11.0
       Compiling cpu_perf v0.1.0 (/home/limershe/cpu_perf)
        Finished `release` profile [optimized] target(s) in 2.11s
         Running `target/release/cpu_perf`
    Sum: 1709600813
    Single-threaded time: 0.0107 seconds
    
    Sum: 1709600813
    Multi-threaded time with Rayon: 0.0025 seconds

    As you can see, in the Rust case, we see less overhead with the additional thread. This is because we’ve used Rayon and it has found more cores to put the threads on. Now imagine this scenario with 500 agents on a 64-core machine. The Rust version would continue to scale, while the Python version would not.

    Why GPU isn't the bottleneck (it's the CPU and network)

    So how did Python get so popular for AI workloads in the first place if it’s that much slower at multi-threaded work? 

    It turns out that the GIL isn’t an issue for GPU-intensive tasks. When a Python thread sends a task to a GPU (via CUDA), it releases the GIL. This allows other threads to run on the CPU. 

    The real bottlenecks for Python agents are:

    • CPU-bound tasks: As demonstrated above, any "thinking" or data processing done by the agent is severely limited by the GIL.
    • Network I/O: Asynchronous IO is supported through the asyncio module of the Python Standard Library (stdlib) and awaiting non-blocking function calls is particularly common in the network IO space with Python. While threading works well for I/O in Python, a large number of concurrent network requests can still be managed more efficiently in Rust due to its lower overhead, more robust async runtime (Tokio), and easy to use libraries built on it like Actix Web and Reqwest. 

    A practical I/O example in Python

    This example shows that threading in Python is effective for I/O-bound tasks because the GIL is released during the wait. To see the effects of this, create the file io_perf.py and run it on your system:

    import time
    import threading
    
    def simulate_network_request():
        """Simulates a 1-second network delay."""
        time.sleep(1)
    
    # --- Single-threaded I/O ---
    start_time = time.perf_counter()
    simulate_network_request()
    simulate_network_request()
    end_time = time.perf_counter()
    print(f"Single-threaded I/O took: {end_time - start_time:.4f} seconds")
    
    # --- Multi-threaded I/O ---
    thread1 = threading.Thread(target=simulate_network_request)
    thread2 = threading.Thread(target=simulate_network_request)
    
    start_time = time.perf_counter()
    thread1.start()
    thread2.start()
    thread1.join()
    thread2.join()
    end_time = time.perf_counter()
    print(f"Multi-threaded I/O took:  {end_time - start_time:.4f} seconds")

    When I run this on my Fedora system with:

    python io_perf.py

    The following output is the result:

    Single-threaded I/O took: 2.0006 seconds
    Multi-threaded I/O took:  1.0008 seconds

    As we anticipated, the multi-threaded version is about twice as fast. Remember though, a real agent does both I/O and CPU work. Rust resolves performance issues because it is able to deliver parallelism for both.

    The pragmatic path: A hybrid approach

    For most developers I’ve spoken with who are adopting Rust, their solution isn't to abandon Python entirely. Instead they’re using both environments together. A practical approach is to first prototype in Python using its rich ecosystem to build and test agent logic. Then you can identify bottlenecks by profiling your application to find the most resource-intensive parts. Finally, you rewrite critical components in Rust, and expose them as native Python modules using tools like PyO3, a Rust library that provides bindings for the Python interpreter and can be used to create modules in Rust that can be consumed in Python applications. 

    While it’s certainly possible to use native AI crates with Rust, the hybrid model gives you the best of both worlds: the development speed (and proven AI module support) of Python and the execution speed of Rust, allowing you to build AI agents that are not only intelligent but also highly performant and ready to scale.

    Related Posts

    • How Rust makes Rayon's data parallelism magical

    • How to install multiple versions of Python on Red Hat Enterprise Linux

    • How to deploy a Flask application in Python with Gunicorn

    • A beginner's guide to Python containers

    • How to manage Python dependencies in Ansible execution environments

    • Speed up your Python using Rust

    Recent Posts

    • Why some agentic AI developers are moving code from Python to Rust

    • Confidential VMs: The core of confidential containers

    • Benchmarking with GuideLLM in air-gapped OpenShift clusters

    • Run Qwen3-Next on vLLM with Red Hat AI: A step-by-step guide

    • How to implement observability with Python and Llama Stack

    What’s up next?

    Open source AI for developers introduces and covers key features of Red Hat OpenShift AI, including Jupyter Notebooks, PyTorch, and enhanced monitoring and observability tools, along with MLOps and continuous integration/continuous deployment (CI/CD) workflows.

    Get the e-book
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue