Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • See all Red Hat products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Red Hat OpenShift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • See all technologies
    • Programming languages & frameworks

      • Java
      • Python
      • JavaScript
    • System design & architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer experience

      • Productivity
      • Tools
      • GitOps
    • Automated data processing

      • AI/ML
      • Data science
      • Apache Kafka on Kubernetes
    • Platform engineering

      • DevOps
      • DevSecOps
      • Red Hat Ansible Automation Platform for applications and services
    • Secure development & architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & cloud native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • See all learning resources

    E-books

    • GitOps cookbook
    • Podman in action
    • Kubernetes operators
    • The path to GitOps
    • See all e-books

    Cheat sheets

    • Linux commands
    • Bash commands
    • Git
    • systemd commands
    • See all cheat sheets

    Documentation

    • Product documentation
    • API catalog
    • Legacy documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore the Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Your AI agents, evolved: Modernize Llama Stack agents by migrating to the Responses API

December 9, 2025
J William Murdock Linda Alexander
Related topics:
APIsArtificial intelligencePython
Related products:
Red Hat AI

    The world of AI agent development moves fast. Sometimes, that means the tools we rely on evolve in ways that require us to rethink our implementations. If you built agents using Llama Stack's original Agent APIs, you've likely heard they're being deprecated in favor of the more powerful, OpenAI-compatible Responses API. What does that mean for your existing code? How do you make the transition without rebuilding from scratch?

    This post walks you through practical migration strategies, from simple rewrites to sophisticated emulation layers. You will learn how to keep your agents working while you access the new capabilities in the Responses API. Whether you maintain a production system or want to update your agent architecture, you can use these methods to balance effort with benefit.

    Why migrate to Responses API? Understanding the shift

    The original Llama Stack Agent APIs required you to explicitly manage every aspect of agent lifecycle: creating tool groups, defining agents, establishing sessions, and orchestrating turns. Furthermore, those APIs locked you into a relatively narrow and limited concept of an agent and were not particularly well suited for more complex and powerful agentic reasoning tasks. Other API providers had similar constructs, such as the now-deprecated OpenAI's Assistants API. The industry has shifted toward APIs like OpenAI's Responses API or Anthropic's Messages API in which the server provides agentic operations and the client application manages the abstraction layers.

    Llama Stack has adopted the Responses API because of its well-earned popularity as a powerful tool for agentic reasoning. This API lets you avoid juggling multiple tool endpoints and identifiers. You make one API call that handles tool discovery, execution planning, and response synthesis automatically. The Responses API supports advanced patterns, such as multi-step reasoning and automatic tool chaining, that previously required extensive custom orchestration with the legacy APIs.

    Deprecation doesn't mean your agents stop working immediately. The client-side Agent APIs still function, so you have time to plan your migration. You might not need to migrate now, but you might want to consider it if you plan to add more advanced capabilities to your applications.

    Seeing agent modernization in action

    This post references concrete implementations and working code examples. To keep this article readable, the detailed implementations, including runnable examples and step-by-step explanations, are in a companion notebook. This allows you to review the code for each migration strategy works while this post focuses on the concepts and decision factors.

    Follow along with our companion Python Notebook for complete, runnable examples of every migration strategy discussed here.

    The simple approach: Agents in the Python client

    The old server APIs for creating and managing agents are deprecated and will be removed, but the Agent class in the Llama Stack Python client now uses the updated server APIs. It doesn't match the documented behavior of the old agent APIs exactly, but it is a decent approximation. If you want the simplest and easiest migration path, using those structures might make sense. See the Legacy Agent API Example section in the companion notebook for code that works with the latest Python client. This approach works well when:

    • You are using the Llama Stack Python client.
    • The existing implementation meets your needs well, so you don't need to extend it.
    • You don't intend to add much or any more advanced functionality in the future.

    The direct approach: Rewriting with Responses API

    For many applications, the best migration strategy is to rewrite your agent calls using the OpenAI-compatible Responses API directly. The transformation can be surprisingly straightforward, and you can get the same behaviors with simpler, more maintainable code.

    When you use legacy APIs, you typically create agent configurations, establish sessions, and manage turns through multiple API calls. Each step requires careful orchestration and state management. With the Responses API, you can replace this verbose multi-call pattern with a single, elegant API call. The complexity of tool discovery, execution planning, and context management happens automatically. For follow-up questions, you pass the previous response ID to maintain conversation context.

    See the companion notebook's Legacy Agent API Example and Equivalent Responses API Example sections for complete code comparisons that demonstrate this dramatic simplification.

    This direct rewrite approach works best when:

    • You want to take advantage of the Responses API's full capabilities immediately.
    • You don't really need a data structure to contain your agentic reasoning capability.
    • You want to start adopting a more modern approach to agentic reasoning based on agentic functions instead of agent objects.

    The compatibility bridge: Emulating legacy APIs

    A complete rewrite is not always practical. You might have extensive test suites built around the legacy API structure, or your application's architecture deeply depends on the separation between agent creation and execution. As noted earlier, the Agent class in the Llama Stack Python client is the easiest way to do this but a more flexible approach is to build and control your own approximation of the legacy APIs while using newer APIs such as Responses internally.

    Our companion notebook demonstrates this with a LegacyAgent class that emulates the original API structure. The emulator stores agent configurations when created, manages session state, and translates legacy turn creation calls into Responses API calls while maintaining the interface your existing code expects.

    With this emulation layer, your existing code continues to work unchanged while benefiting from the Responses API's improved orchestration. The notebook's Emulating the Legacy Agent API section provides a complete implementation you can adapt to your needs. This approach is particularly valuable when:

    • You have extensive legacy code that would be expensive to rewrite.
    • Multiple teams or systems depend on the current API structure.
    • You want to migrate gradually while maintaining backward compatibility.
    • You want your own implementation of your agent APIs so you can extend and enhance them over time.

    A pragmatic middle ground: Simplified agent classes

    Between complete rewrites and full emulation lies a practical compromise: creating simplified agent classes that capture the essence of your agent patterns without strict API compatibility. This approach lets you modernize incrementally while keeping familiar abstractions.

    The notebook's Adopting a simpler agent class section shows how to create a SimpleExampleAgent class that wraps the Responses API while maintaining the conceptual model of agents and turns. The agent stores its configuration and conversation state, and allows you to create sessions that handle the complexity of multiple-turn conversations internally.

    This pattern provides a clean migration path that:

    • Your legacy code is not so extensive that a modest rewrite is infeasible, but you still want to keep your existing structure mostly intact.
    • You want your own implementation of your agent object so you can extend and enhance them over time.
    • You would prefer to start from a more elegant object structure than you get from the emulator approach above.

    Advanced patterns: Beyond basic migration

    Migrating to the OpenAI-compatible Responses API involves more than maintaining existing functionality. It is an opportunity to implement more sophisticated agent patterns that were cumbersome with the legacy APIs.

    Human-in-the-loop tool approval

    The Responses API makes it straightforward to implement tool approval workflows where agents must get permission before executing sensitive operations. The notebook's Human-in-the-loop tool approval section demonstrates an SimpleExampleAgentWithApproval class that intercepts tool calls, presents them to users for review, and only proceeds with execution after receiving explicit approval.

    This pattern is essential for production systems where certain actions such as database modifications or external API calls require human oversight. The legacy APIs made this cumbersome to implement, but with the Responses API, it becomes a natural extension of the standard flow.

    Model safety

    The Model Safety section in the notebook provides some options for how to use "guardrail" models, or models that detect problematic language like profanity or endorsement of criminal behavior. Such models can be useful for detecting problematic user requests and/or model responses.

    The notebook provides some simple example classes to illustrate how to wrap guardrail capabilities as agent objects. There are many other good sources for information about model guardrail technology such as Meta Llama Guard, NVIDIA NeMo Guardrails, and Guardrails AI.

    Reasoning and acting

    The notebook's ReAct section discusses the ReAct agent object in the Llama Stack Python client, its purpose, and how you can use it. It also explains why you might avoid it with newer models that have similar built-in capabilities. The notebook provides alternatives that use the Responses API, including a ReactAgent class that combines the capabilities of the Responses API with ReAct innovations.

    Authored multi-step workflows

    For complex tasks that benefit from explicit orchestration, the Responses API enables elegant multi-step workflows. The notebook's Authored multi-step agentic flows section shows how breaking a complex query into focused steps dramatically improves accuracy.

    For example, when searching for events at national parks, instead of asking the agent to do everything at once, you can:

    1. Retrieve a list of parks in a specific state.
    2. Use the list to systematically query events at each park.
    3. Synthesize the results into a comprehensive summary.

    This pattern improves accuracy on complex tasks by breaking them into focused, manageable steps. This would have required extensive custom code with the legacy APIs.

    The notebook shows how to do this by calling the Responses API directly, but third-party frameworks like LangGraph and CrewAI are also useful for building more complex agentic workflows. You can implement your applications with those frameworks and then connect them to a Llama Stack server. The LangGraph + Llama Stack example agent and the CrewAI + Llama Stack example agent demonstrate this capability.

    Multi-process architectures

    The notebook's Multi-process architectures section discusses considerations for legacy APIs, which stored agent configuration in the server. In contrast, the Responses API relies on the client to manage the configuration. This adds extra challenges for multi-process applications where configuring and consuming an agent configuration might happen in different processes.

    Your agents, evolved

    Migration from Llama Stack Agent APIs to the OpenAI-compatible Responses API is more than a technical upgrade. It is an opportunity to simplify your code, improve agent capabilities, and position your system for future innovations. Whether you choose a complete rewrite, build compatibility layers, or redesign your architecture, the Responses API provides a stable and maintainable foundation for agent development.

    The deprecation of the legacy APIs isn't a disruption; it's an evolution. Your agents will work better, with less code, fewer edge cases, and more sophisticated reasoning capabilities. The migration strategies covered in this post ensure you can transition at your own pace without sacrificing your existing investments.

    Ready to modernize your agents? Explore the complete examples in our companion notebook to see these migration strategies in action. Start with one agent, choose the approach that fits your needs, and experience firsthand how the Responses API transforms agent development from complex orchestration to elegant simplicity.

    Related Posts

    • Your agent, your rules: A deep dive into the Responses API with Llama Stack

    • How to implement observability with Python and Llama Stack

    • Retrieval-augmented generation with Llama Stack and Python

    • ReAct vs. naive prompt chaining on Llama Stack

    • Exploring Llama Stack with Python: Tool calling and agents

    • Integrate vLLM inference on macOS/iOS with Llama Stack APIs

    Recent Posts

    • Integrate OpenShift Gateway API with OpenShift Service Mesh

    • Your AI agents, evolved: Modernize Llama Stack agents by migrating to the Responses API

    • Semantic anomaly detection in log files with Cordon

    • Advancing low‑bit quantization for LLMs: AutoRound x LLM Compressor

    • JBoss EAP XP 6 is here

    What’s up next?

    Are you a developer looking to integrate artificial intelligence into your Node.js applications? Our AI and Node.js cheat sheet provides a practical overview of key concepts and tools to get you started.

    Get the cheat sheet
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue