Cedric Clyburn (@cedricclyburn), Senior Developer Advocate at Red Hat, is an enthusiastic software technologist with a background in Kubernetes, DevOps, and container tools. He has experience speaking and organizing conferences including DevNexus, WeAreDevelopers, The Linux Foundation, KCD NYC, and more. Cedric loves all things open-source, and works to make developer's lives easier! Based out of New York.
Explore the benefits of using Kubernetes, Context7, and GitHub MCP servers to diagnose issues, access up-to-date documentation, and interact with repositories.
AI agents are where things get exciting! In this episode of The Llama Stack Tutorial, we'll dive into Agentic AI with Llama Stack—showing you how to give your LLM real-world capabilities like searching the web, pulling in data, and connecting to external APIs. You'll learn how agents are built with models, instructions, tools, and safety shields, and see live demos of using the Agentic API, running local models, and extending functionality with Model Context Protocol (MCP) servers.Join Senior Developer Advocate Cedric Clyburn as we learn all things Llama Stack! Next episode? Guardrails, evals, and more!
Building AI apps is one thing—but making them chat with your documents is next-level. In Part 3 of the Llama Stack Tutorial, we dive into Retrieval Augmented Generation (RAG), a pattern that lets your LLM reference external knowledge it wasn't trained on. Using the open-source Llama Stack project from Meta, you'll learn how to:- Spin up a local Llama Stack server with Podman- Create and ingest documents into a vector database- Build a RAG agent that selectively retrieves context from your data- Chat with real docs like PDFs, invoices, or project files, using Agentic RAGBy the end, you'll see how RAG brings your unique data into AI workflows and how Llama Stack makes it easy to scale from local dev to production on Kubernetes.
Building AI applications is more than just running a model — you need a consistent way to connect inference, agents, storage, and safety features across different environments. That’s where Llama Stack comes in. In this second episode of The Llama Stack Tutorial Series, Cedric (Developer Advocate @ Red Hat) walks through how to:- Run Llama 3.2 (3B) locally and connect it to Llama Stack- Use the Llama Stack server as the backbone for your AI applications- Call REST APIs for inference, agents, vector databases, guardrails, and telemetry- Test out a Python app that talks to Llama Stack for inferenceBy the end of the series, you’ll see how Llama Stack gives developers a modular API layer that makes it easy to start building enterprise-ready generative AI applications—from local testing all the way to production. In the next episode, we'll use Llama Stack to chat with your own data (PDFs, websites, and images) with local models.🔗 Explore MoreLlama Stack GitHub: https://github.com/meta-llama/llama-stackDocs: https://llama-stack.readthedocs.io5.
AI applications are moving fast—but building them at scale is hard. Local prototypes often don’t translate to production, and every environment seems to require a different setup. Llama Stack, an open-source framework from Meta, was created to bring consistency and modularity to generative AI applications. In this first episode of The Llama Stack Tutorial Series, Cedric (Developer Advocate @ Red Hat) explains what Llama Stack is, why it’s being compared to Kubernetes for the AI world, key building blocks, and future episodes that'll dive into real-world use cases with Llama Stack. Explore MoreLlama Stack Tutorial (what we'll be following during the series): https://rh-aiservices-bu.github.io/llama-stack-tutorial Llama Stack GitHub: https://github.com/meta-llama/llama-stackDocs: https://llama-stack.readthedocs.io5.
llm-d optimizes LLM inference at scale with disaggregated prefill/decode, smart caching, and Kubernetes-native architecture for production environments.
Learn about the Podman AI Lab and how you can start using it today for testing and building AI-enabled applications. As an extension for Podman Desktop, the container & cloud-native tool for application developers and administrators, the AI Lab is your one-stop-shop for popular generative AI use cases like summarizers, chatbots, and RAG applications. In addition, from the model catalog, you can easily download and start AI models as local services on your machine. We'll cover this and more, and be sure to try out the Podman AI Lab today!