How platform engineering accelerates enterprise AI adoption
Explore how platform engineering, OpenShift, and Developer Hub create a governed, repeatable, and scalable foundation for enterprise AI.
Explore how platform engineering, OpenShift, and Developer Hub create a governed, repeatable, and scalable foundation for enterprise AI.
Building AI applications is more than just running a model — you need a consistent way to connect inference, agents, storage, and safety features across different environments. That’s where Llama Stack comes in. In this second episode of The Llama Stack Tutorial Series, Cedric (Developer Advocate @ Red Hat) walks through how to:- Run Llama 3.2 (3B) locally and connect it to Llama Stack- Use the Llama Stack server as the backbone for your AI applications- Call REST APIs for inference, agents, vector databases, guardrails, and telemetry- Test out a Python app that talks to Llama Stack for inferenceBy the end of the series, you’ll see how Llama Stack gives developers a modular API layer that makes it easy to start building enterprise-ready generative AI applications—from local testing all the way to production. In the next episode, we'll use Llama Stack to chat with your own data (PDFs, websites, and images) with local models.🔗 Explore MoreLlama Stack GitHub: https://github.com/meta-llama/llama-stackDocs: https://llama-stack.readthedocs.io5.
Learn how to optimize PyTorch code with minimal effort using torch.compile, a just-in-time compiler that generates optimized kernels automatically.
Learn how a pattern engine plus a small LLM can perform production-grade failure analysis on low-cost hardware, slashing inference costs by over 99%.
AI applications are moving fast—but building them at scale is hard. Local prototypes often don’t translate to production, and every environment seems to require a different setup. Llama Stack, an open-source framework from Meta, was created to bring consistency and modularity to generative AI applications. In this first episode of The Llama Stack Tutorial Series, Cedric (Developer Advocate @ Red Hat) explains what Llama Stack is, why it’s being compared to Kubernetes for the AI world, key building blocks, and future episodes that'll dive into real-world use cases with Llama Stack. Explore MoreLlama Stack Tutorial (what we'll be following during the series): https://rh-aiservices-bu.github.io/llama-stack-tutorial Llama Stack GitHub: https://github.com/meta-llama/llama-stackDocs: https://llama-stack.readthedocs.io5.
The rise of large language models (LLMs) has opened up exciting possibilities for developers looking to build intelligent applications. However, the process of adapting these models to specific use cases can be difficult, requiring deep expertise and substantial resources. In this talk, we'll introduce you to InstructLab, an open-source project that aims to make LLM tuning accessible to developers and data scientists of all skill levels, on consumer-grade hardware.
In this video, we'll explore how InstructLab's innovative approach combines collaborative knowledge curation, efficient data generation, and instruction training to enable developers to refine foundation models for specific use cases. Through a live demonstration, you'll learn how IBM Research has partnered with Red Hat to simplify the process of enhancing LLMs with new knowledge and skills for targeted applications. Join us to explore how InstructLab is making LLM tuning more accessible, empowering developers to harness the power of AI in their projects.
Headed to Devoxx Belgium? Visit the Red Hat Developer booth on-site to speak to our expert technologists.
Red Hat empowers Developers. Wherever you are, whoever you are, it's your innovations that drive us to go bigger and build better, but we know there's only so much one developer can do. That's why it's our mission to bring you together, to create a community where you can learn new skills, get inspired, and create incredible ideas. We are here to empower you.
Welcome to Red Hat Red Hat Developer brings developers together to learn from each other and create more extraordinary things, faster. We serve the builders. Those who solve problems and create their careers with code. We chart a course for you, giving your career a path and your work purpose. We share what we know to help you solve problems once, build momentum together, and make the world better for all.
As GPU demand grows, idle time gets expensive. Learn how to efficiently manage AI workloads on OpenShift AI with Kueue and the custom metrics autoscaler.
Learn how to implement Llama Stack's built-in guardrails with Python, helping to improve the safety and performance of your LLM applications.
LLM Compressor 0.7.0 brings Hadamard transforms for better accuracy, mixed-precision FP4/FP8, and calibration-free block quantization for efficient compression.
Gain familiarity with LLM tools by learning how to easily create them with Apache Camel. In this lab, you'll create an agent that uses live feeds to assist clients.
Explore AI integration in this DevNation Day Santiago 2024 deep dive.
Llama Stack offers an alternative to the OpenAI Responses API, enabling multi-step agents, RAG, and tool use on your own infrastructure with any model.
See how a custom MCP client for Docling transformed unstructured data into usable content, reducing document prep time by over 80%.
Go beyond performance and accuracy. This guide for technical practitioners details how to implement trust, transparency, and safety into your AI workflows.
The Agent2Agent (A2A) protocol is an open standard enabling seamless communication between AI agents. Here are the key things to know before getting started.
llm-d optimizes LLM inference at scale with disaggregated prefill/decode, smart caching, and Kubernetes-native architecture for production environments.
Learn how to optimize LLMs like Granite 3.3 for better performance and efficiency on a single server by using open source tools like LLM Compressor and vLLM.
Explore the evolving LLM post-training datasets, the various formats, and transformation process from structured datasets into token sequences.
Learn how to establish communication with vLLM using Apple Foundation and Alamofire for low-level HTTP interactions in macOS and iOS applications.
Build a simple agentic AI server with the Model Context Protocol (MCP). This guide shows you how to create a Python-based weather tool.
This article demonstrates how to implement GPU-based autoscaling on the Red Hat OpenShift platform using custom metrics autoscaler (KEDA) and Prometheus.
Learn how vLLM outperforms Ollama in high-performance production deployments, delivering significantly higher throughput and lower latency.