Red Hat AI

Red Hat empowers Developers. Wherever you are, whoever you are, it's your innovations that drive us to go bigger and build better, but we know there's only so much one developer can do. That's why it's our mission to bring you together, to create a community where you can learn new skills, get inspired, and create incredible ideas. We are here to empower you.

Welcome to Red Hat Red Hat Developer brings developers together to learn from each other and create more extraordinary things, faster. We serve the builders. Those who solve problems and create their careers with code. We chart a course for you, giving your career a path and your work purpose. We share what we know to help you solve problems once, build momentum together, and make the world better for all.

As GPU demand grows, idle time gets expensive. Learn how to efficiently manage AI workloads on OpenShift AI with Kueue and the custom metrics autoscaler.

Learn how to implement Llama Stack's built-in guardrails with Python, helping to improve the safety and performance of your LLM applications.

LLM Compressor 0.7.0 release recap

Dipika Sikka +3

August 25, 2025

LLM Compressor 0.7.0 brings Hadamard transforms for better accuracy, mixed-precision FP4/FP8, and calibration-free block quantization for efficient compression.

Explore AI integration in this DevNation Day Santiago 2024 deep dive.

Llama Stack offers an alternative to the OpenAI Responses API, enabling multi-step agents, RAG, and tool use on your own infrastructure with any model.

See how a custom MCP client for Docling transformed unstructured data into usable content, reducing document prep time by over 80%.

Go beyond performance and accuracy. This guide for technical practitioners details how to implement trust, transparency, and safety into your AI workflows.

The Agent2Agent (A2A) protocol is an open standard enabling seamless communication between AI agents. Here are the key things to know before getting started.

llm-d optimizes LLM inference at scale with disaggregated prefill/decode, smart caching, and Kubernetes-native architecture for production environments.

Learn how to optimize LLMs like Granite 3.3 for better performance and efficiency on a single server by using open source tools like LLM Compressor and vLLM.

Explore the evolving LLM post-training datasets, the various formats, and transformation process from structured datasets into token sequences.

Learn how to establish communication with vLLM using Apple Foundation and Alamofire for low-level HTTP interactions in macOS and iOS applications.

Build a simple agentic AI server with the Model Context Protocol (MCP). This guide shows you how to create a Python-based weather tool.

Learn how vLLM outperforms Ollama in high-performance production deployments, delivering significantly higher throughput and lower latency.

Learn how to perform large-scale, distributed batch inference on Red Hat OpenShift AI using the CodeFlare SDK with Ray Data and vLLM.

Enterprise-grade artificial intelligence and machine learning (AI/ML) for

This tutorial shows you how to use the Llama Stack API to implement retrieval-augmented generation for an AI application built with Python.

Headed to JavaZone2025? Visit the Red Hat Developer booth on-site to speak to our expert technologists.

Tackle the AI/ML lifecycle with OpenShift AI. This guide helps you build adaptable, production-ready MLOps workflows, from data preparation to live inference.

Learn how to use the CodeFlare SDK to submit RayJobs to a remote Ray cluster in OpenShift AI.

Learn how PagedAttention solves the memory waste problem of traditional LLM systems by breaking the cache into small, on-demand blocks.

Learn how to deploy Open Platform for Enterprise AI ChatQnA application in OpenShift with AMD Instinct hardware.

Learn about the advantages of prompt chaining and the ReAct framework compared to simpler agent architectures for complex tasks.

Red Hat AI

Red Hat empowers Developers

Welcome to Red Hat Developer

Optimize GPU utilization with Kueue and KEDA

Implement AI safeguards with Python and Llama Stack

LLM Compressor 0.7.0 release recap

Integrate your AI services and cloud-native applications into a single platform

Your agent, your rules: A deep dive into the Responses API with Llama Stack

How I built an agentic application for Docling with MCP

Building trustworthy AI: A developer's guide to production-ready systems

How to enhance Agent2Agent (A2A) security

Getting started with llm-d for distributed AI inference

Optimizing generative AI models with quantization

Introduction to supervised fine-tuning dataset formats

Integrate vLLM inference on macOS/iOS with Alamofire and Apple Foundation

How to build a simple agentic AI server with MCP

Ollama vs. vLLM: A deep dive into performance benchmarking

Batch inference on OpenShift AI with Ray Data, vLLM, and CodeFlare

Open source-powered AI/ML for the hybrid cloud

Retrieval-augmented generation with Llama Stack and Python

Red Hat at JavaZone 2025

From raw data to model serving with OpenShift AI

Submit remote RayJobs to a Ray cluster with the CodeFlare SDK

How PagedAttention resolves memory waste of LLM systems

Deploy ChatQnA on OpenShift with AMD Instinct

ReAct vs. naive prompt chaining on Llama Stack

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links