Red Hat AI

Learn how to overcome compatibility challenges when deploying OpenShift AI and OpenShift Service Mesh 3 on one cluster.

Harness Llama Stack with Python for LLM development. Explore tool calling, agents, and Model Context Protocol (MCP) for versatile integrations.

This beginner's guide to Podman AI Lab walks through setting up Podman Desktop, installing the AI Lab extension, and launching your first RAG chatbot.

Ollama makes it easy for developers to get started with local model experimentation, while vLLM provides a path to reliable, efficient, and scalable deployment.

Learn how to build a Model-as-a-Service platform with this simple demo. (Part 3 of 4)

Catch up on the most popular articles published on Red Hat Developer this year. Get insights on Linux, AI, Argo CD, virtualization, GCC 15, and more.

Learn how RamaLama's integration with libkrun and microVMs enhances AI model isolation, security, and resource efficiency for deployments.

Boost inference performance by up to 2.5X with vLLM's Eagle 3 speculative decoding integration. Discover how in this blog post.

Explore the architecture of a Models-as-a-Service (MaaS) platform and how enterprises can create a secure and scalable environment for AI models. (Part 2 of 4)

Discover how to communicate with vLLM using the OpenAI spec as implemented by the SwiftOpenAI and MacPaw/OpenAI open source projects.

Discover how model compression slashes LLM deployment costs for technical practitioners, covering quantization, pruning, distillation, and speculative decoding.

This article introduces Models-as-a-Service (MaaS) for enterprises, outlining the challenges, benefits, key technologies, and workflows. (Part 1 of 4)

Learn how to evaluate the performance of your LLM deployments with the open source GuideLLM toolkit to optimize cost, reliability, and user experience.

RamaLama's new multimodal feature integrates vision-language models with containers. Discover how it helps developers download and serve multimodal AI models.

Integrate Red Hat AI Inference Server with LangChain to build agentic document processing workflows. This article presents a use case and Python code.

Explore Red Hat Summit 2025 with Dan Russo and Repo, the Red Hat Developer mascot!

Discover how to deploy compressed, fine-tuned models for efficient inference with the new Axolotl and LLM Compressor integration.

Learn how to run vLLM on CPUs with OpenShift using Kubernetes APIs and dive into performance experiments for LLM benchmarking in this beginner-friendly guide.

Discover why Kafka is the foundation behind modular, scalable, and controllable AI automation.

Learn how to secure, observe, and control AI models at scale without code changes to simplify zero-trust deployments by using service mesh.

Enhance your Node.js AI applications with distributed tracing. Discover how to use Jaeger and OpenTelemetry for insights into Llama Stack interactions.

Learn how to deploy a Whisper model on Red Hat AI Inference Server within a RHEL 9 environment using Podman containers and NVIDIA GPUs for speech recognition.

Learn to build a chatbot leveraging vLLM for generative AI inference. This guide provides source code and steps to connect to a Llama Stack Swift SDK server.

Deploy AI at the edge with Red Hat OpenShift AI. Learn to set up OpenShift AI, configure storage, train models, and serve using KServe's RawDeployment.

Red Hat AI

How to deploy OpenShift AI & Service Mesh 3 on one cluster

Exploring Llama Stack with Python: Tool calling and agents

AI meets containers: My first step into Podman AI Lab

Ollama or vLLM? How to choose the right LLM serving tool for your use case

How to build a Model-as-a-Service platform

Our top 10 articles of 2025 (so far)

Supercharging AI isolation: microVMs with RamaLama & libkrun

Fly Eagle(3) fly: Faster inference with vLLM & speculative decoding

Red Hat at WAD 2025 | Labs

Why Models-as-a-Service architecture is ideal for AI models

Integrate vLLM inference on macOS/iOS using OpenAI APIs

The hidden cost of large language models

6 benefits of Models-as-a-Service for enterprises

GuideLLM: Evaluate LLM deployments for real-world inference

Unleashing multimodal magic with RamaLama

Integrate Red Hat AI Inference Server & LangChain in agentic workflows

Repo at Red Hat Summit 2025

Axolotl meets LLM Compressor: Fast, sparse, open

How to run vLLM on CPUs with OpenShift for GPU-free inference

How Kafka improves agentic AI

How to use service mesh to improve AI model security

How to implement observability with Node.js and Llama Stack

Speech-to-text with Whisper and Red Hat AI Inference Server

Integrate vLLM inference on macOS/iOS with Llama Stack APIs

Optimize model serving at the edge with RawDeployment mode

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links