Red Hat AI Inference Server

Featured image for vLLM interference article.
Article

Run Gemma 4 with Red Hat AI on Day 0: A step-by-step guide

Saša Zelenović +4

Learn how to deploy and experiment with Gemma 4, the latest open model family from Google DeepMind. This guide covers text, image, and video input, Mixture-of-Experts architecture, and more. Get started with Red Hat AI Inference Server today.

ai-ml
Article

Vibes, specs, skills, and agents: The four pillars of AI coding

Rich Naszcyniec

Explore the four pillars of AI coding: vibes, secs, skills, and agents, and learn how they can improve the coding quality and reduce the encoding/decoding gap. Discover the benefits of a spec-driven approach and the importance of modular specs and skills in achieving harmony.

Featured image for vLLM interference article.
Article

Integrate Claude Code with Red Hat AI Inference Server on OpenShift

Alexander Barbosa Ayala

Learn how to integrate Anthropic's Claude Code, an agentic coding tool, using Red Hat AI Inference Server on OpenShift. Keep the inference process private on your own infrastructure while retaining the full Claude Code workflow.

Featured image for Red Hat OpenShift AI.
Article

Run Model-as-a-Service for multiple LLMs on OpenShift

Vladimir Belousov

Learn how to deploy multiple large language models (LLMs) behind a single OpenAI-compatible endpoint on OpenShift using a Model-as-a-Service (MaaS) approach. This guide demonstrates how to build an intelligent routing infrastructure that dynamically inspects the request payload and directs traffic based on the specified model field, reducing GPU waste and simplifying application logic.

Featured image for Red Hat OpenShift AI.
Article

Hybrid loan-decisioning with OpenShift AI and Vertex AI

Harshil Sabhnani

Discover a practical solution pattern for building a modern financial application that makes loan decisions using multiple machine learning systems deployed across hybrid environments.

LLM Compressor v0.10.0 is here
Article

LLM Compressor v0.10: Faster compression with distributed GPTQ

Kyle Sayers +2

LLM Compressor v0.10 introduces Distributed Data Parallel (DDP) for faster compression, memory management, and advanced quantization formats. Make model compression workflows more efficient for large language models.

Red Hat AI
Article

Configure NVIDIA Blackwell GPUs for Red Hat AI workloads

Erwan Gallen +4

Learn how to enable the NVIDIA RTX PRO 4500 Blackwell Server Edition on Red Hat AI for compact, power-efficient AI deployments. This hardware offers inference performance without adding unnecessary operational complexity for Red Hat AI users.

Featured image for vLLM interference article.
Article

5 steps to triage vLLM performance

David Whyte-Gray +3

Learn how to improve the performance of your vLLM deployments with a diagnostic workflow that isolates latency issues and server saturation. Discover the key metrics to monitor and techniques to alleviate memory pressure.

Red Hat AI
Article

Estimate GPU memory for LLM fine-tuning with Red Hat AI

Mohib Azam

Learn how to estimate memory requirements for your LLM fine-tuning experiments using Red Hat Training Hub's memory_estimator.py API. This guide covers the memory components, adjusting training setups for specific GPU specifications, and using the memory estimator in your code. Streamline your model fine-tuning process with runtime estimates and automated hyperparameter suggestions.

Featured image for Red Hat OpenShift AI.
Article

Serve and benchmark Prithvi models with vLLM on OpenShift

Michele Gazzetti +3

Learn how to deploy and test an Earth and space model inference service on Red Hat AI Inference Server and Red Hat OpenShift AI. This article includes two self-contained activities, one deploying Prithvi using a traditional Deployment object and another serving the model using KServe and observing Knative scaling.

ai-ml
Article

Optimize PyTorch training with the autograd engine

Vishal Goyal

Understand the PyTorch autograd engine internals to debug gradient flows. Learn about computational graphs, saved tensors, and performance optimization techniques.

Featured image for vLLM interference article.
Article

Practical strategies for vLLM performance tuning

Trevor Royer

Optimize vLLM performance with practical tuning tips. Learn how to use GuideLLM for benchmarking, adjust GPU ratios, and maximize KV cache to improve throughput.

Backstage learning path components feature image
Page

How to learn AI with Red Hat

Whether you're just getting started with artificial intelligence or looking to deepen your knowledge, our hands-on tutorials will help you unlock the potential of AI while leveraging Red Hat's enterprise-grade solutions.

Better front-end Developer Experience
Article

Our top articles for developers in 2025

Colleen Lobner

Take a look back at Red Hat Developer's most popular articles of 2025, covering AI coding practices, agentic systems, advanced Linux networking, and more.