AI inference

A stylized illustration representing an artificial neural network, set against a dark purple background within a slightly rounded, darker purple square icon shape. The neural network consists of multiple layers of interconnected nodes, depicted as glossy, spherical red orbs. Lines connect these red orbs, forming a complex web. White arrow shapes extend horizontally from the left side, pointing towards the network, suggesting input or data flowing into the system.
Article

Running AI inference on Rebellions ATOM NPU with Red Hat AI

Erwan Gallen +2

Learn how to deploy and serve large language models (LLM) on Rebellions ATOM NPUs using Red Hat OpenShift AI and a certified vLLM container image on the Red Hat AI Inference Server. This post walks through the steps to set up the joint solution between Red Hat and Rebellions, including installing the Node Feature Discovery operator, the Rebellions NPU operator, creating the ATOM hardware profile in OpenShift AI, and creating the vLLM RBLN ServingRuntime.

Featured image for Red Hat OpenShift AI.
Article

Build an enterprise RAG system with OGX

Abdelhamid Soliman

Learn how to transform a simple chatbot into an enterprise RAG application by applying metadata filtering, hybrid search, and neural reranking using the OGX framework in Red Hat OpenShift AI.

Red Hat OpenShift AI
Article

How to route external and local LLMs with Models-as-a-Service

Edward Arthur Quarm Jnr

Discover how Red Hat OpenShift AI 3.4's Models-as-a-Service (MaaS) capability streamlines AI inference by acting as an integrated AI gateway within the platform, providing centralized governance and routing requests to both self-hosted models and external providers.

A stylized illustration representing an artificial neural network, set against a dark purple background within a slightly rounded, darker purple square icon shape. The neural network consists of multiple layers of interconnected nodes, depicted as glossy, spherical red orbs. Lines connect these red orbs, forming a complex web. White arrow shapes extend horizontally from the left side, pointing towards the network, suggesting input or data flowing into the system.
Article

Combining KServe and llm-d for optimized generative AI inference

Ran Pollak +1

Learn how to combine KServe and llm-d to optimize generative AI inference, improve performance, and reduce infrastructure costs. This article demonstrates the integration architecture and provides practical guidance for AI platform teams.