Red Hat AI

A stylized illustration representing an artificial neural network, set against a dark purple background within a slightly rounded, darker purple square icon shape. The neural network consists of multiple layers of interconnected nodes, depicted as glossy, spherical red orbs. Lines connect these red orbs, forming a complex web. White arrow shapes extend horizontally from the left side, pointing towards the network, suggesting input or data flowing into the system.
Article

Intelligent inference scheduling with llm-d on Red Hat AI

Edoardo Vacchi +1

Learn how llm-d routes each inference request to the GPU that already has the relevant data cached, cutting down on time-to-first-token, and doubling throughput without changing hardware. Discover how Red Hat's stack packages this neatly into a single Kubernetes resource.

Red Hat AI
Article

Bring your own evaluation framework to EvalHub

William Caban Babilonia +2

Learn how to onboard a custom evaluation framework into EvalHub using one class, one method, and a container image. This guide covers the contract, data structures, and a complete minimal adapter.

Red Hat AI
Article

Understanding evaluation collections in EvalHub

William Caban Babilonia +2

Learn how to read an existing system collection, understand its threshold logic, and build your own collection that encodes your actual measurement strategy with thresholds that mean something.

Featured image for vLLM interference article.
Article

Speculators v0.5.0: DFlash support and online training

Helen Zhao +2

Speculators v0.5.0 introduces DFlash support, enabling single-pass draft token generation with block diffusion for more efficient speculative decoding workflows. The release also adds unified online and offline training through vLLM’s native hidden states extraction system, improving training flexibility, version stability, and production readiness.

ai-ml
Article

Evaluation-driven development with EvalHub

William Caban Babilonia +1

Learn how evaluation-driven development (EDD) turns AI optimization from an art into an engineering discipline with EvalHub.

Featured image for Red Hat OpenShift AI.
Article

Build an enterprise RAG system with OGX

Abdelhamid Soliman

Learn how to transform a simple chatbot into an enterprise RAG application by applying metadata filtering, hybrid search, and neural reranking using the OGX framework in Red Hat OpenShift AI.

Featured image for agentic AI
Article

How Kagenti ADK simplifies production AI agent management

Legare Kerrison

Learn how Kagenti ADK, an open source toolkit, handles the complexities of managing production AI agents. It aligns with the Linux Foundation's Agent2Agent (A2A) protocol and provides a set of runtime services for easier deployment and operation.

Featured image for vLLM interference article.
Article

Beyond the next token: Why diffusion LLMs are changing the game

Alon Kellner +1

This article discusses the benefits of diffusion LLMs, a revolutionary approach to language models that offers a dynamic tradeoff between accuracy and performance. The article covers the architecture, evolution, and real-world statistics of this technology, including examples of open source models like LLaDA 2.X and Mercury 2.

A stylized illustration representing an artificial neural network, set against a dark purple background within a slightly rounded, darker purple square icon shape. The neural network consists of multiple layers of interconnected nodes, depicted as glossy, spherical red orbs. Lines connect these red orbs, forming a complex web. White arrow shapes extend horizontally from the left side, pointing towards the network, suggesting input or data flowing into the system.
Article

Combining KServe and llm-d for optimized generative AI inference

Ran Pollak +1

Learn how to combine KServe and llm-d to optimize generative AI inference, improve performance, and reduce infrastructure costs. This article demonstrates the integration architecture and provides practical guidance for AI platform teams.