Build a multi-agent supervisor pattern on Red Hat AI
Learn how to isolate AI agents using the supervisor pattern and OpenShell sandboxes. Protect credentials and limit blast radius during incident response.
Learn how to isolate AI agents using the supervisor pattern and OpenShell sandboxes. Protect credentials and limit blast radius during incident response.
Explore a demo of serving a multimodal model (Qwen3-Omni) with vLLM-Omni on a single hardware accelerator.
Learn how to optimize deployment of vLLM for various traffic shapes, including high-concurrency chat, long-context RAG, high-throughput batch, and distributed AI-grid.
Learn about the three optimization levers for distributed AI inference: prefill/decode disaggregation, KV cache strategy, and speculative decoding.
Learn how Red Hat's SastAI initiative, in collaboration with NVIDIA, automates false positive identification in static application security testing (SAST) using generative AI. By employing an agentic, multi-stage research workflow, SastAI reduces noise and improves triage efficiency. Discover the pattern harvesting methodology that greatly enhances the SastAI solution, now offering a tighter solution with better knowledge and reasoning.
Learn how to connect the EvalHub runtime to internal or external model servers using service account tokens, API keys, or custom certificates.
Learn about the five-dimensional design space in modern LLM serving, including tensor, pipeline, expert, data, and context parallelism.
Look inside Red Hat AI Inference on Amazon EKS to understand its core architectural components and Kubernetes resources.
Discover how to use EvalHub and OCI persistence to make your AI evaluation results immutable, content-addressable, and fully auditable.
Explore the mechanics of gradient synchronization in PyTorch distributed training, focusing on MPI primitives like All-Reduce and core techniques like pipeline parallelism, tensor parallelism, and sharded data parallelism.
Learn how speculative decoding can improve the performance of large language models (LLMs) in production by using a small, fast model to generate tokens speculatively and a large model to verify them.
Learn how to use the EvalHub CLI to automate AI evaluations in your CI/CD pipelines. Install the SDK, configure profiles, and set up a production gate.
Learn how llm-d routes each inference request to the GPU that already has the relevant data cached, cutting down on time-to-first-token, and doubling throughput without changing hardware. Discover how Red Hat's stack packages this neatly into a single Kubernetes resource.
Learn how to onboard a custom evaluation framework into EvalHub using one class, one method, and a container image. This guide covers the contract, data structures, and a complete minimal adapter.
Headed to WeAreDevelopers World Congress Europe 2026? Visit the Red Hat Developer booth on-site to speak to our expert technologists.
Learn how to read an existing system collection, understand its threshold logic, and build your own collection that encodes your actual measurement strategy with thresholds that mean something.
Speculators v0.5.0 introduces DFlash support, enabling single-pass draft token generation with block diffusion for more efficient speculative decoding workflows. The release also adds unified online and offline training through vLLM’s native hidden states extraction system, improving training flexibility, version stability, and production readiness.
Red Hat and DeepLearning.AI have released a free hands-on course on the full LLM
Learn how to use Red Hat OpenShift AI's reusable components to build modular AI pipelines, speed up development, and focus on what differentiates your applications.
Learn how evaluation-driven development (EDD) turns AI optimization from an art into an engineering discipline with EvalHub.
Learn about LogAn, an open source tool designed to overcome the limitations of using LLMs to analyze massive volumes of production logs.
A Llama Stack-dependent backend, or any rapidly-evolving upstream project faces a version-drift problem. Explore our no-cost solution that provides early warnings.
Learn how an expert red-teamed an infrastructure using Red Hat AI, OpenClaw, and abliterated models on Red Hat OpenShift on IBM Cloud.
Learn how to transform a simple chatbot into an enterprise RAG application by applying metadata filtering, hybrid search, and neural reranking using the OGX framework in Red Hat OpenShift AI.
Learn how to prevent GPU waste and financial loss by implementing just-in-time (JIT) checkpointing with Kubeflow Training SDK on OpenShift AI.