
Profiling vLLM Inference Server with GPU acceleration on RHEL
Get a comprehensive guide to profiling a vLLM inference server on a Red Hat Enterprise Linux system equipped with NVIDIA GPUs.
Get a comprehensive guide to profiling a vLLM inference server on a Red Hat Enterprise Linux system equipped with NVIDIA GPUs.
Maximize return on investment in GPU hardware by investing in the appropriate network infrastructure for high-performance distributed training on OpenShift.
This learning path explores running AI models, specifically large language
Learn how to use the OpenShift dynamic accelerator slicer with NVIDIA MIG to split large GPUs into smaller, just-in-time resources.
Move larger models from code to production faster with an enterprise-grade
Learn how to set up insights-mcp, a lightweight, self-hosted MCP server that enables you to connect LLM-based agents to integrate with existing workflows.
Learn how to integrate incident detection with OpenShift Lightspeed, the AI-powered virtual assistant for Red Hat OpenShift.
Learn how to scale machine learning operations (MLOps) with an assembly line approach using configuration-driven pipelines, versioned artifacts, and GitOps.
Explore key updates in Red Hat Ansible Automation Platform 2.6, including the self-service automation portal and Ansible Lightspeed intelligent assistant.
The LLM Compressor 0.8.0 release introduces quantization workflow enhancements, extended support for Qwen3 models, and improved accuracy recovery.
Learn how llm-d's KV cache aware routing reduces latency and improves throughput by directing requests to pods that already hold relevant context in GPU memory.
Learn how to deploy LLMs like Qwen3-Coder-30B-A3B-Instruct on less infrastructure using Red Hat AI Inference Server's LLM Compressor and OpenShift AI.
DeepSeek-V3.2-Exp offers major long-context efficiency via vLLM on Day 0, deploying easily on the latest leading hardware and Red Hat AI platforms.
Implement cost-effective LLM serving on OpenShift AI with this step-by-step guide to configuring KServe's Serverless mode for vLLM autoscaling.
Learn how to deploy Model Context Protocol (MCP) servers on OpenShift using ToolHive, a Kubernetes-native utility that simplifies MCP server management.
Welcome back to Red Hat Dan on Tech, where Senior Distinguished Engineer Dan Walsh dives deep on all things technical, from his expertise in container technologies with tools like Podman and Buildah, to runtimes, Kubernetes, AI, and SELinux! In this episode, Eric Curtin joins to discuss Sorcery AI, a new AI code review tool that has been helping to find bugs, review PRs, and much more!
See how vLLM’s throughput and latency compare to llama.cpp's and discover which tool is right for your specific deployment needs on enterprise-grade hardware.
Deploy DialoGPT-small on OpenShift AI for internal model testing, with step-by-step instructions for setting up runtime, model storage, and inference services.
Walk through how to set up KServe autoscaling by leveraging the power of vLLM, KEDA, and the custom metrics autoscaler operator in Open Data Hub.
See how to use Cursor AI to migrate a Bash test suite to Python, including how to replace functions and create a new PyTest suite.
Discover how llama.cpp API remoting brings AI inference to native speed on macOS, closing the gap between API remoting and native performance.
Learn how to leverage kernel live patching to keep your infrastructure updated and minimize the amount of manual work required.
AI agents are where things get exciting! In this episode of The Llama Stack Tutorial, we'll dive into Agentic AI with Llama Stack—showing you how to give your LLM real-world capabilities like searching the web, pulling in data, and connecting to external APIs. You'll learn how agents are built with models, instructions, tools, and safety shields, and see live demos of using the Agentic API, running local models, and extending functionality with Model Context Protocol (MCP) servers.Join Senior Developer Advocate Cedric Clyburn as we learn all things Llama Stack! Next episode? Guardrails, evals, and more!
Learn how to install Red Hat OpenShift AI to enable an on-premise inference service for Ansible Lightspeed in this step-by-step guide.
Discover the benefits of using Rust for building concurrent, scalable agentic systems, and learn how it addresses the GIL bottleneck in Python.