Reduce LLM benchmarking costs with oversaturation detection
Oversaturation in LLM benchmarking can lead to wasted machine time and skewed performance metrics. Find out how one Red Hat team tackled the challenge.
Oversaturation in LLM benchmarking can lead to wasted machine time and skewed performance metrics. Find out how one Red Hat team tackled the challenge.
Learn how to automatically transfer AI model metadata managed by OpenShift AI into Red Hat Developer Hub’s Software Catalog.
Integrate Red Hat OpenShift Lightspeed with a locally served large language model (LLM) for enhanced assistance within the OpenShift environment.
Explore the benefits of using Kubernetes, Context7, and GitHub MCP servers to diagnose issues, access up-to-date documentation, and interact with repositories.
Dive into LLM post-training methods, from supervised fine-tuning and continual learning to parameter-efficient and reinforcement learning approaches.
Learn how to deploy LLMs on Red Hat OpenShift AI for Ansible Lightspeed, enabling on-premise inference and optimizing resource utilization.
Your Red Hat Developer membership unlocks access to product trials, learning resources, events, tools, and a community you can trust to help you stay ahead in AI and emerging tech.
Discover the advantages of vLLM, an open source inference server that speeds up generative AI applications by making better use of GPU memory.
Celebrate our mascot Repo's first birthday with us as we look back on the events that shaped Red Hat Developer and the open source community from the past year.
Learn how to deploy multimodal AI models on edge devices using the RamaLama CLI, from pulling your first vision language model (VLM) to serving it via an API.
Discover SDG Hub, an open framework for building, composing, and scaling synthetic data pipelines for large language models.
Learn about the 5 common stages of the inference workflow, from initial setup to edge deployment, and how AI accelerator needs shift throughout the process.
Learn how to implement spec coding, a structured approach to AI-assisted development that combines human expertise with AI efficiency.
Get a comprehensive guide to profiling a vLLM inference server on a Red Hat Enterprise Linux system equipped with NVIDIA GPUs.
This learning path explores running AI models, specifically large language
Learn how to scale machine learning operations (MLOps) with an assembly line approach using configuration-driven pipelines, versioned artifacts, and GitOps.
The LLM Compressor 0.8.0 release introduces quantization workflow enhancements, extended support for Qwen3 models, and improved accuracy recovery.
Learn how llm-d's KV cache aware routing reduces latency and improves throughput by directing requests to pods that already hold relevant context in GPU memory.
Learn how to deploy LLMs like Qwen3-Coder-30B-A3B-Instruct on less infrastructure using Red Hat AI Inference Server's LLM Compressor and OpenShift AI.
DeepSeek-V3.2-Exp offers major long-context efficiency via vLLM on Day 0, deploying easily on the latest leading hardware and Red Hat AI platforms.
Implement cost-effective LLM serving on OpenShift AI with this step-by-step guide to configuring KServe's Serverless mode for vLLM autoscaling.
Learn how to deploy Model Context Protocol (MCP) servers on OpenShift using ToolHive, a Kubernetes-native utility that simplifies MCP server management.
See how vLLM’s throughput and latency compare to llama.cpp's and discover which tool is right for your specific deployment needs on enterprise-grade hardware.
Integrate the Kubernetes MCP server with OpenShift and VS Code to give AI assistants a safe, intelligent way to interact with your clusters.