Improve vLLM Semantic Router accuracy with fine-tuning
Learn how we fine-tuned the vLLM Semantic Router's embedding model to reduce misrouting rates and improve routing accuracy in enterprise deployments.
Learn how we fine-tuned the vLLM Semantic Router's embedding model to reduce misrouting rates and improve routing accuracy in enterprise deployments.
Learn how our team implemented CI/CD pipelines for the it-self-service-agent AI quickstart and the benefits of using CI/CD for agentic systems.
Explore the spending transaction monitor AI quickstart, demonstrating agentic AI for intelligent financial monitoring on enterprise-grade infrastructure. Lower the barrier to entry for AI experimentation and refine your AI strategy.
Learn how to build reliable AI agents with our 8-stage evaluation framework. We explore DeepEval, multi-turn testing, and CI/CD integration for Red Hat AI.
Discover a practical solution pattern for building a modern financial application that makes loan decisions using multiple machine learning systems deployed across hybrid environments.
Learn how to run high-performance computing workloads managed by Slurm within a containerized OpenShift environment using the Slinky operator.
Learn how the Responses API in Llama Stack automates complex tool calling while maintaining granular control over conversation flow for AI agents. Discover the benefits and implementation details.
Learn how to estimate memory requirements for your LLM fine-tuning experiments using Red Hat Training Hub's memory_estimator.py API. This guide covers the memory components, adjusting training setups for specific GPU specifications, and using the memory estimator in your code. Streamline your model fine-tuning process with runtime estimates and automated hyperparameter suggestions.
Learn how to deploy and test an Earth and space model inference service on Red Hat AI Inference Server and Red Hat OpenShift AI. This article includes two self-contained activities, one deploying Prithvi using a traditional Deployment object and another serving the model using KServe and observing Knative scaling.
Understand the PyTorch autograd engine internals to debug gradient flows. Learn about computational graphs, saved tensors, and performance optimization techniques.
Optimize vLLM performance with practical tuning tips. Learn how to use GuideLLM for benchmarking, adjust GPU ratios, and maximize KV cache to improve throughput.
Learn how to fine-tune AI pipelines in Red Hat OpenShift AI 3.3. Use Kubeflow Trainer and modular components for reproducible, production-grade model tuning.
Explore big versus small prompting in AI agents. Learn how Red Hat's AI quickstart balances model capability, token costs, and task focus using LangGraph.
Deploy an enterprise-ready RAG chatbot using OpenShift AI. This quickstart automates provisioning of components like vector databases and ingestion pipelines.
Discover the self-service agent AI quickstart for automating IT processes on Red Hat OpenShift AI. Deploy, integrate with Slack and ServiceNow, and more.
Learn how to fine-tune a RAG model using Feast and Kubeflow Trainer. This guide covers preprocessing and scaling training on Red Hat OpenShift AI.
Learn how to implement retrieval-augmented generation (RAG) with Feast on Red Hat OpenShift AI to create highly efficient and intelligent retrieval systems.
Most log lines are noise. Learn how semantic anomaly detection filters out repetitive patterns—even repetitive errors—to surface the genuinely unusual events.
Optimize AI scheduling. Discover 3 workflows to automate RayCluster lifecycles using KubeRay and Kueue on Red Hat OpenShift AI 3.
Learn how to share an NVIDIA GPU with an OpenShift Local instance to run containerized workloads that require GPU acceleration without a dedicated server.
Learn how we built a simple, rules-based algorithm to detect oversaturation in LLM performance benchmarks, reducing costs by more than a factor of 2.
Your Red Hat Developer membership unlocks access to product trials, learning resources, events, tools, and a community you can trust to help you stay ahead in AI and emerging tech.
Enhance your Python AI applications with distributed tracing. Discover how to use Jaeger and OpenTelemetry for insights into Llama Stack interactions.
Learn how to implement Llama Stack's built-in guardrails with Python, helping to improve the safety and performance of your LLM applications.
Enterprise-grade artificial intelligence and machine learning (AI/ML) for