Christopher Nuland

Christopher Nuland's contributions

Learn how we fine-tuned the vLLM Semantic Router's embedding model to reduce misrouting rates and improve routing accuracy in enterprise deployments.

Learn how to set up vLLM Semantic Router locally with two models: a quantized Qwen3-Coder-Next running on Apple Silicon, and Google's Gemini 2.5 Pro as the cloud fallback. This router can significantly reduce token costs by routing common requests to a less expensive model.

Article

Introduction to distributed inference with llm-d

Christopher Nuland +1

November 21, 2025

Learn how the llm-d project is revolutionizing LLM inference by enabling distributed, efficient, and scalable model serving across Kubernetes clusters.

Learn how llm-d's KV cache aware routing reduces latency and improves throughput by directing requests to pods that already hold relevant context in GPU memory.

Christopher Nuland

Christopher Nuland's contributions

Improve vLLM Semantic Router accuracy with fine-tuning

Getting started with the vLLM Semantic Router project's Athena release: Optimize your tokens for agentic AI

Introduction to distributed inference with llm-d

Master KV cache aware routing with llm-d for efficient AI inference

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links