Christopher Nuland
Christopher Nuland's contributions
Article
Improve vLLM Semantic Router accuracy with fine-tuning
Christopher Nuland
Learn how we fine-tuned the vLLM Semantic Router's embedding model to reduce misrouting rates and improve routing accuracy in enterprise deployments.
Article
Getting started with the vLLM Semantic Router project's Athena release: Optimize your tokens for agentic AI
Christopher Nuland
Learn how to set up vLLM Semantic Router locally with two models: a quantized Qwen3-Coder-Next running on Apple Silicon, and Google's Gemini 2.5 Pro as the cloud fallback. This router can significantly reduce token costs by routing common requests to a less expensive model.
Article
Introduction to distributed inference with llm-d
Christopher Nuland
+1
Learn how the llm-d project is revolutionizing LLM inference by enabling distributed, efficient, and scalable model serving across Kubernetes clusters.
Article
Master KV cache aware routing with llm-d for efficient AI inference
Christopher Nuland
+1
Learn how llm-d's KV cache aware routing reduces latency and improves throughput by directing requests to pods that already hold relevant context in GPU memory.
Article
Improve vLLM Semantic Router accuracy with fine-tuning
Christopher Nuland
Learn how we fine-tuned the vLLM Semantic Router's embedding model to reduce misrouting rates and improve routing accuracy in enterprise deployments.
Article
Getting started with the vLLM Semantic Router project's Athena release: Optimize your tokens for agentic AI
Christopher Nuland
Learn how to set up vLLM Semantic Router locally with two models: a quantized Qwen3-Coder-Next running on Apple Silicon, and Google's Gemini 2.5 Pro as the cloud fallback. This router can significantly reduce token costs by routing common requests to a less expensive model.
Article
Introduction to distributed inference with llm-d
Christopher Nuland
+1
Learn how the llm-d project is revolutionizing LLM inference by enabling distributed, efficient, and scalable model serving across Kubernetes clusters.
Article
Master KV cache aware routing with llm-d for efficient AI inference
Christopher Nuland
+1
Learn how llm-d's KV cache aware routing reduces latency and improves throughput by directing requests to pods that already hold relevant context in GPU memory.