Christopher Nuland
Christopher Nuland's contributions
Article
Getting started with the vLLM Semantic Router project's Athena release: Optimize your tokens for agentic AI
Christopher Nuland
Learn how to set up vLLM Semantic Router locally with two models: a quantized Qwen3-Coder-Next running on Apple Silicon, and Google's Gemini 2.5 Pro as the cloud fallback. This router can significantly reduce token costs by routing common requests to a less expensive model.
Article
Introduction to distributed inference with llm-d
Christopher Nuland
+1
Learn how the llm-d project is revolutionizing LLM inference by enabling distributed, efficient, and scalable model serving across Kubernetes clusters.
Article
Master KV cache aware routing with llm-d for efficient AI inference
Christopher Nuland
+1
Learn how llm-d's KV cache aware routing reduces latency and improves throughput by directing requests to pods that already hold relevant context in GPU memory.
Article
Getting started with the vLLM Semantic Router project's Athena release: Optimize your tokens for agentic AI
Christopher Nuland
Learn how to set up vLLM Semantic Router locally with two models: a quantized Qwen3-Coder-Next running on Apple Silicon, and Google's Gemini 2.5 Pro as the cloud fallback. This router can significantly reduce token costs by routing common requests to a less expensive model.
Article
Introduction to distributed inference with llm-d
Christopher Nuland
+1
Learn how the llm-d project is revolutionizing LLM inference by enabling distributed, efficient, and scalable model serving across Kubernetes clusters.
Article
Master KV cache aware routing with llm-d for efficient AI inference
Christopher Nuland
+1
Learn how llm-d's KV cache aware routing reduces latency and improves throughput by directing requests to pods that already hold relevant context in GPU memory.