Lucas Wilkinson
Lucas Wilkinson's contributions

Article
How we optimized vLLM for DeepSeek-R1
Michael Goin
+4
Explore inference performance improvements that help vLLM serve DeepSeek AI models more efficiently in this technical deep dive.

Article
Introducing Machete, a mixed-input GEMM kernel optimized for NVIDIA Hopper GPUs
Lucas Wilkinson
Machete, Neural Magic’s optimized kernel for NVIDIA Hopper GPUs, achieves 4x memory savings and faster LLM inference with mixed-input quantization in vLLM.

Article
How Marlin pushes the boundaries of mixed-precision LLM inference
Michael Goin
+1
Learn about Marlin, a mixed-precision matrix multiplication kernel that delivers 4x speedup with FP16xINT4 computations for batch sizes up to 32.