Lucas Wilkinson

Lucas Wilkinson's contributions

Featured blog image with the following text: vLLM and DeepSeek

How we optimized vLLM for DeepSeek-R1

Michael Goin +4

March 19, 2025

Explore inference performance improvements that help vLLM serve DeepSeek AI models more efficiently in this technical deep dive.

Featured image for Machete.

Machete, Neural Magic’s optimized kernel for NVIDIA Hopper GPUs, achieves 4x memory savings and faster LLM inference with mixed-input quantization in vLLM.

Featured image for LLM inference with Marlin.

Learn about Marlin, a mixed-precision matrix multiplication kernel that delivers 4x speedup with FP16xINT4 computations for batch sizes up to 32.

Report a website issue

Your name

Your e-mail address

Subject

Message

Type of request/issue

Problem Page URL

Country/Territory

Red Hat Account Number