Michael Goin

Michael Goin's contributions

Learn how to optimize PyTorch code with minimal effort using torch.compile, a just-in-time compiler that generates optimized kernels automatically.

Learn how to control the output of vLLM's AI responses with structured outputs. Discover how to define choice lists, JSON schemas, regex, and more.

LLM Compressor bridges the gap between model training and efficient deployment via quantization and sparsity, enabling cost-effective, low-latency inference.

How we optimized vLLM for DeepSeek-R1

Michael Goin +4

March 19, 2025

Explore inference performance improvements that help vLLM serve DeepSeek AI models more efficiently in this technical deep dive.

Explore how vLLM's new multimodal AI inference capabilities enhance performance, scalability, and flexibility across diverse hardware platforms.

Explore how distributed inference works within vLLM in this recap of Neural Magic's vLLM Office Hours with Michael Goin and Murali Andoorveedu, a vLLM committer from CentML.

Explore the integration of FP8 in vLLM. Learn how to receive up to a 2x reduction in latency on NVIDIA GPUs with minimal accuracy degradation.

Learn about Marlin, a mixed-precision matrix multiplication kernel that delivers 4x speedup with FP16xINT4 computations for batch sizes up to 32.

Report a website issue

Red Hat Developer Sandbox

Programming languages & frameworks

System design & architecture

Developer experience

Automated data processing

Platform engineering

Secure development & architectures

E-books

Cheat sheets

Documentation

Michael Goin

Michael Goin's contributions

vLLM with torch.compile: Efficient LLM inference on PyTorch

Structured outputs in vLLM: Guiding AI responses

LLM Compressor: Optimize LLMs for low-latency deployments

How we optimized vLLM for DeepSeek-R1

vLLM V1: Accelerating multimodal inference for large language models

Distributed inference with vLLM

vLLM brings FP8 inference to the open source community

How Marlin pushes the boundaries of mixed-precision LLM inference

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue