Lucas Wilkinson
Lucas Wilkinson's contributions
Article
Run Gemma 4 with Red Hat AI on Day 0: A step-by-step guide
Saša Zelenović
+4
Learn how to deploy and experiment with Gemma 4, the latest open model family from Google DeepMind. This guide covers text, image, and video input, Mixture-of-Experts architecture, and more. Get started with Red Hat AI Inference Server today.
Article
How we optimized vLLM for DeepSeek-R1
Michael Goin
+4
Explore inference performance improvements that help vLLM serve DeepSeek AI models more efficiently in this technical deep dive.
Article
Introducing Machete, a mixed-input GEMM kernel optimized for NVIDIA Hopper GPUs
Lucas Wilkinson
Machete, Neural Magic’s optimized kernel for NVIDIA Hopper GPUs, achieves 4x memory savings and faster LLM inference with mixed-input quantization in vLLM.
Article
How Marlin pushes the boundaries of mixed-precision LLM inference
Michael Goin
+1
Learn about Marlin, a mixed-precision matrix multiplication kernel that delivers 4x speedup with FP16xINT4 computations for batch sizes up to 32.
Article
Run Gemma 4 with Red Hat AI on Day 0: A step-by-step guide
Saša Zelenović
+4
Learn how to deploy and experiment with Gemma 4, the latest open model family from Google DeepMind. This guide covers text, image, and video input, Mixture-of-Experts architecture, and more. Get started with Red Hat AI Inference Server today.
Article
How we optimized vLLM for DeepSeek-R1
Michael Goin
+4
Explore inference performance improvements that help vLLM serve DeepSeek AI models more efficiently in this technical deep dive.
Article
Introducing Machete, a mixed-input GEMM kernel optimized for NVIDIA Hopper GPUs
Lucas Wilkinson
Machete, Neural Magic’s optimized kernel for NVIDIA Hopper GPUs, achieves 4x memory savings and faster LLM inference with mixed-input quantization in vLLM.
Article
How Marlin pushes the boundaries of mixed-precision LLM inference
Michael Goin
+1
Learn about Marlin, a mixed-precision matrix multiplication kernel that delivers 4x speedup with FP16xINT4 computations for batch sizes up to 32.