Mark Kurtz

Mark Kurtz's contributions

Explore new open source quantized reasoning models based on the DeepSeek-R1-Distill suite that deliver near-perfect accuracy and inference speed improvements.

Discover Sparse Llama: A 50% pruned, GPU-optimized Llama 3.1 model with 2:4 sparsity, enabling faster, cost-effective inference without sacrificing accuracy.

Explore multimodal model quantization in LLM Compressor, a unified library for optimizing models for deployment with vLLM.

Open-sourced on Hugging Face, deployment-ready with vLLM, and extensible using LLM Compressor.

Advancing AI efficiency is more critical than ever, and sparsity has proven to be a cornerstone in this pursuit.

Quantized LLMs achieve near-full accuracy with minimal trade-offs after 500K+ evaluations, providing efficient, high-performance solutions for AI model deployment.