Mark Kurtz

Mark Kurtz's contributions

Open-sourced on Hugging Face, deployment-ready with vLLM, and extensible using LLM Compressor.

Advancing AI efficiency is more critical than ever, and sparsity has proven to be a cornerstone in this pursuit.

Quantized LLMs achieve near-full accuracy with minimal trade-offs after 500K+ evaluations, providing efficient, high-performance solutions for AI model deployment.

Discover LLM Compressor, a unified library for creating accurate compressed models for cheaper and faster inference with vLLM.

Llama 3's advancements, particularly at 8 billion parameters, make AI more accessible and efficient.

4-bit and 8-bit quantized LLMs excel in long-context tasks, retaining over 99% accuracy across 4K to 64K sequence lengths.

Report a website issue

Red Hat Developer Sandbox

Programming Languages & Frameworks

System Design & Architecture

Developer Productivity

Automated Data Processing

Platform Engineering

Secure Development & Architectures

E-Books

Cheat Sheets

Documentation

Mark Kurtz

Mark Kurtz's contributions

Compressed Granite 3.1: Powerful performance in a small package

2:4 Sparse Llama FP8: SOTA performance for NVIDIA Hopper GPUs

We ran over half a million evaluations on quantized LLMs—here's what we found

LLM Compressor is here: Faster inference with vLLM

Deploy Llama 3 8B with vLLM

How well do quantized models handle long-context tasks?

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue