Dan Alistarh

Dan Alistarh's contributions

Discover a new combinatorial approach to decoding AI’s hidden logic, exploring how neural networks truly compute and reason."

Explore new open source quantized reasoning models based on the DeepSeek-R1-Distill suite that deliver near-perfect accuracy and inference speed improvements.

Discover Sparse Llama: A 50% pruned, GPU-optimized Llama 3.1 model with 2:4 sparsity, enabling faster, cost-effective inference without sacrificing accuracy.

Advancing AI efficiency is more critical than ever, and sparsity has proven to be a cornerstone in this pursuit.

Quantized LLMs achieve near-full accuracy with minimal trade-offs after 500K+ evaluations, providing efficient, high-performance solutions for AI model deployment.

4-bit and 8-bit quantized LLMs excel in long-context tasks, retaining over 99% accuracy across 4K to 64K sequence lengths.

Report a website issue

Red Hat Developer Sandbox

Programming languages & frameworks

System design & architecture

Developer experience

Automated data processing

Platform engineering

Secure development & architectures

E-books

Cheat sheets

Documentation

Dan Alistarh

Dan Alistarh's contributions

Cracking the code: How neural networks might actually “think”

Deployment-ready reasoning with quantized DeepSeek-R1 models

2:4 Sparse Llama: Smaller models for efficient GPU inference

2:4 Sparse Llama FP8: SOTA performance for NVIDIA Hopper GPUs

We ran over half a million evaluations on quantized LLMs—here's what we found

How well do quantized models handle long-context tasks?

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue