Red Hat Developer Blog

Here's our most recent blog content. Explore our featured monthly resource as well as our most recently published items. Don't miss the chance to learn more about our contributors.

Subscribe to the feed

View all blogs & articles

Content type
Product
Topics
Article Featured image for launching Sparse FP8 models and kernels

Advancing AI efficiency is more critical than ever, and sparsity has proven...

Article Featured image for LLM evaluation.

Quantized LLMs achieve near-full accuracy with minimal trade-offs after 500K+...

Article Featured image for Machete.

Machete, Neural Magic’s optimized kernel for NVIDIA Hopper GPUs, achieves...

Article Featured image for LLM Compressor.

Discover LLM Compressor, a unified library for creating accurate compressed...

Article Featured image for vLLM FP8 inference.

Explore the integration of FP8 in vLLM. Learn how to receive up to a 2x...

Article Featured image for Deploy Llama 3 8B with vLLM blog.
Jun 18, 2024
Mark Kurtz

Llama 3's advancements, particularly at 8 billion parameters, make AI more...

Article Featured image for LLM inference with Marlin.

Learn about Marlin, a mixed-precision matrix multiplication kernel that...

Article Featured image for How Well Do Quantized Models Handle Long-Context Tasks?

4-bit and 8-bit quantized LLMs excel in long-context tasks, retaining over...

Article Featured image for AI/ML

Sparse fine-tuning in combination with sparsity-aware inference software,...