Red Hat Developer Blog
Here's our most recent blog content. Explore our featured monthly resource as well as our most recently published items. Don't miss the chance to learn more about our contributors.
View all blogs & articles

Discover Sparse Llama: A 50% pruned, GPU-optimized Llama 3.1 model with 2:4...

Explore how vLLM's new multimodal AI inference capabilities enhance...

Learn about an efficient inference scaling method that can improve your...

Explore multimodal model quantization in LLM Compressor, a unified library...

Progress in small LLM reasoning: Our Qwen-32B model, using particle...

On reproducing R1-like reasoning in small LLMs: LIMO dataset ineffective for...

Explore how distributed inference works within vLLM in this recap of Neural...

An update on reproducing R1-like reasoning in small LLMs: Granite models show...

Open-sourced on Hugging Face, deployment-ready with vLLM, and extensible...