Red Hat Developer Blog

Article

Mar 11, 2025

Retrieval-augmented generation with Node.js, Podman AI Lab & React

Michael Dawson

Podman AI Lab, which integrates with Podman Desktop, provides everything you...

Article

Featured image showing two whales swimming through a red frame.

Mar 03, 2025

Deployment-ready reasoning with quantized DeepSeek-R1 models

Eldar Kurtić +3

Explore new open source quantized reasoning models based on the...

Article

Featured image for 2.4 Sparse Foundation Models.

Feb 28, 2025

2:4 Sparse Llama: Smaller models for efficient GPU inference

Eldar Kurtić +4

Discover Sparse Llama: A 50% pruned, GPU-optimized Llama 3.1 model with 2:4...

Article

Featured image showing scaffolding that forms the word "V1".

Feb 27, 2025

vLLM V1: Accelerating multimodal inference for large language models

Michael Goin +3

Explore how vLLM's new multimodal AI inference capabilities enhance...

Article

Feb 25, 2025

Lessons on reproducing R1-like reasoning in small LLMs

Akash Srivastava +8

Learn about an efficient inference scaling method that can improve your...

Article

Feb 19, 2025

Multimodal model quantization support through LLM Compressor

Kyle Sayers +3

Explore multimodal model quantization in LLM Compressor, a unified library...

Article

Feb 17, 2025

On reasoning versus inference-time scaling

Akash Srivastava +8

Progress in small LLM reasoning: Our Qwen-32B model, using particle...

Article

Feb 07, 2025

Granite, LIMO, and small LLM reasoning

Akash Srivastava +8

On reproducing R1-like reasoning in small LLMs: LIMO dataset ineffective for...

Article

Feb 06, 2025

Distributed inference with vLLM

Michael Goin

Explore how distributed inference works within vLLM in this recap of Neural...

Report a website issue

Red Hat Developer Sandbox

Programming languages & frameworks

System design & architecture

Developer experience

Automated data processing

Platform engineering

Secure development & architectures

E-books

Cheat sheets

Documentation

View all blogs & articles

Retrieval-augmented generation with Node.js, Podman AI Lab & React

Deployment-ready reasoning with quantized DeepSeek-R1 models

2:4 Sparse Llama: Smaller models for efficient GPU inference

vLLM V1: Accelerating multimodal inference for large language models

Lessons on reproducing R1-like reasoning in small LLMs

Multimodal model quantization support through LLM Compressor

On reasoning versus inference-time scaling

Granite, LIMO, and small LLM reasoning

Distributed inference with vLLM

Featured Authors

Cedric Clyburn

Michael Dawson

Don Schenck

Andrew Azores

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue