Alexandre Marques

Alexandre Marques's contributions

Learn about NVFP4, a 4-bit floating-point format for high-performance inference on modern GPUs that can deliver near-baseline accuracy at large scale.

Speculators standardizes speculative decoding for large language models, with a unified Hugging Face format, vLLM integration, and more.

Boost inference performance by up to 2.5X with vLLM's Eagle 3 speculative decoding integration. Discover how in this blog post.

Discover how to deploy compressed, fine-tuned models for efficient inference with the new Axolotl and LLM Compressor integration.

Learn how quantized vision-language models enable faster inference, lower costs, and scalable AI deployment without compromising capability.

Explore new open source quantized reasoning models based on the DeepSeek-R1-Distill suite that deliver near-perfect accuracy and inference speed improvements.

Discover Sparse Llama: A 50% pruned, GPU-optimized Llama 3.1 model with 2:4 sparsity, enabling faster, cost-effective inference without sacrificing accuracy.

Open-sourced on Hugging Face, deployment-ready with vLLM, and extensible using LLM Compressor.

Report a website issue

Red Hat Developer Sandbox

Programming languages & frameworks

System design & architecture

Developer experience

Automated data processing

Platform engineering

Secure development & architectures

E-books

Cheat sheets

Documentation

Alexandre Marques

Alexandre Marques's contributions

Accelerating large language models with NVFP4 quantization

Speculators: Standardized, production-ready speculative decoding

Fly Eagle(3) fly: Faster inference with vLLM & speculative decoding

Axolotl meets LLM Compressor: Fast, sparse, open

Enable 3.5 times faster vision language models with quantization

Deployment-ready reasoning with quantized DeepSeek-R1 models

2:4 Sparse Llama: Smaller models for efficient GPU inference

Compressed Granite 3.1: Powerful performance in a small package

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue