Alexandre Marques

Alexandre Marques's contributions

Featured image for AI/ML

Open-sourced on Hugging Face, deployment-ready with vLLM, and extensible using LLM Compressor.

Featured image for launching Sparse FP8 models and kernels

Advancing AI efficiency is more critical than ever, and sparsity has proven to be a cornerstone in this pursuit.

Featured image for LLM evaluation.

Quantized LLMs achieve near-full accuracy with minimal trade-offs after 500K+ evaluations, providing efficient, high-performance solutions for AI model deployment.

Featured image for How Well Do Quantized Models Handle Long-Context Tasks?

4-bit and 8-bit quantized LLMs excel in long-context tasks, retaining over 99% accuracy across 4K to 64K sequence lengths.