Red Hat AI

Featured image for How Well Do Quantized Models Handle Long-Context Tasks?

4-bit and 8-bit quantized LLMs excel in long-context tasks, retaining over 99% accuracy across 4K to 64K sequence lengths.

Featured image for AI/ML

Sparse fine-tuning in combination with sparsity-aware inference software, like DeepSparse, unlocks ubiquitous CPU hardware as a deployment target for LLM inference.

featured image for SparseGPT.

Compress large language models (LLMs) with SparseGPT to make your machine learning inference fast and efficient. Prune in one-shot with minimal accuracy loss.

Page Thumbnail

Page Not Found