Helen Zhao
Helen Zhao's contributions
Article
Speculators v0.5.0: DFlash support and online training
Helen Zhao
+2
Speculators v0.5.0 introduces DFlash support, enabling single-pass draft token generation with block diffusion for more efficient speculative decoding workflows. The release also adds unified online and offline training through vLLM’s native hidden states extraction system, improving training flexibility, version stability, and production readiness.
Article
Speculators: Standardized, production-ready speculative decoding
Alexandre Marques
+7
Speculators standardizes speculative decoding for large language models, with a unified Hugging Face format, vLLM integration, and more.
Article
LLM Compressor 0.7.0 release recap
Dipika Sikka
+3
LLM Compressor 0.7.0 brings Hadamard transforms for better accuracy, mixed-precision FP4/FP8, and calibration-free block quantization for efficient compression.
Article
Speculators v0.5.0: DFlash support and online training
Helen Zhao
+2
Speculators v0.5.0 introduces DFlash support, enabling single-pass draft token generation with block diffusion for more efficient speculative decoding workflows. The release also adds unified online and offline training through vLLM’s native hidden states extraction system, improving training flexibility, version stability, and production readiness.
Article
Speculators: Standardized, production-ready speculative decoding
Alexandre Marques
+7
Speculators standardizes speculative decoding for large language models, with a unified Hugging Face format, vLLM integration, and more.
Article
LLM Compressor 0.7.0 release recap
Dipika Sikka
+3
LLM Compressor 0.7.0 brings Hadamard transforms for better accuracy, mixed-precision FP4/FP8, and calibration-free block quantization for efficient compression.