Harshith Umesh

Harshith Umesh's contributions

Featured image for vLLM interference article.
Article

Performance improvements with speculative decoding in vLLM for gpt-oss

Harshith Umesh

Learn how speculative decoding in vLLM can significantly increase throughput without altering a model's output quality, resulting in 19% cost savings at scale for enterprise AI. This post benchmarks gpt-oss-120B with Eagle3 speculative decoding on vLLM and demonstrates consistent throughput and latency improvements across varying concurrency levels, datasets, tensor-parallelism settings, and draft-token budgets.