Harshith Umesh
Harshith Umesh's contributions
Article
Performance improvements with speculative decoding in vLLM for gpt-oss
Harshith Umesh
Learn how speculative decoding in vLLM can significantly increase throughput without altering a model's output quality, resulting in 19% cost savings at scale for enterprise AI. This post benchmarks gpt-oss-120B with Eagle3 speculative decoding on vLLM and demonstrates consistent throughput and latency improvements across varying concurrency levels, datasets, tensor-parallelism settings, and draft-token budgets.
Article
How to deploy and benchmark vLLM with GuideLLM on Kubernetes
Harshith Umesh
Learn how to deploy and test the inference capabilities of vLLM on OpenShift using GuideLLM, a specialized performance benchmarking tool.
Article
vLLM or llama.cpp: Choosing the right LLM inference engine for your use case
Harshith Umesh
See how vLLM’s throughput and latency compare to llama.cpp's and discover which tool is right for your specific deployment needs on enterprise-grade hardware.
Article
Ollama vs. vLLM: A deep dive into performance benchmarking
Harshith Umesh
Learn how vLLM outperforms Ollama in high-performance production deployments, delivering significantly higher throughput and lower latency.
Performance improvements with speculative decoding in vLLM for gpt-oss
Learn how speculative decoding in vLLM can significantly increase throughput without altering a model's output quality, resulting in 19% cost savings at scale for enterprise AI. This post benchmarks gpt-oss-120B with Eagle3 speculative decoding on vLLM and demonstrates consistent throughput and latency improvements across varying concurrency levels, datasets, tensor-parallelism settings, and draft-token budgets.
How to deploy and benchmark vLLM with GuideLLM on Kubernetes
Learn how to deploy and test the inference capabilities of vLLM on OpenShift using GuideLLM, a specialized performance benchmarking tool.
vLLM or llama.cpp: Choosing the right LLM inference engine for your use case
See how vLLM’s throughput and latency compare to llama.cpp's and discover which tool is right for your specific deployment needs on enterprise-grade hardware.
Ollama vs. vLLM: A deep dive into performance benchmarking
Learn how vLLM outperforms Ollama in high-performance production deployments, delivering significantly higher throughput and lower latency.