Yuan Tang

Yuan Tang's contributions

Learn how to combine KServe and llm-d to optimize generative AI inference, improve performance, and reduce infrastructure costs. This article demonstrates the integration architecture and provides practical guidance for AI platform teams.

Discover the advantages of vLLM, an open source inference server that speeds up generative AI applications by making better use of GPU memory.

Discover the benefits of KServe, a highly scalable machine learning deployment tool for Kubernetes.

Yuan Tang

Yuan Tang's contributions

Combining KServe and llm-d for optimized generative AI inference

Why vLLM is the best choice for AI inference today

Empower conversational AI at scale with KServe

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links