Yuan Tang
Yuan Tang's contributions
Article
Designing distributed AI inference: Core concepts and scaling dimensions
Fatih E. Nar
+3
Learn about the five-dimensional design space in modern LLM serving, including tensor, pipeline, expert, data, and context parallelism.
Article
Combining KServe and llm-d for optimized generative AI inference
Ran Pollak
+1
Learn how to combine KServe and llm-d to optimize generative AI inference, improve performance, and reduce infrastructure costs. This article demonstrates the integration architecture and provides practical guidance for AI platform teams.
Article
Why vLLM is the best choice for AI inference today
Fatih E. Nar
+4
Discover the advantages of vLLM, an open source inference server that speeds up generative AI applications by making better use of GPU memory.
Article
Empower conversational AI at scale with KServe
Saurabh Agarwal
+3
Discover the benefits of KServe, a highly scalable machine learning deployment tool for Kubernetes.
Article
Designing distributed AI inference: Core concepts and scaling dimensions
Fatih E. Nar
+3
Learn about the five-dimensional design space in modern LLM serving, including tensor, pipeline, expert, data, and context parallelism.
Article
Combining KServe and llm-d for optimized generative AI inference
Ran Pollak
+1
Learn how to combine KServe and llm-d to optimize generative AI inference, improve performance, and reduce infrastructure costs. This article demonstrates the integration architecture and provides practical guidance for AI platform teams.
Article
Why vLLM is the best choice for AI inference today
Fatih E. Nar
+4
Discover the advantages of vLLM, an open source inference server that speeds up generative AI applications by making better use of GPU memory.
Article
Empower conversational AI at scale with KServe
Saurabh Agarwal
+3
Discover the benefits of KServe, a highly scalable machine learning deployment tool for Kubernetes.