Addie Stevens
Addie Stevens's contributions
Article
Building domain-specific LLMs with synthetic data and SDG Hub
Shivchander Sudalairaj
+2
Use the open source SDG Hub to quickly create custom synthetic data pipelines. Train and evaluate your models faster and more efficiently.
Article
Introduction to distributed inference with llm-d
Christopher Nuland
+1
Learn how the llm-d project is revolutionizing LLM inference by enabling distributed, efficient, and scalable model serving across Kubernetes clusters.
Article
Post-training methods for language models
Mustafa Eyceoz
+1
Dive into LLM post-training methods, from supervised fine-tuning and continual learning to parameter-efficient and reinforcement learning approaches.
Article
vLLM with torch.compile: Efficient LLM inference on PyTorch
Luka Govedič
+5
Learn how to optimize PyTorch code with minimal effort using torch.compile, a just-in-time compiler that generates optimized kernels automatically.
Article
Ollama or vLLM? How to choose the right LLM serving tool for your use case
Addie Stevens
+2
Ollama makes it easy for developers to get started with local model experimentation, while vLLM provides a path to reliable, efficient, and scalable deployment.
Article
GuideLLM: Evaluate LLM deployments for real-world inference
Jenny Yi
+2
Learn how to evaluate the performance of your LLM deployments with the open source GuideLLM toolkit to optimize cost, reliability, and user experience.
Article
Structured outputs in vLLM: Guiding AI responses
Michael Goin
+2
Learn how to control the output of vLLM's AI responses with structured outputs. Discover how to define choice lists, JSON schemas, regex, and more.
Article
LLM Compressor: Optimize LLMs for low-latency deployments
Kyle Sayers
+3
LLM Compressor bridges the gap between model training and efficient deployment via quantization and sparsity, enabling cost-effective, low-latency inference.
Building domain-specific LLMs with synthetic data and SDG Hub
Use the open source SDG Hub to quickly create custom synthetic data pipelines. Train and evaluate your models faster and more efficiently.
Introduction to distributed inference with llm-d
Learn how the llm-d project is revolutionizing LLM inference by enabling distributed, efficient, and scalable model serving across Kubernetes clusters.
Post-training methods for language models
Dive into LLM post-training methods, from supervised fine-tuning and continual learning to parameter-efficient and reinforcement learning approaches.
vLLM with torch.compile: Efficient LLM inference on PyTorch
Learn how to optimize PyTorch code with minimal effort using torch.compile, a just-in-time compiler that generates optimized kernels automatically.
Ollama or vLLM? How to choose the right LLM serving tool for your use case
Ollama makes it easy for developers to get started with local model experimentation, while vLLM provides a path to reliable, efficient, and scalable deployment.
GuideLLM: Evaluate LLM deployments for real-world inference
Learn how to evaluate the performance of your LLM deployments with the open source GuideLLM toolkit to optimize cost, reliability, and user experience.
Structured outputs in vLLM: Guiding AI responses
Learn how to control the output of vLLM's AI responses with structured outputs. Discover how to define choice lists, JSON schemas, regex, and more.
LLM Compressor: Optimize LLMs for low-latency deployments
LLM Compressor bridges the gap between model training and efficient deployment via quantization and sparsity, enabling cost-effective, low-latency inference.