Artificial intelligence

Featured blog image with the following text: vLLM and DeepSeek
Article

How we optimized vLLM for DeepSeek-R1

Michael Goin +4

Explore inference performance improvements that help vLLM serve DeepSeek AI models more efficiently in this technical deep dive.

Featured image for AI/ML
Article

Granite, LIMO, and small LLM reasoning

Akash Srivastava +8

On reproducing R1-like reasoning in small LLMs: LIMO dataset ineffective for Llama/Granite; synthetic data generation shows promise but fine-tuning is tricky.

Featured image for AI/ML
Article

How particle filtering makes small LLMs think big

Akash Srivastava +8

An update on reproducing R1-like reasoning in small LLMs: Granite models show big gains with particle filtering, outperforming GPT-4o on benchmarks.

Featured image for Distributed inference with vLLM.
Article

Distributed inference with vLLM

Michael Goin

Explore how distributed inference works within vLLM in this recap of Neural Magic's vLLM Office Hours with Michael Goin and Murali Andoorveedu, a vLLM committer from CentML.

Video Thumbnail
Video

Docling: Efficient document processing for AI workflows

Cedric Clyburn +1

Wanting to use your personal or organizational data in AI workflows, but it's stuck in PDFs and other document formats? Docling is here to help. It’s an open-source tool from IBM Research that converts files like PDFs and DocX into easy-to-use Markdown and JSON while keeping everything structured. In this video, join developer advocate Cedric Clyburn to see how it works, We'll walk through a demo using LlamaIndex for a question-answering app, and share some interesting details and benchmarks. Let’s dig in and see how Docling can make working with your data so much easier for RAG, Fine-Tuning models, and more.

Video Thumbnail
Video

Enhancing Generative AI with InstructLab for Accessible Model Fine-Tuning

Legare Kerrison +2

The rise of large language models (LLMs) has opened up exciting possibilities for developers looking to build intelligent applications. However, the process of adapting these models to specific use cases can be difficult, requiring deep expertise and substantial resources. In this talk, we'll introduce you to InstructLab, an open-source project that aims to make LLM tuning accessible to developers and data scientists of all skill levels, on consumer-grade hardware.We'll explore how InstructLab's innovative approach combines collaborative knowledge curation, efficient data generation, and instruction training to enable developers to refine foundation models for specific use cases. Through a live demonstration, you’ll learn how IBM Research has partnered with Red Hat to simplify the process of enhancing LLMs with new knowledge and skills for targeted applications. Join us to explore how InstructLab is making LLM tuning more accessible, empowering developers to harness the power of AI in their projects.

Video Thumbnail
Video

Red Hat Dan on Tech: Episode 12 - How libkrun enhances AI, Security, Containers, Virtualization, and more

Daniel Walsh +1

Welcome back to Red Hat Dan on Tech, where Senior Distinguished Engineer Dan Walsh dives deep on all things technical, from his expertise in container technologies with tools like Podman and Buildah, to runtimes, Kubernetes, AI, and SELinux! In this episode, senior principal software engineer Sergio Lopez Pascual joins to deep dive into Libkrun and Krunkit about getting the most out of VM technology with containers and much more!