
Deployment-ready reasoning with quantized DeepSeek-R1 models
Explore new open source quantized reasoning models based on the DeepSeek-R1-Distill suite that deliver near-perfect accuracy and inference speed improvements.
Explore new open source quantized reasoning models based on the DeepSeek-R1-Distill suite that deliver near-perfect accuracy and inference speed improvements.
Discover Sparse Llama: A 50% pruned, GPU-optimized Llama 3.1 model with 2:4 sparsity, enabling faster, cost-effective inference without sacrificing accuracy.
Explore how vLLM's new multimodal AI inference capabilities enhance performance, scalability, and flexibility across diverse hardware platforms.
Learn about an efficient inference scaling method that can improve your model's reasoning ability and performance at runtime while saving on compute costs.
Learn how to securely integrate Microsoft Azure OpenAI Service with Red Hat OpenShift Lightspeed using temporary child credentials.
Discover how the RamaLama open source project can help isolate AI models for testing and experimenting.
Explore multimodal model quantization in LLM Compressor, a unified library for optimizing models for deployment with vLLM.
Progress in small LLM reasoning: Our Qwen-32B model, using particle filtering, now surpasses o1-preview on Math500.
Summary of all the Node.js AI posts on the parasol application
On reproducing R1-like reasoning in small LLMs: LIMO dataset ineffective for Llama/Granite; synthetic data generation shows promise but fine-tuning is tricky.
An update on reproducing R1-like reasoning in small LLMs: Granite models show big gains with particle filtering, outperforming GPT-4o on benchmarks.
Explore how distributed inference works within vLLM in this recap of Neural Magic's vLLM Office Hours with Michael Goin and Murali Andoorveedu, a vLLM committer from CentML.
Discover how NVIDIA MIG technology on Red Hat OpenShift AI enhances GPU resource utilization.
Wanting to use your personal or organizational data in AI workflows, but it's stuck in PDFs and other document formats? Docling is here to help. It’s an open-source tool from IBM Research that converts files like PDFs and DocX into easy-to-use Markdown and JSON while keeping everything structured. In this video, join developer advocate Cedric Clyburn to see how it works, We'll walk through a demo using LlamaIndex for a question-answering app, and share some interesting details and benchmarks. Let’s dig in and see how Docling can make working with your data so much easier for RAG, Fine-Tuning models, and more.
The rise of large language models (LLMs) has opened up exciting possibilities for developers looking to build intelligent applications. However, the process of adapting these models to specific use cases can be difficult, requiring deep expertise and substantial resources. In this talk, we'll introduce you to InstructLab, an open-source project that aims to make LLM tuning accessible to developers and data scientists of all skill levels, on consumer-grade hardware.We'll explore how InstructLab's innovative approach combines collaborative knowledge curation, efficient data generation, and instruction training to enable developers to refine foundation models for specific use cases. Through a live demonstration, you’ll learn how IBM Research has partnered with Red Hat to simplify the process of enhancing LLMs with new knowledge and skills for targeted applications. Join us to explore how InstructLab is making LLM tuning more accessible, empowering developers to harness the power of AI in their projects.
Welcome back to Red Hat Dan on Tech, where Senior Distinguished Engineer Dan Walsh dives deep on all things technical, from his expertise in container technologies with tools like Podman and Buildah, to runtimes, Kubernetes, AI, and SELinux! In this episode, senior principal software engineer Sergio Lopez Pascual joins to deep dive into Libkrun and Krunkit about getting the most out of VM technology with containers and much more!
Learn how to run distributed AI training on Red Hat OpenShift using RoCE with
Open-sourced on Hugging Face, deployment-ready with vLLM, and extensible using LLM Compressor.
Learn how to build a ModelCar container image and deploy it with OpenShift AI.
Let's take a look at how to effectively integrate Generative AI into an existing application through the InstructLab project, an open-source methodology and community to make LLM tuning accessible to all! Learn about the project, and how InstructLab can help to train a model on domain-specific skills and knowledge, then how Podman's AI Lab allows developers to easily setup an environment for model serving and AI-enabled application development.
The Konveyor community has developed "Konveyor AI" (Kai), a tool that uses Generative AI to accelerate application modernization. Kai integrates large language models with static code analysis to facilitate code modifications within a developer's IDE, helping transition to technologies like Quarkus efficiently. This video provides a short introduction and demo showcasing the migration of the Java EE "coolstore" application to Quarkus using Konveyor AI.
In this episode, Senior Distinguished Engineer Dan Walsh discusses tips and tricks for writing SELinux policies and how you can use containers to your advantage.
Welcome to the new Red Hat Dan on Tech, where Senior Distinguished Engineer Dan Walsh dives deep on all things technical, from his expertise in container technologies with tools like Podman and Buildah, to runtimes, Kubernetes, AI, and SELinux! This weekly series will bring in guests from around the industry to highlight innovation and things you should know, and new episodes will be released right here, on the Red Hat Developer channel, each and every Wednesday at 9am EST! Stay tuned, and see you in the next episode!
Kickstart your generative AI application development journey with Podman AI Lab, an open-source extension for Podman Desktop to build applications with LLMs on a local environment. The Podman AI Lab helps to make AI more accessible and approachable, providing recipes for example use cases with generative AI, curated models sourced from Hugging Face, model serving with integrated code snippets, and a playground environment to test and adjust model performance. Learn more on Red Hat Developer https://developers.redhat.com/product... and download Podman Desktop today to get started!
Welcome back to Red Hat Dan on Tech, where Senior Distinguished Engineer Dan Walsh dives deep on all things technical, from his expertise in container technologies with tools like Podman and Buildah, to runtimes, Kubernetes, AI, and SELinux! Let's talk about Podman and containers when it comes to Systemd, and how technologies like Quadlet abstracts the complexities of running containers under Systemd, featuring Principal Software Engineer Ygal Blum.