
Simplify AI data integration with RamaLama and RAG
Explore how RamaLama makes it easier to share data with AI models using retrieval-augmented generation (RAG), a technique for enhancing large language models.
Explore how RamaLama makes it easier to share data with AI models using retrieval-augmented generation (RAG), a technique for enhancing large language models.
Learning the naming conventions of large language models (LLMs) helps users select the right model for their needs.
Explore how to run tools with Node.js using Llama Stack's completions API, agent API, and support for in-line tools, local MCP tools, and remote MCP tools.
Learn how quantized vision-language models enable faster inference, lower costs, and scalable AI deployment without compromising capability.
Sign up for the Red Hat Developer Newsletter
This article demonstrates how to run the Microsoft TRELLIS AI workload using Podman on RHEL to generate 3D assets.
Headed to DevNexus 2025? Visit the Red Hat Developer booth on-site to speak to our expert technologists.
Contact Sales - Red Hat Developer
This article demonstrates how to fine-tune LLMs with the Kubeflow Training Operator on Red Hat OpenShift AI.
Learn how to integrate NVIDIA NIM with Red Hat OpenShift AI to create and deliver AI-enabled applications at scale.
Enable hardware-enabled networking for containerized workloads using Red Hat OpenShift, BlueField DPUs, and NVIDIA DOCA Platform Framework.
Explore inference performance improvements that help vLLM serve DeepSeek AI models more efficiently in this technical deep dive.
Podman AI Lab, which integrates with Podman Desktop, provides everything you need to start developing Node.js applications that leverage large language models.
Explore new open source quantized reasoning models based on the DeepSeek-R1-Distill suite that deliver near-perfect accuracy and inference speed improvements.
Discover Sparse Llama: A 50% pruned, GPU-optimized Llama 3.1 model with 2:4 sparsity, enabling faster, cost-effective inference without sacrificing accuracy.
Explore how vLLM's new multimodal AI inference capabilities enhance performance, scalability, and flexibility across diverse hardware platforms.
Learn how to securely integrate Microsoft Azure OpenAI Service with Red Hat OpenShift Lightspeed using temporary child credentials.
Discover how the RamaLama open source project can help isolate AI models for testing and experimenting.
Explore multimodal model quantization in LLM Compressor, a unified library for optimizing models for deployment with vLLM.
Summary of all the Node.js AI posts on the parasol application
Explore how distributed inference works within vLLM in this recap of Neural Magic's vLLM Office Hours with Michael Goin and Murali Andoorveedu, a vLLM committer from CentML.
Discover how NVIDIA MIG technology on Red Hat OpenShift AI enhances GPU resource utilization.
Wanting to use your personal or organizational data in AI workflows, but it's stuck in PDFs and other document formats? Docling is here to help. It’s an open-source tool from IBM Research that converts files like PDFs and DocX into easy-to-use Markdown and JSON while keeping everything structured. In this video, join developer advocate Cedric Clyburn to see how it works, We'll walk through a demo using LlamaIndex for a question-answering app, and share some interesting details and benchmarks. Let’s dig in and see how Docling can make working with your data so much easier for RAG, Fine-Tuning models, and more.
The rise of large language models (LLMs) has opened up exciting possibilities for developers looking to build intelligent applications. However, the process of adapting these models to specific use cases can be difficult, requiring deep expertise and substantial resources. In this talk, we'll introduce you to InstructLab, an open-source project that aims to make LLM tuning accessible to developers and data scientists of all skill levels, on consumer-grade hardware.We'll explore how InstructLab's innovative approach combines collaborative knowledge curation, efficient data generation, and instruction training to enable developers to refine foundation models for specific use cases. Through a live demonstration, you’ll learn how IBM Research has partnered with Red Hat to simplify the process of enhancing LLMs with new knowledge and skills for targeted applications. Join us to explore how InstructLab is making LLM tuning more accessible, empowering developers to harness the power of AI in their projects.