ReAct vs. naive prompt chaining on Llama Stack
Learn about the advantages of prompt chaining and the ReAct framework compared to simpler agent architectures for complex tasks.
Learn about the advantages of prompt chaining and the ReAct framework compared to simpler agent architectures for complex tasks.
Discover how Bunsen tracks and analyzes large and busy upstream projects.
Discover the comprehensive security and scalability measures for a Models-as-a-Service (MaaS) platform in an enterprise environment.
This beginner's guide to Podman AI Lab walks through setting up Podman Desktop, installing the AI Lab extension, and launching your first RAG chatbot.
RamaLama's new multimodal feature integrates vision-language models with containers. Discover how it helps developers download and serve multimodal AI models.
Learn how to control the output of vLLM's AI responses with structured outputs. Discover how to define choice lists, JSON schemas, regex, and more.
Learn how Podman AI Lab and RamaLama work together to simplify local AI model execution, using containers and GPU support for faster, easier AI development.
Explore how to utilize guardrails for safety mechanisms in large language models (LLMs) with Node.js and Llama Stack, focusing on LlamaGuard and PromptGuard.
MINC is a new Podman Desktop extension that eases local Kubernetes development, offering a streamlined local Kubernetes experience powered by MicroShift.
PowerUP 2025 is the week of May 19th. It's held in Anaheim, California this year
LLM Compressor bridges the gap between model training and efficient deployment via quantization and sparsity, enabling cost-effective, low-latency inference.
Explore performance and usability improvements in vLLM 0.8.1 on OpenShift, including crucial architectural overhauls and multimodal inference optimizations.
Discover how to fine-tune large language models (LLMs) with Kubeflow Training, PyTorch FSDP, and Hugging Face SFTTrainer in OpenShift AI.
Discover the new Llama 4 Scout and Llama 4 Maverick models from Meta, with mixture of experts architecture, early fusion multimodality, and Day 0 model support.
Explore how RamaLama makes it easier to share data with AI models using retrieval-augmented generation (RAG), a technique for enhancing large language models.
Learn how quantized vision-language models enable faster inference, lower costs, and scalable AI deployment without compromising capability.
The Red Hat Developer program brings developers together to learn from each other and create more extraordinary things, faster. We serve the builders. Those who solve problems and create their careers with code.
This article provides automation strategies to help you scale smarter for better infrastructure with Ansible Automation Platform and AWS (part 1 of 3).
Explore inference performance improvements that help vLLM serve DeepSeek AI models more efficiently in this technical deep dive.
With the growth in the use of containers, the need to bundle your application
Explore new open source quantized reasoning models based on the DeepSeek-R1-Distill suite that deliver near-perfect accuracy and inference speed improvements.
Red Hat® Summit is where ideas and innovation come together to shape the future of enterprise IT. With a variety of offerings for this year’s event, you have the opportunity to shape conversations around open cloud technology, digital transformation, and much more.
Explore how vLLM's new multimodal AI inference capabilities enhance performance, scalability, and flexibility across diverse hardware platforms.
Explore multimodal model quantization in LLM Compressor, a unified library for optimizing models for deployment with vLLM.
This article explores how developing technical expertise can improve collaboration and enhance UX design outcomes.