Batch inference on OpenShift AI with Ray Data, vLLM, and CodeFlare
Learn how to perform large-scale, distributed batch inference on Red Hat OpenShift AI using the CodeFlare SDK with Ray Data and vLLM.
Learn how to perform large-scale, distributed batch inference on Red Hat OpenShift AI using the CodeFlare SDK with Ray Data and vLLM.
Enterprise-grade artificial intelligence and machine learning (AI/ML) for
This tutorial shows you how to use the Llama Stack API to implement retrieval-augmented generation for an AI application built with Python.
Headed to JavaZone2025? Visit the Red Hat Developer booth on-site to speak to our expert technologists.
Tackle the AI/ML lifecycle with OpenShift AI. This guide helps you build adaptable, production-ready MLOps workflows, from data preparation to live inference.
This video demonstrates the scaling of an object used in a custom metric autoscaler. It depicts the example described in this article: https://developers.redhat.com/articles/2025/06/13/boost-ai-efficiency-gpu-autoscaling-using-keda-openshift#.
Learn how to use the CodeFlare SDK to submit RayJobs to a remote Ray cluster in OpenShift AI.
Learn how PagedAttention solves the memory waste problem of traditional LLM systems by breaking the cache into small, on-demand blocks.
Dive into the Q2’25 edition of Camel integration quarterly digest, covering the
Learn how to deploy Open Platform for Enterprise AI ChatQnA application in OpenShift with AMD Instinct hardware.
Learn about the advantages of prompt chaining and the ReAct framework compared to simpler agent architectures for complex tasks.
Discover how Bunsen tracks and analyzes large and busy upstream projects.
Discover the comprehensive security and scalability measures for a Models-as-a-Service (MaaS) platform in an enterprise environment.
Learn how to overcome compatibility challenges when deploying OpenShift AI and OpenShift Service Mesh 3 on one cluster.
Harness Llama Stack with Python for LLM development. Explore tool calling, agents, and Model Context Protocol (MCP) for versatile integrations.
This beginner's guide to Podman AI Lab walks through setting up Podman Desktop, installing the AI Lab extension, and launching your first RAG chatbot.
Get started with AI in Node.js. This cheat sheet covers selecting models, using servers like Ollama, and client libraries like LangChain.js for AI integration.
Ollama makes it easy for developers to get started with local model experimentation, while vLLM provides a path to reliable, efficient, and scalable deployment.
Learn how to build a Model-as-a-Service platform with this simple demo. (Part 3 of 4)
Catch up on the most popular articles published on Red Hat Developer this year. Get insights on Linux, AI, Argo CD, virtualization, GCC 15, and more.
Discover the new features and tools in RHEL 10.
Learn how RamaLama's integration with libkrun and microVMs enhances AI model isolation, security, and resource efficiency for deployments.
Boost inference performance by up to 2.5X with vLLM's Eagle 3 speculative decoding integration. Discover how in this blog post.
Explore the architecture of a Models-as-a-Service (MaaS) platform and how enterprises can create a secure and scalable environment for AI models. (Part 2 of 4)
Download the Red Hat Enterprise Linux 10 cheat sheet for a quick reference guide to essential commands, image building, and system management with RHEL.