
Integrate your AI services and cloud-native applications into a single platform
Explore AI integration in this DevNation Day Santiago 2024 deep dive.
Explore AI integration in this DevNation Day Santiago 2024 deep dive.
Llama Stack offers an alternative to the OpenAI Responses API, enabling multi-step agents, RAG, and tool use on your own infrastructure with any model.
See how a custom MCP client for Docling transformed unstructured data into usable content, reducing document prep time by over 80%.
Go beyond performance and accuracy. This guide for technical practitioners details how to implement trust, transparency, and safety into your AI workflows.
The Agent2Agent (A2A) protocol is an open standard enabling seamless communication between AI agents. Here are the key things to know before getting started.
llm-d optimizes LLM inference at scale with disaggregated prefill/decode, smart caching, and Kubernetes-native architecture for production environments.
Learn how to optimize LLMs like Granite 3.3 for better performance and efficiency on a single server by using open source tools like LLM Compressor and vLLM.
Explore the evolving LLM post-training datasets, the various formats, and transformation process from structured datasets into token sequences.
Learn how to establish communication with vLLM using Apple Foundation and Alamofire for low-level HTTP interactions in macOS and iOS applications.
Build a simple agentic AI server with the Model Context Protocol (MCP). This guide shows you how to create a Python-based weather tool.
This article demonstrates how to implement GPU-based autoscaling on the Red Hat OpenShift platform using custom metrics autoscaler (KEDA) and Prometheus.
Learn how vLLM outperforms Ollama in high-performance production deployments, delivering significantly higher throughput and lower latency.
Learn how to perform large-scale, distributed batch inference on Red Hat OpenShift AI using the CodeFlare SDK with Ray Data and vLLM.
Enterprise-grade artificial intelligence and machine learning (AI/ML) for
This tutorial shows you how to use the Llama Stack API to implement retrieval-augmented generation for an AI application built with Python.
Headed to JavaZone2025? Visit the Red Hat Developer booth on-site to speak to our expert technologists.
Tackle the AI/ML lifecycle with OpenShift AI. This guide helps you build adaptable, production-ready MLOps workflows, from data preparation to live inference.
This video demonstrates the scaling of an object used in a custom metric autoscaler. It depicts the example described in this article: https://developers.redhat.com/articles/2025/06/13/boost-ai-efficiency-gpu-autoscaling-using-keda-openshift#.
Learn how to use the CodeFlare SDK to submit RayJobs to a remote Ray cluster in OpenShift AI.
Learn how PagedAttention solves the memory waste problem of traditional LLM systems by breaking the cache into small, on-demand blocks.
Dive into the Q2’25 edition of Camel integration quarterly digest, covering the
Learn how to deploy Open Platform for Enterprise AI ChatQnA application in OpenShift with AMD Instinct hardware.
Transform your domain expertise into intelligent applications that deliver real
Learn about the advantages of prompt chaining and the ReAct framework compared to simpler agent architectures for complex tasks.
Discover how Bunsen tracks and analyzes large and busy upstream projects.