Building trustworthy AI: A developer's guide to production-ready systems
Go beyond performance and accuracy. This guide for technical practitioners details how to implement trust, transparency, and safety into your AI workflows.
Go beyond performance and accuracy. This guide for technical practitioners details how to implement trust, transparency, and safety into your AI workflows.
The Agent2Agent (A2A) protocol is an open standard enabling seamless communication between AI agents. Here are the key things to know before getting started.
llm-d optimizes LLM inference at scale with disaggregated prefill/decode, smart caching, and Kubernetes-native architecture for production environments.
Learn how to optimize LLMs like Granite 3.3 for better performance and efficiency on a single server by using open source tools like LLM Compressor and vLLM.
Explore the evolving LLM post-training datasets, the various formats, and transformation process from structured datasets into token sequences.
Learn how to establish communication with vLLM using Apple Foundation and Alamofire for low-level HTTP interactions in macOS and iOS applications.
Build a simple agentic AI server with the Model Context Protocol (MCP). This guide shows you how to create a Python-based weather tool.
Learn how vLLM outperforms Ollama in high-performance production deployments, delivering significantly higher throughput and lower latency.
Learn how to perform large-scale, distributed batch inference on Red Hat OpenShift AI using the CodeFlare SDK with Ray Data and vLLM.
Enterprise-grade artificial intelligence and machine learning (AI/ML) for
This tutorial shows you how to use the Llama Stack API to implement retrieval-augmented generation for an AI application built with Python.
Headed to JavaZone2025? Visit the Red Hat Developer booth on-site to speak to our expert technologists.
Tackle the AI/ML lifecycle with OpenShift AI. This guide helps you build adaptable, production-ready MLOps workflows, from data preparation to live inference.
Learn how to use the CodeFlare SDK to submit RayJobs to a remote Ray cluster in OpenShift AI.
Learn how PagedAttention solves the memory waste problem of traditional LLM systems by breaking the cache into small, on-demand blocks.
Learn how to deploy Open Platform for Enterprise AI ChatQnA application in OpenShift with AMD Instinct hardware.
Learn about the advantages of prompt chaining and the ReAct framework compared to simpler agent architectures for complex tasks.
Learn how to overcome compatibility challenges when deploying OpenShift AI and OpenShift Service Mesh 3 on one cluster.
Harness Llama Stack with Python for LLM development. Explore tool calling, agents, and Model Context Protocol (MCP) for versatile integrations.
This beginner's guide to Podman AI Lab walks through setting up Podman Desktop, installing the AI Lab extension, and launching your first RAG chatbot.
Ollama makes it easy for developers to get started with local model experimentation, while vLLM provides a path to reliable, efficient, and scalable deployment.
Learn how to build a Model-as-a-Service platform with this simple demo. (Part 3 of 4)
Catch up on the most popular articles published on Red Hat Developer this year. Get insights on Linux, AI, Argo CD, virtualization, GCC 15, and more.
Learn how RamaLama's integration with libkrun and microVMs enhances AI model isolation, security, and resource efficiency for deployments.
Boost inference performance by up to 2.5X with vLLM's Eagle 3 speculative decoding integration. Discover how in this blog post.