
How to build a simple agentic AI server with MCP
Build a simple agentic AI server with the Model Context Protocol (MCP). This guide shows you how to create a Python-based weather tool.
Build a simple agentic AI server with the Model Context Protocol (MCP). This guide shows you how to create a Python-based weather tool.
Learn how vLLM outperforms Ollama in high-performance production deployments, delivering significantly higher throughput and lower latency.
Learn how to perform large-scale, distributed batch inference on Red Hat OpenShift AI using the CodeFlare SDK with Ray Data and vLLM.
Enterprise-grade artificial intelligence and machine learning (AI/ML) for
This tutorial shows you how to use the Llama Stack API to implement retrieval-augmented generation for an AI application built with Python.
Headed to JavaZone2025? Visit the Red Hat Developer booth on-site to speak to our expert technologists.
Tackle the AI/ML lifecycle with OpenShift AI. This guide helps you build adaptable, production-ready MLOps workflows, from data preparation to live inference.
Learn how to use the CodeFlare SDK to submit RayJobs to a remote Ray cluster in OpenShift AI.
Learn how PagedAttention solves the memory waste problem of traditional LLM systems by breaking the cache into small, on-demand blocks.
Learn how to deploy Open Platform for Enterprise AI ChatQnA application in OpenShift with AMD Instinct hardware.
Transform your domain expertise into intelligent applications that deliver real
Learn about the advantages of prompt chaining and the ReAct framework compared to simpler agent architectures for complex tasks.
Learn how to overcome compatibility challenges when deploying OpenShift AI and OpenShift Service Mesh 3 on one cluster.
Harness Llama Stack with Python for LLM development. Explore tool calling, agents, and Model Context Protocol (MCP) for versatile integrations.
This beginner's guide to Podman AI Lab walks through setting up Podman Desktop, installing the AI Lab extension, and launching your first RAG chatbot.
Ollama makes it easy for developers to get started with local model experimentation, while vLLM provides a path to reliable, efficient, and scalable deployment.
Learn how to build a Model-as-a-Service platform with this simple demo. (Part 3 of 4)
Catch up on the most popular articles published on Red Hat Developer this year. Get insights on Linux, AI, Argo CD, virtualization, GCC 15, and more.
Learn how RamaLama's integration with libkrun and microVMs enhances AI model isolation, security, and resource efficiency for deployments.
Boost inference performance by up to 2.5X with vLLM's Eagle 3 speculative decoding integration. Discover how in this blog post.
Explore the architecture of a Models-as-a-Service (MaaS) platform and how enterprises can create a secure and scalable environment for AI models. (Part 2 of 4)
Discover how to communicate with vLLM using the OpenAI spec as implemented by the SwiftOpenAI and MacPaw/OpenAI open source projects.
Discover how model compression slashes LLM deployment costs for technical practitioners, covering quantization, pruning, distillation, and speculative decoding.
This article introduces Models-as-a-Service (MaaS) for enterprises, outlining the challenges, benefits, key technologies, and workflows. (Part 1 of 4)