Red Hat at JavaZone 2025
Headed to JavaZone2025? Visit the Red Hat Developer booth on-site to speak to our expert technologists.
Headed to JavaZone2025? Visit the Red Hat Developer booth on-site to speak to our expert technologists.
Tackle the AI/ML lifecycle with OpenShift AI. This guide helps you build adaptable, production-ready MLOps workflows, from data preparation to live inference.
Learn how to use the CodeFlare SDK to submit RayJobs to a remote Ray cluster in OpenShift AI.
Learn how PagedAttention solves the memory waste problem of traditional LLM systems by breaking the cache into small, on-demand blocks.
Learn how to deploy Open Platform for Enterprise AI ChatQnA application in OpenShift with AMD Instinct hardware.
Learn about the advantages of prompt chaining and the ReAct framework compared to simpler agent architectures for complex tasks.
Learn how to overcome compatibility challenges when deploying OpenShift AI and OpenShift Service Mesh 3 on one cluster.
Harness Llama Stack with Python for LLM development. Explore tool calling, agents, and Model Context Protocol (MCP) for versatile integrations.
This beginner's guide to Podman AI Lab walks through setting up Podman Desktop, installing the AI Lab extension, and launching your first RAG chatbot.
Ollama makes it easy for developers to get started with local model experimentation, while vLLM provides a path to reliable, efficient, and scalable deployment.
Learn how to build a Model-as-a-Service platform with this simple demo. (Part 3 of 4)
Catch up on the most popular articles published on Red Hat Developer this year. Get insights on Linux, AI, Argo CD, virtualization, GCC 15, and more.
Learn how RamaLama's integration with libkrun and microVMs enhances AI model isolation, security, and resource efficiency for deployments.
Boost inference performance by up to 2.5X with vLLM's Eagle 3 speculative decoding integration. Discover how in this blog post.
Explore the architecture of a Models-as-a-Service (MaaS) platform and how enterprises can create a secure and scalable environment for AI models. (Part 2 of 4)
Discover how to communicate with vLLM using the OpenAI spec as implemented by the SwiftOpenAI and MacPaw/OpenAI open source projects.
Discover how model compression slashes LLM deployment costs for technical practitioners, covering quantization, pruning, distillation, and speculative decoding.
This article introduces Models-as-a-Service (MaaS) for enterprises, outlining the challenges, benefits, key technologies, and workflows. (Part 1 of 4)
Learn how to evaluate the performance of your LLM deployments with the open source GuideLLM toolkit to optimize cost, reliability, and user experience.
RamaLama's new multimodal feature integrates vision-language models with containers. Discover how it helps developers download and serve multimodal AI models.
Integrate Red Hat AI Inference Server with LangChain to build agentic document processing workflows. This article presents a use case and Python code.
Explore Red Hat Summit 2025 with Dan Russo and Repo, the Red Hat Developer mascot!
Discover how to deploy compressed, fine-tuned models for efficient inference with the new Axolotl and LLM Compressor integration.
Learn how to run vLLM on CPUs with OpenShift using Kubernetes APIs and dive into performance experiments for LLM benchmarking in this beginner-friendly guide.