
GuideLLM: Evaluate LLM deployments for real-world inference
Learn how to evaluate the performance of your LLM deployments with the open source GuideLLM toolkit to optimize cost, reliability, and user experience.
Learn how to evaluate the performance of your LLM deployments with the open source GuideLLM toolkit to optimize cost, reliability, and user experience.
RamaLama's new multimodal feature integrates vision-language models with containers. Discover how it helps developers download and serve multimodal AI models.
Integrate Red Hat AI Inference Server with LangChain to build agentic document processing workflows. This article presents a use case and Python code.
Explore Red Hat Summit 2025 with Dan Russo and Repo, the Red Hat Developer mascot!
Discover how to deploy compressed, fine-tuned models for efficient inference with the new Axolotl and LLM Compressor integration.
Learn how to run vLLM on CPUs with OpenShift using Kubernetes APIs and dive into performance experiments for LLM benchmarking in this beginner-friendly guide.
Discover why Kafka is the foundation behind modular, scalable, and controllable AI automation.
Learn how to secure, observe, and control AI models at scale without code changes to simplify zero-trust deployments by using service mesh.
Enhance your Node.js AI applications with distributed tracing. Discover how to use Jaeger and OpenTelemetry for insights into Llama Stack interactions.
Learn how to deploy a Whisper model on Red Hat AI Inference Server within a RHEL 9 environment using Podman containers and NVIDIA GPUs for speech recognition.
Learn to build a chatbot leveraging vLLM for generative AI inference. This guide provides source code and steps to connect to a Llama Stack Swift SDK server.
Deploy AI at the edge with Red Hat OpenShift AI. Learn to set up OpenShift AI, configure storage, train models, and serve using KServe's RawDeployment.
Dive into the world of containers and Kubernetes with Podman Desktop, an open-source tool to empower your container development workflow, and seamlessly deploy applications to local and remote Kubernetes environments. For developers, operations, and those looking to simplify building and deploying containers, Podman Desktop provides an intuitive interface compatible with container engines such as Podman, Docker, Lima, and more.
Learn about the Podman AI Lab and how you can start using it today for testing and building AI-enabled applications. As an extension for Podman Desktop, the container & cloud-native tool for application developers and administrators, the AI Lab is your one-stop-shop for popular generative AI use cases like summarizers, chatbots, and RAG applications. In addition, from the model catalog, you can easily download and start AI models as local services on your machine. We'll cover this and more, and be sure to try out the Podman AI Lab today!
Learn to harness the power of natural language processing by creating LLM tools with Apache Camel's low-code UI. Engage with this interactive tutorial in the Developer Sandbox for a hands-on experience.
In this video, Maarten demonstrates a neural network and how it works in AI/ML models. Neural networks are a class of ML models inspired by the human brain, made up of interconnected units of neurons, or nodes. Neural networks are the foundation of many AI applications, including image recognition, speech processing, and natural language understanding.
In this recording, we demonstrate how to compose model compression experiments, highlighting the benefits of advanced algorithms requiring custom data sets and how evaluation results and model artifacts can be shared with stakeholders.
Podman enables developers to run Linux containers on MacOS within virtual machines, including GPU acceleration for improved AI inference performance.
Learn how to control the output of vLLM's AI responses with structured outputs. Discover how to define choice lists, JSON schemas, regex, and more.
Headed to WeAreDevelopers World Congress 2025? Visit the Red Hat Developer booth on-site to speak to our expert technologists.
Explore how to utilize guardrails for safety mechanisms in large language models (LLMs) with Node.js and Llama Stack, focusing on LlamaGuard and PromptGuard.
Learn how to optimize GPU resource use with NVIDIA Multi-Instance GPU (MIG) and discover how MIG-Adapter enhances GPU resource utilization in Kubernetes.
Members from the Red Hat Node.js team were recently at PowerUp 2025. It was held
Discover how IBM used OpenShift AI to maximize GPU efficiency on its internal AI supercomputer, using open source tools like Kueue for efficient AI workloads.
Gain detailed insights into vLLM deployments on OpenShift AI. Learn to build dashboards with Dynatrace and OpenTelemetry to enable reliable LLM performance.