
How to run AI models in cloud development environments
Explore using RamaLama to run private AI inference in cloud development environments and improve productivity. Follow this tutorial to get started.
Explore using RamaLama to run private AI inference in cloud development environments and improve productivity. Follow this tutorial to get started.
Enhance your Node.js AI applications with distributed tracing. Discover how to use Jaeger and OpenTelemetry for insights into Llama Stack interactions.
Learn how to deploy a Whisper model on Red Hat AI Inference Server within a RHEL 9 environment using Podman containers and NVIDIA GPUs for speech recognition.
Learn to build a chatbot leveraging vLLM for generative AI inference. This guide provides source code and steps to connect to a Llama Stack Swift SDK server.
Deploy AI at the edge with Red Hat OpenShift AI. Learn to set up OpenShift AI, configure storage, train models, and serve using KServe's RawDeployment.
Dive into the world of containers and Kubernetes with Podman Desktop, an open-source tool to empower your container development workflow, and seamlessly deploy applications to local and remote Kubernetes environments. For developers, operations, and those looking to simplify building and deploying containers, Podman Desktop provides an intuitive interface compatible with container engines such as Podman, Docker, Lima, and more.
Learn about the Podman AI Lab and how you can start using it today for testing and building AI-enabled applications. As an extension for Podman Desktop, the container & cloud-native tool for application developers and administrators, the AI Lab is your one-stop-shop for popular generative AI use cases like summarizers, chatbots, and RAG applications. In addition, from the model catalog, you can easily download and start AI models as local services on your machine. We'll cover this and more, and be sure to try out the Podman AI Lab today!
Learn to harness the power of natural language processing by creating LLM tools with Apache Camel's low-code UI. Engage with this interactive tutorial in the Developer Sandbox for a hands-on experience.
In this video, Maarten demonstrates a neural network and how it works in AI/ML models. Neural networks are a class of ML models inspired by the human brain, made up of interconnected units of neurons, or nodes. Neural networks are the foundation of many AI applications, including image recognition, speech processing, and natural language understanding.
In this recording, we demonstrate how to compose model compression experiments, highlighting the benefits of advanced algorithms requiring custom data sets and how evaluation results and model artifacts can be shared with stakeholders.
Learn how to control the output of vLLM's AI responses with structured outputs. Discover how to define choice lists, JSON schemas, regex, and more.
Learn how Podman AI Lab and RamaLama work together to simplify local AI model execution, using containers and GPU support for faster, easier AI development.
Automate technical documentation with an AI agent that scans repositories, generates full TechDocs, and seamlessly integrates with Red Hat Developer Hub.
Explore how to utilize guardrails for safety mechanisms in large language models (LLMs) with Node.js and Llama Stack, focusing on LlamaGuard and PromptGuard.
Learn how to optimize GPU resource use with NVIDIA Multi-Instance GPU (MIG) and discover how MIG-Adapter enhances GPU resource utilization in Kubernetes.
Members from the Red Hat Node.js team were recently at PowerUp 2025. It was held
Discover how IBM used OpenShift AI to maximize GPU efficiency on its internal AI supercomputer, using open source tools like Kueue for efficient AI workloads.
Learn how integration powers AI with Apache Camel at Devoxx UK 2025. Explore
Gain detailed insights into vLLM deployments on OpenShift AI. Learn to build dashboards with Dynatrace and OpenTelemetry to enable reliable LLM performance.
Learn how to use Red Hat OpenShift AI to quickly develop, train, and deploy
Explore the complete machine learning operations (MLOps) pipeline utilizing Red
llm-d delivers Kubernetes-native distributed inference with advanced optimizations, reducing latency and maximizing throughput.
LLM Semantic Router uses semantic understanding and caching to boost performance, cut costs, and enable efficient inference with llm-d.
Optimize model inference and reduce costs with model compression techniques like quantization and pruning with LLM Compressor on Red Hat OpenShift AI.
Learn how to use synthetic data generation (SDG) and fine-tuning in Red Hat AI to customize reasoning models for your enterprise workflows.