Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

How RamaLama makes working with AI models boring

November 22, 2024
Daniel Walsh
Related topics:
Artificial intelligenceContainersOpen sourcePython
Related products:
Podman Desktop

Share:

    Over the last few months, our team has been working on a new AI project called RamaLama (Figure 1). Yes, another name that contains lama.

    The RamaLama project logo featuring a llama wearing aviator sunglasses and a leather jacket.

    What does RamaLama do?

    RamaLama facilitates local management and serving of AI models.

    RamaLama's goal is to make it easy for developers and administrators to run and serve AI models. RamaLama merges the world of AI inferencing with the world of containers as designed by Podman and Docker, and eventually, Kubernetes.

    When you first launch RamaLama, it inspects your system for GPU support, falling back to CPU support if no GPUs are present. It then uses a container engine like Podman or Docker to download a container image from quay.io/ramalama. The container image contains all the software necessary to run an AI model for your systems setup. Currently RamaLama supports llama.cpp and vLLM for running container models.

    Once the container image is in place, RamaLama pulls the specified AI model from any of types of model registries: Ollama, Hugging Face, OCI registry.

    At this point, once RamaLama has pulled the AI model, it’s showtime, baby! Time to run our inferencing runtime. RamaLama offers switchable inferencing runtimes, namely llama.cpp and vLLM, for running containerized models.

    RamaLama launches a container with the AI model volume mounted into the container, starting a chatbot or a rest API service from a simple single command. Models are treated similarly to how Podman and Docker treat container images. RamaLama works with Podman Desktop and Docker Desktop on macOS and Windows.

    Running AI workloads in containers eliminates the users need to configure the host system for AI.

    8 reasons to use RamaLama

    RamaLama thinks differently about LLMs, connecting your use cases with the rest of the Linux and container world. You should use RamaLama if:

    1. You want a simple and easy way to test out AI models.
    2. You don’t want to mess with installing specialized software to support your specific GPU.
    3. You want to find and pull models from any catalog including Hugging Face, Ollama, and even container registries.
    4. You want to use whichever runtime works best for your model and hardware combination: llama.cpp, vLLM, whisper.cpp, etc.
    5. You value running AI models in containers for the simplicity, collaborative properties, and existing infrastructure you have (container registries, CI/CD workflows, etc.).
    6. You want an easy path to run AI models on Podman, Docker, and Kubernetes.
    7. You love the power of running models at system boot using containers with Quadlets.
    8. You believe in the power of collaborative open source to enable the fastest and most creativity when tackling new problems in a fast-moving space.

    Why not just use Ollama?

    Realizing that lots of people currently use Ollama, we looked into working with it. While we loved its ease of use, we did not think it fit our needs. We decided to build an alternative tool that allows developers to run and serve AI models from a simple interface, while making it easy to take those models, put them in containers, and enable all of the local, collaborative, and production benefits that they offer.

    Differences between Ollama and RamaLama

    Table 1 compares Ollama and RamaLama capabilities.

    Table 1: Ollama versus RamaLama.

    Feature

    Ollama

    RamaLama

    Running models on host OS

    Defaults to running AI models locally on the host system.

    Defaults to running AI models in containers on the host system, but can also run them directly using the –nocontainer option.

    Running models on host container

    Not supported.

    Default. RamaLama wraps Podman or Docker and launches first, downloading a container with all of the AI tools ready to execute. It also downloads the AI model to the host, then launches the container with the AI model mounted into it, and runs the serving app. 

    Support for alternative AI runtimes

    Supports llama.cpp.

    Currently RamaLama supports llama.cpp and vLLM.

    Optimization and installation of AI software

    Statically linked with llama.cpp and it becomes a problem for the user to configure their host system to run the AI model.

    RamaLama downloads different container images with all of the software, optimized for your specific GPU configuration. 

    Benefit: Users get started faster and optimized for the specific GPU they have, similar to what Flatpak does to pull all of the display stuff at once and use it everywhere. The same optimized containers are used for every model you pull?

    AI model registry support

    Defaults to pulling images from Ollama; some support for Hugging Face, and no support for OCI content.

    Supports pulling from OCI, Ollama, and Hugging Face.

    Benefit: Sometimes the latest model is only available in one or two places. RamaLama lets you pull it from almost anywhere. If you can find what you want, you can pull it.

    Podman Quadlet generation

    None.

    RamaLama can generate a Podman Quadlet file suitable for launching the AI model and container underneath systemd as a service on an edge device. The Quadlet is based on the locally running AI model, making it easy for the developer to go from experimenting to using it in production.

    Kubernetes YAML generation

    None.

    RamaLama can generate a Kubernetes YAML file to enable users to easily move from a locally running AI model to running the same AI model in a Kubernetes cluster.

    Bottom line

    We want to iterate quickly on RamaLama and experiment with how we can help developers run and package AI workloads with different patterns like retrieval-augmented generation (RAG) models, Whisper support, summarizes, and other patterns.

    Install RamaLama

    You can install RamaLama via PyPi or the command line.

    PyPi

    RamaLama is available via PyPi:

    pipx install ramalama

    Install by script

    Install RamaLama by running one of the following one-liners.

    Linux:

    curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.sh | sudo bash

    macOS (run without sudo):

    curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.sh | bash

    Distro install

    Fedora:

    $ sudo dnf -y install ramalama

    We need your help!

    We want you to install the tool and try it out, and then give us feedback on what you think. 

    Looking for a project to contribute to? RamaLama welcomes you. It is written in simple Python and wraps other tools, so the barrier to contribute is low. We love help on documentation and potentially web design. This is definitely a community project where we can use varied talents.

    We are looking for help packaging RamaLama for other Linux distributions, Mac (Brew?), and Windows. We have it packaged for Fedora and plan on getting it into CentOS Stream and hopefully RHEL. But we really want to see if available everywhere you can run Podman and/or Docker.

    Related Posts

    • Integrate a private AI coding assistant into your CDE using Ollama, Continue, and OpenShift Dev Spaces

    • Experiment and test AI models with Podman AI Lab

    • Introducing Podman AI Lab: Developer tooling for working with LLMs

    • How to fine-tune Llama 3.1 with Ray on OpenShift AI

    • Introducing GPU support for Podman AI Lab

    • Getting started with Podman AI Lab

    Recent Posts

    • Kubernetes MCP server: AI-powered cluster management

    • Unlocking the power of OpenShift Service Mesh 3

    • Run DialoGPT-small on OpenShift AI for internal model testing

    • Skopeo: The unsung hero of Linux container-tools

    • Automate certificate management in OpenShift

    What’s up next?

    Download a free preview of Applied AI for Enterprise Java Development (O’Reilly), a practical guide for Java developers who want to build AI applications.

    Get the e-book preview
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue