Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

How RamaLama makes working with AI models boring

November 22, 2024
Daniel Walsh
Related topics:
Artificial intelligenceContainersOpen sourcePython
Related products:
Podman Desktop

    Over the last few months, our team has been working on a new AI project called RamaLama (Figure 1). Yes, another name that contains lama.

    The RamaLama project logo featuring a llama wearing aviator sunglasses and a leather jacket.

    What does RamaLama do?

    RamaLama facilitates local management and serving of AI models.

    RamaLama's goal is to make it easy for developers and administrators to run and serve AI models. RamaLama merges the world of AI inferencing with the world of containers as designed by Podman and Docker, and eventually, Kubernetes.

    When you first launch RamaLama, it inspects your system for GPU support, falling back to CPU support if no GPUs are present. It then uses a container engine like Podman or Docker to download a container image from quay.io/ramalama. The container image contains all the software necessary to run an AI model for your systems setup. Currently RamaLama supports llama.cpp and vLLM for running container models.

    Once the container image is in place, RamaLama pulls the specified AI model from any of types of model registries: Ollama, Hugging Face, OCI registry.

    At this point, once RamaLama has pulled the AI model, it’s showtime, baby! Time to run our inferencing runtime. RamaLama offers switchable inferencing runtimes, namely llama.cpp and vLLM, for running containerized models.

    RamaLama launches a container with the AI model volume mounted into the container, starting a chatbot or a rest API service from a simple single command. Models are treated similarly to how Podman and Docker treat container images. RamaLama works with Podman Desktop and Docker Desktop on macOS and Windows.

    Running AI workloads in containers eliminates the users need to configure the host system for AI.

    8 reasons to use RamaLama

    RamaLama thinks differently about LLMs, connecting your use cases with the rest of the Linux and container world. You should use RamaLama if:

    1. You want a simple and easy way to test out AI models.
    2. You don’t want to mess with installing specialized software to support your specific GPU.
    3. You want to find and pull models from any catalog including Hugging Face, Ollama, and even container registries.
    4. You want to use whichever runtime works best for your model and hardware combination: llama.cpp, vLLM, whisper.cpp, etc.
    5. You value running AI models in containers for the simplicity, collaborative properties, and existing infrastructure you have (container registries, CI/CD workflows, etc.).
    6. You want an easy path to run AI models on Podman, Docker, and Kubernetes.
    7. You love the power of running models at system boot using containers with Quadlets.
    8. You believe in the power of collaborative open source to enable the fastest and most creativity when tackling new problems in a fast-moving space.

    Why not just use Ollama?

    Realizing that lots of people currently use Ollama, we looked into working with it. While we loved its ease of use, we did not think it fit our needs. We decided to build an alternative tool that allows developers to run and serve AI models from a simple interface, while making it easy to take those models, put them in containers, and enable all of the local, collaborative, and production benefits that they offer.

    Differences between Ollama and RamaLama

    Table 1 compares Ollama and RamaLama capabilities.

    Table 1: Ollama versus RamaLama.

    Feature

    Ollama

    RamaLama

    Running models on host OS

    Defaults to running AI models locally on the host system.

    Defaults to running AI models in containers on the host system, but can also run them directly using the –nocontainer option.

    Running models on host container

    Not supported.

    Default. RamaLama wraps Podman or Docker and launches first, downloading a container with all of the AI tools ready to execute. It also downloads the AI model to the host, then launches the container with the AI model mounted into it, and runs the serving app. 

    Support for alternative AI runtimes

    Supports llama.cpp.

    Currently RamaLama supports llama.cpp and vLLM.

    Optimization and installation of AI software

    Statically linked with llama.cpp and it becomes a problem for the user to configure their host system to run the AI model.

    RamaLama downloads different container images with all of the software, optimized for your specific GPU configuration. 

    Benefit: Users get started faster and optimized for the specific GPU they have, similar to what Flatpak does to pull all of the display stuff at once and use it everywhere. The same optimized containers are used for every model you pull?

    AI model registry support

    Defaults to pulling images from Ollama; some support for Hugging Face, and no support for OCI content.

    Supports pulling from OCI, Ollama, and Hugging Face.

    Benefit: Sometimes the latest model is only available in one or two places. RamaLama lets you pull it from almost anywhere. If you can find what you want, you can pull it.

    Podman Quadlet generation

    None.

    RamaLama can generate a Podman Quadlet file suitable for launching the AI model and container underneath systemd as a service on an edge device. The Quadlet is based on the locally running AI model, making it easy for the developer to go from experimenting to using it in production.

    Kubernetes YAML generation

    None.

    RamaLama can generate a Kubernetes YAML file to enable users to easily move from a locally running AI model to running the same AI model in a Kubernetes cluster.

    Bottom line

    We want to iterate quickly on RamaLama and experiment with how we can help developers run and package AI workloads with different patterns like retrieval-augmented generation (RAG) models, Whisper support, summarizes, and other patterns.

    Install RamaLama

    You can install RamaLama via PyPi or the command line.

    PyPi

    RamaLama is available via PyPi:

    pipx install ramalama

    Install by script

    Install RamaLama by running one of the following one-liners.

    Linux:

    curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.sh | sudo bash

    macOS (run without sudo):

    curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.sh | bash

    Distro install

    Fedora:

    $ sudo dnf -y install ramalama

    We need your help!

    We want you to install the tool and try it out, and then give us feedback on what you think. 

    Looking for a project to contribute to? RamaLama welcomes you. It is written in simple Python and wraps other tools, so the barrier to contribute is low. We love help on documentation and potentially web design. This is definitely a community project where we can use varied talents.

    We are looking for help packaging RamaLama for other Linux distributions, Mac (Brew?), and Windows. We have it packaged for Fedora and plan on getting it into CentOS Stream and hopefully RHEL. But we really want to see if available everywhere you can run Podman and/or Docker.

    Related Posts

    • Integrate a private AI coding assistant into your CDE using Ollama, Continue, and OpenShift Dev Spaces

    • Experiment and test AI models with Podman AI Lab

    • Introducing Podman AI Lab: Developer tooling for working with LLMs

    • How to fine-tune Llama 3.1 with Ray on OpenShift AI

    • Introducing GPU support for Podman AI Lab

    • Getting started with Podman AI Lab

    Recent Posts

    • Red Hat UBI 8 builders have been promoted to the Paketo Buildpacks organization

    • Using eBPF in Red Hat products

    • How we made one data layer serve the UI, the mocks, and the E2E tests

    • Build trusted Python containers with Project Hummingbird and Calunga

    • Simplify distributed tracing: ObservabilityInstaller installation

    What’s up next?

    Download a free preview of Applied AI for Enterprise Java Development (O’Reilly), a practical guide for Java developers who want to build AI applications.

    Get the e-book preview
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility