Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

How RamaLama makes working with AI models boring

November 22, 2024
Daniel Walsh
Related topics:
Artificial intelligenceContainersOpen sourcePython
Related products:
Podman Desktop

Share:

    Over the last few months, our team has been working on a new AI project called RamaLama (Figure 1). Yes, another name that contains lama.

    The RamaLama project logo featuring a llama wearing aviator sunglasses and a leather jacket.

    What does RamaLama do?

    RamaLama facilitates local management and serving of AI models.

    RamaLama's goal is to make it easy for developers and administrators to run and serve AI models. RamaLama merges the world of AI inferencing with the world of containers as designed by Podman and Docker, and eventually, Kubernetes.

    When you first launch RamaLama, it inspects your system for GPU support, falling back to CPU support if no GPUs are present. It then uses a container engine like Podman or Docker to download a container image from quay.io/ramalama. The container image contains all the software necessary to run an AI model for your systems setup. Currently RamaLama supports llama.cpp and vLLM for running container models.

    Once the container image is in place, RamaLama pulls the specified AI model from any of types of model registries: Ollama, Hugging Face, OCI registry.

    At this point, once RamaLama has pulled the AI model, it’s showtime, baby! Time to run our inferencing runtime. RamaLama offers switchable inferencing runtimes, namely llama.cpp and vLLM, for running containerized models.

    RamaLama launches a container with the AI model volume mounted into the container, starting a chatbot or a rest API service from a simple single command. Models are treated similarly to how Podman and Docker treat container images. RamaLama works with Podman Desktop and Docker Desktop on macOS and Windows.

    Running AI workloads in containers eliminates the users need to configure the host system for AI.

    8 reasons to use RamaLama

    RamaLama thinks differently about LLMs, connecting your use cases with the rest of the Linux and container world. You should use RamaLama if:

    1. You want a simple and easy way to test out AI models.
    2. You don’t want to mess with installing specialized software to support your specific GPU.
    3. You want to find and pull models from any catalog including Hugging Face, Ollama, and even container registries.
    4. You want to use whichever runtime works best for your model and hardware combination: llama.cpp, vLLM, whisper.cpp, etc.
    5. You value running AI models in containers for the simplicity, collaborative properties, and existing infrastructure you have (container registries, CI/CD workflows, etc.).
    6. You want an easy path to run AI models on Podman, Docker, and Kubernetes.
    7. You love the power of running models at system boot using containers with Quadlets.
    8. You believe in the power of collaborative open source to enable the fastest and most creativity when tackling new problems in a fast-moving space.

    Why not just use Ollama?

    Realizing that lots of people currently use Ollama, we looked into working with it. While we loved its ease of use, we did not think it fit our needs. We decided to build an alternative tool that allows developers to run and serve AI models from a simple interface, while making it easy to take those models, put them in containers, and enable all of the local, collaborative, and production benefits that they offer.

    Differences between Ollama and RamaLama

    Table 1 compares Ollama and RamaLama capabilities.

    Table 1: Ollama versus RamaLama.

    Feature

    Ollama

    RamaLama

    Running models on host OS

    Defaults to running AI models locally on the host system.

    Defaults to running AI models in containers on the host system, but can also run them directly using the –nocontainer option.

    Running models on host container

    Not supported.

    Default. RamaLama wraps Podman or Docker and launches first, downloading a container with all of the AI tools ready to execute. It also downloads the AI model to the host, then launches the container with the AI model mounted into it, and runs the serving app. 

    Support for alternative AI runtimes

    Supports llama.cpp.

    Currently RamaLama supports llama.cpp and vLLM.

    Optimization and installation of AI software

    Statically linked with llama.cpp and it becomes a problem for the user to configure their host system to run the AI model.

    RamaLama downloads different container images with all of the software, optimized for your specific GPU configuration. 

    Benefit: Users get started faster and optimized for the specific GPU they have, similar to what Flatpak does to pull all of the display stuff at once and use it everywhere. The same optimized containers are used for every model you pull?

    AI model registry support

    Defaults to pulling images from Ollama; some support for Hugging Face, and no support for OCI content.

    Supports pulling from OCI, Ollama, and Hugging Face.

    Benefit: Sometimes the latest model is only available in one or two places. RamaLama lets you pull it from almost anywhere. If you can find what you want, you can pull it.

    Podman Quadlet generation

    None.

    RamaLama can generate a Podman Quadlet file suitable for launching the AI model and container underneath systemd as a service on an edge device. The Quadlet is based on the locally running AI model, making it easy for the developer to go from experimenting to using it in production.

    Kubernetes YAML generation

    None.

    RamaLama can generate a Kubernetes YAML file to enable users to easily move from a locally running AI model to running the same AI model in a Kubernetes cluster.

    Bottom line

    We want to iterate quickly on RamaLama and experiment with how we can help developers run and package AI workloads with different patterns like retrieval-augmented generation (RAG) models, Whisper support, summarizes, and other patterns.

    Install RamaLama

    You can install RamaLama via PyPi or the command line.

    PyPi

    RamaLama is available via PyPi:

    pipx install ramalama

    Install by script

    Install RamaLama by running one of the following one-liners.

    Linux:

    curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.sh | sudo bash

    macOS (run without sudo):

    curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.sh | bash

    Distro install

    Fedora:

    $ sudo dnf -y install ramalama

    We need your help!

    We want you to install the tool and try it out, and then give us feedback on what you think. 

    Looking for a project to contribute to? RamaLama welcomes you. It is written in simple Python and wraps other tools, so the barrier to contribute is low. We love help on documentation and potentially web design. This is definitely a community project where we can use varied talents.

    We are looking for help packaging RamaLama for other Linux distributions, Mac (Brew?), and Windows. We have it packaged for Fedora and plan on getting it into CentOS Stream and hopefully RHEL. But we really want to see if available everywhere you can run Podman and/or Docker.

    Related Posts

    • Integrate a private AI coding assistant into your CDE using Ollama, Continue, and OpenShift Dev Spaces

    • Experiment and test AI models with Podman AI Lab

    • Introducing Podman AI Lab: Developer tooling for working with LLMs

    • How to fine-tune Llama 3.1 with Ray on OpenShift AI

    • Introducing GPU support for Podman AI Lab

    • Getting started with Podman AI Lab

    Recent Posts

    • How to run a fraud detection AI model on RHEL CVMs

    • How we use software provenance at Red Hat

    • Alternatives to creating bootc images from scratch

    • How to update OpenStack Services on OpenShift

    • How to integrate vLLM inference into your macOS and iOS apps

    What’s up next?

    Download a free preview of Applied AI for Enterprise Java Development (O’Reilly), a practical guide for Java developers who want to build AI applications.

    Get the e-book preview
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue