How to run AI models in cloud development environments

This learning path explores running AI models, specifically Large Language Models (LLMs), in cloud development environments to enhance developer efficiency and data security. It introduces RamaLama, an open-source tool launched in mid-2024, designed to simplify AI workflows by integrating with container technologies.

Access the Developer Sandbox

Prerequisites:

In this lesson, you will:

  • Learn about RamaLama and its performance compared to other tools.

What is RamaLama?

RamaLama was officially launched as part of the Containers organization, with its initial development beginning in mid-2024. The project was spearheaded by Eric Curtin and Dan Walsh, who aimed to simplify AI workflows by integrating them with container technologies. The tool is designed to make working with AI models effortless by leveraging Open Container Initiative (OCI) containers and container engines like Podman.

If you want to know more about RamaLama, read the following articles:

RamaLama vs. other tools

There are tools out there that allow developers to run large language models (LLMs) locally. RamaLama stands out because it brings AI inference to the world of containers, making it easier to manage and serve AI models. By default, RamaLama runs AI models in isolated container environments using Podman. This eliminates the risk of an LLM accessing the host system. Additionally, you can run the LLM in an air-gapped environment by providing a configuration to RamaLama. 

RamaLama is designed to run models in a containerized environment, making it an ideal choice for running and testing LLMs locally and in cloud environments.

It allows packaging an LLM into OCI images and pushes models to OCI registries. Apart from OCI registries, it’s compatible with HuggingFace and Ollama registries.

The following comparison table highlights the key differences between RamaLama and one of the popular AI tools called Ollama:

Purpose

Ollama

RamaLama

Default deployment model

Unencapsulated runtime on the system

OCI container-based runtimes (Podman/Docker)

Model registry compatibility

Uses ollama registry (proprietary)

Supports HuggingFace, Ollama, OCI Container registries

Containerization

Not explicitly container-focused

Explicitly focused on container runtimes

Security

Local-only by default, no built-in network exposure, closed-source client and model registry

Containers isolate execution, allows hardened environments, supports trusted model sources

Community

Limited external contributions and engagement with upstream projects

Open source-driven, aligned with OCI standards

Privacy

Lacks default encapsulation, may affect security and resource management

Fully offline capability for air-gapped usage

RAG support

No native RAG support

Built-in RAG command to process documents and create containerized vector stores

Previous resource
Overview: How to run AI models in cloud development environments
Next resource
How to serve IBM Granite with RamaLama in Red Hat OpenShift Dev Spaces