Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Introducing Models-as-a-Service in OpenShift AI

November 25, 2025
Dmytro Zaharnytskyi
Related topics:
Artificial intelligence
Related products:
Red Hat AIRed Hat OpenShift AI

    Note

    March 2026 update: We are working on a new blog post reflecting all the recent changes to MaaS. In the meantime, some URLs and commands in this article are no longer valid. For the latest installation and configuration instructions, refer to the official quick start guide.

    This article explains how to deploy and manage Models-as-a-Service (MaaS) on Red Hat OpenShift, now available in developer preview. We'll begin by discussing the benefits of MaaS, highlighting how it enables organizations to share AI models at scale. Then, we'll guide you through the process of setting it up on OpenShift, deploying a sample model, and demonstrating how rate limiting protects your resources.

    What is Models-as-a-Service (MaaS)?

    With Models-as-a-Service (MaaS), you can deliver AI models as shared resources that users within an organization can access on demand. MaaS provides a ready-to-go AI foundation using standardized API endpoints, enabling organizations to share and access private, faster AI at scale.

    Red Hat OpenShift AI already lets you run AI models by exposing them via APIs and sharing. When sharing models with a large user base, though, you might find it hard to maintain quality-of-service by limiting excessive usage. OpenShift AI 3 introduces the Models-as-a-Service pattern, using Red Hat's Connectivity Link capabilities. This gives OpenShift AI admins better control over model access and rate limiting.

    Quick setup

    Let's prepare your environment for the Models-as-a-Service deployment.

    Prerequisites

    Ensure you have the following components available:

    • An OpenShift cluster (4.19.9 or later)
    • Red Hat OpenShift AI Operator 3
    • Red Hat Connectivity Link 1.2
    • CLI tools: oc, kubectl, jq, kustomize

    Deploy the MaaS infrastructure

    You can deploy the entire platform with a single script. Run the following commands while logged into your OpenShift cluster as a cluster administrator:

    curl -sSLo deploy-rhoai-stable.sh \
    https://raw.githubusercontent.com/opendatahub-io/maas-billing/refs/tags/0.0.1/deployment/scripts/deploy-rhoai-stable.sh
    chmod +x deploy-rhoai-stable.sh
    MAAS_REF="0.0.1" ./deploy-rhoai-stable.sh

    The deployment script creates a new Gateway object named maas-default-gateway, which serves as the ingress point for the MaaS system.

    oc describe Gateway maas-default-gateway -n openshift-ingress # View Gateway Info
    oc get Gateway maas-default-gateway -n openshift-ingress -o jsonpath='{.spec.listeners[0].hostname}' # Gateway's Hostname

    You can find more information on the MaaS architecture here.

    Deploy a sample model and test rate limiting

    Now let's deploy a lightweight GPU model and demonstrate how MaaS enforces rate limits.

    Deploy the IBM Granite model

    Enter the following to start the deployment:

    # Deploy and immediately watch the pod status (one line)
    kustomize build 
    "https://github.com/opendatahub-io/maas-billing//docs/samples/models/ibm-granite-2b-gpu" \
      | kubectl apply -f - && kubectl get pods -n llm -w

    This model is MaaS-enabled through its Gateway reference to maas-default-gateway. You can find more information about this configuration in the model setup documentation.

    Retrieve access token

    Create an access token for authentication:

    CLUSTER_DOMAIN=$(kubectl get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
    TOKEN=$(curl -sSk -X POST "https://maas.${CLUSTER_DOMAIN}/maas-api/v1/tokens" \
      -H "Authorization: Bearer $(oc whoami -t)" \
      -H "Content-Type: application/json" \
      -d '{"expiration": "10m"}' | jq -r '.token')
    echo "Token: ${TOKEN:0:50}..."

    Call the model

    Make a simple inference request to validate authentication:

    # List available models
    curl -sSk "https://maas.${CLUSTER_DOMAIN}/maas-api/v1/models" \
      -H "Authorization: Bearer $TOKEN" | jq
    # Send an inference request
    curl -sSk -X POST "https://maas.${CLUSTER_DOMAIN}/llm/ibm-granite-2b-gpu/v1/chat/completions" \
      -H "Authorization: Bearer $TOKEN" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "ibm-granite/granite-3.1-2b-instruct",
        "messages": [{"role": "user", "content": "Hello! What is your name?"}],
        "max_tokens": 50
      }' | jq

    View tier information

    Tier information for the Developer Preview is stored in a ConfigMap within the maas-api namespace. You can view this configuration by using the following command. Additional details are available in the tier overview documentation.

    oc describe cm tier-to-group-mapping -n maas-api

    User groups are mapped to tiers. By default, the free tier includes the system:authenticated group, which is automatically granted to all authenticated users. This means your current user is assigned to the free tier.

    Experience rate limiting

    You can view the free tier rate limits by using the following commands:

    oc get TokenRateLimitPolicy gateway-token-rate-limits -n openshift-ingress -o jsonpath='{.spec.limits.free-user-tokens.rates}'
    oc get RateLimitPolicy gateway-rate-limits -n openshift-ingress -o jsonpath='{.spec.limits.free.rates}'

    We can see the rate limits are:

    • 5 requests per 2 minutes (request-based limit)
    • 100 tokens per minute (token-based limit)

    To demonstrate rate limiting in action, execute the following commands to exceed the request limit:

    # Send 10 rapid requests (free tier allows only 5 per 2 minutes)
    for i in {1..10}; do
      HTTP_CODE=$(curl -sSk -o /dev/null -w "%{http_code}" -X POST \
        "https://maas.${CLUSTER_DOMAIN}/llm/ibm-granite-2b-gpu/v1/chat/completions" \
        -H "Authorization: Bearer $TOKEN" \
        -H "Content-Type: application/json" \
        -d '{"model": "ibm-granite/granite-3.1-2b-instruct", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 100}')
      
      echo "Request $i: HTTP $HTTP_CODE (should be 200 for first 5)"
    done

    Expected output:

    Request 1: HTTP 200
    Request 2: HTTP 200
    Request 3: HTTP 200
    Request 4: HTTP 200
    Request 5: HTTP 429 ← Rate limited!
    Request 6: HTTP 429
    Request 7: HTTP 429
    Request 8: HTTP 429
    Request 9: HTTP 429
    Request 10: HTTP 429

    The HTTP 429 response indicates that the rate limit has been reached. Note that rate limiting is based on the total number of tokens reported by the LLM, so the number of successful requests might vary depending on response token counts. The quota resets after 2 minutes, demonstrating the fair usage controls in action.

    Different tier

    To create and test an additional tier, follow this example:

    # Edit the tier configuration to match your organization's needs: 
    kubectl edit configmap tier-to-group-mapping -n maas-api
    # Create premium group 
    oc adm groups new premium-group 2>/dev/null 
    # Add current user to premium group
    CURRENT_USER=$(oc whoami)
    oc adm groups add-users premium-group $CURRENT_USER
    # Verify membership
    oc get group premium-group
    CLUSTER_DOMAIN=$(kubectl get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
    TOKEN=$(curl -sSk -X POST "https://maas.${CLUSTER_DOMAIN}/maas-api/v1/tokens" \
      -H "Authorization: Bearer $(oc whoami -t)" \
      -H "Content-Type: application/json" \
      -d '{"expiration": "10m"}' | jq -r '.token')
    # Test PREMIUM tier (20 requests allowed)
    echo "Testing PREMIUM tier (20 requests per 2 minutes):"
    for i in {1..25}; do
      HTTP_CODE=$(curl -sSk -o /dev/null -w "%{http_code}" -X POST \
        "https://maas.${CLUSTER_DOMAIN}/llm/ibm-granite-2b-gpu/v1/chat/completions" \
        -H "Authorization: Bearer $TOKEN" \
        -H "Content-Type: application/json" \
        -d '{"model": "ibm-granite/granite-3.1-2b-instruct", "messages": [{"role": "user", "content": "Hi"}], "max_tokens": 5}')
      
      if [ "$HTTP_CODE" = "429" ]; then
        echo "Request $i: HTTP $HTTP_CODE ❌ (Rate limit hit)"
        break
      else
        echo "Request $i: HTTP $HTTP_CODE ✅"
      fi
    done

    Available UI

    For your convenience you can deploy a model using the MaaS user interface (UI). For more information on how to enable the UI, refer to the official repo.

    First, you have to create a project. Inside it, you will find a Deployments tab (Figure 1). There you can deploy your model, specifying the model location, model type, and other important information.

    The Deploy model page in the MaaS UI. The UI lets you deploy any model without needing to interact with a terminal.
    Figure 1: Deploying a model from within the MaaS UI.

    Note that you can deploy MaaS only on distributed runtime. Select it from the Serving runtime drop-down (Figure 2).

    A screenshot of the "Model deployment" step on the "Deploy a model" tab. The arrow points to the "Serving runtime" field, which is set to "Distributed Inference Server with llm-d." Below this is the "Number of replicas to deploy" set to 1. The "Hardware profile" is set to "default-profile" with details showing 2 Cores and 4 GiB of memory requested, and limits of 4 Cores and 8 GiB.
    Figure 2: Selecting a serving runtime in the Model deployment interface.

    What's next?

    You now have a working MaaS deployment with a sample model under governance. Here are resources to explore next:

    • Customize Tiers and Limits
    • Enable Model-Specific Access Control

    We've also set up observability on the cluster for you. You can build your dashboards in Grafana and connect it to the metrics in Prometheus, or use our default dashboard shown in Figure 3.

    The MaaS observability dashboard connected to the cluster with deployed MaaS. It shows information about users who use the model and the amount of tokens spent.
    Figure 3: MaaS observability dashboard.

    We encourage you to try the developer preview version of Models-as-a-Service and give us your feedback (refer to the Contributing section in our repo).

    For detailed documentation, visit the MaaS community documentation.

    Last updated: April 7, 2026

    Recent Posts

    • Confidential virtual machine storage attack scenarios

    • Introducing virtualization platform autopilot

    • Integrate zero trust workload identity manager with Red Hat OpenShift GitOps

    • Best Practice Configuration and Tuning for Linux and Windows VMs

    • Red Hat UBI 8 builders have been promoted to the Paketo Buildpacks organization

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility