This marks the third installment of a four-part article series, detailing the step-by-step process of constructing a Model-as-a-Service (MaaS) platform utilizing the architecture we've previously described in parts 1 and 2. If you want to skip this to see MaaS in action, just ask a Red Hat Representative to walk you through an actual demo.
Process overview
In this demo, we'll utilize a pre-provisioned Red Hat OpenShift Container Platform cluster with Red Hat OpenShift AI. The initial steps of this process involve establishing a connection to Red Hat OpenShift AI, which includes logging into OpenShift Container Platform via the command line or web console, accessing the OpenShift AI dashboard, and selecting the designated large language model (LLM) host project.
Subsequent steps entail understanding the core components of OpenShift AI, such as workbenches, models, cluster storage, connections, and permissions, which are critical for MaaS operation. After this, we will review and test a pre-deployed Granite model, involving inspecting its configuration, assessing its connections, reviewing model files in the ODH-TEC workbench, and conducting endpoint testing via curl commands.
Then, we will detail the deployment of a new model, TinyLlama. This includes configuring deployment settings, such as the serving runtime, model server size, and source model location. We will test the newly deployed model using curl commands to verify endpoint responsiveness.
Configuration of 3Scale API Gateway for MaaS is essential for managing, securing, and analyzing API access. This involves accessing the 3Scale Admin Portal, enabling the Developer Portal, subscribing a user to a service, creating an application for a model, and testing API access.
Furthermore, we'll outline creating a new product in 3Scale using the operator to automate model exposure. This includes creating backends and products, promoting the product to production, generating API documentation, subscribing users, and conducting product testing.
We will demonstrate connecting an application (specifically AnythingLLM) to the MaaS model by creating a connection in OpenShift AI and attaching it to a newly created workbench. Finally, this guide covers exploring usage analytics in 3Scale to monitor model consumption from a developer and administrator perspective. This process provides a detailed methodology for implementing a Models-as-a-Service platform using Red Hat technologies.
Models-as-a-Service deployment and configuration
The ability to leverage artificial intelligence (AI) effectively is becoming increasingly critical for modern organizations aiming to innovate and gain a competitive edge. MaaS offers a robust solution, enabling organizations to deploy, manage, and consume AI models in a scalable and secure manner.
Building upon the understanding of the strategic need for MaaS and the essential components and capabilities of MaaS, this guide delves into the practical steps to set up and configure a MaaS platform. The primary goal is to deploy an AI model, expose it securely through an API gateway, and then consume it within an application, providing hands-on experience in building a secure, private, and efficient MaaS environment tailored to organizational needs.
Prerequisites
Before initiating the configuration, it is essential to have a pre-provisioned OpenShift Container Platform cluster with OpenShift AI already deployed. This setup provides the infrastructure necessary for deploying and managing AI models efficiently.
Understanding OpenShift AI components
Before we dive deeper, it's important to understand the key components of the AI. Within the OpenShift AI dashboard, several critical components are essential for implementing MaaS:
- Workbenches: These are integrated development environments (IDEs), such as JupyterLab or VSCode, designed for data scientists. Workbenches provide the tools and environments to develop, train, and test AI models.
- Models: This section allows for the management and deployment of machine learning models. It also provides capabilities to monitor model performance and usage, ensuring optimal operation.
- Cluster storage: This feature manages storage resources required for models and workbenches. Efficient storage management is crucial for storing and accessing large model files and data.
- Connections: This component facilitates the management of secure connections to external services, such as S3 buckets or APIs. It automatically injects credentials as environment variables, simplifying integration and enhancing security.
- Permissions: This feature manages user and group access to project resources, ensuring appropriate access controls and security.
Step 1: Establish a connection to OpenShift AI
We will begin by establishing a connection to OpenShift AI. Follow the steps below to accomplish this:
- Log in to the OpenShift Container Platform.
- Open a terminal window (e.g., within a VSCode workbench).
- Utilize the
oc login
command with the provided credentials (e.g., "userX" and "openshift"). This command establishes a secure connection to the OpenShift cluster. - Approve the certificate if prompted to ensure a secure and trusted connection.
- The cluster URL will resemble https://api.cluster-guid.guid.sandbox.opentlc.com:6443. This URL serves as the endpoint for interacting with the OpenShift API.
- Alternatively, OpenShift Container Platform console can be accessed via its URL, typically provided in workshop links.
- Access the OpenShift AI dashboard.
- Open the OpenShift AI dashboard URL in a web browser tab (https://rhods-dashboard-redhat-ods-applications.apps.cluster-guid.guid.sandbox.opentlc.com). This dashboard provides a centralized interface for managing AI projects and resources.
- Click Login with OpenShift and select rhsso to initiate the authentication process.
- Enter the provided credentials ("userX", "openshift") when prompted to access the dashboard.
- Select the LLM host project.
- Upon successful login, navigate to the Data Science Projects section and click Go to Data Science Projects.
- Select the LLM Host project from the list. This project is designated for hosting and managing large language models.
Step 2: Review and test a pre-deployed model
Let's review and test the Granite model.
- Review the model configuration:
- Navigate to the Models tab within the LLM Host project.
- Observe the pre-deployed Granite-3.2 model. Expand the details to review its configuration settings and allocated resources.
- Examine the internal and external endpoint details. Note that the external endpoint may be a placeholder if the model is only accessible internally at this stage.
- Review the connection:
- Switch to the Connections tab.
- Observe the models connection, which is utilized by the Granite model serving and an ODH-TEC workbench.
- Edit this connection to review the S3 configuration, including Access Key, Secret Key, Bucket, and Endpoint for Minio object storage. This connection automatically injects these values as environment variables, saving time and improving security by avoiding hardcoding credentials.
- Review the model files:
- Access the ODH-TEC workbench from the Workbenches tab, then Open. This tool allows for the management and viewing of S3 storage.
- Notice that ODH-TEC is pre-configured with the models connection. Explore the buckets and files to locate the stored Granite and TinyLlama models.
- Test the model (Granite):
- Create a new workbench for testing purposes. Navigate to the Workbench tab and click Create workbench.
- Set the name to "Model-Test," the Image selection to code-server (version 2024.2), the deployment size to Standard, and the persistent storage size to 5 GiB.
- Once the workbench starts, click Open to access the VSCode interface.
- Open a new terminal within VSCode (Terminal menu -> New Terminal or Ctrl + Shift + `).
- Execute a
curl
command to test the internal endpoint of the Granite model. - A JSON response should confirm that the model is functioning correctly.
Step 3: Deploy a new model
Now we're ready to deploy a new model by following these steps:
- Deploy the model:
- In the OpenShift AI dashboard, navigate to the Models tab.
- Click Deploy model.
- Fill in the required details as follows:
- Model deployment name: TinyLlama
- Serving runtime: Custom - vLLM ServingRuntime-CPU
- Number of model server replicas to deploy: 1
- Model server size: Medium
- Accelerator: None
- Model route: Checked
- Token authentication: Unchecked
- Source model location: Existing connection
- Connection: models
- Path: TinyLlama/TinyLlama-1.1B-Chat-v1.0
- Additional serving runtime arguments: –served-model-name=tinyllama/tinyllama-1.1b-chat-v1.0
- Click Deploy. Monitor the deployment status by checking the tinyllama-predictor-… Pod in the llm-hosting project within the OpenShift console.
- Test the deployed model (TinyLlama):
- Once the deployment is complete, return to the Model-Test workbench terminal.
- Test the internal endpoint using
curl
. The-k
option can be used to ignore the self-signed certificate in the Istio mesh if necessary. - Test the external endpoint using
curl
. - A JSON response should appear, but it might take longer due to the model running on a CPU.
Step 4: Configure 3Scale API Gateway
Let's configure 3Scale API Gateway as follows:
- Access the 3Scale admin portal:
- Retrieve the 3Scale admin credentials ("admin_user" and "admin_password") from the system-seed secret in the 3scale namespace using the
oc get secret
commands. - Log in to the 3Scale admin portal URL (https://maas-admin.apps.cluster-guid.guid.sandbox.opentlc.com/) using these credentials.
- Retrieve the 3Scale admin credentials ("admin_user" and "admin_password") from the system-seed secret in the 3scale namespace using the
- Explore the 3Scale admin portal:
- The dashboard provides sections for Audience, APIs (Products, Backends), and Account Settings.
- Note the pre-created "granite-3dot2-8b-instruct" Product and Backend.
- Enable the Developer Portal:
- Go to Audience -> Developer Portal -> Settings -> Domains & Access.
- Remove the Developer Portal access code to allow open access to the portal (authentication still required).
- Subscribe a user to a service:
- A user (e.g., userX) is pre-created.
- Go to Audience -> Accounts -> Listing and click userX.
- In the Service Subscriptions tab, click Subscribe next to the granite-3dot2-8b-instruct service.
- Select the Default plan and create the subscription.
Step 5: Access the Developer Portal
Next, we'll access the Developer Portal:
- Open the Developer Portal URL:
- Launch a web browser and navigate to the specific URL provided for the Developer Portal (https://maas.apps.cluster-guid.guid.sandbox.opentlc.com). This serves as the gateway to interacting with the available APIs. Ensure your network connectivity is stable to access the portal without interruptions.
- Log in using your workshop credentials:
- Upon reaching the Developer Portal, you will be prompted to enter your login details. Use the designated workshop credentials, such as "userX" for the username and "openshift" for the password. These credentials authenticate your access to the portal and allow you to interact with the APIs.
- Once logged in, you should observe a listing of available APIs ready for exploration and integration.
- Create an application for Granite model:
- Within the Developer Portal, locate and click the option labeled See your Applications and their credentials.
- To utilize the Granite Model, click the Create new application button, which will initiate the process of generating a new application instance with unique credentials.
- From the list of available services, choose the granite-3dot2-8b-instruct service.
- Assign a descriptive name to your application, such as "Granite application," to easily identify later. After providing a name, click the Create Application button to finalize the application creation.
- After successful application creation, carefully record the generated Endpoint URL, Model Name, and API Key. These credentials are essential for authenticating and interacting with the Granite Model API. The Endpoint URL specifies where to send API requests, the model name identifies the specific model being used, and the API Key authorizes your application's access to the API.
- Test API access (Granite):
- Open your terminal application, such as the Model-Test workbench. Use the
curl
command-line tool to send requests to the Granite Model API. Replace any placeholders in the provided example command with the actual endpoint URL, model name, and API key you obtained in the previous step. - Refer to the provided documentation or sources for an example
curl
command. This example serves as a template, demonstrating the correct structure and parameters for making an API request. Adapt this example with your specific credentials. - Upon executing the
curl
command, expect to receive a text completion response from the Granite Model API. This response confirms successful API access and model functionality. If the request is properly formed and authenticated, the API will generate and return a text completion based on the input provided in your request.
- Open your terminal application, such as the Model-Test workbench. Use the
Step 6: Create a new product in 3Scale
To efficiently deploy and expose new models like TinyLlama, we recommend leveraging the 3Scale Operator within the OpenShift web console. This approach automates the product creation and management process, ensuring consistency and reducing manual configuration.
- Navigate to the 3Scale Operator:
- Access the OpenShift web console by logging into your OpenShift cluster's management interface. Navigate to the Operators section and then select Installed Operators, where you'll see all the operators currently deployed in your cluster.
- From the Project dropdown menu, choose the 3scale project. This filters the list of operators to show only those installed within the specific 3scale project.
- Click the Red Hat Integration - 3scale operator.
- Create a backend for TinyLlama
- Within the 3Scale Operator interface, navigate to the 3scale Backend tab. This tab provides access to managing backend services that will power your APIs.
- Create a new backend by clicking the Create Backend button. You will be prompted to define the configuration for the backend service.
- Switch to the YAML view to directly edit the backend configuration. Replace the existing content with the provided YAML snippet, ensuring to adjust any placeholders, such as the "cluster-guid", to match your environment's specific values.
- After updating the YAML content, click the Create button to deploy the new backend service. Verify that the backend service appears in the list within the OpenShift Console and also within the 3Scale Admin Portal to ensure successful backend creation.
- Create a product for TinyLlama:
- Return to the 3Scale Operator page within the OpenShift console and navigate to the 3scale Product tab. This tab allows you to manage the API products exposed to consumers.
- Start the product creation process by clicking the Create Product button.
- Similar to creating the backend, switch to the YAML view and replace the content with the provided, more complex YAML configuration for TinyLlama. This YAML configuration will include detailed definitions for mapping rules, policies, methods, and other settings that govern how the API will behave. These definitions will encompass functionalities such as chat/completions, simple completions, embeddings, and more.
- After updating the YAML, click Create to deploy the new API product. Confirm that the product is successfully created by checking the list of products within both the OpenShift console and the 3Scale admin portal.
- Promote the product to production:
- Newly created products initially reside in a staging environment. This separation allows for testing and validation before making the product publicly available. To move the product to the production environment, navigate to the ProxyConfig Promote tab within the OpenShift console.
- Click the Create ProxyConfigPromote button.
- Replace the existing YAML content with the provided YAML specifically designed for promoting the product to production. Adjust the
productCRName
if necessary to match your environment's product identifier. - After updating the YAML, click Create.
- In the 3Scale Admin Portal, verify that the TinyLlama product now displays "Production APIcast" environment available under Integration -> Configuration, which indicates a successful promotion.
- Create API documentation (ActiveDoc):
- Navigate to the ActiveDoc tab within the OpenShift console. This tab manages the creation and deployment of API documentation.
- Start the creation of new API documentation by clicking Create ActiveDoc.
- In the YAML view, replace the YAML content with the provided YAML for ActiveDoc, which references a pre-created JSON file that contains the actual documentation details.
- After updating the YAML, click Create to deploy the API documentation.
- Subscribe your user to the new product (TinyLlama):
- Access the 3Scale Admin Portal and navigate to Audience -> Listing. Select the desired user, such as "user1" or "userX", to manage their subscriptions.
- Go to the Service Subscriptions tab and click Subscribe on the tinyLlama item.
- Select the Default plan and click Create subscription.
Step 7: Deploy and validate the TinyLlama model
To ensure successful deployment and operation of the TinyLlama model, several validation procedures are necessary:
- Navigate to the Developer Portal here and authenticate.
- Select See your Applications and their credentials to view a list of existing applications and their authentication details.
- Initiate the creation of a new application by selecting Create new application. This is essential for establishing a unique connection to the TinyLlama service.
- Choose the TinyLlama service from the available options. This designates the specific model to which the new application will connect.
- Assign a descriptive name to the application, such as "TinyLlama application," and complete the creation process by selecting Create Application. This name helps in identifying the application for future management.
- Record the Endpoint URL, Model Name, and API Key provided for the newly created application. These details are crucial for communicating with the TinyLlama model via API calls.
- Use the
curl
command in a terminal to validate the API functionality, replacing placeholders with the specific details of the TinyLlama application. This simulates an API request to test the model's response. A samplecurl
command is available in the provided resources. Executing this command should return a completion response from the TinyLlama model, confirming successful integration.
Step 8: Connect to the MaaS model
The final step of this process involves connecting a sample application, AnythingLLM, to the MaaS model via secure connections established in OpenShift AI.
- Create a connection in OpenShift AI:
- Go to the Connections section in the OpenShift AI dashboard and select Add connection. This allows managing external service integrations with OpenShift AI.
- Choose Anything LLM from the connection types to configure the connection for compatibility with the AnythingLLM application.
- Retrieve the Endpoint URL, API Key, and Chat Model Name from the TinyLlama (or Granite) application in the 3Scale Developer Portal. These credentials are needed for authentication and authorization.
- Populate the required fields with the retrieved information:
- Connection name: Assign a unique name like "Granite" or "TinyLlama" for easy identification.
- LLM Provider Type: Set to
generic-openai
for compatibility with standard API formats. - Base URL: Input the 3Scale Endpoint URL and append "/v1" to adhere to the OpenAI API standard.
- API Key: Enter the API key from 3Scale.
- Chat Model Name: Specify the model name, such as "ibm-granite/granite-3.2-8b-instruct" or "tinyllama/tinyllama-1.1b-chat-v1.0."
- Complete the connection creation by selecting Create. This injects the provided values as environment variables, allowing AnythingLLM to access the MaaS model.
- Connect with AnythingLLM
- In the OpenShift AI dashboard, go to the Workbenches section and create a new workbench by selecting Create Workbench. Workbenches provide the runtime environment for applications.
- Name the workbench and select the Custom Image: AnythingLLM 1.7.5 image, which pre-configures the workbench with AnythingLLM.
- In the Connection section, attach the previously created connection by selecting Attach existing connection and choosing the appropriate connection. This links the workbench to the MaaS model credentials.
- Select Create workbench. This provisions and configures the workbench environment.
- After the workbench starts, access AnythingLLM by selecting Open, which launches the application interface.
- In AnythingLLM, create a new workspace by selecting New Workspace, naming it, and selecting Save. This sets up a working environment for interaction with the connected model. Then interaction with the model can begin.
Step 9: Examine usage analytics in 3Scale
Monitoring model usage is critical for the MaaS offering. 3Scale provides comprehensive analytics for tracking and understanding model utilization. There are two perspectives from which you can do this:
- Developer perspective
- In the 3Scale developer portal, go to the Statistics tab. This tab displays various application usage metrics and data.
- Use the dropdown menu to select specific applications, such as the "Granite application" or "TinyLlama application," to view individual usage statistics, including API calls and used methods. This allows developers to monitor their application's interaction with the model.
- Administrative perspective
- In the 3Scale admin portal, navigate to Products and select a specific product, such as "Granite." This section provides product-level usage insights.
- Access analytics by selecting Analytics -> Traffic. This area provides detailed data on traffic and usage patterns of the selected product.
Next steps
Deploying and configuring Models-as-a-Service (MaaS) involves several crucial steps using Red Hat OpenShift AI and 3Scale API Gateway. This guide provided a practical walkthrough for creating a secure and efficient MaaS platform. The process includes deploying an AI model, securely exposing it through an API gateway, and integrating it into an application, providing you with hands-on experience.
From the dashboard, you can view statistics for all applications utilizing the product, the top applications, the most frequently used methods, and usage patterns.
The final article will explore optimizing your MaaS solution for inference, cost, scaling, and security.
You can explore Models-as-a-Service in action. For a truly engaging experience, reach out to a Red Hat representative for a live demonstration—it's far more dynamic and insightful than simply reading about it.
Note:
This lab was part of a workshop for customers in Red Hat Summit 2025 in Boston.