Page
Model deployment and troubleshooting with RHEL AI

With our enhanced model ready, we'll deploy it using RHEL AI for initial testing and validation.
Prerequisites:
- To install RHEL AI on AWS, you must have:
- An active AWS account with proper permissions.
- A Red Hat subscription to access RHEL AI downloads.
- An AWS CLI installed and configured with your access key ID and secret access key.
- Sufficient AWS resources: VPC, subnet, security group, and SSH key pair.
- Storage requirements:
- Minimum 1TB for
/home
directory and 120GB for/path
.
- Minimum 1TB for
In this lesson, you will:
- Set up RHEL AI on AWS by creating S3 buckets, configuring IAM roles, and converting raw images to AMIs.
- Deploy a GPU-enabled EC2 instance with RHEL AI for optimal model serving performance.
- Transfer your trained model from a local development environment to the cloud-based RHEL AI instance.
- Configure InstructLab for GGUF models by switching from vLLM to llama-cpp backend for proper compatibility.
- Serve your model in production using RHEL AI's built-in capabilities and test interactive functionality.
- Troubleshoot common deployment issues, including taxonomy validation errors, environment setup problems, and AWS-specific challenges.
- Optimize performance considerations for AWS RHEL AI, including instance types, storage, security, and cost management.
- Validate model functionality in a cloud environment before scaling to production deployment.
Phase 2: Model deployment with RHEL AI
RHEL AI provides a bootable image with everything needed to run and serve our model.
Set up RHEL AI on AWS
For scalable deployment and testing, we'll use RHEL AI on AWS.
Note: Before starting the RHEL AI installation on AWS, ensure you have satisfied the prerequisites.
Follow these steps to install RHEL AI:
Install and configure the AWS CLI (if not already installed):
# Download and install AWS CLI v2 curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip sudo ./aws/install # Configure AWS CLI aws configure
Create and set up the necessary environment variables:
export BUCKET=<custom_bucket_name> export RAW_AMI=rhel-ai-nvidia-aws-1.5-1747399384-x86_64.raw export AMI_NAME="rhel-ai" export DEFAULT_VOLUME_SIZE=1000 # Size in GB
Create an S3 bucket and an IAM setup for image conversion:
aws s3 mb s3://$BUCKET
Create a trust policy file for VM import:
printf '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "vmie.amazonaws.com" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals":{ "sts:Externalid": "vmimport" } } } ] }' > trust-policy.json # Create the IAM role aws iam create-role --role-name vmimport --assume-role-policy-document file://trust-policy.json
Create a role policy for S3 bucket access:
printf '{ "Version":"2012-10-17", "Statement":[ { "Effect":"Allow", "Action":[ "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket" ], "Resource":[ "arn:aws:s3:::%s", "arn:aws:s3:::%s/*" ] }, { "Effect":"Allow", "Action":[ "ec2:ModifySnapshotAttribute", "ec2:CopySnapshot", "ec2:RegisterImage", "ec2:Describe*" ], "Resource":"*" } ] }' $BUCKET $BUCKET > role-policy.json aws iam put-role-policy --role-name vmimport --policy-name vmimport-$BUCKET --policy-document file://role-policy.json
- Download and convert the RHEL AI image:
- Go to the Red Hat Enterprise Linux AI download page.
- Download the RAW image file (
rhel-ai-nvidia-aws-1.5-1747399384-x86_64.raw
). Upload it to your S3 bucket:
aws s3 cp rhel-ai-nvidia-aws-1.5-1747399384-x86_64.raw s3://$BUCKET/
Create the import configuration and convert the image:
# Create import configuration printf '{ "Description": "RHEL AI Image", "Format": "raw", "UserBucket": { "S3Bucket": "%s", "S3Key": "%s" } }' $BUCKET $RAW_AMI > containers.json # Start the import process task_id=$(aws ec2 import-snapshot --disk-container file://containers.json | jq -r .ImportTaskId) # Monitor import progress aws ec2 describe-import-snapshot-tasks --filters Name=task-state,Values=active
Wait for the import to complete, then register the AMI:
# Get snapshot ID from completed import snapshot_id=$(aws ec2 describe-import-snapshot-tasks --import-task-ids $task_id | jq -r '.ImportSnapshotTasks[0].SnapshotTaskDetail.SnapshotId') # Tag the snapshot aws ec2 create-tags --resources $snapshot_id --tags Key=Name,Value="$AMI_NAME" # Register AMI from snapshot ami_id=$(aws ec2 register-image \ --name "$AMI_NAME" \ --description "$AMI_NAME" \ --architecture x86_64 \ --root-device-name /dev/sda1 \ --block-device-mappings "DeviceName=/dev/sda1,Ebs={VolumeSize=${DEFAULT_VOLUME_SIZE},SnapshotId=${snapshot_id}}" \ --virtualization-type hvm \ --ena-support \ | jq -r .ImageId) # Tag the AMI aws ec2 create-tags --resources $ami_id --tags Key=Name,Value="$AMI_NAME"
Set up instance configuration variables:
instance_name=rhel-ai-instance ami=$ami_id # From previous step instance_type=g4dn.xlarge # GPU-enabled instance for AI workloads key_name=<your-key-pair-name> security_group=<your-sg-id> subnet=<your-subnet-id> disk_size=1000 # GB
Launch the RHEL AI instance:
aws ec2 run-instances \ --image-id $ami \ --instance-type $instance_type \ --key-name $key_name \ --security-group-ids $security_group \ --subnet-id $subnet \ --block-device-mappings DeviceName=/dev/sda1,Ebs='{VolumeSize='$disk_size'}' \ --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value='$instance_name'}]'
Connect to your instance:
ssh -i your-key.pem cloud-user@<instance-public-ip>
Verify RHEL AI installation:
# Verify InstructLab tools ilab --help # Initialize InstructLab (first time) ilab config init
Transfer your enhanced model
After training a model with InstructLab on your local development machine, you need to transfer it to your RHEL AI AWS instance as follows:
Export your model:
# On your local development machine where you trained the model # Identify the model path (look for the converted GGUF model) ls ./instructlab-granite-7b-lab-trained/ # Archive the model for transfer tar -czvf telecom-model.tar.gz ./instructlab-granite-7b-lab-trained/
Transfer to a RHEL AI AWS Instance:
# Using scp to transfer to AWS instance scp -i your-key.pem telecom-model.tar.gz cloud-user@<instance-public-ip>:/home/cloud-user/
Extract on RHEL AI:
# SSH into your RHEL AI AWS instance ssh -i your-key.pem cloud-user@<instance-public-ip> # Extract the model mkdir -p ~/models tar -xzvf telecom-model.tar.gz -C ~/models
Deploy the model with RHEL AI built-in capabilities
RHEL AI comes with InstructLab pre-installed and includes support for serving models. To utilize these capabilities:
Configure InstructLab for GGUF Models:
# Check current InstructLab configuration ilab config show # Edit the configuration file to use llama-cpp backend vi ~/.config/instructlab/config.yaml
Update the serve section to use
llama-cpp
backend:serve: backend: llama-cpp # Change from vllm to llama-cpp
In the first terminal, start the model server:
# Start the model server ilab model serve --model-path ~/models/instructlab-granite-7b-lab-trained/instructlab-granite-7b-lab-Q4_K_M.gguf
The server will start and display messages indicating it's ready to accept connections. Keep this terminal open and running.
In a second terminal, connect to the served model:
# Open a new terminal and SSH into your RHEL AI instance again ssh -i your-key.pem cloud-user@<instance-public-ip> # Connect to the served model for interactive chat ilab model chat
You can now interact with your fine-tuned telecom support assistant to test it. Try asking questions like:
- "What is fiber optic internet?"
- "How does 5G compare to 4G?"
- "What are the benefits of VoIP?"
Performance considerations for AWS RHEL AI
When running AI models on AWS RHEL AI, consider the following:
- Instance types: Choose GPU-enabled instances (g4dn, p3, p4d) for optimal AI workload performance. The g4dn.xlarge instance type provides a good balance of cost and performance for testing.
- Storage: RHEL AI requires a minimum 1TB for
/home
directory (InstructLab data) and 120GB for/path
(system updates). The AWS setup automatically configures appropriate storage. - Security: Configure security groups to allow necessary ports:
- Port 22 for SSH access.
- Port 8000 for the model server (if exposing externally).
- Cost management: Monitor AWS costs as GPU instances can be expensive. Consider using spot instances for development and testing to reduce costs.
This setup provides a production-ready environment for testing your model's responses and making iterative improvements.
Common issues and troubleshooting
When working with InstructLab and RHEL AI, you might encounter some common issues.
Taxonomy validation errors
InstructLab has strict requirements for taxonomy files:
- No trailing spaces: Make sure there are no spaces at the end of the lines in your YAML files.
- Taxonomy version: Use version 3 for knowledge taxonomies.
- Required fields: Ensure all required fields (domain, document, questions_and_answers) are present.
- Minimum examples: Knowledge taxonomies require at least five seed examples.
- Repository references: The document section must reference a valid GitHub repository.
To check for these issues, run:
ilab taxonomy diff
If you encounter validation errors, carefully review the error messages and fix each issue.
Environment setup issues
If you're missing tools like yq
, run:
# For macOS
brew install yq
# For Ubuntu/Debian
sudo apt-get install yq
# For Fedora/RHEL
sudo dnf install yq
Model training performance
For better performance during model training:
- Use GPU acceleration when available.
- Start with smaller datasets for initial testing.
- Consider using OpenShift AI for distributed training at scale (in future deployments).
MacOS model compatibility
If you encounter errors about vLLM not supporting your platform during local development, remember to convert your model to GGUF format:
ilab model convert --model-dir
~/.local/share/instructlab/checkpoints/YOUR_MODEL
AWS RHEL AI deployment issues
Common issues when deploying on AWS and how you might solve them include:
- Import task fails: Check IAM permissions and S3 bucket access.
- AMI registration fails: Verify snapshot completed successfully.
- Instance launch fails: Check VPC, subnet, and security group configurations.
- Connection issues: Verify security group allows SSH (port 22) from your IP.
- Model serving issues: Ensure GPU drivers are properly configured and model paths are correct.
RHEL AI model serving issues
You may encounter issues with model serving on RHEL AI:
- vLLM compatibility errors: GGUF files are not compatible with vLLM. Always configure InstructLab to use the
llama-cpp
backend for GGUF models by editing~/.config/instructlab/config.yaml
. - Model server fails to start: Check the server logs for specific error messages.
Common issues include:
- Insufficient GPU memory.
- Port already in use.
- Incorrect Model file path.
Having deployed our model on RHEL AI, examined possible issues we may face, and learned how to troubleshoot them, we are ready to productionize and scale the model with OpenShift AI.