Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

RHEL AI in action: Streamline AI workflows with RAG, LAB, and RAGLAB

November 20, 2024
Faisal Shah
Related topics:
Artificial intelligence
Related products:
Red Hat Enterprise Linux AI

Share:

    This article details the use of Red Hat Enterprise Linux AI (RHEL AI) for fine-tuning and deploying Granite LLM models on Managed Cloud Services (MCS) data. We outline the techniques and steps involved in the process, including methods such as retrieval-augmented generation (RAG), model fine-tuning (LAB), and RAGLAB, which leverages iLAB. Additionally, we demonstrate the integration of these methods to develop a chatbot using a Streamlit app.

    Prerequisites

    • RHEL AI installed Amazon EC2 instance p4de.24xlarge.

    Initialize InstructLab

    RHEL AI includes InstructLab, a tool for fine-tuning and serving models. After ensuring the prerequisites are in place, we initialize InstructLab with the command ilab config init. Next, we select the appropriate training profile for our system, in this case, A100_H100_x8.yaml.

    To download models from registry.stage.redhat.io, users need to log in with Podman (using their own account). If you require registry access, please refer to the documentation for the necessary steps. 

    For further details on initialization, consult the provided documentation.

    Data pre-processing

    For RHEL AI, the knowledge data is hosted in a public Git repository and formatted in markdown (.md) files, which are essential for fine-tuning the model. We utilized a set of over 30 PDF documents and converted them into this required format (we used docling tool to convert files into .md format). Additionally, skill and knowledge datasets are created using YAML files, known as qna.yaml, which contain structured question-and-answer pairs that guide the fine-tuning process.

    It is crucial to ensure that your data adheres to the specified format and follows RHEL AI guidelines. The taxonomy files, such as the qna.yaml, should be organized within the correct directory structure: /var/home/cloud-user/.local/share/instructlab/taxonomy/<your taxonomy path to qna>

    To confirm that the qna.yaml file is properly formatted, you can use the following command to verify its structure:

    ilab taxonomy diff 

    Vector databases used

    • Milvus Lite (in-memory).
    • Milvus Watsonx (hosted).

    Anaconda setup

    For each experiment, it's important to create a dedicated environment using Anaconda. This setup helps with package management and ensures isolation, minimizing the risk of dependency conflicts across different projects.

    Follow the steps below:

    1. Go to Anaconda Downloads and copy the link to the 64-Bit (x86) Installer (1007.9M):

      curl -O https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Linux-x86_64.sh
    2. Install Anaconda:

          bash Anaconda3-2024.06-1-Linux-x86_64.sh # accept the TnC
    3. Create and activate a new Anaconda environment:

          conda create -n mcs-rhelai-milvus-dev python=3.11 anaconda
          conda activate mcs-rhelai-milvus-dev
    4. Run a Jupyter notebook with port forwarding:

         pip install jupyter
         #run jupyter lab
         jupyter lab  --no-browser --ip 0.0.0.0 --port=8080
    5. Open Jupyter notebook in browser open terminal in your local machine and use the below command to connect:

         ssh -L 8080:localhost:8080 cloud-user@10.31.124.12

    First approach: RAG (granite-7b-redhat-lab + Milvus Lite)

    We started by implementing a retrieval-augmented generation (RAG) approach, utilizing the granite-7b-redhat-lab model. This pre-trained large language model (LLM) can be easily downloaded using the InstructLab command:

    ilab model download --repository docker://<repository_and_model> --release <release>

    (Refer to the official doc for more details on this command.) 

    Figure 1 shows the path to the downloaded models.

    Code of the path to the downloaded models.
    Figure 1: Path to the downloaded models.

    Next, we set up Milvus Lite, a lightweight vector database, to store document embeddings. We used the LangChain Milvus wrapper to simplify the integration process. To install Milvus, use simple a pip command:

    pip install -qU langchain_milvus

    The following steps outline how we ingested the data into Milvus and utilized it for retrieval during the RAG process.

    Step 1: Import required packages and download embeddings

    We began by importing the necessary packages and downloading the our chosen embeddings:

    from langchain_milvus import Milvus
    from langchain_community.embeddings import HuggingFaceEmbeddings 
    
    URI = "./milvus_example.db"
    embeddings = HuggingFaceEmbeddings(model_name="mixedbread-ai/mxbai-embed-large-v1")

    Step 2: Load documents

    We used DirectoryLoader from LangChain to load documents into the system. There are various methods to achieve this, but in this case, we chose to load documents from a local directory:

    from langchain_community.document_loaders import DirectoryLoader 
    
    loader = DirectoryLoader('../data/')
    documents = loader.load()
    // Optionally, split the documents into smaller chunks
    

    Step 3: Add documents to Milvus

    Once the documents were loaded, we converted them into embeddings and stored them in Milvus:

    vector_store_saved = Milvus.from_documents(
        documents, embeddings, 
        collection_name="milvus_mcs", 
        connection_args={"uri": URI}
    )
    

    The vector embeddings are stored locally, as illustrated in Figure 2. In our case, they were saved in the file milvus_example.db, which allows for efficient retrieval during the RAG process.

    Vector embeddings stored in local file.
    Figure 2: Vector embeddings stored in local file.

    Set up the RAG pipeline

    After ingesting the data, the next step is to configure the RAG pipeline by connecting the granite-7b-redhat-lab model to the retrieval system. This involves querying Milvus for relevant document embeddings and integrating the retrieved information with the LLM’s response generation.

    To do this, the granite-7b-redhat-lab model needs to be hosted and running in RHEL AI. This can be easily achieved using the following InstructLab command:

    ilab model serve --model-path ~/.cache/instructlab/models/granite-7b-redhat-lab/

    By default, the model is served on 127.0.0.1:8000. Once the model is up and running, you will notice GPU resources being utilized for processing. Figure 3 shows GPU resource being consumed.

    GPU resource being consumed.
    Figure 3: GPU resource being consumed.

    For more details on model serving, refer to the documentation.

    We have implemented a basic Streamlit app to demonstrate this setup. Figures 4 and 5 show the RAG approach with the Streamlit app.

    RAG approach with Streamlit app showing the user asking a question.
    Figure 4: RAG approach with Streamlit app showing the user querying the model.
    RAG approach with Streamlit app showing the answer to the proposed question.
    Figure 5: RAG approach with Streamlit app showing the model's answer to the query.

    Second approach: Fine-tuning the granite-starter model

    Fine-tuning an LLM allows for adapting the model to specific tasks or datasets, improving its accuracy and relevance. In our case, fine-tuning the granite-starter model helps enhance the performance of a question-answer chatbot based on domain-specific  knowledge. We use the qna.yaml file and knowledge documents, as outlined in the data pre-processing section, to fine-tune the model. Once the data is validated, the following steps are carried out.

    Step 1: Create a synthetic dataset using sample examples

    To generate additional training data, we create a synthetic dataset using MCS-based examples. This is achieved by running the following InstructLab command:

    ilab data generate

    This command runs the synthetic data generation (SDG) process using the mixtral-8x7B-instruct model as the teacher to generate synthetic data.

    Since this process can be time-consuming and depends on the volume of data, it is recommended to run the InstructLab commands within a tmux session to maintain continuity. To create a new tmux session, run:

    tmux new -s session_name

    For our sample set, the SDG process took approximately 10.5 hours and produced around 70,000 new samples.

    To count the number of generated samples, you can use the following command:

    wc -l ~/.local/share/instructlab/datasets/checkpoints/<your_data_file>/*.jsonl

    After the SDG process completes, verify that the new files are created as expected. The new dataset created using SDG process is shown in Figure 6.

    New dataset created using SDG process.
    Figure 6: New dataset created using SDG process.

    Step 2: Training

    RHEL AI utilizes your taxonomy tree and synthetic data to create a newly trained model that incorporates your domain-specific knowledge and skills through a multi-phase training and evaluation process.

    For training, we focus on two essential files:

    • <knowledge-train-messages-file>
    • <skills-train-messages-file>

    The training process is initiated using the following InstructLab command:

    ilab model train --strategy lab-multiphase \
       --phased-phase1-data ~/.local/share/instructlab/datasets/knowledge_train_msgs_2024-08-30T05_19_50.jsonl \
       --phased-phase2-data ~/.local/share/instructlab/datasets/skills_train_msgs_2024-08-30T05_19_50.jsonl

    It is advisable to run the above commands in the tmux session created earlier.

    Note

    This training process can be quite time-consuming, depending on your hardware specifications. In our case, it took approximately three days to complete both training phases. After the process, verify that the new checkpoints have been created successfully.

    Figure 7 shows the new checkpoints created.

    New checkpoints created.
    Figure 7: New checkpoints created.

    Step 3: Serve and chat with the model

    To interact with your newly trained model, you need to activate it on a machine by serving the model. The ilab model serve command initiates a vLLM server, allowing you to chat with the model.

    For our use case, the best-performing model selected was samples_4387520. RHEL AI evaluates all checkpoints from phase 2 of model training using the Multi-turn Benchmark (MT-Bench) and identifies the best-performing checkpoint as the fully trained output model. You can serve this model with the following command:

    ilab model serve --model-path ~/.local/share/instructlab/phased/phase2/checkpoints/hf_format/samples_4387520

    Once the model is being served, open another terminal to start chatting with the fine-tuned model using the command:

    ilab model chat --model ~/.local/share/instructlab/phased/phase2/checkpoints/hf_format/samples_4387520

    With these steps, your fine-tuned granite-7b model is now ready for interaction. Figure 8 shows fine-tuned model interaction using ilab chat command.

    alt text
    Figure 8: Fine-tuned model interaction using ilab chat command

    Let’s test its capabilities using our Streamlit application. Figure 9 shows fine-tuned model interaction on the Streamlit application, and Figure 10 shows the model's answer.

    Fine-tuned model interaction on Streamlit app showing the user asking a question.
    Figure 9: Fine-tuned model interaction on Streamlit app showing the user querying the model.
    Fine-tuned model interaction on Streamlit app showing the model's answer.
    Figure 10: Fine-tuned model interaction on Streamlit app showing the model's answer to the query.

    Third approach: RAGLAB (RAG leveraging iLAB)

    This hybrid approach enhances the model’s ability to generate more accurate and contextually relevant responses, particularly for domain-specific tasks. In this phase, we combine our fine-tuned granite-7b model with the RAG approach.

    This approach closely resembles Approach 1; however, instead of utilizing a pre-trained model, we leverage a domain-specific model. Based on the results from our experiments (currently conducted manually), we are confident that RAGLAB provides a robust solution to meet our requirements. 

    Figure 11 shows RAGLAB approach in action. Figure 12 shows the model's answer to the user query.

    RAGLAB approach in action showing the user asking a question.
    Figure 11: RAGLAB approach in action showing the user querying the model.
    RAGLAB approach in action showing the model's answer.
    Figure 12: RAGLAB approach in action showing the model's answer to the query.

    To enhance scalability and performance for larger datasets, users can consider transitioning from Milvus Lite to Enterprise Milvus (e.g., WxD Milvus) to better accommodate their specific use cases.

    Connecting to Watsonx Milvus follows the same procedure as with Milvus Lite, with the key difference being the use of a database URL, username, and password instead of localhost. We can employ tools like Attu to verify our ingested data.  Figure 13 shows stored vector embeddings that can be viewed using tools like Attu.

    Stored vector embeddings that can be viewed using Attu.
    Figure 13: Stored vector embeddings that can be viewed using Attu.

    The remaining processes remain unchanged from those discussed in the previous approaches. We will now utilize our Streamlit app to showcase comparisons across all three approaches, as shown in Figure 14.

    Comparison of RAG, LAB, and RAGLAB approaches using the Streamlit app.
    Figure 14: Comparison of RAG, LAB, and RAGLAB approaches using the Streamlit app.

    Conclusion

    This article demonstrated how you can apply Red Hat Enterprise Linux AI (RHEL AI) for fine-tuning Granite LLM models using three key approaches: retrieval-augmented generation (RAG), direct model fine-tuning (LAB), and an integrated method with RAGLAB. We highlighted essential steps including data preprocessing, environment setup, and the use of Milvus as a vector database to efficiently store and query embeddings. This comprehensive framework enables building and deploying scalable AI solutions, such as a chatbot application, leveraging a seamless end-to-end pipeline.

    Related Posts

    • Red Hat publishes Docker Hub images for Granite 7B LLMs and InstructLab

    • Open source AI coding assistance with the Granite models

    • Getting started with InstructLab for generative AI model tuning

    • How to use LLMs in Java with LangChain4j and Quarkus

    • How to fine-tune Llama 3.1 with Ray on OpenShift AI

    • How InstructLab enables accessible model fine-tuning for gen AI

    Recent Posts

    • Beyond a single cluster with OpenShift Service Mesh 3

    • Kubernetes MCP server: AI-powered cluster management

    • Unlocking the power of OpenShift Service Mesh 3

    • Run DialoGPT-small on OpenShift AI for internal model testing

    • Skopeo: The unsung hero of Linux container-tools

    What’s up next?

    This hands-on learning path demonstrates how retrieval-augmented generation (RAG) works and how users can implement a RAG workflow using Red Hat OpenShift AI and Elasticsearch vector database.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue