Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Prepare and label custom datasets with Label Studio

May 2, 2024
Diego Alvarez Ponce Kaitlyn Abdo
Related topics:
Artificial intelligenceEdge computing
Related products:
Red Hat OpenShift AI

Share:

    This is the fifth chapter in our “computer vision at the edge” series, in which we will dive into the preparation of a custom dataset. Below you can see all the different episodes in this series:

    • How to install single node OpenShift on AWS
    • How to install single node OpenShift on bare metal
    • Red Hat OpenShift AI installation and set up
    • Model training in Red Hat OpenShift AI
    • Prepare and label custom datasets with Label Studio
    • Deploy computer vision applications at the edge with MicroShift

    Data labeling with Label Studio

    Data labeling is the practice of accurately labeling data for the purpose of correct identification by training models, algorithms, etc. It is crucial for training models because it provides the necessary annotations and context for algorithms to learn effectively. Machine learning models understand patterns and make accurate predictions or classifications from labeled data. It also ensures that models can generalize well to unseen examples, improving their performance and reliability in real-world applications. Ultimately, robust data labeling enhances the overall quality and effectiveness of machine learning systems, making them more trustworthy and valuable for various industries and domains.

    Label Studio offers a comprehensive suite of data labeling capabilities, making it a versatile tool for various machine learning tasks. Its intuitive interface and user-friendly design allow for a seamless and efficient labeling process. Its open source nature allows for customization and integration with existing workflows, making it an accessible choice for both beginners and experienced data annotators. Overall, Label Studio simplifies the data labeling process, enabling users to annotate datasets quickly and accurately. Because of Label Studio’s easy-to-use open source platform, it is an excellent choice for what we are trying to achieve.

    If you are reading this, you are probably interested in knowing more about how to prepare and label a custom dataset. In this case, we will cover data labeling with Label Studio to locate objects on images and create the corresponding bounding boxes around them. In the next few sections, we will learn how to deploy the application on our single node and use it to prepare the custom dataset to be later consumed during the YOLO model training. 

    Label Studio deployment 

    Label Studio, as previously introduced, is a comprehensive suite for data labeling. It can be installed on a wide variety of infrastructures, like on-premise and cloud, through different installation methods. One of them—and the one that fits better in our environment—is directly deploying the application and dependent components in our Red Hat OpenShift cluster via deployment. 

    First of all, we need to create a new namespace where we will deploy all the resources associated with the labeling tool. 

    oc new-project labelstudio

    And once we have this new project available, we will dive into the deployment of the different components that make up the tool.

    PostgreSQL

    In the first instance, Label Studio requires persistent storage to save some metadata. For this reason, it requires the deployment of a PostgreSQL database. In this step, we will show you how to get that up and running in your environment.

    1. Start by applying the database Configmap:

      vi db_configmap.yaml
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: postgres-config
        labels:
          app: postgres
      data:
        POSTGRES_DB: db
        POSTGRES_USER: postgres
        POSTGRES_PASSWORD: postgres
      oc apply -f db_configmap.yaml
    2. This database will require a persistent volume to store the data. We can either create it from the web console or directly apply the following file:

       vi db_pvc.yaml
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: postgres-pvc
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
      oc apply -f db_pvc.yaml
    3. Now that the volumes are created, we can proceed with the PostgreSQL deployment:

      vi db_deployment.yaml
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: postgres 
      spec:
        replicas: 1
        selector:
          matchLabels:
            app: postgres
        template:
          metadata:
            labels:
              app: postgres
          spec:
            containers:
              - name: postgres
                image: postgres:alpine
                imagePullPolicy: "IfNotPresent"
                ports:
                  - containerPort: 5432
                envFrom:
                  - configMapRef:
                      name: postgres-config
                env:
                  - name: PGDATA
                    value: /var/lib/postgresql/data/pgdata
                volumeMounts:
                  - mountPath: /var/lib/postgresql/data
                    name: postgres-vol
            volumes:
              - name: postgres-vol
                persistentVolumeClaim:
                  claimName: postgres-pvc
      oc apply -f db_deployment.yaml
    4. Finally, we just need to expose the database service:

      vi db_service.yaml
      apiVersion: v1
      kind: Service
      metadata:
        name: postgres
        labels:
          app: postgres
      spec:
        type: NodePort
        ports:
          - port: 5432
        selector:
          app: postgres
      oc apply -f db_service.yaml

    Label Studio

    After finishing the storage configuration in our application and deploying the database, we can continue by deploying Label Studio’s application itself. 

    1. The images and annotations created with Label Studio will be stored in our node, meaning that we need to create another PVC for storage purposes:

      vi ls_pvc.yaml
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: labelstudio-data-pvc
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
      oc apply -f ls_pvc.yaml
    2. Apply the deployment. This will pull the latest label-studio image and will connect the PVCs and the PostgreSQL database to the application:

      vi ls_deployment.yaml
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: labelstudio
      spec:
        replicas: 1
        selector:
          matchLabels:
            component: labelstudio
        strategy:
          type: Recreate
        template:
          metadata:
            labels:
              component: labelstudio
          spec:
            containers:
              - name: labelstudio
                image: heartexlabs/label-studio:latest
                imagePullPolicy: Always
                stdin: true
                tty: true
                env:
                  - name: DJANGO_DB
                    value: default
                  - name: POSTGRE_NAME
                    value: postgres
                  - name: POSTGRE_USER
                    value: postgres
                  - name: POSTGRE_PASSWORD
                    value: postgres
                  - name: POSTGRE_PORT
                    value: "5432"
                  - name: POSTGRE_HOST
                    value: postgres
                volumeMounts:
                  - name: labelstudio-data-vol
                    mountPath: /label-studio/data
            volumes:
              - name: labelstudio-data-vol
                persistentVolumeClaim:
                  claimName: labelstudio-data-pvc
      oc apply -f ls_deployment.yaml
    3. Create the service that will be expose the 8080 port:

      vi ls_service.yaml
      apiVersion: v1
      kind: Service
      metadata:
        name: labelstudio
      spec:
        ports:
          - port: 8080
        selector:
          component: labelstudio
        clusterIP: None
      oc apply -f ls_service.yaml
    4. As a last step, we are going to create a route that will make the application accessible from our browser:

      vi ls_route.yaml
      apiVersion: route.openshift.io/v1
      kind: Route
      metadata:
        name: label-studio-route
      spec:
        path: /
        to:
          kind: Service
          name: labelstudio
        port:
          targetPort: 8080
      oc apply -f ls_route.yaml

    We have just finished the Label Studio deployment. It’s time to start playing with it and prepare our custom dataset. Run this command to get the route we just created:

    oc get route
    NAME                       HOST/PORT                                                                          PATH    SERVICES            PORT      
    label-studio-route     label-studio-route-labelstudio.sno.pemlab.rdu2.redhat.com   /            labelstudio             8080 

    Access the route from a web browser. If everything was configured correctly, the Label Studio login page should show (Figure 1). You can log in by creating a new account or using an existing one. 

    Label Studio login screen.
    Figure 1: Label Studio’s home page to log in or sign up to the application. 

    Labeling a custom dataset

    Once you've accessed the webpage and logged in, we can begin labeling our custom dataset.

    1. We are going to firstly create our project by selecting the Create Project button. 
    2. In the Project Name tab, you can use whatever name best suits your dataset; in our case, we are just going to name it Custom dataset. 
    3. Next, we will import the images we want to label in the Data Import tab. 
    4. In this page you can directly paste a URL to your dataset images or directly Upload the images you want to label from your computer. 
    5. Navigate to the Labeling Setup tab, where we are going to select the template we will use to label our images. 
    6. Select the Object Detection with Bounding Boxes. This will open a new wizard to configure our different classes. 
    7. From there, we will create the new labels for our custom dataset. Delete the existing labels, type the new ones and click Add. In my case, I want to detect different aircrafts. My labels are A380 and B747, as shown in Figure 2. 

      Custom labels.
      Figure 2: Creation of two new custom labels for the A380 and B747 classes. 
    8. After that, we'll select Save in the upper-right corner to begin labeling our data.

    From the project dashboard, select Label all tasks, which will take you to the first image to label. To select a label, you can either click the corresponding label or press the number on your keyboard that corresponds to the label. For example, the A380 is labeled as 1. After pressing 1, click and drag on the image to create a bounding box where the aircraft is located (Figure 3). 

    Bounding box generation.
    Figure 3: Generating a bounding box for the A380 aircraft.

    Remember to create a different box for each aircraft present in the image. If in the same image both aircraft types coexist, make sure you are selecting the corresponding label for each one. When done, select the next image on the left side of the screen. Figure 4 shows another example for a B747 (class 2).

    Bounding box generation.
    Figure 4: Generating a bounding box for the B747 aircraft.

    Once you are done labeling, Submit them, and return to the project dashboard. In the upper-right corner, select Export. Since we will use this data to train YOLO object detection models, export the data in the YOLO format (Figure 5).

    Dataset export in YOLO format.
    Figure 5: Selection of YOLO format for the dataset export.

    This will trigger the prepared and labeled dataset download. When finished, unzip the file. Now, let’s take a look at the folders that it includes: 

    • /images: contains the original images. 
    • /labels: contains a single text file per image. Each line in the file represents the class number and the coordinates for each bounding box.  
    • classes.txt: list with the labels in order. In our case: A380, B747. 

    Now that we have our dataset ready, we can push all these files into a Git repository to be imported to Red Hat OpenShift AI. 

    Video demo

    Watch the following video demo to see how to prepare a custom dataset for AI/ML model training.

    Next steps

    In this article, you learned about the importance of accurately labeled data. To help us put this into practice, we have deployed Label Studio in our OpenShift cluster to label our custom data. 

    In the next article, we will train a YOLO object detection model with our newly labeled data using Red Hat OpenShift AI. 

    Last updated: May 23, 2024

    Related Posts

    • Model training in Red Hat OpenShift AI

    • Red Hat OpenShift AI installation and setup

    • GPU enablement on MicroShift

    • Enable GPU acceleration with the Kernel Module Management Operator

    • Access the OpenAI ChatGPT API in Quarkus

    • Red Hat OpenShift AI installation and setup

    Recent Posts

    • A deep dive into Apache Kafka's KRaft protocol

    • Staying ahead of artificial intelligence threats

    • Strengthen privacy and security with encrypted DNS in RHEL

    • How to enable Ansible Lightspeed intelligent assistant

    • Why some agentic AI developers are moving code from Python to Rust

    What’s up next?

    In this learning path, you will set up options for your Jupyter notebook server and select your PyTorch preferences,  then explore the dataset you'll use to create your model. Finally, you will learn how to build, train, and run your PyTorch model.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue