Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Monitoring RHGS

November 20, 2017
Benny Turner

Share:

    OK so you watched:

    https://www.redhat.com/en/about/videos/architecting-and-performance-tuning-efficient-gluster-storage-pools

    You put in the time and architected an efficient and performant GlusterFS deployment. Your users are reading and writing, applications are humming along, and Gluster is keeping your data safe.

    Now what?

    Well, congratulations you just completed the sprint! Now its time for the marathon.

    The often forgotten component of performance tuning is monitoring, you put in all that work up front to get your cluster performing and your users happy, now how do you ensure that this continues and possibly improves? This is all done through continued monitoring and profiling of your storage cluster, clients, and a deeper look at your workload. In this blog we will look at the different metrics you can monitor in your storage cluster, identify which of these are important to monitor, how often to monitor them, and different ways to accomplish this.

    I will break down the metrics we will be looking at into a few different categories:

    1. System resources
    2. Cluster resources
    3. Client resources

    System resources are the typical things you would monitor on any Linux system and include:

    • CPU
    • RAM
    • Network
    • Disk
    • Processes

    System resources should be monitored on all nodes in the cluster and should be looked at on an individual basis.

    Next, we have cluster resources. Cluster resources include:

    • Gluster Peers
    • Gluster Volumes
    • Geo-Replication

    Cluster resources should only be monitored from one node, the current GlusterD implementations have some limitations on running commands from multiple nodes at the same time and monitoring cluster-wide things from multiple nodes at the same time is redundant. Running a command from multiple nodes at the same time can lead to incidental things like temporary warnings that another node has a lock and the command cannot be completed to more serious lost locks where all Gluster are blocked until the GlusterD service is restarted on the node holding the lock. I want to reiterate that gluster volume/etc command MUST be run from only one node.

    The last piece is keeping an eye on your clients. This may or may not be something you want to monitor along with your storage cluster, there are pluses and minuses to doing this. It can be useful to see what the Gluster processes are doing client side as gluster can consume CPU, memory, and network. On clients I look at:

    • CPU
    • RAM
    • Network
    • Gluster processes
    • Available space
    • IOPs / throughput on Gluster mounts
    • Number of clients in use

    In this blog, I will give suggestions on what commands to run to see resource usage, how often to run said commands, and on which systems they should be run on. I will attempt to give a few examples of how to accomplish this, but as everyone's environment is different and various tools can be used, I will try not to be too tool specific.

    Now that we've gone through the overview let's get into the weeds a bit. Lets start with monitoring the resources of your Gluster storage cluster nodes. To expand a bit further on the server resources I listed above, I want to get into the data points we will look at for each resource group as well as a possible way to check the usage of this resource:

    • CPU
      • CPU usage (percent utilization for each CPU)
        • top -> 1
      • Context switches
        • sar -w
      • System load
        • sar -q
      • Where CPU resources are being used (system/user/await/steal)
        • sar -C
    • RAM
      • RAM usage
        • free -m  -> available column, be sure to account for free-able memory
      • SWAP usage
        • free -m
      • RAM used by Gluster processes
        • ps aux | grep gluster -> be sure to look at RSS (resident/actual usage) not VSZ (virtual/shared memory usage)
    • Network
      • Send statistics
        • ifstat
      • Receive statistics
        • ifstat
      • Dropped packets
        • ifconfig <device>
      • Retransmits
        • ifconfig <device>
    • Disk
      • LVM thinpool usage
        • lvdisplay -> Allocated pool data
      • LVM metadata usage
        • lvdisplay -> Allocated metadata
      • AWAIT
        • iostat -c -m -x
      • %Utilized (IOPs)
        • iostat -c -m -x
      • Used space of bricks
        • df -h
    • Processes
      • Look for hot threads
        • top -H -> look for threads pegging a CPU at 100% for an extended period (seconds?)

    The system level commands I have used as examples are just normal every day Linux commands that most admins should know. It's up to you how often you want to monitor these data points, I would look at them in a matter of minutes if this is something you need to keep a tight eye on. If you are not resource constrained, I would move out to 10s of minutes or even more. Even though these commands are lightweight and well-tested/used, they still take some system resources, and one of the keys of successful monitoring is to ensure that your monitoring doesn't affect your cluster's performance.

    Next, we will look at cluster level commands, these are mostly gluster commands and I would like to again reiterate that they should only be executed from one node! Cluster-wide commands tend to be a bit more invasive on the system, so we should keep the frequency these run considerably less than system level commands. The cluster-level commands I like to keep an eye on are:

    • Gluster Peers
      • gluster peer status
    • Gluster Volumes
      • gluster volume status
      • gluster volume rebalance <VOLNAME> status
    • Geo Replication
      • gluster volume geo-replication status  -> detail add some nice info
    • Bitrot
      • gluster volume bitrot <VOLNAME> scrub status
    • Snapshots
      • gluster snapshot status
    • Self Heal
      • gluster volume heal <VOLNAME> info
      • gluster volume heal <VOLNAME> info split-brain
    • Quota
      • gluster volume quota <VOLNAME> list

    Again, these commands should be run less often and only from one node at a time. Commands that are less invasive which include peer status, volume status, and quota list can be run more often, maybe every 30-120 minutes. The more invasive commands I would run less often, maybe 4-6 times per day. You can choose to run these more often just remember you don't want your monitoring commands adding additional load to your cluster.

    The last group is client-side commands. This can be a bit of a grey area if you are using Gluster as a backing store for an application you may want to monitor your application and storage cluster separately. I am just listing out the things I would look at, you can implement these however you choose. Client-side commands are a bit of a mix of cluster and system level, and consideration on how often they should be run should be made depending which group they fall under:

    • CPU
      • CPU usage(percent utilization for each CPU)
        • top -> 1
      • Context switches
        • sar -w
      • System load
        • sar -q
      • Where CPU resources are being used(system / user / await / steal)
        • sar -C
    • RAM
      • RAM usage
        • free -m  -> available column, be sure to account for free-able memory
      • SWAP usage
        • free -m
      • RAM used by Gluster processes
        • ps aux | grep gluster -> be sure to look at RSS(resident / actual usage) not VSZ(virtual / shared memory usage)
    • Network
      • Send statistics
        • ifstat
      • Receive statistics
        • ifstat
      • Dropped packets
        • ifconfig <device>
      • Retransmits
        • ifconfig <device>
    • Gluster processes
      • ps aux -> look at RSS / memory usage
      • top -H -> again, look for hot threads
    • Available space
      • df -h
    • IOPs / throughput on Gluster mounts
      • This can be a tricky one, you could set up a script that run a small read / write test to measure what kind of throughput you are getting on the volume
    • Number of clients in use
      • gluster v status <VOLNAME> clients

    Monitoring clients is less important in the general NAS use cases, but if you are hyperconverged and/or use Gluster to back a mission-critical application then client-side monitoring becomes more important. For the system level resources, I would adhere to the same guidelines as I have detailed above. For the cluster level commands I would look at these a few times a day, if you want to run actual read/write tests I would think once a day would suffice.

    I hope this blog provides enough detail about what and how often to monitor your cluster/system/client resources. I'll leave you with a couple of key ideas I try to adhere to when monitoring my clusters:

    1. Don't let your monitoring in any way interfere with your cluster's performance.
    2. Run cluster-wide monitoring commands less often than system commands as they are much more expensive.
    3. Run you monitoring commands as few times as possible while still effectively keeping an eye on resources.
    4. Only run cluster (gluster commands here) from one node at a time!

    Thanks for reading!

    -b


    Whether you are new to Containers or have experience, downloading this cheat sheet can assist you when encountering tasks you haven’t done lately.

    Last updated: November 15, 2017

    Recent Posts

    • DeepSeek-V3.2-Exp on vLLM, Day 0: Sparse Attention for long-context inference, ready for experimentation today with Red Hat AI

    • How to deploy the Offline Knowledge Portal on OpenShift

    • Autoscaling vLLM with OpenShift AI

    • Filtering packets from anywhere in the networking stack

    • PostGIS: A powerful geospatial extension for PostgreSQL

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue