Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

How to maximize data storage for microservices and Kubernetes, Part 1: An introduction

August 11, 2021
Don Schenck
Related topics:
Automation and managementContainersKubernetesOperators
Related products:
Red Hat OpenShift Container Platform

    Microservices is a hot topic. Web pages, architectural dissertations, conference talks... The amount of information and number of opinions is staggering, and it can be overwhelming. If ever there was a hot topic in IT, microservices are "it" right now. Create super-small services. Use Functions as a service (FaaS). Embrace serverless. Spread the workload, loosely coupled and written in any mix of software development languages. Go forth and be micro!

    But have you noticed that very few are talking about the data part of all this? "Distribute the data, one database per microservice" is the standard, one-size-fits-all (which, at least in clothing, means "one size fits no one") answer.

    Yeah. Sure. That's easy for you to say.

    In this series of articles, we'll take a look at data storage options as they relate to microservices, Kubernetes, and—specifically—Red Hat OpenShift. We'll start with a broad overview (the buzzword is "30,000-foot view," sigh), then get more detailed in upcoming articles.

    Let's begin by defining a few terms. This will give us an understanding from which to start. Note that when Kubernetes is mentioned, you can assume the information is also applicable to OpenShift.

    Ephemeral versus persistent storage

    If you create a pod in Kubernetes, you can use the pod's root file system for storage. That is to say, the file system inside the pod. You can write to and read from files. If you're running a relational database (RDBMS) or a NoSQL system in a pod—for example, MariaDB or Couchbase—you can use this file system to store your data. If you are running multiple containers within the same pod, the containers can share the data. This is perfectly acceptable and works fine.

    With some caveats. For starters, you can't share data between pods because the file system is local to just that pod. If you wish to share data with one or more other pods, you'll need to use a volume to store your data, a Kubernetes concept we'll learn more about later. For now, just know that a volume is a directory somewhere of some type. Where? What type? We'll get to that.

    Back to the subject of a pod's root file system. There's this small thing: When the pod is destroyed, so is the data. It goes away. In the context of Kubernetes, it's ephemeral.

    That may be desirable, especially when you're first developing a solution and want to wipe things out and start all over every time. As a developer, there are lots of times when this is a great feature. I can mess up as much as I want while knowing that when I restart, I get another try. If I'm using a script to populate a database, I can also fine-tune the script as I'm developing my application. The storage is ephemeral, and I have complete freedom. Eventually, however, you reach a point where you want the data to stick around for a while. You want it to be persistent.

    Persistent volumes and persistent volume claims

    When it comes to persisting data in Kubernetes, the most common solution is to bind a persistent volume claim (PVC) to your application. A PVC, in turn, is part of a persistent volume (PV). Because the binding happens at the application layer and not the pod layer, the cluster can remove and add pods at will while the application continues to use the same PVC. As a pod comes up, it "sees" the PVC as its storage.

    Think of a PV as a huge swath of storage, while a PVC is the part of the PV that you carve out for your application. You might conceptualize a PVC as a hard drive on a server if that helps make it easier to understand. In fact, it might be cloud-based storage spread across multiple clouds, but from the developer's point of view, it's local storage that happens to be persistent. Your pod can be wiped out and replaced, and the data stays intact.

    Here's how it works: A volume is configured for the cluster. It might reside in AWS as an Elastic Block Store (EBS) instance, or on Azure as an Azure Disk instance, or it might use CephFS or any one of many choices. This is the underlying storage system; the PV API hides the complexity of each from the cluster administrator.

    Using a volume, the cluster administrator creates a persistent volume (or several). Then, the developer or architect or operator—depending on how your organization manages this—specifies a PVC, including the size needed. Kubernetes will take care of making sure an eligible PV is found for the PVC. For example, if you specify a 100GB PVC, you must have an available PV of at least 100GB; Kubernetes will not map a 100GB PVC to, say, a 50GB PV.

    Consider the following two objects: a PVC and a Deployment. Notice how the PVC carves out the storage space and how the Deployment binds to it. Notice, also, that the PVC is seeking to use the storage as file storage, as opposed to block storage. Ignore the accessModes setting for now; that's another article.

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: mysqlvolume
    spec:
      resources:
        requests:
          storage: 5Gi
      volumeMode: Filesystem
      accessModes:
        - ReadWriteOnce
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: mysql
    spec:
      selector:
        matchLabels:
          app: mysql
          tier: database
      template:
        metadata:
          labels:
            app: mysql
            tier: database
        spec:
          containers:
          - name: mariadb
            env:
            - name: MYSQL_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mysqlpassword
                  key: password
            image: mariadb
            resources:
              limits:
                memory: "128Mi"
                cpu: "500m"
            ports:
            - containerPort: 3306
            volumeMounts:
            - name: mysqlvolume
              mountPath: /var/lib/mysql
          volumes:
           - name: mysqlvolume
             persistentVolumeClaim:
               claimName: mysqlvolume

    In this example, a 5GB PVC is created to store data for our MySQL Deployment. Our MySQL instance, running in Kubernetes, can be updated as often as we wish while the data remains intact.

    Reading through the YAML for the PVC and the deployment, you can see where volume names are specified, file paths are set, etc. Again, the underlying mechanism (e.g., EBS or CephFS) is completely transparent at this point.

    This setup makes backup and restore easy as well. (We'll get to that in a later article.)

    Persistent storage use cases

    What can you do with this persistent, shared storage? Anything you would with a directory on the server. You can use the file system to store files. Your RDBMS and NoSQL systems will use it to store your databases. You can store objects such as videos and images.

    Simply put, this is where a developer typically lives. You write to and read from files and databases.

    External and API-based storage

    You can, if you so choose, still use API-based storage from within your application (storage that is external to the cluster). You might decide to use the OpenStack Cinder API to store objects, or your C# application might have a connection string to an Azure SQL database instance.

    This option might be the best when migrating existing services into Red Hat OpenShift with minimal change.

    Red Hat OpenShift Data Foundation

    Finally (for this article), Red Hat OpenShift Data Foundation (previously Red Hat OpenShift Container Storage) is simple to install using the Red Hat OpenShift Container Storage operator. OpenShift Data Foundation introduces a way to use Ceph for file system storage, adding data resilience, snapshots, backups, and much more. This technology will be covered in a separate article. Figure 1 offers just a small taste of what the operator brings.

    Screenshot of the OpenShift Container Storage operator installation screen in the OpenShift dashboard.
    Figure 1: The OpenShift Container Storage operator's core capabilities.

    Coming up next

    In the next article in this series, we'll introduce the concepts of ReadWriteOnce (RWO), ReadWriteMany (RWX), and Object Bucket Claims (OBC), and explore the advantages of OpenShift Data Foundation. In the meantime, check out the OpenShift Data Foundation website to learn more. You can also experiment with Kubernetes and OpenShift in the free Developer Sandbox for Red Hat OpenShift.

    Last updated: October 31, 2023

    Related Posts

    • Persistent storage in action: Understanding Red Hat OpenShift's persistent volume framework

    • Distribute your microservices data with events, CQRS, and event sourcing

    Recent Posts

    • Tekton joins the CNCF as an incubating project

    • Federated identity across the hybrid cloud using zero trust workload identity manager

    • Confidential virtual machine storage attack scenarios

    • Introducing virtualization platform autopilot

    • Integrate zero trust workload identity manager with Red Hat OpenShift GitOps

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.