Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Designing efficient file operations at cloud scale

October 5, 2021
Avishay Traeger
Related topics:
LinuxMicroservices
Related products:
Red Hat Enterprise Linux

    Accessing and operating on data is one of the most time-consuming aspects of computing. Developers can improve efficiency by looking for ways to avoid the overhead required by standard file operations. To illustrate the possibilities, I will report on a couple of interesting cases where I designed cloud-scale services that dynamically construct files for users to consume.

    The first application was an incremental backup and restore application, and the second was part of a new OpenShift installation service that creates personalized ISO files of Red Hat Enterprise Linux CoreOS (RHEL CoreOS). Both applications went through similar iterations, starting with naive implementations and gradually improving their efficiency. I will focus on the ISO design first and briefly discuss the backup and restore application at the end.

    First optimization: Amazon S3 server-side copy

    In the naive ISO implementation, we started with a copy of the RHEL CoreOS ISO in an Amazon Simple Storage Service (S3) bucket. When a user requested a customized ISO via a "generate" REST API, the back-end service fetched the base ISO from S3, executed logic to insert our customizations into the ISO (Ignition data), and uploaded the resulting ISO back to S3. The user could then download the ISO directly from S3.

    This naive implementation required the back-end service to download roughly 900MB to the file system, read it, perform a few modifications, write the new ISO to the file system, and upload the new 900MB ISO. The whole process took roughly 30 seconds and incurred significant Amazon Web Service (AWS) costs—both from data transfers and from storing the 900MB per ISO. However, an important benefit was that users downloaded their files directly from S3, so our service did not need to incur the overhead of the download traffic.

    The first milestone in optimizing this design involved S3’s UploadPartCopy API. Amazon S3 recommends dividing the uploads of large files into parts for parallelized uploads and resilience to network issues. The UploadPartCopy API tells S3 that it should retrieve a part’s data from an offset in an existing object instead of expecting a data upload for the specific part. For this optimization, we took advantage of the intelligent way Ignition data is stored in the RHEL CoreOS ISO, as depicted in Figure 1.

    An ISO is generally created by taking a directory tree with the necessary files and packing it up with a given tool. The RHEL CoreOS generation process adds an empty file where the Ignition data will be stored into that directory tree, packs the ISO, finds the empty file’s offset in the ISO, and writes the offset and size into a well-known offset in the ISO’s header. Our optimization instructed S3 to create the new ISO object by performing server-side copies of the data preceding and following the Ignition area, and filling the Ignition area with our uploaded Ignition configuration, as shown in Figure 2.

    Embedding Ignition data in a RHEL CoreOS ISO using Amazon S3 server-side copies.
    Figure 2: Embedding Ignition data in a RHEL CoreOS ISO using Amazon S3 server-side copies.

    Amazon S3 requires each file part to be between 5MB and 5GB in size, which slightly complicated our implementation, but it was straightforward otherwise. The generation time went down from 30 seconds to around 12, and we also saved money on transfer costs. However, we were still paying for storing all of the custom ISOs.

    Second optimization: Download stream injection

    The second optimization involved a tradeoff: We would serve files from our installation service instead of from S3. This allowed an optimized experience and eliminated almost all S3 costs, but incurred the download traffic costs.

    We again started with the base ISO in S3 but cached it locally. When a user asked to download their ISO, we would begin serving data from the cached base ISO. But as soon as the download stream reached the Ignition area, we began serving our Ignition data. After we injected the Ignition data into the download stream, we continued serving the data from the base ISO.

    The benefits were that our service no longer incurred S3 transfer costs or storage costs. This optimization also improved our user experience (UX); rather than forcing the user to generate an ISO, wait 12 or 30 seconds, and then download the file, the user could initiate the download immediately. The drawback was that our service needed to handle the user download traffic. To address this, we split the ISO service into a separate microservice that we could independently scale.

    Backup and restore application

    The backup and restore application that I designed was similar. We implemented incremental disk backups, meaning that the oldest backup contained a full disk image at a certain point in time, and each additional backup contained only the chunks of data that changed from the previous backup. We implemented each backup as one S3 object containing the data and another object containing a bitmap describing the chunks the data object contained.

    The three primary operations in this application were backup, restore, and compaction. We continuously tracked the changes to the disk locally using a bitmap. We took a consistent snapshot of both the data and the bitmap and uploaded them to S3 in the previously described format to create a new backup.

    To restore a disk backup, we downloaded all relevant bitmaps to understand which chunks should be read from which backups. We then created a download stream that read only the relevant data from the relevant incremental backup objects.

    Compaction is an important operation in incremental backups. If left unchecked, the incremental backup chains would grow indefinitely. The growth would incur high storage costs, long recovery times, and, worst of all, an increased risk that a restore operation would fail due to a corrupt link in the chain. Therefore, compaction merges the oldest (full) backup with the next-oldest (incremental) backup. We performed this merge using the S3 UploadCopyPart API that I described earlier, creating a new full backup from the two original backups.

    Conclusion

    These examples illustrate the impact of file formats on subsequent operations involving the files. If you are dealing with existing file formats, get to know them and consider how to best work with them. If you have the challenge and luxury of defining a new file format, efficient operations should be among your top considerations.

    In either case, it is important to become familiar with your storage platform's APIs and behaviors. What operations can you offload to it? Does it perform better with many small files or a few large ones? Is there a significant difference between random and sequential access? File operations tend to be slow compared to other application tasks, but there are also many opportunities for optimization and innovation.

    Last updated: October 30, 2023

    Related Posts

    • How to run containerized workloads securely and at scale with Fedora CoreOS

    • How to customize Fedora CoreOS for dedicated workloads with OSTree

    • Working with Red Hat Enterprise Linux Universal Base Images (UBI)

    Recent Posts

    • Every layer counts: Defense in depth for AI agents with Red Hat AI

    • Fun in the RUN instruction: Why container builds with distroless images can surprise you

    • Trusted software factory: Building trust in the agentic AI era

    • Build a zero trust AI pipeline with OpenShift and RHEL CVMs

    • Red Hat Hardened Images: Top 5 benefits for software developers

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.