Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

How to store large amounts of data in a program

July 5, 2019
Nick Clifton
Related topics:
Artificial intelligence

Share:

    Most programs need data in order to work. Sometimes this data is provided to the program when it runs, and sometimes the data is built into the program. In this article, I'll explain how to store large amounts of data inside a program so that it is there when the program runs.

    The most obvious method of storing data is to include it in the program's source code. For example, in C:

    int a = 1;

    This approach works for small amounts of data, but it quickly becomes cumbersome as the amount of data to be stored increases.  Additionally, if the data is going to be stored in this way, it's often necessary to create a tool that will convert the data into a form that is acceptable to the programming language used.

    The next choice would be to load the data at run-time. This works, but it has problems, too. For example, it presumes the existence of a filesystem that can be used to store the data file(s). It also means that the program is no longer a single entity but now has to be shipped with these data files.  And, extra code needs to be written to handle situations where the files are missing or corrupt.

    So, this article presents a method for including large data files into the body of an executable program. The article is written with ELF-based GNU/Linux systems in mind. Other operating systems may have other methods for solving this problem. In particular, it is worth noting that Windows supports the concept of "resources" [1] for programs, which provide read-only access to various types of embedded data.

    The INCBIN directive

    The method is to make use of an assembler source file, or even inline assembler, and the special assembler pseudo-op called .incbin [2]. This directive allows an arbitrary file to be included in the program at the specified location. For example:

    .incbin "foo.jpg"

    In practice, it is best to make sure that the data is located in the correct section and that it is aligned correctly. Additionally, symbols will probably be needed to provide access to the data from the high-level source code:

    .data
    .align 4
    .global start_of_foo
    start_of_foo:
    .incbin "foo.jpg"
    .global end_of_foo
    end_of_foo:

    This could then be accessed in a C source file like this:

    extern char start_of_foo;
    extern char end_of_foo;
    char * p;
    
    for (p = & start_of_foo; p < & end_of_foo; p++)
      ...

    Note that the use of the address operators (&) and the absence of pointer types (char *) in the above code fragment is correct. This is because of the difference between assembler created symbols and compiler-generated symbols [3]. When an assembler creates a symbol, all it really does is to provide a label that corresponds to a given address. Whereas when a compiler creates a symbol, it creates a space in the program's data, installs a value into that space, and then uses the symbol as an indirect reference to that value.

    The C language does allow symbols to be treated as labels; however, they must be declared as unsized arrays instead:

    extern char start_of_foo[];
    extern char end_of_foo[];
    char * p;
    
    for (p = start_of_foo; p < end_of_foo; p++)
    ...

    The assembler code puts the contents of foo.jpg into the program's data section, which means that it can be written to as well as read. If the data needs to be read-only, then it should be placed into the .rodata section instead, like this:

    .section .rodata
    [...]
    .incbin "foo.jpg"
    [...]

    In fact, it may be desirable to place the data into a section all of its own so that it can be easily located in the resulting executable. The .section directive allows new sections to be created so the following could be used:

    .section foo-image, "a" @progbits

    The "a" indicates that space should be allocated for the section in the run-time memory image of the program. By default this data is read-only, so if it needs to be writeable, you would add the w flag (i.e., "aw"). The @progbits indicates that the section only contains data, nothing else.

    Another thing to consider with this method is that it changes the current section, which could cause problems if the assembler is inlined into a higher level source code. In this case the .pushsection and .popsection pseudo-ops can be used to safely change the section, like this:

    __asm__("\n\
        .pushsection .foo-image, \"a\", @progbits\n\
        .align 4\n\
        .global start_of_foo\n\
    start_of_foo:\n\
        .incbin \"foo.jpg\"\n\
        .global end_of_foo\n\
    end_of_foo:\n\
        .popsection\n");
    

    Putting the data into a section of its own also has an additional benefit. As long as the section name is a valid C identifier (meaning foo_image is OK, but foo-image is not), then the linker will automatically create beginning and end symbols for it. So, it's not necessary to declare them in the assembler code. Hence the following program will print out the size and contents of a file called foo.jpg, with foo.jpg being embedded into the executable:

    int
    main (void)
    {
      extern const char __start_foo_image[];
      extern const char __stop_foo_image[];
      const char * p;
    
      __asm__("\n\
    .pushsection foo_image, \"a\", @progbits\n\
    .incbin \"foo.jpg\"\n\
    .popsection\n");
    
      printf ("image size: %#lx\n", __stop_foo_image - __start_foo_image);
    
      for (p = __start_foo_image; p < __stop_foo_image; p++)
        printf ("%d ", *p);
    
      printf ("\n");
      return 0;
    }

    Modifying the in-program data

    One problem with storing data inside an executable is that it is then difficult to modify the data. Recompilation is always an option, but there is another option. The objcopy program allows the contents of sections in a program to be changed. Note, however, that it does not allow editing of individual bytes within a section, only the wholesale replacement of the contents of a section. Thus, this method only works if the data has been placed into a section of its own.

    The command [4] looks like this:

    objcopy --update-section sectionname=filename <file>

    So, given the examples above this command:

    objcopy --update-section foo_image="bar.jpg" a.out

    will replace the foo.jpg image inside a.out with the bar.jpg image.

    This method does have a major flaw, however; the replacement does not change the symbols generated by the assembler or the linker, and the compiled code will still use the old values. So, if the new file is of a different size to the old file then the stop/end symbol will be incorrect. The start symbol will still be OK, because its value is relative to the start of the foo_image section, which is always zero. Thus, the moral to this story is that, unless the data is self-describing, do not replace it with anything other than an equal-sized block.

    Conclusion

    It is possible to store large data sets inside a program, using a little bit of assembler hackery. Putting the data into its own section makes it easier to examine and, if necessary, alter. This approach does make the program bigger, of course, but depending upon the circumstances it may still be better than storing the data outside of the program.

    References

    [1] https://en.wikipedia.org/wiki/Resource_(Windows)
    [2] https://sourceware.org/binutils/docs-2.32/as/Incbin.html#Incbin
    [3] https://sourceware.org/binutils/docs-2.32/ld/Source-Code-Reference.html#Source-Code-Reference
    [4] https://sourceware.org/binutils/docs-2.32/binutils/objcopy.html#objcopy

    Last updated: July 3, 2019

    Recent Posts

    • Container starting and termination order in a pod

    • More Essential AI tutorials for Node.js Developers

    • How to run a fraud detection AI model on RHEL CVMs

    • How we use software provenance at Red Hat

    • Alternatives to creating bootc images from scratch

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue