Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

How to store large amounts of data in a program

July 5, 2019
Nick Clifton
Related topics:
Artificial intelligence

    Most programs need data in order to work. Sometimes this data is provided to the program when it runs, and sometimes the data is built into the program. In this article, I'll explain how to store large amounts of data inside a program so that it is there when the program runs.

    The most obvious method of storing data is to include it in the program's source code. For example, in C:

    int a = 1;

    This approach works for small amounts of data, but it quickly becomes cumbersome as the amount of data to be stored increases.  Additionally, if the data is going to be stored in this way, it's often necessary to create a tool that will convert the data into a form that is acceptable to the programming language used.

    The next choice would be to load the data at run-time. This works, but it has problems, too. For example, it presumes the existence of a filesystem that can be used to store the data file(s). It also means that the program is no longer a single entity but now has to be shipped with these data files.  And, extra code needs to be written to handle situations where the files are missing or corrupt.

    So, this article presents a method for including large data files into the body of an executable program. The article is written with ELF-based GNU/Linux systems in mind. Other operating systems may have other methods for solving this problem. In particular, it is worth noting that Windows supports the concept of "resources" [1] for programs, which provide read-only access to various types of embedded data.

    The INCBIN directive

    The method is to make use of an assembler source file, or even inline assembler, and the special assembler pseudo-op called .incbin [2]. This directive allows an arbitrary file to be included in the program at the specified location. For example:

    .incbin "foo.jpg"

    In practice, it is best to make sure that the data is located in the correct section and that it is aligned correctly. Additionally, symbols will probably be needed to provide access to the data from the high-level source code:

    .data
    .align 4
    .global start_of_foo
    start_of_foo:
    .incbin "foo.jpg"
    .global end_of_foo
    end_of_foo:

    This could then be accessed in a C source file like this:

    extern char start_of_foo;
    extern char end_of_foo;
    char * p;
    
    for (p = & start_of_foo; p < & end_of_foo; p++)
      ...

    Note that the use of the address operators (&) and the absence of pointer types (char *) in the above code fragment is correct. This is because of the difference between assembler created symbols and compiler-generated symbols [3]. When an assembler creates a symbol, all it really does is to provide a label that corresponds to a given address. Whereas when a compiler creates a symbol, it creates a space in the program's data, installs a value into that space, and then uses the symbol as an indirect reference to that value.

    The C language does allow symbols to be treated as labels; however, they must be declared as unsized arrays instead:

    extern char start_of_foo[];
    extern char end_of_foo[];
    char * p;
    
    for (p = start_of_foo; p < end_of_foo; p++)
    ...

    The assembler code puts the contents of foo.jpg into the program's data section, which means that it can be written to as well as read. If the data needs to be read-only, then it should be placed into the .rodata section instead, like this:

    .section .rodata
    [...]
    .incbin "foo.jpg"
    [...]

    In fact, it may be desirable to place the data into a section all of its own so that it can be easily located in the resulting executable. The .section directive allows new sections to be created so the following could be used:

    .section foo-image, "a" @progbits

    The "a" indicates that space should be allocated for the section in the run-time memory image of the program. By default this data is read-only, so if it needs to be writeable, you would add the w flag (i.e., "aw"). The @progbits indicates that the section only contains data, nothing else.

    Another thing to consider with this method is that it changes the current section, which could cause problems if the assembler is inlined into a higher level source code. In this case the .pushsection and .popsection pseudo-ops can be used to safely change the section, like this:

    __asm__("\n\
        .pushsection .foo-image, \"a\", @progbits\n\
        .align 4\n\
        .global start_of_foo\n\
    start_of_foo:\n\
        .incbin \"foo.jpg\"\n\
        .global end_of_foo\n\
    end_of_foo:\n\
        .popsection\n");
    

    Putting the data into a section of its own also has an additional benefit. As long as the section name is a valid C identifier (meaning foo_image is OK, but foo-image is not), then the linker will automatically create beginning and end symbols for it. So, it's not necessary to declare them in the assembler code. Hence the following program will print out the size and contents of a file called foo.jpg, with foo.jpg being embedded into the executable:

    int
    main (void)
    {
      extern const char __start_foo_image[];
      extern const char __stop_foo_image[];
      const char * p;
    
      __asm__("\n\
    .pushsection foo_image, \"a\", @progbits\n\
    .incbin \"foo.jpg\"\n\
    .popsection\n");
    
      printf ("image size: %#lx\n", __stop_foo_image - __start_foo_image);
    
      for (p = __start_foo_image; p < __stop_foo_image; p++)
        printf ("%d ", *p);
    
      printf ("\n");
      return 0;
    }

    Modifying the in-program data

    One problem with storing data inside an executable is that it is then difficult to modify the data. Recompilation is always an option, but there is another option. The objcopy program allows the contents of sections in a program to be changed. Note, however, that it does not allow editing of individual bytes within a section, only the wholesale replacement of the contents of a section. Thus, this method only works if the data has been placed into a section of its own.

    The command [4] looks like this:

    objcopy --update-section sectionname=filename <file>

    So, given the examples above this command:

    objcopy --update-section foo_image="bar.jpg" a.out

    will replace the foo.jpg image inside a.out with the bar.jpg image.

    This method does have a major flaw, however; the replacement does not change the symbols generated by the assembler or the linker, and the compiled code will still use the old values. So, if the new file is of a different size to the old file then the stop/end symbol will be incorrect. The start symbol will still be OK, because its value is relative to the start of the foo_image section, which is always zero. Thus, the moral to this story is that, unless the data is self-describing, do not replace it with anything other than an equal-sized block.

    Conclusion

    It is possible to store large data sets inside a program, using a little bit of assembler hackery. Putting the data into its own section makes it easier to examine and, if necessary, alter. This approach does make the program bigger, of course, but depending upon the circumstances it may still be better than storing the data outside of the program.

    References

    [1] https://en.wikipedia.org/wiki/Resource_(Windows)
    [2] https://sourceware.org/binutils/docs-2.32/as/Incbin.html#Incbin
    [3] https://sourceware.org/binutils/docs-2.32/ld/Source-Code-Reference.html#Source-Code-Reference
    [4] https://sourceware.org/binutils/docs-2.32/binutils/objcopy.html#objcopy

    Last updated: July 3, 2019

    Recent Posts

    • Federated identity across the hybrid cloud using zero trust workload identity manager

    • Confidential virtual machine storage attack scenarios

    • Introducing virtualization platform autopilot

    • Integrate zero trust workload identity manager with Red Hat OpenShift GitOps

    • Best Practice Configuration and Tuning for Linux and Windows VMs

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.