Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Efficient string copying and concatenation in C

August 12, 2019
Martin Sebor
Related topics:
C, C#, C++
Related products:
Red Hat OpenShift

Share:

    Among the most heavily used string handling functions declared in the standard C <string.h> header are those that copy and concatenate strings. Both sets of functions copy characters from one object to another, and both return their first argument: a pointer to the beginning of the destination object. The choice of the return value is a source of inefficiency that is the subject of this article.

    The code examples shown in this article are for illustration only. They should not be viewed as recommended practice and may contain subtle bugs.

    Standard solution

    The design of returning the functions' first argument is sometimes questioned by users wondering about its purpose–see for example strcpy() return value, or C: Why does strcpy return its argument? The simple answer is that it's due to a historical accident. The first subset of the functions was introduced in the Seventh Edition of UNIX in 1979 and consisted of strcat, strncat, strcpy, and strncpy. Even though all four functions were used in the implementation of UNIX, some extensively, none of their calls made use of their return value. The functions could have just as easily, and as it turns out, far more usefully, been defined to return a pointer to the last copied character, or just past it.

    The optimal complexity of concatenating two or more strings is linear in the number of characters. But, as mentioned above, having the functions return the destination pointer leads to the operation being significantly less than optimally efficient. The functions traverse the source and destination sequences and obtain the pointers to the end of both. The pointers point either at or just past the terminating NUL ('\0') character that the functions (with the exception of strncpy) append to the destination. However, by returning a pointer to the first character rather than the last (or one just past it), the position of the NUL character is lost and must be computed again when it's needed. This inefficiency can be illustrated on an example concatenating two strings, s1 and s2, into the destination buffer d. The idiomatic (though far from ideal) way to append two strings is by calling the strcpy and strcat functions as follows.

    	strcat (strcpy (d, s1), s2);

    To perform the concatenation, one pass over s1 and one pass over s2 is all that is necessary in addition to the corresponding pass over d that happens at the same time, but the call above makes two passes over s1. Let's break up the calls into two statements.

            char *d1 = strcpy (d, s1); // pass 1 over s1
            strcat (d1, s2); // pass 2 over the copy of s1 in d

    Because strcpy returns the value of its first argument, d, the value of d1 is the same as d. For simplicity, the examples that follow use d instead of storing the return value in d1 and using it. In the strcat call, determining the position of the last character involves traversing the characters just copied to d1. The cost of doing this is linear in the length of the first string, s1. The cost is multiplied with each appended string, and so tends toward quadratic in the number of concatenations times the lengths of all the concatenated strings. This inefficiency is so infamous to have earned itself a name: Schlemiel the Painter's algorithm. (See also 1.)

    It's important to point out that in addition to being inefficient, strcat and strcpy are notorious for their propensity for buffer overflow because neither provides a bound on the number of copied characters.

    Attempts to overcome limitations

    When the lengths of the strings are unknown and the destination size is fixed, following some popular secure coding guidelines to constrain the result of the concatenation to the destination size would actually lead to two redundant passes. For example, following the CERT advisory on the safe uses of strncpy() and strncat() and with the size of the destination being dsize bytes, we might end up with the following code.

    	strncpy (d, s1, dsize - 1);      // pass 1 over s1 plus over d up to dsize - 1
    	d[dsize - 1] = '\0';             // remember to nul-terminate
    	size_t n = strlen (d);           // pass 2 over copy of s1 in d
    	strncat (d, s2, dsize - n - 1);  // pass 3 over copy of s1 in d

    Note that unlike the call to strncat, the call to strncpy above does not append the terminating NUL character to d when s1 is longer than d's size. It's a common mistake to assume it does. In addition, when s1 is shorter than dsize - 1, the strncpy funcion sets all the remaining characters to NUL which is also considered wasteful because the subsequent call to strncat will end up overwriting them.

    In a futile effort to avoid some of the redundancy, programmers sometimes opt to first compute the string lengths and then use memcpy as shown below. This approach, while still less than optimally efficient, is even more error-prone and difficult to read and maintain.

    	size_t s1len = strlen (s1);      // pass 1 over s1
    	if (dsize <= s1len)
              s1len = dsize - 1;            // no need to nul-terminate
    	memcpy (d, s1, s1len);           // pass 2 over s1
    	size_t s2len = strlen (s2);      // pass 1 over s2
    	if (dsize - s1len <= s2len)
              s2len = dsize - s1len - 1;
    	memcpy (d + s1len, s2, s2len);   // pass 2, over s2
    	d[s1len + s1len] = '\0';         // nul-terminate result

    Using sprintf and snprintf for concatenation

    Programmers concerned about the complexity and readability of their code sometimes use the snprintf function instead.

    	snprintf (d, dsize, "%s%s", s1, s2);

    This results in code that is eminently readable but, owing to snprintf's considerable overhead, can be orders of magnitude slower than using the string functions even with their inefficiencies. The overhead is due not only to parsing the format string but also to complexities typically inherent in implementations of formatted I/O functions.

    Some compilers such as GCC and Clang attempt to avoid the overhead of some calls to I/O functions by transforming very simple sprintf and snprintf calls to those to strcpy or memcpy for efficiency. (See a live example online.) However, the corresponding transformation is rarely performed for snprintf because there is no equivalent string function in the C library (the transformation is only done when the snprintf call can be proven not to result in the truncation of output). memcpy alone is not suitable because it copies exactly as many bytes as specified, and neither is strncpy because it overwrites the destination even past the end of the final NUL character.

    The overhead of transforming snprintf calls to a sequence of strlen and memcpy calls is not viewed as sufficiently profitable due to the redundant pass over the string. The section titled Better builtin string functions lists some of the limitations of the GCC optimizer in this area as well as some of the tradeoffs involved in improving it.

    POSIX stpcpy and stpncpy

    A number of library solutions that are outside the C standard have emerged over the years to help deal with this problem. The POSIX standard includes the stpcpy and stpncpy functions that return a pointer to the NUL character if it is found. The functions can be used to mitigate the inconvenience and inefficiency discussed above.

    	const char* stpcpy (char* restrict, const char* restrict);
    	const char* stpncpy (char* restrict, const char* restrict, size_t);

    In particular, where buffer overflow is not a concern, stpcpy can be called like so to concatenate strings:

    	stpcpy (stpcpy (d, s1), s2);

    However, using stpncpy equivalently when the copy must be bounded by the size of the destination does not eliminate the overhead of zeroing out the rest of the destination after the first NUL character and up to the maximum of characters specified by the bound.

    	char *ret = stpncpy (d, dsize, s1);   // zeroes out d beyond the end of s1
    	dsize -= (ret - d);
    	stpncpy (d, dsize, s2);               // again zeroes out d beyond the end

    As a result, the function is still inefficient because each call to it zeroes out the space remaining in the destination and past the end of the copied string. Thus, the complexity of this operation is still quadratic. The severity of the inefficiency increases in proportion to the size of the destination and in inverse relation to the lengths of the concatenated strings.

    OpenBSD strlcpy and strlcat

    In response to buffer overflow attacks exploiting the weaknesses of strcpy and strcat functions, and some of the shortcomings of strncpy and strncat discussed above, the OpenBSD project in the late 1990's introduced a pair of alternate APIs designed to make string copying and concatentation safer [2].

    	size_t strlcpy (char* restrict, const char* restrict, size_t);
    	size_t strlcat (char* restrict, const char* restrict, size_t);

    The main difference between strncpy and strlcpy is in the return value: while the former returns a pointer to the destination, the latter returns the number of characters copied. Another difference is that strlcpy always stores exactly one NUL in the destination. To concatenate s1 and s2 the strlcpy function might be used as follows.

    	size_t n = strlcpy (d, s1, dsize);
    	dsize -= n;
    	d += n;
    	strlcpy (d, s2, dsize);

    This makes strlcpy comparable to snprintf both in its usage and in complexity (of course, the snprintf overhead, while constant, is much greater).

    The strlcpy and strlcat functions are available on other systems besides OpenBSD, including Solaris and Linux (in the BSD compatibility library) but because they are not specified by POSIX, they are not nearly ubiquitous.

    POSIX memccpy

    POSIX also defines another function that has all the desirable properties discussed above and that can be used to solve the problem.

    	void* memccpy (void* restrict dst, const void* restrict src, int c, size_t n);

    The function combines the properties of memcpy, memchr, and the best aspects of the APIs discussed above.

    • Like memchr, it scans the source sequence for the first occurrence of a character specified by one of its arguments. The character can have any value, including zero.
    • Like strlcpy, it copies (at most) the specified number of characters from the source sequence to the destination, without writing beyond it. This resolves the inefficiency complaint about strncpy and stpncpy.
    • Similarly to (though not exactly as) stpcpy and stpncpy, it returns a pointer just past the copy of the specified character if it exists. (Recall that stpcpy and stpncpy return a pointer to the copied nul.) This avoids the inefficiency inherent in strcpy and strncpy.

    Thus, the first example above (strcat (strcpy (d, s1), s2)) can be rewritten using memccpy to avoid any redundant passes over the strings as follows. Note that by using SIZE_MAX as the bound this rewrite doesn't avoid the risk of overflowing the destination present in the original example and should be avoided.

    	memccpy (memccpy (d, s1, '\0', SIZE_MAX) - 1, s2, '\0', SIZE_MAX);

    To avoid the risk of buffer overflow, the appropriate bound needs to be determined for each call and provided as an argument. So a concatenation constrained to the size of the destination as in the snprintf (d, dsize, "%s%s", s1, s2) call might compute the destination size as follows.

    	char *p = memccpy (d, s1, '\0', dsize);
    	dsize -= (p - d - 1);
    	memccpy (p - 1, s2, '\0', dsize);

    Choosing a solution

    The efficiency problems discussed above could be solved if, instead of returning the value of their first argument, the string functions returned a pointer either to or just past the last stored character. However, changing the existing functions after they have been in use for nearly half a century is not feasible.

    Although it is not feasible to solve the problem for the existing C standard string functions, it is possible to mitigate it in new code by adding one or more functions that do not suffer from the same limitations. Because the charter of the C standard is codifying existing practice, it is incumbent on the standardization committee to investigate whether such a function already exists in popular implementations and, if so, consider adopting it. As has been shown above, several such solutions exist.

    Of the solutions described above, the memccpy function is the most general, optimally efficient, backed by an ISO standard, the most widely available even beyond POSIX implementations, and the least controversial.

    In contrast, the stpcpy and stpncpy functions are less general and stpncpy suffers from unnecessary overhead, and so do not meet the outlined goals. The functions might still be worth considering for adoption in C2X to improve portabilty. See N2352 - Add stpcpy and stpncpy to C2X for a proposal.

    The OpenBSD strlcpy and strlcat functions, while optimal, are less general, far less widely supported, and not specified by an ISO standard.

    The memccpy function exists not just in a subset of UNIX implementations, it is specified by another ISO standard, namely ISO/IEC 9945, also known as IEEE Std 1003.1, 2017 Edition, or for short, POSIX: memccpy, where it is provided as an XSI extension to C. The function was derived from System V Interface Definition, Issue 1 (SVID 1), originally published in 1985.

    memccpy is available even beyond implementations of UNIX and POSIX, including for example:

    • memccpy in Android,
    • memccpy in Apple Mac OS X,
    • memccpy in the BlackBerry Native SDK,
    • memccpy in the Compaq Run-Time Library for VAX,
    • memccpy in the Microsoft Visual Studio C Runtime Library,
    • memccpy in IBM z/OS.

    A trivial (but inefficient) reference implementation of memccpy is provided below.

          void* memccpy (void* restrict dst, const void* restrict src, int c, size_t n)
          {
            void *pc = memchr (src, c, n);
            void *ret;
    
            if (pc)
            {
              n = (char*)pc - (char*)src + 1;
              ret = (char*)dst + n;
            }
            else
              ret = 0;
    
            memcpy (dst, src, n);
            return ret;
          }

    A more optimal implementation of the function might be as follows.

          void* memccpy (void* restrict dst, const void* restrict src, int c, size_t n)
          {
            const char *s = src;
            for (char *ret = dst; n; ++ret, ++s, --n)
            {
              *ret = *s;
              if ((unsigned char)*ret == (unsigned char)c)
                return ret + 1;
            }
            return 0;
          }

    By relying on memccpy optimizing compilers will be able to transform simple snprintf (d, dsize, "%s", s) calls into the optimally efficient calls to memccpy (d, s, '\0', dsize). Trading code size for speed, aggressive optimizers might even transform snprintf calls with format strings consisting of multiple %s directives interspersed with ordinary characters such as "%s/%s" into series of such memccpy calls as shown below:

          char *p = memccpy (d, s1, '\0', dsize);
          if (p)
          {
            --p;
            p = memccpy (p, "/", '\0', dsize - (p - d));
            if (p)
            {
              --p;
              p = memccpy (p, s2, '\0', dsize - (p - d));
            }
          }
          if (!p)
            d[dsize - 1] = '\0';

    Update after WG14 April 2019 meeting

    Proposals to include memccpy and the other standard functions discussed in this article (all but strlcpy and strlcat), as well as two others, in the next revision of the C programming language were submitted in April 2019 to the C standardization committee (see 3, 4, 5, and 6). The committee chose to adopt memccpy but rejected the remaining proposals.

    References

    • [1] The Sad State of C Strings by Howard Chu, 2016.
    • [2] strlcpy and strlcat – consistent, safe, string copy and concatenation by Todd C. Miller, OpenBSD project.
    • [3] N2349 – Toward more efficient string copying and concatenation.
    • [4] N2351 – Add strnlen to C2X.
    • [5] N2352 – Add stpcpy and stpncpy to C2X.
    • [6] N2353 – Add strdup and strdnup to C2X.
    Last updated: March 28, 2023

    Recent Posts

    • What qualifies for Red Hat Developer Subscription for Teams?

    • How to run OpenAI's gpt-oss models locally with RamaLama

    • Using DNS over TLS in OpenShift to secure communications

    • Scaling DeepSeek and Sparse MoE models in vLLM with llm-d

    • Multicluster authentication with Ansible Automation Platform

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue