Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Why you should use io_uring for network I/O

April 12, 2023
Donald Hunter
Related topics:
Linux
Related products:
Red Hat Enterprise Linux

io_uring is an async interface to the Linux kernel that can potentially benefit networking. It has been a big win for file I/O (input/output), but might offer only modest gains for network I/O, which already has non-blocking APIs. The gains are likely to come from the following:

  • A reduced number of syscalls on servers that do a lot of context switching
  • A unified asynchronous API for both file and network I/O

Many io_uring features will soon be available in Red Hat Enterprise Linux 9.3, which is distributed with kernel version 5.14. The latest io_uring features are currently available in Fedora 37.

What is io_uring?

io_uring is an asynchronous I/O interface for the Linux kernel. An io_uring is a pair of ring buffers in shared memory that are used as queues between user space and the kernel:

  • Submission queue (SQ): A user space process uses the submission queue to send asynchronous I/O requests to the kernel.
  • Completion queue (CQ): The kernel uses the completion queue to send the results of asynchronous I/O operations back to user space.

The diagram in Figure 1 shows how io_uring provides an asynchronous interface between user space and the Linux kernel.

Two ring buffers called the submission queue and the completion queue. An application is adding an item to the tail of the submission queue and the kernel is consuming an item from the head of the submission queue. The completion queue shows the reverse for responses from kernel to application.
Created by Donald Hunter,
Figure 1: A visual representation of the io_uring submission and completion queues.

This interface enables applications to move away from the traditional readiness-based model of I/O to a new completion-based model where async file and network I/O share a unified API.

The syscall API

The Linux kernel API for io_uring has 3 syscalls:

  • io_uring_setup: Set up a context for performing asynchronous I/O
  • io_uring_register: Register files or user buffers for asynchronous I/O
  • io_uring_enter: Initiate and/or complete asynchronous I/O

The first two syscalls are used to set up an io_uring instance and optionally to pre-register buffers that would be referenced by io_uring operations. Only io_uring_enter needs to be called for queue submission and consumption. The cost of an io_uring_enter call can be amortized over several I/O operations. For very busy servers, you can avoid io_uring_enter calls entirely by enabling busy-polling of the submission queue in the kernel. This comes at the cost of a kernel thread consuming CPU.

The liburing API

The liburing library provides a convenient way to use io_uring, hiding some of the complexity and providing functions to prepare all types of I/O operations for submission.

A user process creates an io_uring:

struct io_uring ring;
io_uring_queue_init(QUEUE_DEPTH, &ring, 0);

then submits operations to the io_uring submission queue:

struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_readv(sqe, client_socket, iov, 1, 0);
io_uring_sqe_set_data(sqe, user_data);
io_uring_submit(&ring);

The process waits for completion:

struct io_uring_cqe *cqe;
int ret = io_uring_wait_cqe(&ring, &cqe);

and uses the response:

user_data = io_uring_cqe_get_data(cqe);
if (cqe->res < 0) {
    // handle error
} else {
    // handle response
}
io_uring_cqe_seen(&ring, cqe);

The liburing API is the preferred way to use io_uring from applications. liburing has feature parity with the latest kernel io_uring development work and is backward-compatible with older kernels that lack the latest io_uring features.

Using io_uring for network I/O

We will try out io_uring for network I/O by writing a simple echo server using the liburing API. Then we will see how to minimize the number of syscalls required for a high-rate concurrent workload.

A simple echo server

The classic echo server that appeared in Berkeley Software Distribution (BSD) Unix looks something like this:

client_fd = accept(listen_fd, &client_addr, &client_addrlen);
for (;;) {
    numRead = read(client_fd, buf, BUF_SIZE);
    if (numRead <= 0)   // exit loop on EOF or error
        break;
    if (write(client_fd, buf, numRead) != numRead)
        // handle write error
    }
}
close(client_fd);

The server could be multithreaded or use non-blocking I/O to support concurrent requests. Whatever form it takes, the server requires at least 5 syscalls per client session, for accept, read, write, read to detect EOF and then close.

A naive translation of this to io_uring results in an asynchronous server that submits one operation at a time and waits for completion before submitting the next. The pseudocode for a simple io_uring-based server, omitting the boilerplate and error handling, looks like this:

add_accept_request(listen_socket, &client_addr, &client_addr_len);
io_uring_submit(&ring);

while (1) {
    int ret = io_uring_wait_cqe(&ring, &cqe);

    struct request *req = (struct request *) cqe->user_data;
    switch (req->type) {
    case ACCEPT:
        add_accept_request(listen_socket,
                          &client_addr, &client_addr_len);
        add_read_request(cqe->res);
        io_uring_submit(&ring);
        break;
    case READ:
        if (cqe->res <= 0) {
            add_close_request(req);
        } else {
            add_write_request(req);
        }
        io_uring_submit(&ring);
        break;
    case WRITE:
        add_read_request(req->socket);
        io_uring_submit(&ring);
        break;
    case CLOSE:
        free_request(req);
        break;
    default:
        fprintf(stderr, "Unexpected req type %d\n", req->type);
        break;
    }

    io_uring_cqe_seen(&ring, cqe);
}

In this io_uring example, the server still requires at least 4 syscalls to process each new client. The only saving achieved here is by submitting a read and a new accept request together. This can be seen in the following strace output for the echo server receiving 1,000 client requests.

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.99    0.445109         111      4001           io_uring_enter
  0.01    0.000063          63         1           brk
------ ----------- ----------- --------- --------- ----------------
100.00    0.445172         111      4002           total

Combining submissions

In an echo server, there are limited opportunities for chaining I/O operations since we need to complete a read before we know how many bytes we can write. We could chain accept and read by using a new fixed file feature of io_uring, but we’re already able to submit a read request and a new accept request together, so there’s maybe not much to be gained there.

We can submit independent operations at the same time so we can combine the submission of a write and the following read. This reduces the syscall count to 3 per client request:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.93    0.438697         146      3001           io_uring_enter
  0.07    0.000325         325         1           brk
------ ----------- ----------- --------- --------- ----------------
100.00    0.439022         146      3002           total

Draining the completion queue

It is possible to combine a lot more work into the same submission if we handle all queued completions before calling io_uring_submit. We can do this by using a combination of io_uring_wait_cqe to wait for work, followed by calls to io_uring_peek_cqe to check whether the completion queue has more entries that can be processed. This avoids spinning in a busy loop when the completion queue is empty while also draining the completion queue as fast as possible.

The pseudocode for the main loop now looks like this:

while (1) {
    int submissions = 0;
    int ret = io_uring_wait_cqe(&ring, &cqe);
    while (1) {
        struct request *req = (struct request *) cqe->user_data;
        switch (req->type) {
        case ACCEPT:
            add_accept_request(listen_socket,
                              &client_addr, &client_addr_len);
            add_read_request(cqe->res);
            submissions += 2;
            break;
        case READ:
            if (cqe->res <= 0) {
                add_close_request(req);
                submissions += 1;
            } else {
                add_write_request(req);
                add_read_request(req->socket);
                submissions += 2;
            }
            break;
        case WRITE:
          break;
        case CLOSE:
            free_request(req);
            break;
        default:
            fprintf(stderr, "Unexpected req type %d\n", req->type);
            break;
        }

        io_uring_cqe_seen(&ring, cqe);

        if (io_uring_sq_space_left(&ring) < MAX_SQE_PER_LOOP) {
            break;     // the submission queue is full
        }

        ret = io_uring_peek_cqe(&ring, &cqe);
        if (ret == -EAGAIN) {
            break;     // no remaining work in completion queue
        }
    }
    if (submissions > 0) {
        io_uring_submit(&ring);
    }
}

The result of batching submissions for all available work gives a significant improvement over the previous result, as shown in the following strace output, again for 1,000 client requests:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.91    0.324226        4104        79           io_uring_enter
  0.09    0.000286         286         1           brk
------ ----------- ----------- --------- --------- ----------------
100.00    0.324512        4056        80           total

The improvement here is substantial, with more than 12 client requests being handled per syscall, or an average of more than 60 I/O ops per syscall. This ratio improves as the server gets busier, which can be demonstrated by enabling logging in the server:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 68.86    0.225228          42      5308       286 write
 31.13    0.101831        4427        23           io_uring_enter
  0.00    0.000009           9         1           brk
------ ----------- ----------- --------- --------- ----------------
100.00    0.327068          61      5332       286 total

This shows that when the server has more work to do, more io_uring operations have time to complete so more new work can be submitted in a single syscall. The echo server is responding to 1,000 client echo requests, or completing 5,000 socket I/O operations with just 23 syscalls.

It is worth noting that as the amount of work submitted increases, the time spent in the io_uring_enter syscall increases, too. There will come a point where it might be necessary to limit the size of submission batches or to enable submission queue polling in the kernel.

Benefits of network I/O

The main benefit of io_uring for network I/O is a modern asynchronous API that is straightforward to use and provides unified semantics for file and network I/O.

A potential performance benefit of io_uring for network I/O is reducing the number of syscalls. This could provide the biggest benefit for high volumes of small operations where the syscall overhead and number of context switches can be significantly reduced.

It is also possible to avoid cumulatively expensive operations on busy servers by pre-registering resources with the kernel before sending io_uring requests. File slots and buffers can be registered to avoid the lookup and refcount costs for each I/O operation.

Registered file slots, called fixed files, also make it possible to chain an accept with a read or write, without any round-trip to user space. A submission queue entry (SQE) would specify a fixed file slot to store the return value of accept, which a linked SQE would then reference in an I/O operation.

Limitations

In theory, operations can be chained together using the IOSQE_IO_LINK flag. However, for reads and writes, there is no mechanism to coerce the return value from a read operation into the parameter set for the following write operation. This limits the scope of linked operations to semantic sequencing such as "write then read" or “write then close” and for accept followed by read or write.

Another consideration is that io_uring is a relatively new Linux kernel feature that is still under active development. There is room for performance improvement, and some io_uring features might still benefit from optimization work. 

io_uring is currently a Linux-specific API, so integrating it into cross-platform libraries like libuv could present some challenges.

Latest features

The most recent features to arrive in io_uring are multi-shot accept, which is available from 5.19 and multi-shot receive, which arrived in 6.0. Multi-shot accept allows an application to issue a single accept SQE, which will repeatedly post a CQE whenever the kernel receives a new connection request. Multi-shot receive will likewise post a CQE whenever newly received data is available. These features are available in Fedora 37 but are not yet available in RHEL 9.

Conclusion

The io_uring API is a fully functional asynchronous I/O interface that provides unified semantics for both file and network I/O. It has the potential to provide modest performance benefits to network I/O on its own and greater benefit for mixed file and network I/O application workloads.

Popular asynchronous I/O libraries such as libuv are multi-platform, which makes it more challenging to adopt Linux-specific APIs. When adding io_uring to a library, both file I/O and network I/O should be added to gain the most from io_uring's async completion model.

Network I/O-related feature development and optimization work in io_uring will be driven primarily by further adoption in networked applications. Now is the time to integrate io_uring into your applications and I/O libraries.

More information

Explore the following resources to learn more: 

  • Faster IO through io_uring
  • Detailed description (PDF)
  • Fixed files
  • What’s new (PDF)
  • io_uring and networking in 2023

Find other tutorials on Red Hat Developer's Linux topic page.

Last updated: August 14, 2023

Related Posts

  • Achieving high-performance, low-latency networking with XDP: Part I

  • Network debugging with eBPF (RHEL 8)

  • Practical micro-benchmarking with 'ltrace' and 'sched'

  • Orchestrate offloaded network functions on DPUs with Red Hat OpenShift

  • Improving .NET Core Kestrel performance using a Linux-specific transport

Recent Posts

  • Red Hat Enterprise Linux 10.2 and 9.8: Top features for developers

  • What GPU kernels mean for your distributed inference

  • Debugging image mode with Red Hat OpenShift 4.20: A practical guide

  • EvalHub: Because "looks good to me" isn't a benchmark

  • SQL Server HA on RHEL: Meet Pacemaker HA Agent v2 (tech preview)

What’s up next?

systemd Commands cheat sheet card image

Users and administrators query and control systemd behavior through the systemctl command. The systemd Commands Cheat Sheet presents the most common uses of systemctl, along with journalctl for displaying information about systemd activities from its logs.

Get the systemd cheat sheet
Red Hat Developers logo LinkedIn YouTube Twitter Facebook

Platforms

  • Red Hat AI
  • Red Hat Enterprise Linux
  • Red Hat OpenShift
  • Red Hat Ansible Automation Platform
  • See all products

Build

  • Developer Sandbox
  • Developer tools
  • Interactive tutorials
  • API catalog

Quicklinks

  • Learning resources
  • E-books
  • Cheat sheets
  • Blog
  • Events
  • Newsletter

Communicate

  • About us
  • Contact sales
  • Find a partner
  • Report a website issue
  • Site status dashboard
  • Report a security problem

RED HAT DEVELOPER

Build here. Go anywhere.

We serve the builders. The problem solvers who create careers with code.

Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

Sign me up

Red Hat legal and privacy links

  • About Red Hat
  • Jobs
  • Events
  • Locations
  • Contact Red Hat
  • Red Hat Blog
  • Inclusion at Red Hat
  • Cool Stuff Store
  • Red Hat Summit
© 2026 Red Hat

Red Hat legal and privacy links

  • Privacy statement
  • Terms of use
  • All policies and guidelines
  • Digital accessibility

Chat Support

Please log in with your Red Hat account to access chat support.