Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Multipath TCP on Red Hat Enterprise Linux 8.3: From 0 to 1 subflows

August 19, 2020
Davide Caratti
Related topics:
DevOpsOpen sourceLinux

Share:

    Multipath TCP (MPTCP) extends traditional TCP to allow reliable end-to-end delivery over multiple simultaneous TCP paths, and is coming as a tech preview on Red Hat Enterprise Linux 8.3. This is the first of two articles for users who want to practice with the new MPTCP functionality on a live system. In this first part, we show you how to enable the protocol in the kernel and let client and server applications use the MPTCP sockets. Then, we run diagnostics on the kernel in a sample test network, where endpoints are using a single subflow.

    Multipath TCP in Red Hat Enterprise Linux 8

    Multipath TCP is a relatively new extension for the Transmission Control Protocol (TCP), and its official Linux implementation is even more recent. Early users might want to know what to expect in RHEL 8.3. In this article, you will learn how to:

    • Enable the Multipath TCP protocol in the kernel.
    • Let an application open an IPPROTO_MPTCP socket.
    • Use tcpdump to inspect MPTCP options with live traffic.
    • Inspect the subflow status with ss.

    Enabling Multipath TCP in the kernel

    Multipath TCP registers as an upper-layer protocol (ULP) for TCP. Users can ensure that mptcp is available in the kernel by checking the available ULPs:

    # sysctl net.ipv4.tcp_available_ulp
    net.ipv4.tcp_available_ulp = espintcp mptcp
    

    Unlike upstream Linux, MPTCP is disabled in the default Red Hat Enterprise Linux (RHEL) 8.3 runtime. To enable the possibility of creating sockets, system administrators need to issue a proper sysctl command:

    # sysctl -w net.mptcp.enabled=1
    # sysctl net.mptcp.enabled
    net.mptcp.enabled = 1
    

    Preparing the system for its first MPTCP socket

    With MPTCP enabled in the RHEL 8.3 kernel, user-space programs have a new protocol available for the socket system call. There are two potential use cases for the new protocol.

    Native MPTCP applications

    Applications supporting MPTCP natively can open a SOCK_STREAM socket specifying IPPROTO_MPTCP as the protocol and AF_INET or AF_INET6 as the address family:

    fd = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP);
    

    After the application creates a socket, the kernel will operate one or more TCP subflows that will use the standard MPTCP option (IANA number = 30). Client and server semantics are the same as those used by a regular TCP socket (meaning that they will use bind(), listen(), connect(), and accept()).

    Legacy TCP applications converted to MPTCP

    Most user-space applications have no knowledge of IPPROTO_MPTCP, nor would it be realistic to patch and rebuild all of them to add native support for MPTCP. Because of this, the community opted for using an eBPF program that wraps the socket() system call and overrides the value of protocol.

    In RHEL 8.3, this program will run on CPU groups so that system administrators can specify which applications should run MPTCP while others continue with TCP. We will discuss the eBPF helper upstream in the next weeks, but we want to support early RHEL 8.3 users who want to try their own applications with MPTCP.

    You can use a systemtap script as a workaround to intercept calls to __sys_socket() in the kernel. You can then allow a kernel probe to replace IPPROTO_TCP with IPPROTO_MPTCP. You will need to add packages to install a probe in the kernel with stap. You'll also use the good-old ncat tool from the nmap-ncat package to run the client and the server:

    # dnf -y install \
    > kernel-headers \
    > kernel-devel \
    > kernel-debuginfo
    > kernel-debuginfo-common_x86_64 \
    > systemtap-client \
    > systemtap-client-devel \
    > nmap-ncat
    

    Use the following command to start the systemtap script:

    # stap -vg mpctp.stap

    Protocol smoke test: A single subflow using ncat

    The test network topology shown in Figure 1 consists of a client and a server that run in separate namespaces, connected through a virtual ethernet device (veth).

    veth-ns-client server
    network topology for basic MPTCP testing
    Figure 1: A network topology for basic MPTCP testing.">

    Adding additional IP addresses will simulate multiple L4 paths between endpoints. First, the server opens a passive socket, listening on a TCP port:

    # ncat -l 192.0.2.1 4321

    Then, the client connects to the server:

    # ncat 192.0.2.1 4321

    From a functional point of view, the interaction is the same as using ncat with regular TCP: When the user writes a line in the client's standard input, the server displays that line in the standard output. Similarly, typing a line in the server's standard input results in transmitting it back to the client's standard output. In this example, we use ncat to send a "hello world (1)\n" message to the server. It waits for a second, then sends back "hello world (2)\n," then it closes the connection.

    Note: Current Linux MPTCP does not support mixed IPv4/IPv6 addresses. Therefore, all addresses involved in client/server connectivity must belong to the same family.

    Capturing traffic and examining it with tcpdump

    The Red Hat Enterprise Linux 8 version of tcpdump doesn't yet support dissecting MPTCP v1 suboptions in TCP headers. We can overcome this problem by building a binary from the upstream repository. Alternatively, we can replace it with a more recent binary. With either of those changes, it's possible to inspect the MPTCP suboption.

    Three-way handshake: The MP_CAPABLE suboption

    During a three-way-handshake, the client and server exchange a 64-bit key using the MP_CAPABLE suboption, which is visible in the output of tcpdump in the braces ({}) after mptcp capable. These keys are then used later to compute the DSN/DACK and token. The MP_CAPABLE suboption that originates in the client is also present following a successful connection setup. It will be present until the server explicitly acknowledges it using a data sequence signal (DSS) suboption:

    # tcpdump -#tnnr capture.pcap
    1  IP 192.0.2.2.44176 > 192.0.2.1.4321: Flags [S], seq 1721499445, win 29200, options [mss 1460,sackOK,TS val 33385784 ecr 0,nop,wscale 7,mptcp capable v1], length 0
    2  IP 192.0.2.1.4321 > 192.0.2.2.44176: Flags [S.], seq 3341831007, ack 1721499446, win 28960, options [mss 1460,sackOK,TS val 4061152149 ecr 33385784,nop,wscale 7,mptcp capable v1 {0xbb206e3023b47a2d}], length 0
    3  IP 192.0.2.2.44176 > 192.0.2.1.4321: Flags [.], ack 1, win 229, options [nop,nop,TS val 33385785 ecr 4061152149,mptcp capable v1 {0x41923206b75835f5,0xbb206e3023b47a2d}], length 0
    4  IP 192.0.2.2.44176 > 192.0.2.1.4321: Flags [P.], seq 1:17, ack 1, win 229, options [nop,nop,TS val 33385785 ecr 4061152149,mptcp capable v1 {0x41923206b75835f5,0xbb206e3023b47a2d},nop,nop], length 16
    

    MPTCP-level sequence numbers: The DSS suboption

    After that, TCP segments will carry the DSS suboption that contains MPTCP sequence numbers. More specifically, we can observe the data sequence number (DSN) and data acknowledgment (DACK) values, as shown here:

    5  IP 192.0.2.1.4321 > 192.0.2.2.44176: Flags [.], ack 17, win 227, options [nop,nop,TS val 4061152149 ecr 33385785,mptcp dss ack 1711754507747579648], length 0
    6  IP 192.0.2.2.44176 > 192.0.2.1.4321: Flags [P.], seq 17:33, ack 1, win 229, options [nop,nop,TS val 33386778 ecr 4061152149,mptcp dss ack 1331650533424046587 seq 1711754507747579648 subseq 17 len 16,nop,nop], length 16
    7  IP 192.0.2.1.4321 > 192.0.2.2.44176: Flags [.], ack 33, win 227, options [nop,nop,TS val 4061153142 ecr 33386778,mptcp dss ack 1711754507747579664], length 0
    

    Using a single subflow, DSN and DACK increase by the same amount as the TCP sequence and acknowledgment numbers. When the connection ends, the subflows are closed with a FIN packet, just like regular TCP flows would be. Because it also closes the MPTCP socket, the data fin bit is set in the DSS suboption, as shown here:

    8  IP 192.0.2.2.44176 > 192.0.2.1.4321: Flags [F.], seq 33, ack 1, win 229, options [nop,nop,TS val 33387798 ecr 4061153142,mptcp dss fin ack 1331650533424046587 seq 1711754507747579664 subseq 0 len 1,nop,nop], length 0
    9  IP 192.0.2.1.4321 > 192.0.2.2.44176: Flags [.], ack 34, win 227, options [nop,nop,TS val 4061154203 ecr 33387798,mptcp dss ack 1711754507747579664], length 0
    10  IP 192.0.2.1.4321 > 192.0.2.2.44176: Flags [F.], seq 1, ack 34, win 227, options [nop,nop,TS val 4061162156 ecr 33387798,mptcp dss fin ack 1711754507747579664 seq 1331650533424046587 subseq 0 len 1,nop,nop], length 0
    11  IP 192.0.2.2.44176 > 192.0.2.1.4321: Flags [.], ack 2, win 229, options [nop,nop,TS val 33395793 ecr 4061162156,mptcp dss ack 1331650533424046587], length 0

    Inspecting subflow data with ss

    Because MPTCP uses TCP as a transport protocol, network administrators can query the kernel to retrieve information on TCP connections that are being used by the main MPTCP socket. In this example, we're running ss on the client filtering on the server listening port, where information relevant to MPTCP can be read after tcp-ulp-mptcp:

    # ss -nti '( dport :4321 )' dst 192.0.2.1
    State Recv-Q Send-Q Local Address:Port  Peer Address:PortProcess
    ESTAB 0      0          192.0.2.2:44176    192.0.2.1:4321
    cubic wscale:7,7 [...] bytes_sent:32 bytes_acked:33 [...] tcp-ulp-mptcp flags:Mmec token:0000(id:0)/768f615c(id:0) seq:127af91ad1b321fb sfseq:1 ssnoff:c7304b5f maplen:0
    

    SS command output explained

    The line below tcp-ulp-mptcp is the output of ss in the client namespace immediately following the transmission of packet 6 in the previous section:

    • Each value of token is the truncated Hashed Message Authentication Code algorithm (HMAC) of the remote peer's key, which the client receives during the three-way handshake. Further MP_JOIN SYN packets will use that value to prove that they have not been spoofed. The id is the subflow identifier as specified in the RFC. For non-MP_JOIN sockets, only the local token and ID are available.
    • flags is a bitmask containing information on the subflow state. For instance, M/m records the presence of the MP_CAPABLE suboption in the three-way handshake. The c means that the client received the server's key (that is, it acknowledged the SYN/ACK), while e means that the exchange of both MPTCP keys is complete.
    • seq denotes the next MPTCP sequence number that the endpoint expects on reception, or, equivalently, the DACK value for the next transmitted packet.
    • sfseq is the subflow sequence number, meaning that it is the current TCP ACK value for this subflow.
    • ssnoff is the current difference between the TCP sequence number and the MPTCP sequence number for this subflow. If you are using a single subflow, this value will not change during the connection. If you are using more than one subflow to simultaneously carry data segments, then this value can increase or decrease depending on the path capacity.
    • maplen indicates how many bytes are left to fill the current DSS map.

    Note that we can compute the value of seq by starting from the server key in the SYN/ACK (which is packet 2 of the capture) and computing the server's Initial Data Sequence Number (IDSN), then truncating sha256(ntohll(bb206e3023b47a2d)) to the least-significant 64-bit, as specified by RFC 8684.

    Also note that, because the client is not receiving any data from the server, seq remains equal to the IDSN  throughout the connection's lifetime. For the same reason, the value of sfseq is constantly equal to 1 in the example. We can see the IDSN in the DSN number of packet 10 and in the DACK number of packets 6 and 8 (in decimal format: 1331650533424046587), as well as in the output of ss (in hex format: 127af91ad1b321fb). Similarly, in this example the SSN offset (c7304b5f in the ss output)  is constantly equal to the initial TCP sequence number (3341831007 in the SYN/ACK, packet 2 of the capture output).

    Conclusion and what's next

    In realistic scenarios, MPTCP will generally use more than one subflow. In this way, sockets can preserve connectivity even after an event causes a failure in one of the L4 paths. In the next article, we will show you how to use iproute2 to configure multiple TCP paths on RHEL 8.3, and how to watch ncat doing multipath for real.

    Last updated: August 18, 2020

    Recent Posts

    • Meet the Red Hat Node.js team at PowerUP 2025

    • How to use pipelines for AI/ML automation at the edge

    • What's new in network observability 1.8

    • LLM Compressor: Optimize LLMs for low-latency deployments

    • How to set up NVIDIA NIM on Red Hat OpenShift AI

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue