As some may have noticed already, a new request for comments (RFC) regarding the Stream Control Transmission Protocol (SCTP), RFC8260, has been published recently. This RFC defines two major changes for the SCTP protocol, originally defined in RFC4960:
1) Stream schedulers, which control which stream gets served next when sending a data chunk over the wire.
2) I-Data chunk, which extends DATA to overcome some of its limitations.
This blog post will go over the two changes, pointing out the benefits of using the stream schedulers, and especially when using them together with the new I-Data chunks.
But first let's recap SCTP, which is a transport protocol that was initially designed for telephony signaling and that inherits some features and behaviors from transmission control protocol (TCP), and also some from user datagram protocol (UDP). It is a connection-oriented protocol, like TCP, but which is also message-oriented, like UDP. It supports full reliability, as TCP, but also supports partial reliability, more like UDP, amongst other features.
For the context of this blog, it is important to know that a single SCTP "association" (or "connection", in TCP terms) may contain multiple "sub-flows", which are called "streams". The streams are independent of each other and their job is to multiplex the association. A "User Message" is any message sent by the application. A message must be encapsulated on a "data chunk", which then can be transmitted over the stream. A User Message is comprised by one or more data chunks, in case it is bigger than Path Maximum Transmission Unit (PMTU) and fragmentation is needed. A data chunk never contains more than (part of) a single User Message.
Acronyms that will be used:
- FSN: Fragment Sequence Number
- MID: Message Identifier
- SSN: Stream Sequence Number
- TSN: Transmit Sequence Number
Stream Schedulers
A single SCTP association may have multiple and independent streams flowing. For example, one application could have 5 inbound and 3 outbound streams, and they are all independent. Say the application has sent a data chunk in stream #1 with SSN 1 and TSN 1, and a chunk on stream #2, with SSN 1 and TSN 2. In case the chunk with TSN 1 gets lost and the one with TSN 2 goes through, the latter can be delivered to the application, as the sequence for that stream is intact.
This looks good but there is a big limitation: what if, for any reason, the application needs to prioritize the traffic on stream #1 over stream #2? One example would be video streaming, where the application could send the key frames over stream #1 and the interleaved ones in stream #2. In the case of a sudden bandwidth reduction, the protocol could start dropping chunks from stream #2 first and try to maintain the flow on stream #1, even for those chunks that were already queued up in the socket.
Prior to RFC8260, the SCTP stack used to serve the streams in First-Come, First-Served (FCFS) fashion. The transmission order as defined by the application would remain unaffected and the streams are mostly only used for multiplexing purposes. The figure below shows an example of FCFS. Consider that the applications first send the message(s) on stream #0, then on stream #2, and only then on stream #1.
+---+---+---+ | 0/0 |-+ +---+---+---+ | | +---+---+---+---+---+---+---+---+---+ +---+---+---+ +->|1/2|1/1|1/0|2/0|2/0|2/0|0/0|0/0|0/0| |1/2|1/1|1/0|--->|---|---|---|---|---|---|---|---|---| +---+---+---+ +->| 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | +---+---+---+---+---+---+---+---+---+ +---+---+---+ | | 2/0 |-+ +---+---+---+ +-------+ +-------+ |SID/SSN| |SID/SSN| |-------| +-------+ | TSN | +-------+
In this example there are three streams, represented by the boxes on the left side, with SID 0, 1 and 2. Note how the messages on stream #1 had to wait for the messages on stream #0 and #2 to be transmitted. But what if they were more important for the application?
This is where the stream schedulers come into play: the application now has the ability to ask the SCTP stack to select a different scheduling algorithm than FCFS.
The set of schedulers defined by RFC8260 represents the most commonly used networking packet schedulers, and includes:
- First-Come, First-Served Scheduler (SCTP_SS_FCFS)
- Round-Robin Scheduler (SCTP_SS_RR)
- Round-Robin Scheduler per Packet (SCTP_SS_RR_PKT)
- Priority-Based Scheduler (SCTP_SS_PRIO)
- Fair Capacity Scheduler (SCTP_SS_FC)
- Weighted Fair Queueing Scheduler (SCTP_SS_WFQ)
The Linux stack currently supports only FCFS, RR and PRIO. RR_PKT, FC and WFQ are not yet implemented, although the implementation should be fairly simple.
Please refer to the RFC to learn how to use these schedulers. All setsockopt() options needed are described in there and Linux is following them.
The figure below shows an example of Round-Robin Scheduler without User Message Interleaving (which is also defined by RFC8260 and is described in the next section):
+---+---+---+ | 0/0 |-+ +---+---+---+ | | +---+---+---+---+---+---+---+---+---+ +---+---+---+ +->|1/2|1/1|2/0|2/0|2/0|1/0|0/0|0/0|0/0| |1/2|1/1|1/0|--->|---|---|---|---|---|---|---|---|---| +---+---+---+ +->| 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | +---+---+---+---+---+---+---+---+---+ +---+---+---+ | | 2/0 |-+ +---+---+---+ +-------+ +-------+ |SID/SSN| |SID/SSN| |-------| +-------+ | TSN | +-------+
In this example there are three streams, represented by the boxes on the left side, with SID 0, 1 and 2. Stream #0 and #2 have one large message queued up (with SSN 0), and three shorter messages are scheduled to be sent on stream #1 (with SSN 0, 1 and 2). The output pipe and the scheduling outcome is represented on the right. Without the support for User Message Interleaving, note how it has to transmit the entire message from stream #0 before sending the small chunks from stream #1. This is where the new data chunk format is beneficial.
As this change only affects the sender and is backward compatible, a peer is able to take advantage of stream scheduling even if the other peer is not aware of it.
User Message Interleaving (I-Data Chunk)
In the example above, if stream #1 had a higher priority than stream #0, but its messages were queued up right after stream #0's (by another thread, for instance), a head-of-line blocking situation would occur. This is because the chunks from stream #1 must wait for the chunks from stream #0 to be sent (but not necessarily acked). Only then can the chunks from stream #1 can be transmitted.
That happens because the TSN field actually serves three different purposes in the DATA chunk. Quoting from RFC8260, the TSN field can act:
- As an identifier for DATA chunks, to provide a reliable transfer.
- As an identifier for the sequence of fragments, to allow reassembly.
- As a sequence number, allowing up to 2**16 - 1 Stream Sequence Numbers (SSNs) outstanding.
Again, note that the protocol requires all fragments of a user message to have consecutive TSNs.
The new I-Data chunk fixes this head-of-line blocking issue by NOT overloading the TSN field. Instead, two new fields are added (MID and FSN) and the SSN field is removed. MID (Message Identifier) is now used to identify all chunks for a given User Message and is also used to ensure ordered delivery within the stream. The Fragment Sequence Number (FSN) field is only used when fragmentation of the User Message is necessary, and is a sequence number relative to only this User Message. Therefore, all fragments use the same MID number. The TSN field is now only used to ensure reliability.
With these changes, it is now possible to preempt the transmission of stream #0 in order to send other intervening messages. If we now introduce the new I-Data chunks on top of the previous example, messages would be scheduled as represented here:
+---+---+---+ | 0/0 |-+ +---+---+---+ | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ +---+---+---+ +->|2/0/2|1/2/0|0/0/2|2/0/1|1/1/0|0/0/1|2/0/0|1/0/0|0/0/0| |1/2|1/1|1/0|--->|-----|-----|-----|-----|-----|-----|-----|-----|-----| +---+---+---+ +->| 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | +-----+-----+-----+-----+-----+-----+-----+-----+-----+ +---+---+---+ | | 2/0 |-+ +---+---+---+ +-----------+ +-------+ |SID/MID/FSN| |SID/MID| |-----------| +-------+ | TSN | +-----------+
Note how the sending of the message #0 on stream #0 (TSN #0, #3 and #6) got interrupted in order to send messages from the other streams.
But unlike the stream schedulers, the usage of this new chunk must be negotiated during handshake. Once negotiated, it cannot be changed for that association. This means that one association either uses the old Data chunk, or the new I-Data, but never both. The negotiation follows the usual scheme. The requester advertises its support for this new scheme, and thus the intent to use it. If the requested peer also indicates support for this operation mode, peers must only use I-Data chunks. Otherwise, both peers must fall back to the old Data chunk.
References
- RFC8260
- Stream Schedulers and User Message Interleaving for the Stream Control Transmission Protocol, https://tools.ietf.org/html/rfc8260
- RFC4960
- Stream Control Transmission Protocol, https://tools.ietf.org/html/rfc4960
Take advantage of your Red Hat Developers membership and download RHEL today at no cost.