Featured image for a Linux topic.

Since our first blog post on how to retrieve packet drop reasons in the Linux kernel, upstream development of the feature has continued and new additions have been made. Drop reasons can be retrieved manually, but they are also used by an increasing number of utilities such as the Network Observability operator for Red Hat OpenShift Container Platform, which can report packets being dropped with their reasons.

Let's see what happened recently in the drop reason space of the Linux kernel and how to avoid pitfalls, especially between kernel versions. It's worth noting tools designed on top of drop reasons, like the above operator, are already doing the right thing and do not need special care. But as we saw in the previous article, drop reasons can be retrieved manually when debugging networking issues which can be error prone when not understanding in depth how this works or when not using the right tools.

Non-core drop reasons

In addition to core drop reasons, discussed in the previous blog post and defined in enum skb_drop_reason, support for registering non-core drop reasons was added. This allows other parts of the Linux networking stack to register their own drop reasons to improve visibility into why packets are being dropped there.

At the time of writing, two non-core parts of the Linux networking stack register their own drop reasons: the IEEE 802.11 stack (mac80211) and Open vSwitch.

This works by allowing registering at runtime an additional set of drop reasons, which virtually extends the core definition. Since all drop reasons, core and non-core, have a unique value and can be used in the same core functions, current tools and facilities do not need any modification to report the new drop reasons raw values. However converting those to text is not supported everywhere. We'll see this below.

Drop reasons pitfalls

As we just saw, converting drop reasons to text, especially non-core ones, is not always built-in. But it's not the biggest pitfall. Drop reasons are defined in kernel enums and are not part of a stable ABI. This means, and that was actually the case a few times already, that their raw value can change between kernel releases—for example, when a new reason is added in between existing ones, or when reasons are rearranged. Because of this, different versions of the Linux kernel, including Red Hat Enterprise Linux (RHEL), might report different raw values for the same drop reason.

This is not an issue for tools converting the raw value to a text representation, but not all perform this raw to text translation. This means a raw drop reason value should be checked against the running kernel definition. Of course, there are better ways.

Recommendations

There are two ways of performing a raw value to text conversion for drop reasons while still being version dependent: using an in-kernel conversion or inspecting the running kernel internal definitions and using those.

We'll see below three different tools you can use to inspect drop reasons, that (mostly) fit the above requirement.

Perf

By adding a probe on the skb:kfree_skb tracepoint, we can use its in-kernel translation of drop reasons. However, at the time of writing, this implementation did not support converting non-core drop reasons to a text representation.

While this is not perfect, using perf on the above tracepoint is a good way of reporting drop reasons when inspecting drops happening in the core networking stack; also because this is a very simple way of getting this information as perf is widely available.

$ perf record -e skb:kfree_skb sleep 10
$ perf script
            curl 103998 [010] 40186.014474: skb:kfree_skb: [...] reason: NO_SOCKET
            curl 103998 [010] 40186.014555: skb:kfree_skb: [...] reason: NO_SOCKET
 irq/178-iwlwifi   1289 [000] 44222.379744: skb:kfree_skb: [...] reason: 0x10002

In the above example we can see two packets being dropped because no matching socket was found and one packet dropped with a raw drop reason, 0x10002. This drop reason is a non-core one and on the machine used it corresponds to a mac80211 drop reason, namely RX_DROP_U_REPLAY.

Dropwatch

dropwatch uses the kernel dropmon infrastructure which is, at the time of writing, the only in-kernel implementation for non-core drop reasons as text. Because of this, using dropwatch is one of the preferred ways of inspecting drops in the kernel with their associated reasons.

For an example of how to use dropwatch, see the previous blog post on drop reasons.

Retis

Last but not least, a new kernel packet inspection tool was developed recently, supporting collecting packets in various places of the Linux networking stack: Retis. When asked to report drop reasons, Retis performs a runtime conversion of drop reasons to a text representation by inspecting the running kernel internal definitions using a technology called BPF Type Format (BTF). This means it always has a right raw to text drop reasons translation, regardless of the kernel version running on the system.

Retis is highly configurable but provide sane built-in defaults such as its drop monitoring profile, dropmon:

$ retis -p dropmon collect
16:52:39 [INFO] Applying profile dropmon: Default
16:52:39 [INFO] 4 probe(s) loaded

40648351222101 [curl] 104769 [tp] skb:kfree_skb drop (NO_SOCKET)
    bpf_prog_0b1566e4b83190c5_sd_devices+0xce8d
    bpf_prog_0b1566e4b83190c5_sd_devices+0xce8d
    bpf_trace_run3+0x52
    kfree_skb_reason+0x8f
    tcp_v6_rcv+0x77
    ip6_protocol_deliver_rcu+0x6b
    ip6_input_finish+0x43
    __netif_receive_skb_one_core+0x62
    process_backlog+0x85
    __napi_poll+0x28
    net_rx_action+0x2a4
    __do_softirq+0xd1
    do_softirq.part.0+0x3d
    __local_bh_enable_ip+0x68
    __dev_queue_xmit+0x28e
    ip6_finish_output2+0x2ae
    ip6_finish_output+0x1e0
    ip6_xmit+0x2c0
    inet6_csk_xmit+0xe9
    __tcp_transmit_skb+0x56a
    tcp_connect+0xb37
    tcp_v6_connect+0x512
    __inet_stream_connect+0x10f
    inet_stream_connect+0x3a
    __sys_connect+0xa8
    __x64_sys_connect+0x18
    do_syscall_64+0x5d
    entry_SYSCALL_64_after_hwframe+0x6e
  if 1 (lo) rxif 1 ::1.52414 > ::1.80 ttl 64 label 0x98864 len 40 proto TCP (6) flags [S] seq 2567277025 win 33280

...

In the above example, we can see an IPv6 packet to [::1]:80 was dropped because no socket is listening for such flow. It also reported detailed information about the packet itself, as well as a stack trace.

Thanks to its automatic translation of drop reasons and because it offers flexibility and additional features (probing in many places of the stack in parallel, packets tracking, conntrack and Open vSwitch support, post-processing capabilities, etc.), Retis is a good choice for tracking dropped packets as well as inspecting the Linux networking stack in general. A packet can not only be seen while being dropped, but tracked in the whole networking stack.

Conclusion

Kernel support for drop reasons is increasing over time, now offering drop reasons from non-core parts of the Linux networking stack. All this is very good news as this improves visibility and gives more insight about why some packets are being dropped. While retrieving and making sense of the drop reasons can be tricky due to its implementation, it's easy to avoid pitfalls by understanding how drop reasons work and by using the right tools. Non-core drop reasons are available in recent RHEL 9.2 releases and in RHEL 9.3.

Last updated: January 29, 2024