Fast UDP I/O for Firefox in Rust Author: Max Inden Date: September 14, 2025 --- Motivation About 20% of Firefox’s HTTP traffic uses HTTP/3, which runs over QUIC on UDP, leading to significant UDP I/O. Firefox traditionally uses NSPR, a dated API (e.g., PRSendTo, PRRecvFrom) around POSIX's sendto and recvfrom. Modern OSes offer advanced UDP APIs like sendmmsg/recvmmsg and offloading techniques such as GSO (Generic Segmentation Offload) and GRO (Generic Receive Offload) that can improve performance. Question: Can Firefox improve performance by replacing old UDP I/O with modern system calls? Overview Project started mid-2024 to rewrite Firefox’s QUIC UDP I/O stack using modern system calls and Rust for memory safety. Built on top of quinn-udp (a Rust UDP library from the Quinn QUIC project) to speed development. Firefox supports multiple platforms: Windows, Android, MacOS, Linux (including older versions). By mid-2025, rollout to most Firefox users with promising benchmarks: up to 4 Gbit/s throughput on CPU-bound workloads (previously <1 Gbit/s). Most CPU time post-optimization is spent in I/O syscalls and crypto. The Basics of UDP I/O Single Datagram Traditional UDP I/O sends and receives one datagram per syscall (sendto/recvfrom). Each datagram crosses user-kernel space boundary separately, causing overhead especially at high data rates. Batch of Datagrams Modern OSes support system calls to send/receive multiple datagrams at once (sendmmsg, recvmmsg on Linux). Benefits: amortizes syscall overhead across many packets. Single Large Segmented Datagram With GSO/GRO, apps send one large UDP datagram larger than MTU. Kernel or NIC segments it into smaller packets; similarly, multiple received packets can be coalesced. Improves efficiency by offloading segmentation/coalescing. Note: Wireshark does not yet support GSO, complicating network-level debugging. Replacing NSPR in Firefox Initial step replaced NSPR with quinn-udp, still sending one datagram at a time. Next upgrade: batch processing of UDP datagrams, enabling multi-message syscalls and segmentation offloading where available. Added in-place encryption/decryption improvements. Focus here is on UDP I/O; details on QUIC pipeline improvements available elsewhere. Platform Details Windows Uses WSASendMsg/WSARecvMsg; supports USO (offload for send) and URO (offload for receive). Single-datagram replacements worked well. Enabling URO on Windows ARM64 caused incompatibility and site loading failures (e.g., fosstodon.org). Issue traced to URO not returning segment size with WSL enabled; differentiating QUIC packets became impossible. Result: URO stays disabled in Firefox on Windows. USO caused increased packet loss and network driver crashes; investigation ongoing. MacOS Switched from sendto/recvfrom to sendmsg/recvmsg; no major issues. MacOS lacks UDP segmentation offloading support. Offers undocumented batched syscalls sendmsgx and recvmsgx. Added support behind a feature flag, but decided not to ship it due to uncertainty about Apple’s future changes. Linux Provides mature UDP optimizations: sendmmsg/recvmmsg and segmentation offloading (GSO/GRO). quinn-udp prioritizes GSO over batch sends due to better performance. Firefox uses one UDP socket per connection for privacy, limiting multi-socket batch benefits but making segmentation offloading ideal. Minor changes needed for Firefox sandboxing; deployment is successful with all benefits. Android Android is Linux-like but with differences (e.g., uses socketcall syscall on x86). Android 5 support complicates calls; required a small quinn-udp fix. On older Android (API ≤ 25), setting ECN bits may error; workaround included to retry without ECN. Upstream Quinn improvements benefit Firefox immediately. Example: fix for Android