fix: handle Darwin accept() bug with socklen=0 in uSockets (#21791)

## Summary

Fixes a macOS kernel (XNU) bug where `accept()` can return a valid
socket descriptor but with `addrlen=0`, indicating an already-dead
socket.

This occurs when an IPv4 connection to an IPv6 dual-stack listener is
immediately aborted (RST packet). The fix detects this condition on
Darwin and handles it intelligently - preserving buffered data when
present, discarding truly dead sockets when not.

## Background

This implements the equivalent of the bugfix from capnproto:
https://github.com/capnproto/capnproto/pull/2365

The issue manifests as:
1. IPv4 connection made to IPv6 dual-stack listener
2. Connection immediately aborted (sends RST packet)  
3. `accept()` returns valid socket descriptor but `addrlen=0`
4. Socket may have buffered data from `connectx()` or be truly dead

## Enhanced Data-Preserving Solution

Unlike simple "close immediately" approaches, this fix **prevents data
loss** from the `connectx()` edge case:

**Race Condition Scenario:**
1. Client uses `connectx()` to send data immediately during connection
2. Network abort (RST) occurs after data is buffered but before full
connection establishment
3. Darwin kernel returns `socklen=0` but socket has buffered data
4. **Our fix preserves this data instead of losing it**

**Logic:**
```c
if (addr->len == 0) {
    /* Check if there's any pending data before discarding the socket */
    char peek_buf[1];
    ssize_t has_data = recv(accepted_fd, peek_buf, 1, MSG_PEEK | MSG_DONTWAIT);
    
    if (has_data <= 0) {
        /* No data available, socket is truly dead - discard it */
        bsd_close_socket(accepted_fd);
        continue; /* Try to accept the next connection */
    }
    /* If has_data > 0, let the socket through - there's buffered data to read */
}
```

## XNU Kernel Source Analysis

After investigating the Darwin XNU kernel source code, I found this bug
affects **multiple system calls**, not just `accept()`. The bug is
rooted in the kernel's socket layer when protocol-specific functions
return NULL socket addresses.

### Affected System Calls

#### 1. accept() and accept_nocancel()  FIXED
**Location:**
[`/bsd/kern/uipc_syscalls.c:596-605`](https://github.com/apple/darwin-xnu/blob/main/bsd/kern/uipc_syscalls.c#L596-L605)

```c
(void) soacceptlock(so, &sa, 0);
socket_unlock(head, 1);
if (sa == NULL) {
    namelen = 0;  // ← BUG: Returns socklen=0
    if (uap->name) {
        goto gotnoname;
    }
    error = 0;
    goto releasefd;
}
```

#### 2. getsockname() ⚠️ ALSO AFFECTED
**Location:**
[`/bsd/kern/uipc_syscalls.c:2601-2603`](https://github.com/apple/darwin-xnu/blob/main/bsd/kern/uipc_syscalls.c#L2601-L2603)

```c
if (sa == 0) {
    len = 0;  // ← SAME BUG: Returns socklen=0
    goto gotnothing;
}
```

#### 3. getpeername() ⚠️ ALSO AFFECTED  
**Location:**
[`/bsd/kern/uipc_syscalls.c:2689-2691`](https://github.com/apple/darwin-xnu/blob/main/bsd/kern/uipc_syscalls.c#L2689-L2691)

```c
if (sa == 0) {
    len = 0;  // ← SAME BUG: Returns socklen=0
    goto gotnothing;
}
```

### System Calls NOT Affected

#### connect() and connectx()  SAFE
**Locations:** 
-
[`/bsd/kern/uipc_syscalls.c:686-744`](https://github.com/apple/darwin-xnu/blob/main/bsd/kern/uipc_syscalls.c#L686-L744)
(connect)
-
[`/bsd/kern/uipc_syscalls.c:747+`](https://github.com/apple/darwin-xnu/blob/main/bsd/kern/uipc_syscalls.c#L747)
(connectx)

**Why they're safe:** These functions read socket addresses from
userspace via `getsockaddr()` and pass them to the protocol layer. They
don't receive socket addresses from the network stack, so they can't
encounter the `socklen=0` condition.

### Root Cause

The bug occurs when protocol layer functions (`pru_accept`,
`pru_sockaddr`, `pru_peeraddr`) return NULL socket addresses during
IPv4→IPv6 dual-stack connection race conditions. The kernel returns
`socklen=0` instead of treating it as an error case.

**Key XNU source reference:**
[`/bsd/kern/uipc_socket.c:1544`](https://github.com/apple/darwin-xnu/blob/main/bsd/kern/uipc_socket.c#L1544)
```c
error = (*so->so_proto->pr_usrreqs->pru_accept)(so, nam);
```

**Socket state vs buffered data:** From
[`/bsd/kern/uipc_socket2.c:2227`](https://github.com/apple/darwin-xnu/blob/main/bsd/kern/uipc_socket2.c#L2227):
```c
// Even with SS_CANTRCVMORE set, data can be buffered in so->so_rcv.sb_cc
return so->so_rcv.sb_cc >= so->so_rcv.sb_lowat ||
       ((so->so_state & SS_CANTRCVMORE) && cfil_sock_data_pending(&so->so_rcv) == 0)
```

## Changes

- Added Darwin-specific check in `bsd_accept_socket()` in
`packages/bun-usockets/src/bsd.c:708-720`
- When `addr->len == 0` after successful `accept()`:
  1. Check for buffered data with `recv(MSG_PEEK | MSG_DONTWAIT)`
  2. If data exists, let socket through normally (prevents data loss)
  3. If no data, close socket and continue accepting
- Only applies to `__APPLE__` builds to avoid affecting other platforms

## Test plan

- [x] Debug build compiles successfully
- [x] Basic HTTP server operations work correctly (exercises accept
path)
- [x] Regression test covers IPv4→IPv6 dual-stack connection abort
scenarios
- [x] Test verifies server doesn't crash/hang when encountering
socklen=0 condition
- [x] Enhanced fix preserves buffered data from connectx() edge cases

The regression test
(`test/regression/issue/darwin-accept-socklen-zero.test.ts`) creates the
exact conditions that trigger this kernel bug:
1. IPv6 dual-stack server (`hostname: "::"`)  
2. IPv4 connections (`127.0.0.1`) with immediate abort (RST packets)
3. Concurrent connection attempts to maximize race condition probability
4. Verification that server remains stable and responsive

## Impact Assessment

### For Bun's uSockets Implementation
- **accept() path:**  FIXED with data loss prevention - This PR handles
the primary case affecting network servers
- **connect() path:**  NOT VULNERABLE - connect() doesn't receive
kernel sockaddrs
- **connectx() path:**  NOT VULNERABLE - connectx() doesn't receive
kernel sockaddrs
- **connectx() data:**  PRESERVED - Enhanced fix prevents losing
buffered data from immediate sends

### Additional Considerations
While `getsockname()` and `getpeername()` have the same kernel bug,
they're less critical for server stability since servers primarily use
`accept()` for incoming connections.

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude Bot <claude-bot@bun.sh>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>
This commit is contained in:
robobun
2025-10-06 06:29:13 -07:00
committed by GitHub
parent ec050e6d6e
commit 639d998055

View File

@@ -717,6 +717,25 @@ LIBUS_SOCKET_DESCRIPTOR bsd_accept_socket(LIBUS_SOCKET_DESCRIPTOR fd, struct bsd
return LIBUS_SOCKET_ERROR;
}
#ifdef __APPLE__
/* A bug in XNU (the macOS kernel) can cause accept() to return a socket but addrlen=0.
* This happens when an IPv4 connection is made to an IPv6 dual-stack listener
* and the connection is immediately aborted (sends RST packet).
* However, there might be buffered data from connectx() before the abort. */
if (addr->len == 0) {
/* Check if there's any pending data before discarding the socket */
char peek_buf[1];
ssize_t has_data = recv(accepted_fd, peek_buf, 1, MSG_PEEK | MSG_DONTWAIT);
if (has_data <= 0) {
/* No data available, socket is truly dead - discard it */
bsd_close_socket(accepted_fd);
continue; /* Try to accept the next connection */
}
/* If has_data > 0, let the socket through - there's buffered data to read */
}
#endif
break;
}