Compare commits

...

10 Commits

Author SHA1 Message Date
Claude
147e058310 fix: eliminate double-deinit race in PathWatcher and PathWatcherManager
PathWatcher.deinit() previously called setClosed() (which locks/unlocks
the mutex) followed by hasPendingDirectories() (lockless atomic load).
Between these two operations, a worker thread in unrefPendingDirectory()
could acquire the mutex, see closed=true and pending_directories==0,
and set should_deinit=true — causing both threads to call destroy().

Fix by combining the closed flag store and pending_directories check
under a single mutex hold in deinit(). Since unrefPendingDirectory()
already checks isClosed() inside its own mutex hold, the two threads
can no longer both decide to proceed with cleanup.

Apply the same pattern to PathWatcherManager.deinit() where
hasPendingTasks() was checked outside the mutex after the watcher_count
gate, allowing the last task to complete unobserved.
2026-03-09 20:44:00 +00:00
Claude
b4d95e3485 fix: dereference SSLConfig SharedPtr in getTlsHostname
Merge with main pulled in #27838 (SSLConfig SharedPtr refactor) which
changed tls_props from a raw pointer to ?SSLConfig.SharedPtr. The
getTlsHostname function (from #27891) needs .get() to access the inner
struct.

https://claude.ai/code/session_013XzvW9VsevSPURzRLixE7G
2026-03-09 19:47:44 +00:00
Alistair Smith
4c12c774a0 Merge branch 'main' into path-watcher-deadlock-fix 2026-03-09 11:08:31 -07:00
robobun
11bc2d986c deflake: fix stdio passthrough test timeout on slow CI (#27868)
## Summary
- Increased timeout from 10s to 30s for the "it accepts stdio
passthrough" test to accommodate slow CI environments (Windows aarch64)
- Pipe stderr from the `bun install` step instead of inheriting, so
install failures produce actionable error messages
- Added explicit error check after install with stderr in the error
message
- Cleaned up variable naming to avoid confusing reassignment of
`stdout`/`stderr`/`exited`

## Test plan
- [x] Verified test passes locally with `bun bd test`
- [ ] Verify test passes consistently on Windows aarch64 CI

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Alistair Smith <alistair@anthropic.com>
2026-03-09 09:58:49 -07:00
Alistair Smith
d5d00aa562 Merge branch 'main' into path-watcher-deadlock-fix 2026-03-06 11:38:21 -08:00
Alistair Smith
19a2f4c6ce Revert "regression test"
This reverts commit 5c578173e8.
2026-03-06 11:37:48 -08:00
Alistair Smith
72ba748643 Add some bounds 2026-03-03 17:20:36 -08:00
Alistair Smith
5c578173e8 regression test 2026-03-03 17:06:39 -08:00
Alistair Smith
59962f75e7 add one more deinit case (unrefPendingDirectory) 2026-03-02 10:59:58 -08:00
Chris Lloyd
d3379ab5a5 Fix PathWatcherManager deadlock and UAF in deferred deinit
Three issues in the deferred-deinit pattern used by
PathWatcherManager, all triggered by rapid fs.watch() churn.

## unregisterWatcher self-deadlock (primary)

The deinit-on-last-watcher defer was registered AFTER the
mutex.lock()/defer mutex.unlock() pair:

    this.mutex.lock();
    defer this.mutex.unlock();           // fires last
    var watchers = this.watchers.slice();
    defer {                               // fires BEFORE unlock
        if (this.deinit_on_last_watcher and this.watcher_count == 0) {
            this.deinit();                // called holding this.mutex
        }
    }

Zig defers fire LIFO, so deinit() ran while still holding
this.mutex. deinit() then re-acquires this.mutex when
hasPendingTasks() is true (a DirectoryRegisterTask running on
the workpool). os_unfair_lock is non-recursive -> the thread
blocks in __ulock_wait2 forever.

Trigger: onError() called (sets deinit_on_last_watcher), then
last watcher closed while a workpool task is mid-flight.

Also UAF: if deinit() completes it calls destroy(this), then
defer this.mutex.unlock() fires on freed memory.

Fix: hoist to a should_deinit flag checked by a defer
registered before the lock. LIFO ordering now yields
unlock -> deinit.

## unrefPendingTask UAF

Same shape: this.deinit() called while holding this.mutex. In
this path has_pending_tasks is stored false just before, so
deinit() doesn't re-lock -- but if it completes, destroy(this)
means the defer this.mutex.unlock() is a UAF. Same fix pattern.

## processWatcher AB/BA lock inversion

Worker thread holds watcher.mutex (processWatcher) -> calls
manager._decrementPathRef() on the OOM error path, which
acquires manager.mutex. Main thread unregisterWatcher holds
manager.mutex -> wants watcher.mutex. Classic AB/BA. Only on
allocation failure during file_paths.append, but real under
memory pressure.

Fix: capture append result, unlock watcher.mutex BEFORE
calling _decrementPathRef.

## Test plan

- zig fmt --check + zig ast-check (Bun's zig 0.15.2)
- bun run zig:check (full semantic analysis, 5/5)
- Full compile to bun-zig.o (asan + noasan), symbols verified
- No JS regression test: the trigger (onError() firing) requires
  platform-level watch loop failure which isn't reproducible
  via rmSync of watched paths (inotify/kqueue return deletion
  events, not errors). Stress tests pass on unpatched Linux.

🏠 Remote-Dev: homespace
2026-03-02 10:59:57 -08:00
3 changed files with 80 additions and 33 deletions

View File

@@ -39,12 +39,20 @@ pub const PathWatcherManager = struct {
}
fn unrefPendingTask(this: *PathWatcherManager) void {
// deinit() may destroy(this). Defer it until after unlock so we don't
// unlock() a freed mutex.
var should_deinit = false;
defer if (should_deinit) this.deinit();
this.mutex.lock();
defer this.mutex.unlock();
this.pending_tasks -= 1;
if (this.deinit_on_last_task and this.pending_tasks == 0) {
if (this.pending_tasks == 0) {
// Clear unconditionally: if tasks drain to zero before deinit() runs,
// gating this on deinit_on_last_task leaves the flag stale-true and
// deinit() keeps deferring on a count that is already zero.
this.has_pending_tasks.store(false, .release);
this.deinit();
if (this.deinit_on_last_task) should_deinit = true;
}
}
@@ -449,8 +457,13 @@ pub const PathWatcherManager = struct {
{
watcher.mutex.lock();
defer watcher.mutex.unlock();
watcher.file_paths.append(bun.default_allocator, child_path.path) catch |err| {
const append_result = watcher.file_paths.append(bun.default_allocator, child_path.path);
watcher.mutex.unlock();
// On error, drop the ref we took in _fdFromAbsolutePathZ. Must do
// this AFTER releasing watcher.mutex: _decrementPathRef acquires
// manager.mutex, and unregisterWatcher acquires manager.mutex before
// watcher.mutex — inverting here would AB/BA deadlock.
append_result catch |err| {
manager._decrementPathRef(entry_path_z);
return switch (err) {
error.OutOfMemory => .{ .err = .{
@@ -604,17 +617,22 @@ pub const PathWatcherManager = struct {
this._decrementPathRefNoLock(file_path);
}
// unregister is always called form main thread
// unregister is always called from main thread
fn unregisterWatcher(this: *PathWatcherManager, watcher: *PathWatcher) void {
// Must defer deinit() to AFTER releasing this.mutex, for two reasons:
// 1. deinit() re-acquires this.mutex at line ~670 when hasPendingTasks() is
// true. os_unfair_lock is non-recursive, so calling deinit() while holding
// the lock self-deadlocks in __ulock_wait2.
// 2. deinit() may destroy(this). Unlocking a freed mutex is UAF.
// Zig defers fire LIFO, so registering this defer before the lock/unlock
// pair makes it fire last.
var should_deinit = false;
defer if (should_deinit) this.deinit();
this.mutex.lock();
defer this.mutex.unlock();
var watchers = this.watchers.slice();
defer {
if (this.deinit_on_last_watcher and this.watcher_count == 0) {
this.deinit();
}
}
for (watchers, 0..) |w, i| {
if (w) |item| {
@@ -644,6 +662,8 @@ pub const PathWatcherManager = struct {
}
}
}
should_deinit = this.deinit_on_last_watcher and this.watcher_count == 0;
}
fn deinit(this: *PathWatcherManager) void {
@@ -661,12 +681,18 @@ pub const PathWatcherManager = struct {
return;
}
if (this.hasPendingTasks()) {
// Combine checking pending_tasks and setting deinit_on_last_task
// under a single mutex hold to prevent a race where the last task
// completes between the lockless hasPendingTasks() check and the
// mutex acquisition, causing neither thread to proceed with cleanup.
{
this.mutex.lock();
defer this.mutex.unlock();
// deinit when all tasks are done
this.deinit_on_last_task = true;
return;
if (this.pending_tasks > 0) {
this.deinit_on_last_task = true;
return;
}
this.has_pending_tasks.store(false, .release);
}
this.main_watcher.deinit(false);
@@ -824,12 +850,23 @@ pub const PathWatcher = struct {
}
pub fn unrefPendingDirectory(this: *PathWatcher) void {
// deinit() calls setClosed() which re-locks this.mutex, and may then
// proceed to destroy(this). Defer it until after unlock so we don't
// self-deadlock or unlock() a freed mutex.
var should_deinit = false;
defer if (should_deinit) this.deinit();
this.mutex.lock();
defer this.mutex.unlock();
this.pending_directories -= 1;
if (this.isClosed() and this.pending_directories == 0) {
if (this.pending_directories == 0) {
// Clear unconditionally: if the scan drains to zero before close()
// runs (the common case — scan is fast, close happens later),
// gating this on isClosed() leaves the flag stale-true. deinit()
// then early-returns on hasPendingDirectories() forever,
// unregisterWatcher never runs, and every fd the scan opened leaks.
this.has_pending_directories.store(false, .release);
this.deinit();
if (this.isClosed()) should_deinit = true;
}
}
@@ -874,10 +911,21 @@ pub const PathWatcher = struct {
}
pub fn deinit(this: *PathWatcher) void {
this.setClosed();
if (this.hasPendingDirectories()) {
// will be freed on last directory
return;
// Combine setting closed and checking pending_directories under a
// single mutex hold to prevent a double-deinit race: without this,
// a worker thread in unrefPendingDirectory() can observe closed=true
// and pending_directories==0 between our setClosed() and
// hasPendingDirectories() calls, causing both threads to proceed
// with destroy().
{
this.mutex.lock();
defer this.mutex.unlock();
this.closed.store(true, .release);
if (this.pending_directories > 0) {
// Will be freed by the last unrefPendingDirectory call.
return;
}
this.has_pending_directories.store(false, .release);
}
if (this.manager) |manager| {

View File

@@ -46,7 +46,7 @@ fn getTlsHostname(client: *const HTTPClient, allowProxyUrl: bool) []const u8 {
}
// Prefer the explicit TLS server_name (e.g. from Node.js servername option)
if (client.tls_props) |props| {
if (props.server_name) |sn| {
if (props.get().server_name) |sn| {
const sn_slice = bun.sliceTo(sn, 0);
if (sn_slice.len > 0) return sn_slice;
}

View File

@@ -414,25 +414,26 @@ it("it accepts stdio passthrough", async () => {
}),
);
let { stdout, stderr, exited } = Bun.spawn({
await using installProc = Bun.spawn({
cmd: [bunExe(), "install"],
cwd: package_dir,
stdio: ["inherit", "inherit", "inherit"],
stdio: ["inherit", "pipe", "pipe"],
env: bunEnv,
});
expect(await exited).toBe(0);
const [installStderr, installExitCode] = await Promise.all([installProc.stderr.text(), installProc.exited]);
if (installExitCode !== 0) {
throw new Error(`bun install failed with exit code ${installExitCode}:\n${installStderr}`);
}
({ stdout, stderr, exited } = Bun.spawn({
await using runProc = Bun.spawn({
cmd: [bunExe(), "--bun", "run", "all"],
cwd: package_dir,
stdio: ["ignore", "pipe", "pipe"],
env: bunEnv,
}));
console.log(package_dir);
const [err, out, exitCode] = await Promise.all([stderr.text(), stdout.text(), exited]);
});
const [err, out, exitCode] = await Promise.all([runProc.stderr.text(), runProc.stdout.text(), runProc.exited]);
try {
// This command outputs in either `["hello", "world"]` or `["world", "hello"]` order.
console.log({ err, out });
expect([err.split("\n")[0], ...err.split("\n").slice(1, -1).sort(), err.split("\n").at(-1)]).toEqual([
"$ run-p echo-hello echo-world",
"$ echo hello",
@@ -442,12 +443,10 @@ it("it accepts stdio passthrough", async () => {
expect(out.split("\n").slice(0, -1).sort()).toStrictEqual(["hello", "world"].sort());
expect(exitCode).toBe(0);
} catch (e) {
console.error({ exitCode });
console.log(err);
console.log(out);
console.error({ exitCode, err, out });
throw e;
}
}, 10000);
}, 30_000);
it.if(!isWindows)("spawnSync correctly reports signal codes", () => {
const trapCode = `