Files
bun.sh/src/js
Ciro Spaciari aef0b5b4a6 fix(usockets): safely handle socket reallocation during context adoption (#25361)
## Summary
- Fix use-after-free vulnerability during socket adoption by properly
tracking reallocated sockets
- Add safety checks to prevent linking closed sockets to context lists
- Properly track socket state with new `is_closed`, `adopted`, and
`is_tls` flags

## What does this PR do?

This PR improves event loop stability by addressing potential
use-after-free issues that can occur when sockets are reallocated during
adoption (e.g., when upgrading a TCP socket to TLS).

### Key Changes

**Socket State Tracking
([internal.h](packages/bun-usockets/src/internal/internal.h))**
- Added `is_closed` flag to explicitly track when a socket has been
closed
- Added `adopted` flag to mark sockets that were reallocated during
context adoption
- Added `is_tls` flag to track TLS socket state for proper low-priority
queue handling

**Safe Socket Adoption
([context.c](packages/bun-usockets/src/context.c))**
- When `us_poll_resize()` returns a new pointer (reallocation occurred),
the old socket is now:
  - Marked as closed (`is_closed = 1`)
  - Added to the closed socket cleanup list
  - Marked as adopted (`adopted = 1`)
  - Has its `prev` pointer set to the new socket for event redirection
- Added guards to
`us_internal_socket_context_link_socket/listen_socket/connecting_socket`
to prevent linking already-closed sockets

**Event Loop Handling ([loop.c](packages/bun-usockets/src/loop.c))**
- After callbacks that can trigger socket adoption (`on_open`,
`on_writable`, `on_data`), the event loop now checks if the socket was
reallocated and redirects to the new socket
- Low-priority socket handling now properly checks `is_closed` state and
uses `is_tls` flag for correct SSL handling

**Poll Resize Safety
([epoll_kqueue.c](packages/bun-usockets/src/eventing/epoll_kqueue.c))**
- Changed `us_poll_resize()` to always allocate new memory with
`us_calloc()` instead of `us_realloc()` to ensure the old pointer
remains valid for cleanup
- Now takes `old_ext_size` parameter to correctly calculate memory sizes
- Re-enabled `us_internal_loop_update_pending_ready_polls()` call in
`us_poll_change()` to ensure pending events are properly redirected

### How did you verify your code works?
Run existing CI and existing socket upgrade tests under asan build
2025-12-15 18:43:51 -08:00
..
2025-11-28 17:51:45 +11:00
2025-09-06 21:58:39 -07:00
2025-10-04 02:17:55 -07:00

JS Modules

TLDR: If anything here changes, re-run bun run build.

  • ./node contains all node:* modules
  • ./bun contains all bun:* modules
  • ./thirdparty contains npm modules we replace like ws
  • ./internal contains modules that aren't assigned to the module resolver

Each .ts/.js file above is assigned a numeric id at compile time and inlined into an array of lazily initialized modules. Internal modules referencing each other is extremely optimized, skipping the module resolver entirely.

Builtins Syntax

Within these files, the $ prefix on variables can be used to access private property names as well as JSC intrinsics.

// Many globals have private versions which are impossible for the user to
// tamper with. Though, these global variables are auto-prefixed by the bundler.
const hello = $Array.from(...);

// Similar situation with prototype values. These aren't autoprefixed since it depends on type.
something.$then(...);
map.$set(...);

// Internal variables we define
$requireMap.$has("elysia");

// JSC engine intrinsics. These usually translate directly to bytecode instructions.
const arr = $newArrayWithSize(5);
// A side effect of this is that using an intrinsic incorrectly like
// this will fail to parse and cause a segfault.
console.log($getInternalField)

V8 has a similar feature to this syntax (they use % instead)

On top of this, we have some special functions that are handled by the builtin preprocessor:

  • require works, but it must be passed a string literal that resolves to a module within src/js. This call gets replaced with $getInternalField($internalModuleRegistry, <number>), which directly loads the module by its generated numerical ID, skipping the resolver for inter-internal modules.

  • $debug() is exactly like console.log, but is stripped in release builds. It is disabled by default, requiring you to pass one of: BUN_DEBUG_MODULE_NAME=1, BUN_DEBUG_JS=1, or BUN_DEBUG_ALL=1. You can also do if($debug) {} to check if debug env var is set.

  • $assert() in debug builds will assert the condition, but it is stripped in release builds. If an assertion fails, the program continues to run, but an error is logged in the console containing the original source condition and any extra messages specified.

  • IS_BUN_DEVELOPMENT is inlined to be true in all development builds.

  • process.platform and process.arch is properly inlined and DCE'd. Do use this to run different code on different platforms.

Builtin Modules

Files in node, bun, thirdparty, and internal are all bundled as "modules". These go through the preprocessor to construct a JS function, where export default/export function/etc are converted into a return statement. Due to this, non-type import statements are not supported.

By using export default, this controls the result of using require to import the module. When ESM imports this module (userland), all properties on this object are available as named exports. Named exports are preprocessed into properties on this default object.

const fs = require("fs"); // load another builtin module

export default {
  hello: 2,
  world: 3,
};

Keep in mind that these are not ES modules. export default is only syntax sugar to assign to the variable $exports, which is actually how the module exports its contents.

To actually wire up one of these modules to the resolver, that is done separately in module_resolver.zig. Maybe in the future we can do codegen for it.

Builtin Functions

./functions contains isolated functions. Each function within is bundled separately, meaning you may not use global variables, non-type imports, and even directly referencing the other functions in these files. require is still resolved the same way it does in the modules.

In function files, these are accessible in C++ by using <file><function>CodeGenerator(vm), for example:

object->putDirectBuiltinFunction(
  vm,
  globalObject,
  identifier,
  // ReadableStream.ts, `function readableStreamToJSON()`
  // This returns a FunctionExecutable* (extends JSCell*, but not JSFunction*).
  readableStreamReadableStreamToJSONCodeGenerator(vm),
  JSC::PropertyAttribute::DontDelete | 0
);

Building

Run bun run build to bundle all the builtins. The output is placed in build/debug/js, where these files are loaded dynamically by bun-debug (an exact filepath is inlined into the binary pointing at where you cloned bun, so moving the binary to another machine may not work). In a release build, these get minified and inlined into the binary (Please commit those generated headers).

If you change the list of files or functions, you will have to run bun run build.

Notes on how the build process works

This isn't really required knowledge to use it, but a rough overview of how ./_codegen/* works

The build process is built on top of Bun's bundler. The first step is scanning all modules and assigning each a numerical ID. The order is determined by an A-Z sort.

The $ for private names is actually a lie, and in JSC it actually uses @; though that is a syntax error in regular JS/TS, so we opted for better IDE support. So first we have to pre-process the files to spot all instances of $ at the start of an identifier and we convert it to __intrinsic__. We also scan for require(string) and replace it with $requireId(n) after resolving it to the integer id, which is defined in ./functions/Module.ts. export default is transformed into return ...;, however this transform is a little more complicated that a string replace because it supports that not being the final statement, and access to the underlying variable $exports, etc.

The preprocessor is smart enough to not replace $ in strings, comments, regex, etc. However, it is not a real JS parser and instead a recursive regex-based nightmare, so may hit some edge cases. Yell at Chloe if it breaks.

The module is then printed like:

// @ts-nocheck
$$capture_start$$(function () {
  const path = __intrinsic__requireId(23);
  // user code is pasted here
  return {
    cool: path,
  };
}).$$capture_end$$;

This capture thing is used to extract the function declaration afterwards, this is more useful in the functions case where functions can have arguments, or be async functions.

After bundling, the inner part is extracted, and then __intrinsic__ is replaced to @.

These can then be inlined into C++ headers and loaded with createBuiltin. This is done in InternalModuleRegistry.cpp.