This PR significantly improves `Bun.stringWidth` to handle a wider
variety of Unicode characters and escape sequences correctly.
## Zero-width character handling
Added support for many previously unhandled zero-width characters:
- Soft hyphen (U+00AD)
- Word joiner and invisible operators (U+2060-U+2064)
- Lone surrogates (U+D800-U+DFFF)
- Arabic formatting characters (U+0600-U+0605, U+06DD, U+070F, U+08E2)
- Indic script combining marks (Devanagari through Malayalam)
- Thai and Lao combining marks
- Combining Diacritical Marks Extended and Supplement
- Tag characters (U+E0000-U+E007F)
## ANSI escape sequence handling
### CSI sequences
- Now properly handles ALL CSI final bytes (0x40-0x7E), not just `m`
- This means cursor movement (A/B/C/D), erase (J/K), scroll (S/T), and
other CSI commands are now correctly excluded from width calculation
### OSC sequences
- Added support for OSC sequences (ESC ] ... BEL/ST)
- OSC 8 hyperlinks are now properly handled
- Supports both BEL (0x07) and ST (ESC \) terminators
### ESC ESC fix
- Fixed state machine bug where `ESC ESC` would incorrectly reset state
- Now correctly handles consecutive ESC characters
## Emoji handling
Added proper grapheme-aware emoji width calculation:
- Flag emoji (regional indicator pairs) → width 2
- Skin tone modifiers → width 2
- ZWJ sequences (family, professions, etc.) → width 2
- Keycap sequences → width 2
- Variation selectors (VS15 for text, VS16 for emoji presentation)
- Uses ICU's `UCHAR_EMOJI` property for accurate emoji detection
## Test coverage
Added comprehensive test suite with **94 tests** covering:
- All zero-width character categories
- All CSI final bytes
- OSC sequences with various terminators
- Emoji edge cases (flags, skin tones, ZWJ, keycaps, variation
selectors)
- East Asian width (CJK, fullwidth, halfwidth katakana)
- Indic and Thai script combining marks
- Fuzzer-like stress tests for robustness
## Breaking changes
This is a behavior change - `stringWidth` will return different values
for some inputs. However, the new values are more accurate
representations of terminal display width:
| Input | Old | New | Why |
|-------|-----|-----|-----|
| Flag emoji 🇺🇸 | 1 | 2 | Flags display as 2 cells |
| Skin tone 👋🏽 | 4 | 2 | Emoji + modifier = 1 grapheme |
| ZWJ family 👨👩👧 | 8 | 2 | ZWJ sequence = 1 grapheme |
| Word joiner U+2060 | 1 | 0 | Invisible character |
| OSC 8 hyperlinks | counted URL | just visible text | URLs are
invisible |
| Cursor movement ESC[5A | counted | 0 | Control sequence |
🤖 Generated with [Claude Code](https://claude.ai/code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude Bot <claude-bot@bun.sh>
## Summary
- Fix assertion failure in `Bun.mmap` when `offset` or `size` options
are non-numeric values
- Add validation to reject negative `offset`/`size` with clear error
messages
Minimal reproduction: `Bun.mmap("", { offset: null });`
## Root Cause
`Bun.mmap` was calling `toInt64()` directly on the `offset` and `size`
options without validating they are numbers first. `toInt64()` has an
assertion that the value must be a number or BigInt, which fails when
non-numeric values like `null` or functions are passed.
## Test plan
- [x] Added tests for negative offset/size rejection
- [x] Added tests for non-number inputs (null, undefined)
- [x] `bun bd test test/js/bun/util/mmap.test.js` passes
Closes ENG-22413
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Bot <claude-bot@bun.sh>
Co-authored-by: Claude <noreply@anthropic.com>
## Summary
- Fix assertion failure when `Bun.indexOfLine` is called with a
non-number offset argument
- Changed from `.to(u32)` to `.coerce(i32, globalThis)` for proper
JavaScript type coercion
## Test plan
- [x] Added regression test in `test/js/bun/util/index-of-line.test.ts`
- [x] `bun bd test test/js/bun/util/index-of-line.test.ts` passes
Closes ENG-21997
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Bot <claude-bot@bun.sh>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>
### What does this PR do?
This was creating `Zig::FFIFunction` when we could instead use a plain
`JSC::JSFunction`
### How did you verify your code works?
Added a test
### What does this PR do?
Adds support for `publicHoistPattern` in `bunfig.toml` and
`public-hoist-pattern` from `.npmrc`. This setting allows you to select
transitive packages to hoist to the root node_modules making them
available for all workspace packages.
```toml
[install]
# can be a string
publicHoistPattern = "@types*"
# or an array
publicHoistPattern = [ "@types*", "*eslint*" ]
```
`publicHoistPattern` only affects the isolated linker.
---
Adds `hoistPattern`. `hoistPattern` is the same as `publicHoistPattern`,
but applies to the `node_modules/.bun/node_modules` directory instead of
the root node_modules. Also the default value of `hoistPattern` is `*`
(everything is hoisted to `node_modules/.bun/node_modules` by default).
---
Fixes a determinism issue constructing the
`node_modules/.bun/node_modules` directory.
---
closes#23481closes#6160closes#23548
### How did you verify your code works?
Added tests for
- [x] only include patterns
- [x] only exclude patterns
- [x] mix of include and exclude
- [x] errors for unexpected expression types
- [x] excluding direct dependency (should still include)
- [x] match all with `*`
- [x] string and array expression types
---------
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
### What does this PR do?
`short` is signed in C++ by default and not unsigned. Switched to
`uint16_t` so it's unambiguous.
### How did you verify your code works?
There is a test
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
## Summary
Fixes a panic that occurred when `console.log()` tried to format a Set
or Map instance with a non-numeric `size` property.
## Issue
When a Set or Map subclass overrides the `size` property with a
non-numeric value (like a constructor function, string, or other
object), calling `console.log()` on the instance would trigger a panic:
```javascript
class C1 extends Set {
constructor() {
super();
Object.defineProperty(this, "size", {
writable: true,
enumerable: true,
value: Set
});
console.log(this); // panic!
}
}
new C1();
```
## Root Cause
In `src/bun.js/ConsoleObject.zig`, the Map and Set formatting code
called `toInt32()` directly on the `size` property value. This function
asserts that the value is not a Cell (objects/functions), causing a
panic when `size` was overridden with non-numeric values.
## Solution
Changed both Map and Set formatting to use `coerce(i32, globalThis)`
instead of `toInt32()`. This properly handles non-numeric values using
JavaScript's standard type coercion rules and propagates any coercion
errors appropriately.
## Test Plan
Added regression tests to `test/js/bun/util/inspect.test.js` that verify
Set and Map instances with overridden non-numeric `size` properties can
be inspected without panicking.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Bot <claude-bot@bun.sh>
Co-authored-by: Claude <noreply@anthropic.com>
## Summary
- Adds birthtime (file creation time) support on Linux using the `statx`
syscall
- Stores birthtime in architecture-specific unused fields of the kernel
Stat struct (x86_64 and aarch64)
- Falls back to traditional `stat` on kernels < 4.11 that don't support
`statx`
- Includes comprehensive tests validating birthtime behavior
Fixes#6585
## Implementation Details
**src/sys.zig:**
- Added `StatxField` enum for field selection
- Implemented `statxImpl()`, `fstatx()`, `statx()`, and `lstatx()`
functions
- Stores birthtime in unused padding fields (architecture-specific for
x86_64 and aarch64)
- Graceful fallback to traditional stat if statx is not supported
**src/bun.js/node/node_fs.zig:**
- Updated `stat()`, `fstat()`, and `lstat()` to use statx functions on
Linux
**src/bun.js/node/Stat.zig:**
- Added `getBirthtime()` helper to extract birthtime from
architecture-specific storage
**test/js/node/fs/fs-birthtime-linux.test.ts:**
- Tests non-zero birthtime values
- Verifies birthtime immutability across file modifications
- Validates consistency across stat/lstat/fstat
- Tests BigInt stats with nanosecond precision
- Verifies birthtime ordering relative to other timestamps
## Test Plan
- [x] Run `bun bd test test/js/node/fs/fs-birthtime-linux.test.ts` - all
5 tests pass
- [x] Compare behavior with Node.js - identical behavior
- [x] Compare with system Bun - system Bun returns epoch, new
implementation returns real birthtime
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Bot <claude-bot@bun.sh>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
## Summary
- Implements proper WebSocket subprotocol negotiation per RFC 6455 and
WHATWG standards
- Adds HeaderValueIterator utility for parsing comma-separated header
values
- Fixes WebSocket client to correctly validate server subprotocol
responses
- Sets WebSocket.protocol property to negotiated subprotocol per WHATWG
spec
- Includes comprehensive test coverage for all subprotocol scenarios
## Changes
**Core Implementation:**
- Add `HeaderValueIterator` utility for parsing comma-separated HTTP
header values
- Replace hash-based protocol matching with proper string set comparison
- Implement WHATWG compliant protocol property setting on successful
negotiation
**WebSocket Client (`WebSocketUpgradeClient.zig`):**
- Parse client subprotocols into StringSet using HeaderValueIterator
- Validate server response against requested protocols
- Set protocol property when server selects a matching subprotocol
- Allow connections when server omits Sec-WebSocket-Protocol header (per
spec)
- Reject connections when server sends unknown or empty subprotocol
values
**C++ Bindings:**
- Add `setProtocol` method to WebSocket class for updating protocol
property
- Export C binding for Zig integration
## Test Plan
Comprehensive test coverage for all subprotocol scenarios:
- ✅ Server omits Sec-WebSocket-Protocol header (connection allowed,
protocol="")
- ✅ Server sends empty Sec-WebSocket-Protocol header (connection
rejected)
- ✅ Server selects valid subprotocol from multiple client options
(protocol set correctly)
- ✅ Server responds with unknown subprotocol (connection rejected with
code 1002)
- ✅ Validates CloseEvent objects don't trigger [Circular] console bugs
All tests use proper WebSocket handshake implementation and validate
both client and server behavior per RFC 6455 requirements.
## Issues Fixed
Fixes#10459 - WebSocket client does not retrieve the protocol sent by
the server
Fixes#10672 - `obs-websocket-js` is not compatible with Bun
Fixes#17707 - Incompatibility with NodeJS when using obs-websocket-js
library
Fixes#19785 - Mismatch client protocol when connecting with multiple
Sec-WebSocket-Protocol
This enables obs-websocket-js and other libraries that rely on proper
RFC 6455 subprotocol negotiation to work correctly with Bun.
🤖 Generated with [Claude Code](https://claude.ai/code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
### What does this PR do?
### How did you verify your code works?
---------
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Claude Bot <claude-bot@bun.sh>
Co-authored-by: Claude <noreply@anthropic.com>
### What does this PR do?
Hopefully fix https://github.com/oven-sh/bun/issues/21879
### How did you verify your code works?
Added a test with a seed larger than u32.
The test vector is from this tiny test I wrote to rule out upstream zig
as the culprit:
```zig
const std = @import("std");
const testing = std.testing;
test "xxhash64 of short string with custom seed" {
const input = "";
const seed: u64 = 16269921104521594740;
const hash = std.hash.XxHash64.hash(seed, input);
const expected_hash: u64 = 3224619365169652240;
try testing.expect(hash == expected_hash);
}
```
### What does this PR do?
Introduce `Bun.stripANSI`, a SIMD-accelerated drop-in replacement for
the popular `"strip-ansi"` package.
`Bun.stripANSI` performs >10x faster and fixes several bugs in
`strip-ansi`, like [this long-standing
one](https://github.com/chalk/strip-ansi/issues/43).
### How did you verify your code works?
There are tests that check the output of `strip-ansi` matches
`Bun.stripANSI`. For cases where `strip-ansi`'s behavior is incorrect,
the expected value is manually provided.
---------
Co-authored-by: Jarred-Sumner <709451+Jarred-Sumner@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Claude Bot <claude-bot@bun.sh>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: taylor.fish <contact@taylor.fish>
I haven't checked all uses of tryTakeException but this bug is probably
not the only one.
Caught by running fuzzy-wuzzy with debug logging enabled. It tried to
print the exception. Updates fuzzy-wuzzy to have improved logging that
can tell you what was last executed before a crash.
---------
Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>