Jarred Sumner 98cee5a57e Improve Bun.stringWidth accuracy and robustness (#25447)
This PR significantly improves `Bun.stringWidth` to handle a wider
variety of Unicode characters and escape sequences correctly.

## Zero-width character handling

Added support for many previously unhandled zero-width characters:
- Soft hyphen (U+00AD)
- Word joiner and invisible operators (U+2060-U+2064)
- Lone surrogates (U+D800-U+DFFF)
- Arabic formatting characters (U+0600-U+0605, U+06DD, U+070F, U+08E2)
- Indic script combining marks (Devanagari through Malayalam)
- Thai and Lao combining marks
- Combining Diacritical Marks Extended and Supplement
- Tag characters (U+E0000-U+E007F)

## ANSI escape sequence handling

### CSI sequences
- Now properly handles ALL CSI final bytes (0x40-0x7E), not just `m`
- This means cursor movement (A/B/C/D), erase (J/K), scroll (S/T), and
other CSI commands are now correctly excluded from width calculation

### OSC sequences
- Added support for OSC sequences (ESC ] ... BEL/ST)
- OSC 8 hyperlinks are now properly handled
- Supports both BEL (0x07) and ST (ESC \) terminators

### ESC ESC fix
- Fixed state machine bug where `ESC ESC` would incorrectly reset state
- Now correctly handles consecutive ESC characters

## Emoji handling

Added proper grapheme-aware emoji width calculation:
- Flag emoji (regional indicator pairs) → width 2
- Skin tone modifiers → width 2
- ZWJ sequences (family, professions, etc.) → width 2
- Keycap sequences → width 2
- Variation selectors (VS15 for text, VS16 for emoji presentation)
- Uses ICU's `UCHAR_EMOJI` property for accurate emoji detection

## Test coverage

Added comprehensive test suite with **94 tests** covering:
- All zero-width character categories
- All CSI final bytes
- OSC sequences with various terminators
- Emoji edge cases (flags, skin tones, ZWJ, keycaps, variation
selectors)
- East Asian width (CJK, fullwidth, halfwidth katakana)
- Indic and Thai script combining marks
- Fuzzer-like stress tests for robustness

## Breaking changes

This is a behavior change - `stringWidth` will return different values
for some inputs. However, the new values are more accurate
representations of terminal display width:

| Input | Old | New | Why |
|-------|-----|-----|-----|
| Flag emoji 🇺🇸 | 1 | 2 | Flags display as 2 cells |
| Skin tone 👋🏽 | 4 | 2 | Emoji + modifier = 1 grapheme |
| ZWJ family 👨‍👩‍👧 | 8 | 2 | ZWJ sequence = 1 grapheme |
| Word joiner U+2060 | 1 | 0 | Invisible character |
| OSC 8 hyperlinks | counted URL | just visible text | URLs are
invisible |
| Cursor movement ESC[5A | counted | 0 | Control sequence |

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude Bot <claude-bot@bun.sh>
2025-12-10 16:17:57 -08:00
2025-12-01 17:01:14 -08:00
2025-10-07 20:08:57 -07:00
2025-11-28 17:51:45 +11:00
2025-12-10 16:06:45 +11:00
2025-11-10 14:38:26 -08:00
2025-11-01 19:58:13 -07:00
2024-12-26 11:48:30 -08:00
2024-12-12 03:21:56 -08:00
2025-10-05 04:28:25 -07:00
2025-01-07 20:19:12 -08:00
2025-11-24 20:16:03 -08:00
2025-11-25 11:06:24 -08:00
2022-09-03 20:54:15 -07:00
2025-12-07 15:42:23 -08:00
2024-07-24 01:30:31 -07:00
2025-12-07 15:42:23 -08:00
2025-07-10 00:10:43 -07:00
go
2021-08-11 13:56:03 -07:00

Logo

Bun

stars Bun speed

Documentation   •   Discord   •   Issues   •   Roadmap

Read the docs →

What is Bun?

Bun is an all-in-one toolkit for JavaScript and TypeScript apps. It ships as a single executable called bun.

At its core is the Bun runtime, a fast JavaScript runtime designed as a drop-in replacement for Node.js. It's written in Zig and powered by JavaScriptCore under the hood, dramatically reducing startup times and memory usage.

bun run index.tsx             # TS and JSX supported out-of-the-box

The bun command-line tool also implements a test runner, script runner, and Node.js-compatible package manager. Instead of 1,000 node_modules for development, you only need bun. Bun's built-in tools are significantly faster than existing options and usable in existing Node.js projects with little to no changes.

bun test                      # run tests
bun run start                 # run the `start` script in `package.json`
bun install <pkg>             # install a package
bunx cowsay 'Hello, world!'   # execute a package

Install

Bun supports Linux (x64 & arm64), macOS (x64 & Apple Silicon) and Windows (x64).

Linux users — Kernel version 5.6 or higher is strongly recommended, but the minimum is 5.1.

x64 users — if you see "illegal instruction" or similar errors, check our CPU requirements

# with install script (recommended)
curl -fsSL https://bun.com/install | bash

# on windows
powershell -c "irm bun.sh/install.ps1 | iex"

# with npm
npm install -g bun

# with Homebrew
brew tap oven-sh/bun
brew install bun

# with Docker
docker pull oven/bun
docker run --rm --init --ulimit memlock=-1:-1 oven/bun

Upgrade

To upgrade to the latest version of Bun, run:

bun upgrade

Bun automatically releases a canary build on every commit to main. To upgrade to the latest canary build, run:

bun upgrade --canary

View canary build

Guides

Contributing

Refer to the Project > Contributing guide to start contributing to Bun.

License

Refer to the Project > License page for information about Bun's licensing.

Description
Bun is a fast, incrementally adoptable all-in-one JavaScript, TypeScript & JSX toolkit. Use individual tools like bun test or bun install in Node.js projects, or adopt the complete stack with a fast JavaScript runtime, bundler, test runner, and package manager built in. Bun aims for 100% Node.js compatibility.
Readme 847 MiB
Languages
Zig 60.6%
C++ 24.9%
TypeScript 8.3%
C 3.3%
JavaScript 1.4%
Other 1.1%