mirror of
https://github.com/oven-sh/bun
synced 2026-02-02 15:08:46 +00:00
This PR significantly improves `Bun.stringWidth` to handle a wider variety of Unicode characters and escape sequences correctly. ## Zero-width character handling Added support for many previously unhandled zero-width characters: - Soft hyphen (U+00AD) - Word joiner and invisible operators (U+2060-U+2064) - Lone surrogates (U+D800-U+DFFF) - Arabic formatting characters (U+0600-U+0605, U+06DD, U+070F, U+08E2) - Indic script combining marks (Devanagari through Malayalam) - Thai and Lao combining marks - Combining Diacritical Marks Extended and Supplement - Tag characters (U+E0000-U+E007F) ## ANSI escape sequence handling ### CSI sequences - Now properly handles ALL CSI final bytes (0x40-0x7E), not just `m` - This means cursor movement (A/B/C/D), erase (J/K), scroll (S/T), and other CSI commands are now correctly excluded from width calculation ### OSC sequences - Added support for OSC sequences (ESC ] ... BEL/ST) - OSC 8 hyperlinks are now properly handled - Supports both BEL (0x07) and ST (ESC \) terminators ### ESC ESC fix - Fixed state machine bug where `ESC ESC` would incorrectly reset state - Now correctly handles consecutive ESC characters ## Emoji handling Added proper grapheme-aware emoji width calculation: - Flag emoji (regional indicator pairs) → width 2 - Skin tone modifiers → width 2 - ZWJ sequences (family, professions, etc.) → width 2 - Keycap sequences → width 2 - Variation selectors (VS15 for text, VS16 for emoji presentation) - Uses ICU's `UCHAR_EMOJI` property for accurate emoji detection ## Test coverage Added comprehensive test suite with **94 tests** covering: - All zero-width character categories - All CSI final bytes - OSC sequences with various terminators - Emoji edge cases (flags, skin tones, ZWJ, keycaps, variation selectors) - East Asian width (CJK, fullwidth, halfwidth katakana) - Indic and Thai script combining marks - Fuzzer-like stress tests for robustness ## Breaking changes This is a behavior change - `stringWidth` will return different values for some inputs. However, the new values are more accurate representations of terminal display width: | Input | Old | New | Why | |-------|-----|-----|-----| | Flag emoji 🇺🇸 | 1 | 2 | Flags display as 2 cells | | Skin tone 👋🏽 | 4 | 2 | Emoji + modifier = 1 grapheme | | ZWJ family 👨👩👧 | 8 | 2 | ZWJ sequence = 1 grapheme | | Word joiner U+2060 | 1 | 0 | Invisible character | | OSC 8 hyperlinks | counted URL | just visible text | URLs are invisible | | Cursor movement ESC[5A | counted | 0 | Control sequence | 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Claude Bot <claude-bot@bun.sh>
npm install
bun run ffi
bun run log
bun run gzip
bun run async
bun run sqlite
# to use custom version of bun/deno/node binary
BUN=path/to/bun bun run ffi
# or edit .env file