This PR significantly improves `Bun.stringWidth` to handle a wider
variety of Unicode characters and escape sequences correctly.
## Zero-width character handling
Added support for many previously unhandled zero-width characters:
- Soft hyphen (U+00AD)
- Word joiner and invisible operators (U+2060-U+2064)
- Lone surrogates (U+D800-U+DFFF)
- Arabic formatting characters (U+0600-U+0605, U+06DD, U+070F, U+08E2)
- Indic script combining marks (Devanagari through Malayalam)
- Thai and Lao combining marks
- Combining Diacritical Marks Extended and Supplement
- Tag characters (U+E0000-U+E007F)
## ANSI escape sequence handling
### CSI sequences
- Now properly handles ALL CSI final bytes (0x40-0x7E), not just `m`
- This means cursor movement (A/B/C/D), erase (J/K), scroll (S/T), and
other CSI commands are now correctly excluded from width calculation
### OSC sequences
- Added support for OSC sequences (ESC ] ... BEL/ST)
- OSC 8 hyperlinks are now properly handled
- Supports both BEL (0x07) and ST (ESC \) terminators
### ESC ESC fix
- Fixed state machine bug where `ESC ESC` would incorrectly reset state
- Now correctly handles consecutive ESC characters
## Emoji handling
Added proper grapheme-aware emoji width calculation:
- Flag emoji (regional indicator pairs) → width 2
- Skin tone modifiers → width 2
- ZWJ sequences (family, professions, etc.) → width 2
- Keycap sequences → width 2
- Variation selectors (VS15 for text, VS16 for emoji presentation)
- Uses ICU's `UCHAR_EMOJI` property for accurate emoji detection
## Test coverage
Added comprehensive test suite with **94 tests** covering:
- All zero-width character categories
- All CSI final bytes
- OSC sequences with various terminators
- Emoji edge cases (flags, skin tones, ZWJ, keycaps, variation
selectors)
- East Asian width (CJK, fullwidth, halfwidth katakana)
- Indic and Thai script combining marks
- Fuzzer-like stress tests for robustness
## Breaking changes
This is a behavior change - `stringWidth` will return different values
for some inputs. However, the new values are more accurate
representations of terminal display width:
| Input | Old | New | Why |
|-------|-----|-----|-----|
| Flag emoji 🇺🇸 | 1 | 2 | Flags display as 2 cells |
| Skin tone 👋🏽 | 4 | 2 | Emoji + modifier = 1 grapheme |
| ZWJ family 👨👩👧 | 8 | 2 | ZWJ sequence = 1 grapheme |
| Word joiner U+2060 | 1 | 0 | Invisible character |
| OSC 8 hyperlinks | counted URL | just visible text | URLs are
invisible |
| Cursor movement ESC[5A | counted | 0 | Control sequence |
🤖 Generated with [Claude Code](https://claude.ai/code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude Bot <claude-bot@bun.sh>
### What does this PR do?
Introduce `Bun.stripANSI`, a SIMD-accelerated drop-in replacement for
the popular `"strip-ansi"` package.
`Bun.stripANSI` performs >10x faster and fixes several bugs in
`strip-ansi`, like [this long-standing
one](https://github.com/chalk/strip-ansi/issues/43).
### How did you verify your code works?
There are tests that check the output of `strip-ansi` matches
`Bun.stripANSI`. For cases where `strip-ansi`'s behavior is incorrect,
the expected value is manually provided.
---------
Co-authored-by: Jarred-Sumner <709451+Jarred-Sumner@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Claude Bot <claude-bot@bun.sh>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: taylor.fish <contact@taylor.fish>