Commit Graph

8 Commits

Author SHA1 Message Date
Jarred Sumner
98cee5a57e Improve Bun.stringWidth accuracy and robustness (#25447)
This PR significantly improves `Bun.stringWidth` to handle a wider
variety of Unicode characters and escape sequences correctly.

## Zero-width character handling

Added support for many previously unhandled zero-width characters:
- Soft hyphen (U+00AD)
- Word joiner and invisible operators (U+2060-U+2064)
- Lone surrogates (U+D800-U+DFFF)
- Arabic formatting characters (U+0600-U+0605, U+06DD, U+070F, U+08E2)
- Indic script combining marks (Devanagari through Malayalam)
- Thai and Lao combining marks
- Combining Diacritical Marks Extended and Supplement
- Tag characters (U+E0000-U+E007F)

## ANSI escape sequence handling

### CSI sequences
- Now properly handles ALL CSI final bytes (0x40-0x7E), not just `m`
- This means cursor movement (A/B/C/D), erase (J/K), scroll (S/T), and
other CSI commands are now correctly excluded from width calculation

### OSC sequences
- Added support for OSC sequences (ESC ] ... BEL/ST)
- OSC 8 hyperlinks are now properly handled
- Supports both BEL (0x07) and ST (ESC \) terminators

### ESC ESC fix
- Fixed state machine bug where `ESC ESC` would incorrectly reset state
- Now correctly handles consecutive ESC characters

## Emoji handling

Added proper grapheme-aware emoji width calculation:
- Flag emoji (regional indicator pairs) → width 2
- Skin tone modifiers → width 2
- ZWJ sequences (family, professions, etc.) → width 2
- Keycap sequences → width 2
- Variation selectors (VS15 for text, VS16 for emoji presentation)
- Uses ICU's `UCHAR_EMOJI` property for accurate emoji detection

## Test coverage

Added comprehensive test suite with **94 tests** covering:
- All zero-width character categories
- All CSI final bytes
- OSC sequences with various terminators
- Emoji edge cases (flags, skin tones, ZWJ, keycaps, variation
selectors)
- East Asian width (CJK, fullwidth, halfwidth katakana)
- Indic and Thai script combining marks
- Fuzzer-like stress tests for robustness

## Breaking changes

This is a behavior change - `stringWidth` will return different values
for some inputs. However, the new values are more accurate
representations of terminal display width:

| Input | Old | New | Why |
|-------|-----|-----|-----|
| Flag emoji 🇺🇸 | 1 | 2 | Flags display as 2 cells |
| Skin tone 👋🏽 | 4 | 2 | Emoji + modifier = 1 grapheme |
| ZWJ family 👨‍👩‍👧 | 8 | 2 | ZWJ sequence = 1 grapheme |
| Word joiner U+2060 | 1 | 0 | Invisible character |
| OSC 8 hyperlinks | counted URL | just visible text | URLs are
invisible |
| Cursor movement ESC[5A | counted | 0 | Control sequence |

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude Bot <claude-bot@bun.sh>
2025-12-10 16:17:57 -08:00
Jarred Sumner
cd6785771e run prettier and add back format action (#13722) 2024-09-03 21:32:52 -07:00
Meghan Denny
702cae51f6 test: Bun.stringWidth is enabled by default (#9321) 2024-03-08 17:54:51 -08:00
Meghan Denny
ed339b367d improve Bun.stringWidth's algorithm (#9022)
* improve Bun.stringWidth's algorithm

* add a bunch more tests from string-width package

* make typescript happy

* undo typescript changes

* use better #define check for debug mode

* properly handle latin1 width tests

* support grapheme clusters

* fix trailing newline

* visibleUTF16WidthFn- add fast path for leading ascii

* add firstNonASCII16IgnoreMin

* fix firstNonASCII16CheckMin

* vectorize visibleUTF16WidthFn

* support emoji variation selector

* expose stringWidth in release mode too

* vectorize visibleLatin1Width

* support ambiguousIsNarrow option

* add typescript definition for stringWidth
2024-02-22 19:16:17 -08:00
Georgijs Vilums
3bc0f90a7c skip invalid stringWidth test 2024-01-22 12:25:49 -08:00
Jarred Sumner
1560a866fe Skip stringWidth tests for now 2024-01-21 19:25:57 -08:00
Jarred Sumner
a8ff7be642 Disable Bun.stringWidth until failing test case passes 2024-01-21 06:10:07 -08:00
Jarred Sumner
b82656d9fc Introduce Bun.stringWidth (#8327)
* Introduce `Bun.stringWidth`

* [autofix.ci] apply automated fixes

* Update utils.md

---------

Co-authored-by: Jarred Sumner <709451+Jarred-Sumner@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2024-01-21 04:47:36 -08:00