mirror of
https://github.com/oven-sh/bun
synced 2026-02-02 15:08:46 +00:00
## Summary Replace Bun's outdated grapheme breaking implementation with [Ghostty's approach](https://github.com/ghostty-org/ghostty/tree/main/src/unicode) using the [uucode](https://github.com/jacobsandlund/uucode) library. This adds proper **GB9c (Indic Conjunct Break)** support — Devanagari and other Indic script conjuncts now correctly form single grapheme clusters. ## Motivation The previous implementation used a `GraphemeBoundaryClass` enum with only 12 values and a 2-bit `BreakState` (just `extended_pictographic` and `regional_indicator` flags). It had no support for Unicode's GB9c rule, meaning Indic conjunct sequences (consonant + virama + consonant) were incorrectly split into multiple grapheme clusters. ## Architecture ### Runtime (zero uucode dependency, two table lookups) ``` codepoint → [3-level LUT] → GraphemeBreakNoControl enum (u5, 17 values) (state, gb1, gb2) → [8KB precomputed array] → (break_result, new_state) ``` The full grapheme break algorithm (GB6-GB13, GB9c, GB11, GB999) runs only at **comptime** to populate the 8KB decision array. At runtime it's pure table lookups. ### File Layout ``` src/deps/uucode/ ← Vendored library (MIT, build-time only) src/unicode/uucode/ ← Build-time integration ├── uucode_config.zig ← What Unicode properties to generate ├── grapheme_gen.zig ← Generator: queries uucode → writes tables ├── lut.zig ← 3-level lookup table generator └── CLAUDE.md ← Maintenance docs src/string/immutable/ ← Runtime (no uucode dependency) ├── grapheme.zig ← Grapheme break API + comptime decisions ├── grapheme_tables.zig ← Pre-generated tables (committed, ~91KB source) └── visible.zig ← Width calculation (2 lines changed) scripts/update-uucode.sh ← Update vendored uucode + regenerate ``` ### Key Types | Type | Size | Values | |------|------|--------| | `GraphemeBreakNoControl` | u5 | 17 (adds `indic_conjunct_break_{consonant,linker,extend}`, `emoji_modifier_base`, `zwnj`, etc.) | | `BreakState` | u3 | 5 (`default`, `regional_indicator`, `extended_pictographic`, `indic_conjunct_break_consonant`, `indic_conjunct_break_linker`) | ### Binary Size The tables store only the `GraphemeBreakNoControl` enum per codepoint (not width or emoji properties, which visible.zig handles separately): - stage1: 8192 × u16 = **16KB** (maps high byte → stage2 offset) - stage2: 27392 × u8 = **27KB** (maps to stage3 index; max value is 16) - stage3: 17 × u5 = **~17 bytes** (one per enum value) - Precomputed decisions: **8KB** - **Total: ~51KB** (vs previous ~70KB+) ## How to Regenerate Tables ```bash # After updating src/deps/uucode/: ./scripts/update-uucode.sh # Or manually: vendor/zig/zig build generate-grapheme-tables ``` Normal builds never run the generator — they use the committed `grapheme_tables.zig`. ## Testing ```bash bun bd test test/js/bun/util/stringWidth.test.ts ``` New test cases verify Devanagari conjuncts (GB9c): - `क्ष` (Ka+Virama+Ssa) → single cluster, width 2 - `क्ष` (Ka+Virama+ZWJ+Ssa) → single cluster, width 2 - `क्क्क` (Ka+Virama+Ka+Virama+Ka) → single cluster, width 3 --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
74 lines
4.4 KiB
Markdown
74 lines
4.4 KiB
Markdown
Bun itself is MIT-licensed.
|
||
|
||
## JavaScriptCore
|
||
|
||
Bun statically links JavaScriptCore (and WebKit) which is LGPL-2 licensed. WebCore files from WebKit are also licensed under LGPL2. Per LGPL2:
|
||
|
||
> (1) If you statically link against an LGPL’d library, you must also provide your application in an object (not necessarily source) format, so that a user has the opportunity to modify the library and relink the application.
|
||
|
||
You can find the patched version of WebKit used by Bun here: <https://github.com/oven-sh/webkit>. If you would like to relink Bun with changes:
|
||
|
||
- `git submodule update --init --recursive`
|
||
- `make jsc`
|
||
- `zig build`
|
||
|
||
This compiles JavaScriptCore, compiles Bun’s `.cpp` bindings for JavaScriptCore (which are the object files using JavaScriptCore) and outputs a new `bun` binary with your changes.
|
||
|
||
## Linked libraries
|
||
|
||
Bun statically links these libraries:
|
||
|
||
| Library | License |
|
||
|---------|---------|
|
||
| [`boringssl`](https://boringssl.googlesource.com/boringssl/) | [several licenses](https://boringssl.googlesource.com/boringssl/+/refs/heads/master/LICENSE) |
|
||
| [`brotli`](https://github.com/google/brotli) | MIT |
|
||
| [`libarchive`](https://github.com/libarchive/libarchive) | [several licenses](https://github.com/libarchive/libarchive/blob/master/COPYING) |
|
||
| [`lol-html`](https://github.com/cloudflare/lol-html/tree/master/c-api) | BSD 3-Clause |
|
||
| [`mimalloc`](https://github.com/microsoft/mimalloc) | MIT |
|
||
| [`picohttp`](https://github.com/h2o/picohttpparser) | dual-licensed under the Perl License or the MIT License |
|
||
| [`zstd`](https://github.com/facebook/zstd) | dual-licensed under the BSD License or GPLv2 license |
|
||
| [`simdutf`](https://github.com/simdutf/simdutf) | Apache 2.0 |
|
||
| [`tinycc`](https://github.com/tinycc/tinycc) | LGPL v2.1 |
|
||
| [`uSockets`](https://github.com/uNetworking/uSockets) | Apache 2.0 |
|
||
| [`zlib-cloudflare`](https://github.com/cloudflare/zlib) | zlib |
|
||
| [`c-ares`](https://github.com/c-ares/c-ares) | MIT licensed |
|
||
| [`libicu`](https://github.com/unicode-org/icu) 72 | [license here](https://github.com/unicode-org/icu/blob/main/icu4c/LICENSE) |
|
||
| [`libbase64`](https://github.com/aklomp/base64/blob/master/LICENSE) | BSD 2-Clause |
|
||
| [`libuv`](https://github.com/libuv/libuv) (on Windows) | MIT |
|
||
| [`libdeflate`](https://github.com/ebiggers/libdeflate) | MIT |
|
||
| [`uucode`](https://github.com/jacobsandlund/uucode) | MIT |
|
||
| A fork of [`uWebsockets`](https://github.com/jarred-sumner/uwebsockets) | Apache 2.0 licensed |
|
||
| Parts of [Tigerbeetle's IO code](https://github.com/tigerbeetle/tigerbeetle/blob/532c8b70b9142c17e07737ab6d3da68d7500cbca/src/io/windows.zig#L1) | Apache 2.0 licensed |
|
||
|
||
## Polyfills
|
||
|
||
For compatibility reasons, the following packages are embedded into Bun's binary and injected if imported.
|
||
|
||
| Package | License |
|
||
|---------|---------|
|
||
| [`assert`](https://npmjs.com/package/assert) | MIT |
|
||
| [`browserify-zlib`](https://npmjs.com/package/browserify-zlib) | MIT |
|
||
| [`buffer`](https://npmjs.com/package/buffer) | MIT |
|
||
| [`constants-browserify`](https://npmjs.com/package/constants-browserify) | MIT |
|
||
| [`crypto-browserify`](https://npmjs.com/package/crypto-browserify) | MIT |
|
||
| [`domain-browser`](https://npmjs.com/package/domain-browser) | MIT |
|
||
| [`events`](https://npmjs.com/package/events) | MIT |
|
||
| [`https-browserify`](https://npmjs.com/package/https-browserify) | MIT |
|
||
| [`os-browserify`](https://npmjs.com/package/os-browserify) | MIT |
|
||
| [`path-browserify`](https://npmjs.com/package/path-browserify) | MIT |
|
||
| [`process`](https://npmjs.com/package/process) | MIT |
|
||
| [`punycode`](https://npmjs.com/package/punycode) | MIT |
|
||
| [`querystring-es3`](https://npmjs.com/package/querystring-es3) | MIT |
|
||
| [`stream-browserify`](https://npmjs.com/package/stream-browserify) | MIT |
|
||
| [`stream-http`](https://npmjs.com/package/stream-http) | MIT |
|
||
| [`string_decoder`](https://npmjs.com/package/string_decoder) | MIT |
|
||
| [`timers-browserify`](https://npmjs.com/package/timers-browserify) | MIT |
|
||
| [`tty-browserify`](https://npmjs.com/package/tty-browserify) | MIT |
|
||
| [`url`](https://npmjs.com/package/url) | MIT |
|
||
| [`util`](https://npmjs.com/package/util) | MIT |
|
||
| [`vm-browserify`](https://npmjs.com/package/vm-browserify) | MIT |
|
||
|
||
## Additional credits
|
||
|
||
- Bun's JS transpiler, CSS lexer, and Node.js module resolver source code is a Zig port of [@evanw](https://github.com/evanw)’s [esbuild](https://github.com/evanw/esbuild) project.
|
||
- Credit to [@kipply](https://github.com/kipply) for the name "Bun"! |