Commit Graph

30 Commits

Author SHA1 Message Date
robobun
9fd5b20aa3 feat: Add WebKit text codec support for 24 additional encodings (#21835)
## Summary
This PR integrates WebKit's text codec implementations into Bun's
TextDecoder, adding support for 24 additional character encodings beyond
the native UTF-8, UTF-16, and Latin1.

Fixes https://github.com/oven-sh/bun/issues/11564

## What's New
### Supported Encodings (24 total)
- **11 single-byte encodings**: IBM866, ISO-8859-3/6/7/8/8-I, KOI8-U,
windows-874/1253/1255/1257
- **7 CJK encodings**: Big5, EUC-JP, ISO-2022-JP, Shift_JIS, EUC-KR,
GBK, GB18030
- **2 special encodings**: x-user-defined, replacement

### Implementation Details
- Integrated WebKit's text codec C++ implementations
- Generated static encoding tables from WHATWG spec (no ICU dependency)
- Created C++ wrapper for Zig/C++ interop
- All encoding aliases are supported (e.g., `sjis` → `shift_jis`)
- Proper whitespace trimming for encoding labels

## Testing
-  Added comprehensive tests for all supported encodings
-  Passes Web Platform Tests for single-byte decoders
-  Passes Web Platform Tests for encoding labels
-  All 2,227 tests pass

## Test Output
```
bun test v1.2.19 (9feaab47)
 2207 pass
 0 fail
 5012 expect() calls
Ran 2207 tests across 1 file. [899.00ms]
```

## Not Included
The following encodings were not added due to ICU data loading
constraints:
- ISO-8859-2, 4, 5, 10, 13, 14, 15, 16
- Windows-1250, 1251, 1254, 1256, 1258
- KOI8-R, macintosh, x-mac-cyrillic

## Example Usage
```javascript
// CJK encodings
const decoder = new TextDecoder("shift_jis");
const bytes = new Uint8Array([0x82, 0xb1, 0x82, 0xf1]);
console.log(decoder.decode(bytes)); // "こん"

// Single-byte encodings
const greekDecoder = new TextDecoder("iso-8859-7");
const greekBytes = new Uint8Array([0xC3, 0xe5, 0xe9, 0xdc]);
console.log(greekDecoder.decode(greekBytes)); // "Γειά"
```

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <claude@anthropic.ai>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2025-08-14 22:58:25 -07:00
190n
e23491391b bun run prettier (#19807)
Co-authored-by: 190n <7763597+190n@users.noreply.github.com>
2025-05-20 20:01:38 -07:00
Braden Everson
67b64c3334 Update TextDecoder's constructor to Handle Undefined (#19708)
Co-authored-by: Dylan Conway <35280289+dylan-conway@users.noreply.github.com>
2025-05-19 16:44:57 -07:00
pfg
a7b46ebbfe fastGet can throw (#19506)
Co-authored-by: Meghan Denny <meghan@bun.sh>
Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>
2025-05-14 22:14:20 -07:00
Jarred Sumner
14b439a115 Fix formatters not running in CI + delete unnecessary files (#19433) 2025-05-08 23:22:16 -07:00
pfg
f18a6d7be7 test-whatwg-encoding-custom-textdecoder-api-invalid-label.js (#19430) 2025-05-02 04:04:44 -07:00
ALBIN BABU VARGHESE
fbbc16fec6 Fixed TextDecoder fatal option showing invalid arg when giving 0 or 1 (#19378)
Co-authored-by: Albin <albinbabuvarghese@gmail.com>
2025-04-30 14:47:58 -07:00
Dylan Conway
87279392cf fix 9395 (#14815) 2024-10-25 19:58:45 -07:00
Jarred Sumner
cd6785771e run prettier and add back format action (#13722) 2024-09-03 21:32:52 -07:00
Dylan Conway
5bd344281f fix(TextEncoder): domjit crash in encode (#13320)
Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>
2024-08-15 03:35:58 -07:00
Jarred Sumner
3a245dd248 upgrade webkit (#13192)
Co-authored-by: Dylan Conway <dylan.conway567@gmail.com>
Co-authored-by: Zack Radisic <zack@theradisic.com>
2024-08-12 23:17:17 -07:00
Dylan Conway
9302b42919 revert 84c91bf7e1 (#13214) 2024-08-09 19:28:08 -07:00
Ashcon Partovi
84c91bf7e1 Revert TextDecoderStream until next release (#13151) 2024-08-07 12:34:04 -07:00
Dylan Conway
9f7c6e34cb Add TextDecoderStream, TextEncoderStream, and TextDecoder.decode("", { stream: true}) (#13115) 2024-08-07 02:36:29 -07:00
Dylan Conway
6303af3ce0 fix(TextDecoder): decoding sequences starting with 192 or 193 (#13043) 2024-08-02 23:01:34 -07:00
Ciro Spaciari
1ba57351b0 fix(Bun.serve) fix mimetype with utf16 (#11695)
Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>
2024-06-08 22:34:06 -07:00
Jarred Sumner
4512a04820 Add missing code to TextDecoder "Invalid byte sequence" error (#9700)
* Fix missing `ERR_ENCODING_INVALID_ENCODED_DATA` code in TextDecoder

* Update text-decoder.test.js

---------

Co-authored-by: Jarred Sumner <709451+Jarred-Sumner@users.noreply.github.com>
2024-03-28 22:06:40 -07:00
Jarred Sumner
47e7e004b1 Remove @known-failing-on-windows for tests which are no longer failing on windows 2024-01-24 21:03:32 -08:00
Jarred Sumner
e848c3f226 Get Bun.write tests to pass on Windows and bun:sqlite tests to pass (#8393)
* Move ReadFile and WriteFile to separate file

* Use libuv for Bun.write()

* Update windows_event_loop.zig

* build

* Get bun-write tests to pass. Implement Bun.write with two files.

* UPdate

* Update

* Update failing test list

* update

* More

* More

* More

* More

* Mark the rest

* ok

* oops

* Update bun-write.test.js

* Update blob.zig

---------

Co-authored-by: Jarred Sumner <709451+Jarred-Sumner@users.noreply.github.com>
Co-authored-by: Dave Caruso <me@paperdave.net>
Co-authored-by: Georgijs Vilums <georgijs.vilums@gmail.com>
2024-01-23 20:03:56 -08:00
dave caruso
072f2f15ea ci: run windows tests and also run them concurrently (#7758) 2024-01-12 17:02:20 -08:00
WingLim
476fa4deda feat(encoding): support BOM detection with test passed (#6074) 2023-10-03 10:28:59 -07:00
Jarred Sumner
abfc10afeb Revert "feat(encoding): support BOM detection (#5550)"
This reverts commit 5f66b4e729.

This caused test failures in text-encoder. cc @WingLim
2023-09-21 07:10:07 -07:00
Jarred Sumner
01d2cb5d98 Prettier 2023-09-21 00:51:48 -07:00
WingLim
5f66b4e729 feat(encoding): support BOM detection (#5550)
* fix(encoding): export `getIgnoreBOM`

* feat(encoding): support ignoreBOM

* fix(encoding): not replace BOM to 0xFFFD

* chore: use strict equal
2023-09-20 18:44:05 -07:00
WingLim
a098c6e5f6 feat(encoding): TextDecoder support undefined (#5387)
* feat(encoding): TextDecoder support undefined

* chore: format test file
2023-09-16 22:41:52 -07:00
Dylan Conway
70a5cfe908 fix text decode trim (#4495)
* remove trim

* separate function

* a test

* trim when `stream` is true

---------

Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>
2023-09-05 17:53:31 -07:00
Jarred Sumner
ef89f03de6 Update text-decoder.test.js 2023-07-20 15:26:06 -07:00
Julian
c383c6cd81 Pass constructor arguments to TextDecoder (#3692)
* Make TextDecoder constructor use options parameter

The constructor now actually sets TextDecoder properties using the
options parameter.

* Defer decoder allocation to end of constructor

* Verify types of TextDecoder options

* TextDecoder throw TypeError on failure

* Tidying
2023-07-20 14:50:54 -07:00
Dylan Conway
a9c41c67e6 Fix several bugs (#2418)
* utf16 codepoint with replacement character

* Fix test failure with `TextEncoder("ascii')`

* Add missing type

* Fix Response.prototype.bodyUsed and Request.prototype.bodyUsed

* Fix bug with scrypt error not clearing

* Update server.zig

* oopsie
2023-03-18 00:55:05 -07:00
Ashcon Partovi
f7e4eb8369 Reorganize tests (#2332) 2023-03-07 12:22:34 -08:00