Files
bun.sh/src/bun.js/bindings/EncodeURIComponent.cpp
SUZUKI Sosuke d7eebef6f8 Upgrade WebKit (#23122)
### What does this PR do?

- **Use `Latin1Character` instead of `LChar`**
- **Fix for
0875bc8f62**

### How did you verify your code works?

---

# WebKit Update Summary (September 2025)

## Overview
This document summarizes the major changes in WebKit/JavaScriptCore from
the September 2025 update. The update includes approximately 254
JSC-related commits with significant improvements to performance,
stability, and developer experience.

## Critical Bug Fixes

### Memory Safety
- **operationMaterializeObjectInOSR fix** (5c7aadfa0a96): Fixed
uninitialized Butterfly storage during OSR exits with sunk Array
allocations. This prevents potential crashes when arrays with holes are
materialized during OSR exit.
- **FTL materialization fixes** (a72d19840714, ed1e6fe03899): Added
missing internal object type handling in FTL materialization, improving
stability during optimization bailouts.

### Promise and Async Improvements
- **JSPromiseReaction object** (a1cb5e087a46, later reverted in
b0566a4db201): Initially introduced to improve promise reaction handling
but was reverted due to compatibility issues with Bun's modifications.
- **Async stack traces enhancements**:
  - Added support for `Promise.any` in async stack traces (d9a997b3edaa)
- Added empty JSValue checking for async stack trace safety
(9d26223d4bcb)
- Promise.all support was added and later reverted due to performance
concerns

## Performance Optimizations

### JIT Compiler Improvements
- **B3 Immutable Loads** (570a3530f949, 62300f8db3d9): Added
immutability annotations and CSE optimizations for loads that can look
for targets in dominators
- **BBQ JIT enhancements**:
  - Fixed callee-save register handling (c7ae05719045)
  - Simplified F32 copysign operations (e0651af57025)
- **DFG optimizations**:
- Fixed RegExp constant folding with materialized NewRegExp nodes
(7b53a04a5afa)
- Improved RegExp object node handling in strength reduction
(eeb65e05095b)

### WebAssembly Improvements
- **WASM SIMD Support**:
- Added v128 support for IPInt call and tail-call instructions
(73f0c9d430cb)
- Implemented v128 support in local.get, local.set, global.get,
global.set (67d7bf15139a)
  - Added x86_64 SIMD integer arithmetic and float instructions
- **WASM Memory Management**:
- Introduced WasmInstanceAnchor for better instance lifecycle management
(f9f1ed183bf7)
- Attached AbstractHeap to wasm memory access for better optimization
(f183c6f7def4)
  - Added signal handling for null checks in wasm (bf18b5b709f3)
- **WASM Debugging**: Added LLDB debugging infrastructure for
WebAssembly (e03c10225cc8)

## API and Language Features

### Iterator Helpers
- Merged `Iterator.prototype.sliding` into `Iterator.prototype.windows`
(1d49e823702d)
- Optimized iterator next method calls using CachedCall (5ee92514060c)

### Math Extensions
- Improved performance of `Math.sumPrecise` implementation
(602294057337)

### Error Handling
- Enhanced error messages for for-of loops without Symbol.iterator
(0051bbf2491f)

## Infrastructure Changes

### Character Type Refactoring
- **LChar to Latin1Character rename** (63b97b511366, 1424f0687876):
Major refactoring replacing the `LChar` type with `Latin1Character`
throughout the codebase for better clarity
- Additional fixes for Latin1Character usage (711eab3243f0,
50bf8e6fd4ca, 88e29ab76aec)

### Build System
- Fixed builds with GCC 15.x (e33b18bc59d6)
- Added gitattributes for JSC test files (82c4cc796da6)
- Improved test runner with comprehensive verbose logging (7ef95c177a42)
- Added memory-limited annotations for tests using excessive memory
(b991cd17d612)

### Testing Infrastructure
- Improved handling of missing test executables (db1e3bbb3be2)
- Added support for non-customized ICU 74.2 in intl tests (c922a28b6642)
- Fixed various test configuration issues and timeouts

## Bun-Specific Modifications

### Preserved Customizations
- Maintained `BUN_JSC_ADDITIONS` for Bun-specific features
- Kept async context support for AsyncLocalStorage
- Preserved V8 heap snapshot compatibility layer
- Maintained custom inspector extensions

### Conflicts Resolved
- Successfully merged upstream changes while preserving Bun's event loop
integration
- Resolved conflicts in promise handling while maintaining Bun's async
behavior
- Fixed re-declaration issues with `isAsyncFrame` for async stack traces

## Breaking Changes and Reverts

### Reverted Features
1. **JSPromiseReaction object**: Reverted due to conflicts with Bun's
promise handling
2. **Promise.all async stack trace support**: Reverted due to ~4%
performance regression in JetStream3/doxbee-async benchmark
3. **Array.prototype.flat C++ implementation**: Reverted (reason not
specified in commit)

## Security Improvements
- Type safety improvements with uncheckedDowncast for Wasm::Callee
(48425afd643d)
- Added bounds checking and validation for Wasm array operations
(b5148db1c4c1)
- Improved memory safety with proper initialization of materialized
objects

## Platform Support
- macOS: Continued support for x64/arm64
- Linux: Maintained glibc/musl compatibility
- Windows: Preserved x64 support
- Fixed platform-specific alignment issues for x86_64 (94a60eb123c5)

## Notable Debugging Enhancements
- LLDB infrastructure for WebAssembly debugging
- Improved verbose command logging in test runners
- Enhanced stack trace capabilities for async functions
- Better error reporting for missing Symbol.iterator

## Performance Metrics
- Several memory optimizations for test execution
- JIT memory reservation size adjustments for debug builds
- Optimized iterator operations with cached calls
- Improved Math.sumPrecise performance

## Future Considerations
- The JSPromiseReaction implementation may need revisiting with adjusted
architecture
- Async stack trace support for Promise.all requires performance
optimization
- Continued work on WASM SIMD support for additional operations

## Migration Notes for Bun Team
1. **LChar usage**: All references to `LChar` have been replaced with
`Latin1Character`
2. **Promise handling**: The reverted JSPromiseReaction changes indicate
potential architectural conflicts that may need addressing
3. **Test configuration**: New memory-limited annotations should be used
for memory-intensive tests
4. **Build flags**: Ensure USE_BUN_JSC_ADDITIONS and USE_BUN_EVENT_LOOP
remain enabled
2025-10-01 17:16:25 -07:00

102 lines
4.1 KiB
C++

#include "EncodeURIComponent.h"
// from JSGlobalObjectFunctions.cpp
namespace JSC {
template<typename CharacterType>
static WebCore::ExceptionOr<void> encode(VM& vm, const WTF::BitSet<256>& doNotEscape, std::span<const CharacterType> characters, StringBuilder& builder)
{
auto scope = DECLARE_THROW_SCOPE(vm);
// 18.2.6.1.1 Runtime Semantics: Encode ( string, unescapedSet )
// https://tc39.github.io/ecma262/#sec-encode
auto throwException = [] {
return WebCore::ExceptionOr<void>(WebCore::Exception { WebCore::EncodingError, "String contained an illegal UTF-16 sequence."_s });
};
builder.reserveCapacity(characters.size());
// 4. Repeat
auto* end = characters.data() + characters.size();
for (auto* cursor = characters.data(); cursor != end; ++cursor) {
auto character = *cursor;
// 4-c. If C is in unescapedSet, then
if (character < doNotEscape.size() && doNotEscape.get(character)) {
// 4-c-i. Let S be a String containing only the code unit C.
// 4-c-ii. Let R be a new String value computed by concatenating the previous value of R and S.
builder.append(static_cast<Latin1Character>(character));
continue;
}
// 4-d-i. If the code unit value of C is not less than 0xDC00 and not greater than 0xDFFF, throw a URIError exception.
if (U16_IS_TRAIL(character))
return throwException();
// 4-d-ii. If the code unit value of C is less than 0xD800 or greater than 0xDBFF, then
// 4-d-ii-1. Let V be the code unit value of C.
char32_t codePoint;
if (!U16_IS_LEAD(character))
codePoint = character;
else {
// 4-d-iii. Else,
// 4-d-iii-1. Increase k by 1.
++cursor;
// 4-d-iii-2. If k equals strLen, throw a URIError exception.
if (cursor == end)
return throwException();
// 4-d-iii-3. Let kChar be the code unit value of the code unit at index k within string.
auto trail = *cursor;
// 4-d-iii-4. If kChar is less than 0xDC00 or greater than 0xDFFF, throw a URIError exception.
if (!U16_IS_TRAIL(trail))
return throwException();
// 4-d-iii-5. Let V be UTF16Decode(C, kChar).
codePoint = U16_GET_SUPPLEMENTARY(character, trail);
}
// 4-d-iv. Let Octets be the array of octets resulting by applying the UTF-8 transformation to V, and let L be the array size.
Latin1Character utf8OctetsBuffer[U8_MAX_LENGTH];
unsigned utf8Length = 0;
// We can use U8_APPEND_UNSAFE here since codePoint is either
// 1. non surrogate one, correct code point.
// 2. correct code point generated from validated lead and trail surrogates.
U8_APPEND_UNSAFE(utf8OctetsBuffer, utf8Length, codePoint);
// 4-d-v. Let j be 0.
// 4-d-vi. Repeat, while j < L
for (unsigned index = 0; index < utf8Length; ++index) {
// 4-d-vi-1. Let jOctet be the value at index j within Octets.
// 4-d-vi-2. Let S be a String containing three code units "%XY" where XY are two uppercase hexadecimal digits encoding the value of jOctet.
// 4-d-vi-3. Let R be a new String value computed by concatenating the previous value of R and S.
builder.append('%');
builder.append(hex(utf8OctetsBuffer[index], 2));
}
}
return {};
}
static WebCore::ExceptionOr<void> encode(VM& vm, WTF::StringView view, const WTF::BitSet<256>& doNotEscape, StringBuilder& builder)
{
if (view.is8Bit())
return encode(vm, doNotEscape, view.span8(), builder);
return encode(vm, doNotEscape, view.span16(), builder);
}
WebCore::ExceptionOr<void> encodeURIComponent(VM& vm, WTF::StringView source, StringBuilder& builder)
{
static constexpr auto doNotEscapeWhenEncodingURIComponent = makeLatin1CharacterBitSet(
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
"!'()*-._~");
return encode(vm, source, doNotEscapeWhenEncodingURIComponent, builder);
}
}