mirror of
https://github.com/oven-sh/bun
synced 2026-02-02 15:08:46 +00:00
feat: add Bun.JSONL.parse() for streaming newline-delimited JSON parsing (#26356)
Adds a built-in JSONL parser implemented in C++ using JavaScriptCore's
optimized JSON parser.
## API
### `Bun.JSONL.parse(input)`
Parse a complete JSONL string or `Uint8Array` and return an array of all
parsed values. Throws on invalid input.
```ts
const results = Bun.JSONL.parse('{"a":1}\n{"b":2}\n');
// [{ a: 1 }, { b: 2 }]
```
### `Bun.JSONL.parseChunk(input, start?, end?)`
Parse as many complete values as possible, returning `{ values, read,
done, error }`. Designed for streaming use cases where input arrives
incrementally.
```ts
const result = Bun.JSONL.parseChunk('{"id":1}\n{"id":2}\n{"id":3');
result.values; // [{ id: 1 }, { id: 2 }]
result.read; // 17
result.done; // false
result.error; // null
```
## Implementation Details
- C++ implementation in `BunObject.cpp` using JSC's `streamingJSONParse`
- ASCII fast path: zero-copy `StringView` for pure ASCII input
- Non-ASCII: uses `fromUTF8ReplacingInvalidSequences` with
`utf16_length_from_utf8` size check to prevent overflow
- UTF-8 BOM automatically skipped for `Uint8Array` input
- Pre-built `Structure` with fixed property offsets for fast result
object creation
- `Symbol.toStringTag = "JSONL"` on the namespace object
- `parseChunk` returns errors in `error` property instead of throwing,
preserving partial results
- Comprehensive boundary checks on start/end parameters
## Tests
234 tests covering:
- Complete and partial/streaming input scenarios
- Error handling and recovery
- UTF-8 multi-byte characters and BOM handling
- start/end boundary security (exhaustive combinations, clamping, OOB
prevention)
- 4 GB input rejection (both ASCII and non-ASCII paths)
- Edge cases (empty input, single values, whitespace, special numbers)
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
This commit is contained in:
@@ -2,7 +2,7 @@ option(WEBKIT_VERSION "The version of WebKit to use")
|
|||||||
option(WEBKIT_LOCAL "If a local version of WebKit should be used instead of downloading")
|
option(WEBKIT_LOCAL "If a local version of WebKit should be used instead of downloading")
|
||||||
|
|
||||||
if(NOT WEBKIT_VERSION)
|
if(NOT WEBKIT_VERSION)
|
||||||
set(WEBKIT_VERSION 87c6cde57dd1d2a82bbc9caf500f70f8a7c1f249)
|
set(WEBKIT_VERSION daf95b4b4574799ff22c8c4effd0dc6e864968a5)
|
||||||
endif()
|
endif()
|
||||||
|
|
||||||
# Use preview build URL for Windows ARM64 until the fix is merged to main
|
# Use preview build URL for Windows ARM64 until the fix is merged to main
|
||||||
|
|||||||
@@ -150,6 +150,7 @@
|
|||||||
"/runtime/secrets",
|
"/runtime/secrets",
|
||||||
"/runtime/console",
|
"/runtime/console",
|
||||||
"/runtime/yaml",
|
"/runtime/yaml",
|
||||||
|
"/runtime/jsonl",
|
||||||
"/runtime/html-rewriter",
|
"/runtime/html-rewriter",
|
||||||
"/runtime/hashing",
|
"/runtime/hashing",
|
||||||
"/runtime/glob",
|
"/runtime/glob",
|
||||||
|
|||||||
188
docs/runtime/jsonl.mdx
Normal file
188
docs/runtime/jsonl.mdx
Normal file
@@ -0,0 +1,188 @@
|
|||||||
|
---
|
||||||
|
title: JSONL
|
||||||
|
description: Parse newline-delimited JSON (JSONL) with Bun's built-in streaming parser
|
||||||
|
---
|
||||||
|
|
||||||
|
Bun has built-in support for parsing [JSONL](https://jsonlines.org/) (newline-delimited JSON), where each line is a separate JSON value. The parser is implemented in C++ using JavaScriptCore's optimized JSON parser and supports streaming use cases.
|
||||||
|
|
||||||
|
```ts
|
||||||
|
const results = Bun.JSONL.parse('{"name":"Alice"}\n{"name":"Bob"}\n');
|
||||||
|
// [{ name: "Alice" }, { name: "Bob" }]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## `Bun.JSONL.parse()`
|
||||||
|
|
||||||
|
Parse a complete JSONL input and return an array of all parsed values.
|
||||||
|
|
||||||
|
```ts
|
||||||
|
import { JSONL } from "bun";
|
||||||
|
|
||||||
|
const input = '{"id":1,"name":"Alice"}\n{"id":2,"name":"Bob"}\n{"id":3,"name":"Charlie"}\n';
|
||||||
|
const records = JSONL.parse(input);
|
||||||
|
console.log(records);
|
||||||
|
// [
|
||||||
|
// { id: 1, name: "Alice" },
|
||||||
|
// { id: 2, name: "Bob" },
|
||||||
|
// { id: 3, name: "Charlie" }
|
||||||
|
// ]
|
||||||
|
```
|
||||||
|
|
||||||
|
Input can be a string or a `Uint8Array`:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
const buffer = new TextEncoder().encode('{"a":1}\n{"b":2}\n');
|
||||||
|
const results = Bun.JSONL.parse(buffer);
|
||||||
|
// [{ a: 1 }, { b: 2 }]
|
||||||
|
```
|
||||||
|
|
||||||
|
When passed a `Uint8Array`, a UTF-8 BOM at the start of the buffer is automatically skipped.
|
||||||
|
|
||||||
|
### Error handling
|
||||||
|
|
||||||
|
If the input contains invalid JSON, `Bun.JSONL.parse()` throws a `SyntaxError`:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
try {
|
||||||
|
Bun.JSONL.parse('{"valid":true}\n{invalid}\n');
|
||||||
|
} catch (error) {
|
||||||
|
console.error(error); // SyntaxError: Failed to parse JSONL
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## `Bun.JSONL.parseChunk()`
|
||||||
|
|
||||||
|
For streaming scenarios, `parseChunk` parses as many complete values as possible from the input and reports how far it got. This is useful when receiving data incrementally (e.g., from a network stream) and you need to know where to resume parsing.
|
||||||
|
|
||||||
|
```ts
|
||||||
|
const chunk = '{"id":1}\n{"id":2}\n{"id":3';
|
||||||
|
|
||||||
|
const result = Bun.JSONL.parseChunk(chunk);
|
||||||
|
console.log(result.values); // [{ id: 1 }, { id: 2 }]
|
||||||
|
console.log(result.read); // 17 — characters consumed
|
||||||
|
console.log(result.done); // false — incomplete value remains
|
||||||
|
console.log(result.error); // null — no parse error
|
||||||
|
```
|
||||||
|
|
||||||
|
### Return value
|
||||||
|
|
||||||
|
`parseChunk` returns an object with four properties:
|
||||||
|
|
||||||
|
| Property | Type | Description |
|
||||||
|
| -------- | --------------------- | ----------------------------------------------------------------------- |
|
||||||
|
| `values` | `any[]` | Array of successfully parsed JSON values |
|
||||||
|
| `read` | `number` | Number of bytes (for `Uint8Array`) or characters (for strings) consumed |
|
||||||
|
| `done` | `boolean` | `true` if the entire input was consumed with no remaining data |
|
||||||
|
| `error` | `SyntaxError \| null` | Parse error, or `null` if no error occurred |
|
||||||
|
|
||||||
|
### Streaming example
|
||||||
|
|
||||||
|
Use `read` to slice off consumed input and carry forward the remainder:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
let buffer = "";
|
||||||
|
|
||||||
|
async function processStream(stream: ReadableStream<string>) {
|
||||||
|
for await (const chunk of stream) {
|
||||||
|
buffer += chunk;
|
||||||
|
const result = Bun.JSONL.parseChunk(buffer);
|
||||||
|
|
||||||
|
for (const value of result.values) {
|
||||||
|
handleRecord(value);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Keep only the unconsumed portion
|
||||||
|
buffer = buffer.slice(result.read);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Handle any remaining data
|
||||||
|
if (buffer.length > 0) {
|
||||||
|
const final = Bun.JSONL.parseChunk(buffer);
|
||||||
|
for (const value of final.values) {
|
||||||
|
handleRecord(value);
|
||||||
|
}
|
||||||
|
if (final.error) {
|
||||||
|
console.error("Parse error in final chunk:", final.error.message);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Byte offsets with `Uint8Array`
|
||||||
|
|
||||||
|
When the input is a `Uint8Array`, you can pass optional `start` and `end` byte offsets:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
const buf = new TextEncoder().encode('{"a":1}\n{"b":2}\n{"c":3}\n');
|
||||||
|
|
||||||
|
// Parse starting from byte 8
|
||||||
|
const result = Bun.JSONL.parseChunk(buf, 8);
|
||||||
|
console.log(result.values); // [{ b: 2 }, { c: 3 }]
|
||||||
|
console.log(result.read); // 24
|
||||||
|
|
||||||
|
// Parse a specific range
|
||||||
|
const partial = Bun.JSONL.parseChunk(buf, 0, 8);
|
||||||
|
console.log(partial.values); // [{ a: 1 }]
|
||||||
|
```
|
||||||
|
|
||||||
|
The `read` value is always a byte offset into the original buffer, making it easy to use with `TypedArray.subarray()` for zero-copy streaming:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
let buf = new Uint8Array(0);
|
||||||
|
|
||||||
|
async function processBinaryStream(stream: ReadableStream<Uint8Array>) {
|
||||||
|
for await (const chunk of stream) {
|
||||||
|
// Append chunk to buffer
|
||||||
|
const newBuf = new Uint8Array(buf.length + chunk.length);
|
||||||
|
newBuf.set(buf);
|
||||||
|
newBuf.set(chunk, buf.length);
|
||||||
|
buf = newBuf;
|
||||||
|
|
||||||
|
const result = Bun.JSONL.parseChunk(buf);
|
||||||
|
|
||||||
|
for (const value of result.values) {
|
||||||
|
handleRecord(value);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Keep unconsumed bytes
|
||||||
|
buf = buf.slice(result.read);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Error recovery
|
||||||
|
|
||||||
|
Unlike `parse()`, `parseChunk()` does not throw on invalid JSON. Instead, it returns the error in the `error` property, along with any values that were successfully parsed before the error:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
const input = '{"a":1}\n{invalid}\n{"b":2}\n';
|
||||||
|
const result = Bun.JSONL.parseChunk(input);
|
||||||
|
|
||||||
|
console.log(result.values); // [{ a: 1 }] — values parsed before the error
|
||||||
|
console.log(result.error); // SyntaxError
|
||||||
|
console.log(result.read); // 7 — position up to last successful parse
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Supported value types
|
||||||
|
|
||||||
|
Each line can be any valid JSON value, not just objects:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
const input = '42\n"hello"\ntrue\nnull\n[1,2,3]\n{"key":"value"}\n';
|
||||||
|
const values = Bun.JSONL.parse(input);
|
||||||
|
// [42, "hello", true, null, [1, 2, 3], { key: "value" }]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Performance notes
|
||||||
|
|
||||||
|
- **ASCII fast path**: Pure ASCII input is parsed directly without copying, using a zero-allocation `StringView`.
|
||||||
|
- **UTF-8 support**: Non-ASCII `Uint8Array` input is decoded to UTF-16 using SIMD-accelerated conversion.
|
||||||
|
- **BOM handling**: UTF-8 BOM (`0xEF 0xBB 0xBF`) at the start of a `Uint8Array` is automatically skipped.
|
||||||
|
- **Pre-built object shape**: The result object from `parseChunk` uses a cached structure for fast property access.
|
||||||
95
packages/bun-types/bun.d.ts
vendored
95
packages/bun-types/bun.d.ts
vendored
@@ -743,6 +743,101 @@ declare module "bun" {
|
|||||||
export function parse(input: string): unknown;
|
export function parse(input: string): unknown;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* JSONL (JSON Lines) related APIs.
|
||||||
|
*
|
||||||
|
* Each line in the input is expected to be a valid JSON value separated by newlines.
|
||||||
|
*/
|
||||||
|
namespace JSONL {
|
||||||
|
/**
|
||||||
|
* The result of `Bun.JSONL.parseChunk`.
|
||||||
|
*/
|
||||||
|
interface ParseChunkResult {
|
||||||
|
/** The successfully parsed JSON values. */
|
||||||
|
values: unknown[];
|
||||||
|
/** How far into the input was consumed. When the input is a string, this is a character offset. When the input is a `TypedArray`, this is a byte offset. Use `input.slice(read)` or `input.subarray(read)` to get the unconsumed remainder. */
|
||||||
|
read: number;
|
||||||
|
/** `true` if all input was consumed successfully. `false` if the input ends with an incomplete value or a parse error occurred. */
|
||||||
|
done: boolean;
|
||||||
|
/** A `SyntaxError` if a parse error occurred, otherwise `null`. Values parsed before the error are still available in `values`. */
|
||||||
|
error: SyntaxError | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Parse a JSONL (JSON Lines) string into an array of JavaScript values.
|
||||||
|
*
|
||||||
|
* If a parse error occurs and no values were successfully parsed, throws
|
||||||
|
* a `SyntaxError`. If values were parsed before the error, returns the
|
||||||
|
* successfully parsed values without throwing.
|
||||||
|
*
|
||||||
|
* Incomplete trailing values (e.g. from a partial chunk) are silently
|
||||||
|
* ignored and not included in the result.
|
||||||
|
*
|
||||||
|
* When a `TypedArray` is passed, the bytes are parsed directly without
|
||||||
|
* copying if the content is ASCII.
|
||||||
|
*
|
||||||
|
* @param input The JSONL string or typed array to parse
|
||||||
|
* @returns An array of parsed values
|
||||||
|
* @throws {SyntaxError} If the input starts with invalid JSON and no values could be parsed
|
||||||
|
*
|
||||||
|
* @example
|
||||||
|
* ```js
|
||||||
|
* const items = Bun.JSONL.parse('{"a":1}\n{"b":2}\n');
|
||||||
|
* // [{ a: 1 }, { b: 2 }]
|
||||||
|
*
|
||||||
|
* // From a Uint8Array (zero-copy for ASCII):
|
||||||
|
* const buf = new TextEncoder().encode('{"a":1}\n{"b":2}\n');
|
||||||
|
* const items = Bun.JSONL.parse(buf);
|
||||||
|
* // [{ a: 1 }, { b: 2 }]
|
||||||
|
*
|
||||||
|
* // Partial results on error after valid values:
|
||||||
|
* const partial = Bun.JSONL.parse('{"a":1}\n{bad}\n');
|
||||||
|
* // [{ a: 1 }]
|
||||||
|
*
|
||||||
|
* // Throws when no valid values precede the error:
|
||||||
|
* Bun.JSONL.parse('{bad}\n'); // throws SyntaxError
|
||||||
|
* ```
|
||||||
|
*/
|
||||||
|
export function parse(input: string | NodeJS.TypedArray | DataView<ArrayBuffer> | ArrayBufferLike): unknown[];
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Parse a JSONL chunk, designed for streaming use.
|
||||||
|
*
|
||||||
|
* Never throws on parse errors. Instead, returns whatever values were
|
||||||
|
* successfully parsed along with an `error` property containing the
|
||||||
|
* `SyntaxError` (or `null` on success). Use `read` to determine how
|
||||||
|
* much input was consumed and `done` to check if all input was parsed.
|
||||||
|
*
|
||||||
|
* When a `TypedArray` is passed, the bytes are parsed directly without
|
||||||
|
* copying if the content is ASCII. Optional `start` and `end` parameters
|
||||||
|
* allow slicing without copying, and `read` will be a byte offset into
|
||||||
|
* the original typed array.
|
||||||
|
*
|
||||||
|
* @param input The JSONL string or typed array to parse
|
||||||
|
* @param start Byte offset to start parsing from (typed array only, default: 0)
|
||||||
|
* @param end Byte offset to stop parsing at (typed array only, default: input.byteLength)
|
||||||
|
* @returns An object with `values`, `read`, `done`, and `error` properties
|
||||||
|
*
|
||||||
|
* @example
|
||||||
|
* ```js
|
||||||
|
* let buffer = new Uint8Array(0);
|
||||||
|
* for await (const chunk of stream) {
|
||||||
|
* buffer = Buffer.concat([buffer, chunk]);
|
||||||
|
* const { values, read, error } = Bun.JSONL.parseChunk(buffer);
|
||||||
|
* if (error) throw error;
|
||||||
|
* for (const value of values) handle(value);
|
||||||
|
* buffer = buffer.subarray(read);
|
||||||
|
* }
|
||||||
|
* ```
|
||||||
|
*/
|
||||||
|
export function parseChunk(input: string): ParseChunkResult;
|
||||||
|
export function parseChunk(
|
||||||
|
input: NodeJS.TypedArray | DataView<ArrayBuffer> | ArrayBufferLike,
|
||||||
|
start?: number,
|
||||||
|
end?: number,
|
||||||
|
): ParseChunkResult;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* YAML related APIs
|
* YAML related APIs
|
||||||
*/
|
*/
|
||||||
|
|||||||
@@ -19,6 +19,8 @@
|
|||||||
#include <JavaScriptCore/LazyClassStructureInlines.h>
|
#include <JavaScriptCore/LazyClassStructureInlines.h>
|
||||||
#include <JavaScriptCore/FunctionPrototype.h>
|
#include <JavaScriptCore/FunctionPrototype.h>
|
||||||
#include <JavaScriptCore/DateInstance.h>
|
#include <JavaScriptCore/DateInstance.h>
|
||||||
|
#include <JavaScriptCore/JSONObject.h>
|
||||||
|
#include "wtf/SIMDUTF.h"
|
||||||
#include <JavaScriptCore/ObjectConstructor.h>
|
#include <JavaScriptCore/ObjectConstructor.h>
|
||||||
#include "headers.h"
|
#include "headers.h"
|
||||||
#include "BunObject.h"
|
#include "BunObject.h"
|
||||||
@@ -434,6 +436,195 @@ static JSValue constructDNSObject(VM& vm, JSObject* bunObject)
|
|||||||
return dnsObject;
|
return dnsObject;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
JSC_DECLARE_HOST_FUNCTION(jsFunctionJSONLParse);
|
||||||
|
JSC_DECLARE_HOST_FUNCTION(jsFunctionJSONLParseChunk);
|
||||||
|
|
||||||
|
JSC_DEFINE_HOST_FUNCTION(jsFunctionJSONLParse, (JSGlobalObject * globalObject, CallFrame* callFrame))
|
||||||
|
{
|
||||||
|
VM& vm = globalObject->vm();
|
||||||
|
auto scope = DECLARE_THROW_SCOPE(vm);
|
||||||
|
|
||||||
|
JSValue arg = callFrame->argument(0);
|
||||||
|
if (arg.isUndefinedOrNull()) {
|
||||||
|
throwTypeError(globalObject, scope, "JSONL.parse requires a string argument"_s);
|
||||||
|
return {};
|
||||||
|
}
|
||||||
|
|
||||||
|
MarkedArgumentBuffer values;
|
||||||
|
JSC::StreamingJSONParseResult result;
|
||||||
|
|
||||||
|
if (arg.isCell() && isTypedArrayType(arg.asCell()->type())) {
|
||||||
|
auto* view = jsCast<JSC::JSArrayBufferView*>(arg.asCell());
|
||||||
|
if (view->isDetached()) {
|
||||||
|
throwTypeError(globalObject, scope, "ArrayBuffer is detached"_s);
|
||||||
|
return {};
|
||||||
|
}
|
||||||
|
auto* data = static_cast<const uint8_t*>(view->vector());
|
||||||
|
size_t length = view->byteLength();
|
||||||
|
|
||||||
|
// Skip UTF-8 BOM if present
|
||||||
|
if (length >= 3 && data[0] == 0xEF && data[1] == 0xBB && data[2] == 0xBF) {
|
||||||
|
data += 3;
|
||||||
|
length -= 3;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (length <= String::MaxLength && simdutf::validate_ascii(reinterpret_cast<const char*>(data), length)) {
|
||||||
|
auto chars = std::span { reinterpret_cast<const char8_t*>(data), length };
|
||||||
|
result = JSC::streamingJSONParse(globalObject, StringView(chars), values);
|
||||||
|
} else {
|
||||||
|
size_t u16Length = simdutf::utf16_length_from_utf8(reinterpret_cast<const char*>(data), length);
|
||||||
|
if (u16Length > String::MaxLength) {
|
||||||
|
throwOutOfMemoryError(globalObject, scope);
|
||||||
|
return {};
|
||||||
|
}
|
||||||
|
auto str = WTF::String::fromUTF8ReplacingInvalidSequences(std::span { reinterpret_cast<const char8_t*>(data), length });
|
||||||
|
if (str.isNull()) {
|
||||||
|
throwOutOfMemoryError(globalObject, scope);
|
||||||
|
return {};
|
||||||
|
}
|
||||||
|
result = JSC::streamingJSONParse(globalObject, str, values);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
auto* inputString = arg.toString(globalObject);
|
||||||
|
RETURN_IF_EXCEPTION(scope, {});
|
||||||
|
auto view = inputString->view(globalObject);
|
||||||
|
RETURN_IF_EXCEPTION(scope, {});
|
||||||
|
result = JSC::streamingJSONParse(globalObject, view, values);
|
||||||
|
}
|
||||||
|
|
||||||
|
RETURN_IF_EXCEPTION(scope, {});
|
||||||
|
|
||||||
|
if (result.status == JSC::StreamingJSONParseResult::Status::Error && values.isEmpty()) {
|
||||||
|
throwSyntaxError(globalObject, scope, "Failed to parse JSONL"_s);
|
||||||
|
return {};
|
||||||
|
}
|
||||||
|
|
||||||
|
RELEASE_AND_RETURN(scope, JSValue::encode(constructArray(globalObject, static_cast<ArrayAllocationProfile*>(nullptr), values)));
|
||||||
|
}
|
||||||
|
|
||||||
|
JSC_DEFINE_HOST_FUNCTION(jsFunctionJSONLParseChunk, (JSGlobalObject * globalObject, CallFrame* callFrame))
|
||||||
|
{
|
||||||
|
VM& vm = globalObject->vm();
|
||||||
|
auto scope = DECLARE_THROW_SCOPE(vm);
|
||||||
|
|
||||||
|
JSValue arg = callFrame->argument(0);
|
||||||
|
if (arg.isUndefinedOrNull()) {
|
||||||
|
throwTypeError(globalObject, scope, "JSONL.parseChunk requires a string argument"_s);
|
||||||
|
return {};
|
||||||
|
}
|
||||||
|
|
||||||
|
MarkedArgumentBuffer values;
|
||||||
|
JSC::StreamingJSONParseResult result;
|
||||||
|
size_t readBytes = 0;
|
||||||
|
bool isTypedArray = arg.isCell() && isTypedArrayType(arg.asCell()->type());
|
||||||
|
|
||||||
|
if (isTypedArray) {
|
||||||
|
auto* view = jsCast<JSC::JSArrayBufferView*>(arg.asCell());
|
||||||
|
if (view->isDetached()) {
|
||||||
|
throwTypeError(globalObject, scope, "ArrayBuffer is detached"_s);
|
||||||
|
return {};
|
||||||
|
}
|
||||||
|
auto* data = static_cast<const uint8_t*>(view->vector());
|
||||||
|
size_t length = view->byteLength();
|
||||||
|
|
||||||
|
// Apply optional start/end offsets (byte offsets for typed arrays)
|
||||||
|
size_t start = 0;
|
||||||
|
size_t end = length;
|
||||||
|
|
||||||
|
JSValue startArg = callFrame->argument(1);
|
||||||
|
if (startArg.isNumber()) {
|
||||||
|
double s = startArg.asNumber();
|
||||||
|
if (s > 0)
|
||||||
|
start = static_cast<size_t>(std::min(s, static_cast<double>(length)));
|
||||||
|
}
|
||||||
|
|
||||||
|
JSValue endArg = callFrame->argument(2);
|
||||||
|
if (endArg.isNumber()) {
|
||||||
|
double e = endArg.asNumber();
|
||||||
|
if (e >= 0)
|
||||||
|
end = static_cast<size_t>(std::min(e, static_cast<double>(length)));
|
||||||
|
}
|
||||||
|
|
||||||
|
if (start > end)
|
||||||
|
start = end;
|
||||||
|
|
||||||
|
const uint8_t* sliceData = data + start;
|
||||||
|
size_t sliceLen = end - start;
|
||||||
|
|
||||||
|
// Skip UTF-8 BOM if present at the start of the slice
|
||||||
|
size_t bomOffset = 0;
|
||||||
|
if (start == 0 && sliceLen >= 3 && sliceData[0] == 0xEF && sliceData[1] == 0xBB && sliceData[2] == 0xBF) {
|
||||||
|
sliceData += 3;
|
||||||
|
sliceLen -= 3;
|
||||||
|
bomOffset = 3;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (sliceLen <= String::MaxLength && simdutf::validate_ascii(reinterpret_cast<const char*>(sliceData), sliceLen)) {
|
||||||
|
auto chars = std::span { reinterpret_cast<const char8_t*>(sliceData), sliceLen };
|
||||||
|
result = JSC::streamingJSONParse(globalObject, StringView(chars), values);
|
||||||
|
// For ASCII, byte offset = character offset
|
||||||
|
readBytes = start + bomOffset + result.charactersConsumed;
|
||||||
|
} else {
|
||||||
|
size_t u16Length = simdutf::utf16_length_from_utf8(reinterpret_cast<const char*>(sliceData), sliceLen);
|
||||||
|
if (u16Length > String::MaxLength) {
|
||||||
|
throwOutOfMemoryError(globalObject, scope);
|
||||||
|
return {};
|
||||||
|
}
|
||||||
|
auto str = WTF::String::fromUTF8ReplacingInvalidSequences(std::span { reinterpret_cast<const char8_t*>(sliceData), sliceLen });
|
||||||
|
if (str.isNull()) {
|
||||||
|
throwOutOfMemoryError(globalObject, scope);
|
||||||
|
return {};
|
||||||
|
}
|
||||||
|
result = JSC::streamingJSONParse(globalObject, str, values);
|
||||||
|
// Convert character offset back to UTF-8 byte offset
|
||||||
|
if (str.is8Bit()) {
|
||||||
|
readBytes = start + bomOffset + simdutf::utf8_length_from_latin1(reinterpret_cast<const char*>(str.span8().data()), result.charactersConsumed);
|
||||||
|
} else {
|
||||||
|
readBytes = start + bomOffset + simdutf::utf8_length_from_utf16le(reinterpret_cast<const char16_t*>(str.span16().data()), result.charactersConsumed);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
auto* inputString = arg.toString(globalObject);
|
||||||
|
RETURN_IF_EXCEPTION(scope, {});
|
||||||
|
auto view = inputString->view(globalObject);
|
||||||
|
RETURN_IF_EXCEPTION(scope, {});
|
||||||
|
result = JSC::streamingJSONParse(globalObject, view, values);
|
||||||
|
readBytes = result.charactersConsumed;
|
||||||
|
}
|
||||||
|
|
||||||
|
RETURN_IF_EXCEPTION(scope, {});
|
||||||
|
|
||||||
|
JSArray* array = constructArray(globalObject, static_cast<ArrayAllocationProfile*>(nullptr), values);
|
||||||
|
RETURN_IF_EXCEPTION(scope, {});
|
||||||
|
|
||||||
|
JSValue errorValue = jsNull();
|
||||||
|
if (result.status == JSC::StreamingJSONParseResult::Status::Error) {
|
||||||
|
errorValue = createSyntaxError(globalObject, "Failed to parse JSONL"_s);
|
||||||
|
}
|
||||||
|
|
||||||
|
auto* zigGlobalObject = jsCast<Zig::GlobalObject*>(globalObject);
|
||||||
|
JSObject* resultObj = constructEmptyObject(vm, zigGlobalObject->jsonlParseResultStructure());
|
||||||
|
resultObj->putDirectOffset(vm, 0, array);
|
||||||
|
resultObj->putDirectOffset(vm, 1, jsNumber(readBytes));
|
||||||
|
resultObj->putDirectOffset(vm, 2, jsBoolean(result.status == JSC::StreamingJSONParseResult::Status::Complete));
|
||||||
|
resultObj->putDirectOffset(vm, 3, errorValue);
|
||||||
|
|
||||||
|
return JSValue::encode(resultObj);
|
||||||
|
}
|
||||||
|
|
||||||
|
static JSValue constructJSONLObject(VM& vm, JSObject* bunObject)
|
||||||
|
{
|
||||||
|
JSGlobalObject* globalObject = bunObject->globalObject();
|
||||||
|
JSC::JSObject* jsonlObject = JSC::constructEmptyObject(globalObject);
|
||||||
|
jsonlObject->putDirectNativeFunction(vm, globalObject, vm.propertyNames->parse, 1, jsFunctionJSONLParse, ImplementationVisibility::Public, NoIntrinsic,
|
||||||
|
JSC::PropertyAttribute::DontDelete | 0);
|
||||||
|
jsonlObject->putDirectNativeFunction(vm, globalObject, JSC::Identifier::fromString(vm, "parseChunk"_s), 1, jsFunctionJSONLParseChunk, ImplementationVisibility::Public, NoIntrinsic,
|
||||||
|
JSC::PropertyAttribute::DontDelete | 0);
|
||||||
|
jsonlObject->putDirect(vm, vm.propertyNames->toStringTagSymbol, jsNontrivialString(vm, "JSONL"_s),
|
||||||
|
JSC::PropertyAttribute::DontEnum | JSC::PropertyAttribute::ReadOnly);
|
||||||
|
return jsonlObject;
|
||||||
|
}
|
||||||
|
|
||||||
static JSValue constructBunPeekObject(VM& vm, JSObject* bunObject)
|
static JSValue constructBunPeekObject(VM& vm, JSObject* bunObject)
|
||||||
{
|
{
|
||||||
JSGlobalObject* globalObject = bunObject->globalObject();
|
JSGlobalObject* globalObject = bunObject->globalObject();
|
||||||
@@ -728,6 +919,7 @@ JSC_DEFINE_HOST_FUNCTION(functionFileURLToPath, (JSC::JSGlobalObject * globalObj
|
|||||||
SHA512 BunObject_lazyPropCb_wrap_SHA512 DontDelete|PropertyCallback
|
SHA512 BunObject_lazyPropCb_wrap_SHA512 DontDelete|PropertyCallback
|
||||||
SHA512_256 BunObject_lazyPropCb_wrap_SHA512_256 DontDelete|PropertyCallback
|
SHA512_256 BunObject_lazyPropCb_wrap_SHA512_256 DontDelete|PropertyCallback
|
||||||
JSONC BunObject_lazyPropCb_wrap_JSONC DontDelete|PropertyCallback
|
JSONC BunObject_lazyPropCb_wrap_JSONC DontDelete|PropertyCallback
|
||||||
|
JSONL constructJSONLObject ReadOnly|DontDelete|PropertyCallback
|
||||||
TOML BunObject_lazyPropCb_wrap_TOML DontDelete|PropertyCallback
|
TOML BunObject_lazyPropCb_wrap_TOML DontDelete|PropertyCallback
|
||||||
YAML BunObject_lazyPropCb_wrap_YAML DontDelete|PropertyCallback
|
YAML BunObject_lazyPropCb_wrap_YAML DontDelete|PropertyCallback
|
||||||
Transpiler BunObject_lazyPropCb_wrap_Transpiler DontDelete|PropertyCallback
|
Transpiler BunObject_lazyPropCb_wrap_Transpiler DontDelete|PropertyCallback
|
||||||
|
|||||||
@@ -2045,6 +2045,22 @@ void GlobalObject::finishCreation(VM& vm)
|
|||||||
init.set(obj);
|
init.set(obj);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
this->m_jsonlParseResultStructure.initLater(
|
||||||
|
[](const Initializer<Structure>& init) {
|
||||||
|
// { values, read, done, error } — 4 properties at fixed offsets for fast allocation
|
||||||
|
Structure* structure = init.owner->structureCache().emptyObjectStructureForPrototype(init.owner, init.owner->objectPrototype(), 4);
|
||||||
|
PropertyOffset offset;
|
||||||
|
structure = Structure::addPropertyTransition(init.vm, structure, Identifier::fromString(init.vm, "values"_s), 0, offset);
|
||||||
|
RELEASE_ASSERT(offset == 0);
|
||||||
|
structure = Structure::addPropertyTransition(init.vm, structure, Identifier::fromString(init.vm, "read"_s), 0, offset);
|
||||||
|
RELEASE_ASSERT(offset == 1);
|
||||||
|
structure = Structure::addPropertyTransition(init.vm, structure, Identifier::fromString(init.vm, "done"_s), 0, offset);
|
||||||
|
RELEASE_ASSERT(offset == 2);
|
||||||
|
structure = Structure::addPropertyTransition(init.vm, structure, Identifier::fromString(init.vm, "error"_s), 0, offset);
|
||||||
|
RELEASE_ASSERT(offset == 3);
|
||||||
|
init.set(structure);
|
||||||
|
});
|
||||||
|
|
||||||
this->m_pendingVirtualModuleResultStructure.initLater(
|
this->m_pendingVirtualModuleResultStructure.initLater(
|
||||||
[](const Initializer<Structure>& init) {
|
[](const Initializer<Structure>& init) {
|
||||||
init.set(Bun::PendingVirtualModuleResult::createStructure(init.vm, init.owner, init.owner->objectPrototype()));
|
init.set(Bun::PendingVirtualModuleResult::createStructure(init.vm, init.owner, init.owner->objectPrototype()));
|
||||||
|
|||||||
@@ -563,6 +563,7 @@ public:
|
|||||||
V(public, LazyClassStructure, m_JSConnectionsListClassStructure) \
|
V(public, LazyClassStructure, m_JSConnectionsListClassStructure) \
|
||||||
V(public, LazyClassStructure, m_JSHTTPParserClassStructure) \
|
V(public, LazyClassStructure, m_JSHTTPParserClassStructure) \
|
||||||
\
|
\
|
||||||
|
V(private, LazyPropertyOfGlobalObject<Structure>, m_jsonlParseResultStructure) \
|
||||||
V(private, LazyPropertyOfGlobalObject<Structure>, m_pendingVirtualModuleResultStructure) \
|
V(private, LazyPropertyOfGlobalObject<Structure>, m_pendingVirtualModuleResultStructure) \
|
||||||
V(private, LazyPropertyOfGlobalObject<JSFunction>, m_performMicrotaskFunction) \
|
V(private, LazyPropertyOfGlobalObject<JSFunction>, m_performMicrotaskFunction) \
|
||||||
V(private, LazyPropertyOfGlobalObject<JSFunction>, m_nativeMicrotaskTrampoline) \
|
V(private, LazyPropertyOfGlobalObject<JSFunction>, m_nativeMicrotaskTrampoline) \
|
||||||
@@ -696,6 +697,7 @@ public:
|
|||||||
|
|
||||||
void reload();
|
void reload();
|
||||||
|
|
||||||
|
JSC::Structure* jsonlParseResultStructure() { return m_jsonlParseResultStructure.get(this); }
|
||||||
JSC::Structure* pendingVirtualModuleResultStructure() { return m_pendingVirtualModuleResultStructure.get(this); }
|
JSC::Structure* pendingVirtualModuleResultStructure() { return m_pendingVirtualModuleResultStructure.get(this); }
|
||||||
|
|
||||||
// We need to know if the napi module registered itself or we registered it.
|
// We need to know if the napi module registered itself or we registered it.
|
||||||
|
|||||||
@@ -25,15 +25,22 @@ describe("doesnt_crash", async () => {
|
|||||||
{ target: "browser", minify: false },
|
{ target: "browser", minify: false },
|
||||||
{ target: "browser", minify: true },
|
{ target: "browser", minify: true },
|
||||||
];
|
];
|
||||||
|
let code = "";
|
||||||
|
async function getCode() {
|
||||||
|
if (code) return code;
|
||||||
|
code = await Bun.file(absolute).text();
|
||||||
|
return code;
|
||||||
|
}
|
||||||
|
|
||||||
for (const { target, minify } of configs) {
|
for (const { target, minify } of configs) {
|
||||||
test(`${file} - ${minify ? "minify" : "not minify"}`, async () => {
|
test(`${file} - ${minify ? "minify" : "not minify"} - ${target}`, async () => {
|
||||||
const timeLog = `Transpiled ${file} - ${minify ? "minify" : "not minify"}`;
|
const timeLog = `Transpiled ${file} - ${minify ? "minify" : "not minify"}`;
|
||||||
console.time(timeLog);
|
console.time(timeLog);
|
||||||
const { logs, outputs } = await Bun.build({
|
const { logs, outputs } = await Bun.build({
|
||||||
entrypoints: [absolute],
|
entrypoints: [absolute],
|
||||||
minify: minify,
|
minify: minify,
|
||||||
target,
|
target,
|
||||||
|
files: { [absolute]: await getCode() },
|
||||||
});
|
});
|
||||||
console.timeEnd(timeLog);
|
console.timeEnd(timeLog);
|
||||||
|
|
||||||
@@ -43,6 +50,7 @@ describe("doesnt_crash", async () => {
|
|||||||
|
|
||||||
expect(outputs.length).toBe(1);
|
expect(outputs.length).toBe(1);
|
||||||
const outfile1 = path.join(temp_dir, "file-1" + file).replaceAll("\\", "/");
|
const outfile1 = path.join(temp_dir, "file-1" + file).replaceAll("\\", "/");
|
||||||
|
const content1 = await outputs[0].text();
|
||||||
|
|
||||||
await Bun.write(outfile1, outputs[0]);
|
await Bun.write(outfile1, outputs[0]);
|
||||||
|
|
||||||
@@ -53,6 +61,7 @@ describe("doesnt_crash", async () => {
|
|||||||
const { logs, outputs } = await Bun.build({
|
const { logs, outputs } = await Bun.build({
|
||||||
entrypoints: [outfile1],
|
entrypoints: [outfile1],
|
||||||
target,
|
target,
|
||||||
|
files: { [outfile1]: content1 },
|
||||||
minify: minify,
|
minify: minify,
|
||||||
});
|
});
|
||||||
|
|
||||||
|
|||||||
2112
test/js/bun/jsonl/jsonl-parse.test.ts
Normal file
2112
test/js/bun/jsonl/jsonl-parse.test.ts
Normal file
File diff suppressed because it is too large
Load Diff
@@ -157,4 +157,7 @@ vendor/elysia/test/ws/message.test.ts
|
|||||||
test/js/node/test/parallel/test-worker-abort-on-uncaught-exception.js
|
test/js/node/test/parallel/test-worker-abort-on-uncaught-exception.js
|
||||||
|
|
||||||
# TODO: WebCore fixes
|
# TODO: WebCore fixes
|
||||||
test/js/web/urlpattern/urlpattern.test.ts
|
test/js/web/urlpattern/urlpattern.test.ts
|
||||||
|
|
||||||
|
# TODO: jsc
|
||||||
|
test/js/bun/jsonl/jsonl-parse.test.ts
|
||||||
Reference in New Issue
Block a user