mirror of https://github.com/oven-sh/bun synced 2026-02-10 02:48:50 +00:00

Files

Sosuke Suzuki 8d8345a114 Implement ESM bytecode cache with module metadata serialization

Add BMES v3 format that stores module metadata (import/export entries,
variable declarations, code features) alongside bytecode. This enables
skipping the parsing phase for ES modules by reconstructing JSModuleRecord
directly from cached metadata.

Key changes:
- Add createModuleRecordFromCache() to reconstruct JSModuleRecord from cache
- Serialize/deserialize module metadata in BMES v3 format
- Add hasCachedModuleMetadata() virtual method to SourceProvider

Performance improvement for large bundles:
- ~200KB: break-even point
- 2MB bundle: +13% faster startup
- 10MB bundle: +11% faster startup
- 20MB bundle: +15% faster startup (65ms saved)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-05 16:43:02 +09:00

8.4 KiB

Raw Blame History

ESM Bytecode Cache Architecture

Overview

ESM Bytecode Cache is an experimental feature that enables bytecode caching for ES Modules in Bun's bundler. This allows compiled executables to skip the JavaScript parsing phase during module loading, improving startup performance for large applications.

Architecture

Binary Format: BMES v3

The ESM bytecode cache uses a custom binary format called BMES (Bun Module ES) version 3. This format stores both the bytecode and module metadata required to reconstruct a JSModuleRecord without parsing.

┌─────────────────────────────────────────┐
│              BMES Header                │
├─────────────────────────────────────────┤
│  Magic Number: "SEMB" (4 bytes)         │
│  Version: 3 (4 bytes)                   │
│  Bytecode Offset (4 bytes)              │
│  Bytecode Size (4 bytes)                │
├─────────────────────────────────────────┤
│          Module Metadata                │
├─────────────────────────────────────────┤
│  Requested Modules (dependencies)       │
│  Import Entries                         │
│  Export Entries                         │
│  Star Export Entries                    │
│  Declared Variables (var)               │
│  Lexical Variables (let/const)          │
│  Code Features flags                    │
├─────────────────────────────────────────┤
│            JSC Bytecode                 │
│  (WebKit JavaScriptCore format)         │
└─────────────────────────────────────────┘

Module Loading Flow

Without Bytecode Cache (Traditional)

Source Code → Parse → ModuleAnalyzer → JSModuleRecord → Link → Evaluate
                ↓
           AST Generation
           Import/Export Analysis
           Variable Declaration Extraction

With Bytecode Cache

BMES File → Deserialize Metadata → JSModuleRecord::create() → Link → Evaluate
                    ↓
              Skip Parsing!
              Skip AST Generation!
              Skip Module Analysis!

Implementation Components

1. Metadata Serialization (`src/bun.js/bindings/ZigSourceProvider.cpp`)

During build time, module metadata is serialized into the BMES format:

struct CachedModuleMetadata {
    Vector<ModuleRequest> requestedModules;  // import specifiers
    Vector<ImportEntry> importEntries;       // import bindings
    Vector<ExportEntry> exportEntries;       // export bindings
    Vector<WTF::String> starExportEntries;   // export * from
    Vector<VariableEntry> declaredVariables; // var declarations
    Vector<VariableEntry> lexicalVariables;  // let/const declarations
    uint32_t codeFeatures;                   // feature flags
};

2. JSC Integration (`vendor/WebKit/Source/JavaScriptCore/`)

New virtual methods added to SourceProvider:

// SourceProvider.h
virtual bool hasCachedModuleMetadata() const { return false; }
virtual JSModuleRecord* createModuleRecordFromCache(
    JSGlobalObject*, const Identifier&) { return nullptr; }

Cache check in module loader (JSModuleLoader.cpp):

// Check if we can skip parsing by using cached module metadata
if (sourceCode.provider()->hasCachedModuleMetadata()) {
    JSModuleRecord* moduleRecord =
        sourceCode.provider()->createModuleRecordFromCache(globalObject, moduleKey);
    if (moduleRecord) {
        // Skip parsing entirely!
        promise->fulfillWithNonPromise(globalObject, moduleRecord);
        return;
    }
}
// Fall through to normal parsing...

3. Module Record Reconstruction (`ZigSourceProvider.cpp`)

The createModuleRecordFromCache method reconstructs a complete JSModuleRecord:

Deserialize variable environments (declared + lexical)
Create JSModuleRecord with JSModuleRecord::create()
Add requested modules (dependencies)
Add import entries
Add export entries (local, indirect, star exports)

Usage

Bun.build API

await Bun.build({
  entrypoints: ["./src/index.ts"],
  outdir: "./dist",
  target: "bun",
  compile: true,
  experimentalEsmBytecode: true,  // Enable ESM bytecode cache
});

Output Files

When experimentalEsmBytecode is enabled with compile: true, the bytecode is embedded directly into the single-file executable.

Performance

Benchmark Results

Tested with synthetic modules containing realistic JavaScript patterns (classes, async functions, generators, destructuring, spread operators, etc.)

ESM Bytecode Cache Performance

Size	Source	Binary Increase	No Cache	With Cache	Improvement	Time Saved
tiny	12 KB	+16 KB	9.26 ms	10.13 ms	-9.4%	-0.87 ms
small	28 KB	+54 KB	9.78 ms	10.43 ms	-6.7%	-0.66 ms
medium	212 KB	+483 KB	13.92 ms	13.56 ms	+2.6%	+0.36 ms
large	2 MB	+4.9 MB	48.27 ms	41.83 ms	+13.4%	+6.44 ms
xlarge	10 MB	+24 MB	214.80 ms	192.13 ms	+10.6%	+22.67 ms
huge	20 MB	+49 MB	430.52 ms	364.97 ms	+15.2%	+65.56 ms

Comparison with CJS Bytecode Cache

Size	ESM Improvement	CJS Improvement	ESM Time Saved	CJS Time Saved
tiny	-9.4%	-4.6%	-0.87 ms	-0.43 ms
small	-6.7%	-3.2%	-0.66 ms	-0.31 ms
medium	+2.6%	+3.6%	+0.36 ms	+0.53 ms
large	+13.4%	+16.6%	+6.44 ms	+10.40 ms
xlarge	+10.6%	+14.4%	+22.67 ms	+39.99 ms
huge	+15.2%	+19.4%	+65.56 ms	+109.84 ms

Key Findings

Break-even point: ~200KB of source code
- Below this threshold, bytecode cache overhead exceeds parsing time
- Above this threshold, performance improvements are significant
Large applications benefit most:
- 10-15% faster startup for ESM
- 14-19% faster startup for CJS
- Up to 65ms saved for ~20MB bundles (ESM)
- Up to 110ms saved for ~20MB bundles (CJS)
Binary size trade-off:
- ESM bytecode adds ~2.5x the source size
- CJS bytecode adds ~4.5x the source size
CJS vs ESM performance difference:
- CJS shows higher improvement percentages because:
  - CJS has additional require() resolution overhead
  - CJS bytecode skips both parsing AND bytecode generation
- ESM bytecode currently skips parsing but bytecode is regenerated from cache

Limitations

Small files: Not recommended for applications under ~200KB as the bytecode loading overhead exceeds parsing time savings.
Binary size: Bytecode significantly increases the compiled binary size. Consider this trade-off for deployment scenarios with size constraints.
Experimental status: This feature is experimental and the binary format may change in future versions.

Future Improvements

Bytecode execution from cache: Currently, the bytecode is stored but JSC still regenerates it from the cached data. Direct bytecode execution would further improve performance.
Lazy bytecode loading: Load bytecode on-demand for modules that may not be executed.
Incremental updates: Support for updating individual module bytecode without rebuilding the entire cache.

Technical Details

What Gets Skipped

With ESM bytecode cache enabled, the following operations are skipped during module loading:

Lexical analysis: Tokenizing the source code
Parsing: Building the Abstract Syntax Tree (AST)
Module analysis: Extracting import/export declarations
Scope analysis: Determining variable bindings

What Still Happens

Bytecode validation: JSC validates the cached bytecode
Module linking: Resolving import/export bindings between modules
Module evaluation: Executing the module code

Memory Considerations

The module metadata is kept in memory after loading to support:

Module namespace object creation
Dynamic import resolution
Hot module replacement (future)

8.4 KiB Raw Blame History