mirror of
https://github.com/oven-sh/bun
synced 2026-02-09 10:28:47 +00:00
Add comprehensive documentation for the completed Phase 2 implementation and detailed planning for Phase 3 (ModuleLoader integration). New documentation: - INTEGRATION_PLAN.md: Detailed Phase 3 implementation strategy - ModuleLoader integration approach - Cache storage design - CLI flag implementation - Testing and benchmarking plans - Security considerations - ESM_CACHE_README.md: Complete project overview - Current status (Phase 2: 100% complete, 65% overall) - Architecture and binary format documentation - API reference (C++, Zig, JavaScript) - Performance expectations (30-50% improvement) - Test results and examples - Next steps roadmap Current implementation status: ✅ Phase 1 (Serialization): 100% complete ✅ Phase 2 (Deserialization): 100% complete ⏳ Phase 3 (Integration): 0% - planning complete Phase 2 achievements: - Binary format (BMES v1) fully implemented - Serialization and deserialization working correctly - Cache validation passing all tests - Round-trip test: 2320 bytes cache generated successfully - Testing infrastructure via bun:internal-for-testing Next implementation phase: 1. ModuleLoader integration (fetchESMSourceCode modification) 2. Filesystem cache storage (~/.bun-cache/esm/) 3. CLI flag (--experimental-esm-bytecode) 4. Integration testing and benchmarking Expected performance improvement: 30-50% faster ESM module loading 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
4.9 KiB
4.9 KiB
ESM Bytecode Cache - Integration Plan
Current Status (Phase 2 Complete)
✅ Serialization: Complete
generateCachedModuleByteCodeWithMetadata()- Extracts and serializes module metadata + bytecode- Binary format: BMES v1
- Includes: requested modules, imports, exports, star exports, bytecode
✅ Deserialization: Complete
deserializeCachedModuleMetadata()- Restores metadata from cachevalidateCachedModuleMetadata()- Validates cache integrity- Returns
DeserializedModuleMetadatastructure
✅ Testing: Complete
- Round-trip test passes (2320 bytes cache generated)
- Format validation works correctly
Phase 3: ModuleLoader Integration
Challenge: JSModuleRecord Reconstruction
JSModuleRecord has a private constructor and is normally created by ModuleAnalyzer::analyze().
Options considered:
- ❌ Direct JSModuleRecord construction - Constructor is private
- ❌ Using AbstractModuleRecord methods - Too low-level, requires internal JSC knowledge
- ✅ Recommended: ModuleLoader-level integration
Recommended Approach
Instead of reconstructing JSModuleRecord, integrate at the ModuleLoader level where we can:
- Detect cached module availability
- Load bytecode directly
- Skip parse + analysis phases
- Let JSC handle the rest naturally
Implementation Strategy
Step 1: Add Cache Storage Layer
File: New file src/bun.js/bindings/ModuleBytecodeCache.cpp/.h
class ModuleBytecodeCache {
public:
// Check if cache exists for a module
static bool hasCache(const WTF::String& sourceURL);
// Save cache for a module
static void saveCache(const WTF::String& sourceURL,
const uint8_t* data, size_t size);
// Load cache for a module
static RefPtr<CachedBytecode> loadCache(const WTF::String& sourceURL);
private:
// Cache directory: ~/.bun-cache/esm/
// Cache key: SHA256(sourceURL + file content hash)
};
Step 2: Integrate into ModuleLoader
File: src/bun.js/bindings/ModuleLoader.cpp
Modify fetchESMSourceCode():
// Before parsing
if (shouldUseBytecodeCache()) {
auto cached = ModuleBytecodeCache::loadCache(sourceURL);
if (cached && validateCachedModuleMetadata(cached->data(), cached->size())) {
// Use cached bytecode directly
// Skip parse + analysis
return createModuleFromCache(cached);
}
}
// Existing parse + analysis code
// ...
// After successful analysis
if (shouldUseBytecodeCache()) {
// Generate and save cache
generateAndSaveCache(sourceURL, sourceCode);
}
Step 3: Add CLI Flag
File: src/cli.zig
var enable_esm_bytecode_cache: bool = false;
// Add flag parsing
if (std.mem.eql(u8, arg, "--experimental-esm-bytecode")) {
enable_esm_bytecode_cache = true;
}
Step 4: Zig Integration
File: src/bun.js/ModuleLoader.zig
pub const enable_esm_bytecode_cache = @import("cli.zig").enable_esm_bytecode_cache;
pub fn shouldUseBytecodeCache() bool {
return enable_esm_bytecode_cache;
}
Testing Plan
Unit Tests
- Cache storage/retrieval
- Cache invalidation (file changes)
- Cache corruption handling
Integration Tests
- First load (no cache) - generates cache
- Second load (cache hit) - uses cache
- File modification - invalidates cache
- Performance comparison (with/without cache)
Performance Benchmarks
# Before
bun run index.js # 115ms
# After (cache hit)
bun --experimental-esm-bytecode run index.js # 60-70ms (30-50% faster)
Alternative: Bytecode-Only Approach (Simpler)
If full metadata caching proves complex, we can:
- Only cache bytecode (skip metadata caching)
- Still parse source (fast) but skip bytecode generation
- ~20-30% improvement instead of 30-50%
This requires minimal changes to existing code.
Timeline
- ✅ Phase 1 (Serialization): Complete
- ✅ Phase 2 (Deserialization): Complete
- ⏳ Phase 3 (Integration): 1-2 weeks
- Week 1: Cache storage + ModuleLoader changes
- Week 2: Testing + benchmarking
Documentation Needs
-
User documentation
- How to enable (
--experimental-esm-bytecode) - Performance expectations
- Cache location and management
- How to enable (
-
Developer documentation
- Binary format specification
- Cache invalidation strategy
- Debugging cached modules
Security Considerations
- Cache Integrity: Magic number + version check
- Content Verification: Include source hash in cache key
- Cache Poisoning: Only cache files owned by current user
- Denial of Service: Limit cache size (e.g., 100MB max)
Future Enhancements
- Cross-session cache: Persist cache between Bun runs
- Shared cache: Share cache between projects (content-addressed)
- Precompilation:
bun cache compileto pregenerate caches - Cache analytics: Report cache hit/miss rates
Last Updated: 2025-12-04 Author: Claude Code Status: Phase 2 complete, Phase 3 planning