feat: add Bun.Archive API for creating and extracting tarballs (#25665)

## Summary

- Adds new `Bun.Archive` API for working with tar archives
- `Bun.Archive.from(data)` - Create archive from object, Blob,
TypedArray, or ArrayBuffer
- `Bun.Archive.write(path, data, compress?)` - Write archive to disk
(async)
- `archive.extract(path)` - Extract to directory, returns
`Promise<number>` (file count)
- `archive.blob(compress?)` - Get archive as Blob (async)
- `archive.bytes(compress?)` - Get archive as Uint8Array (async)

Key implementation details:
- Uses existing libarchive bindings for tarball creation/extraction via
`extractToDisk`
- Uses libdeflate for gzip compression
- Immediate byte copying for GC safety (no JSValue protection, no
`hasPendingActivity`)
- Async operations run on worker pool threads with proper VM reference
handling
- Growing memory buffer via `archive_write_open2` callbacks for
efficient tarball creation

## Test plan

- [x] 65 comprehensive tests covering:
  - Normal operations (create, extract, blob, bytes, write)
  - GC safety (unreferenced archives, mutation isolation)  
  - Error handling (invalid args, corrupted data, I/O errors)
- Edge cases (large files, many files, special characters, path
normalization)
  - Concurrent operations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Bot <claude-bot@bun.sh>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Dylan Conway <dylan.conway567@gmail.com>
Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>
This commit is contained in:
robobun
2026-01-09 00:33:35 -08:00
committed by GitHub
parent eb5b498c62
commit 70fa6af355
24 changed files with 2708 additions and 15 deletions

View File

@@ -0,0 +1,33 @@
// Minimal reproduction of memory leak in Bun.Archive.extract()
// Run with: bun run test/js/bun/archive-extract-leak-repro.ts
import { mkdtempSync, rmSync } from "fs";
import { tmpdir } from "os";
import { join } from "path";
const dir = mkdtempSync(join(tmpdir(), "archive-leak-"));
const files = {
"a.txt": "hello",
"b.txt": "world",
};
const archive = Bun.Archive.from(files);
function formatMB(bytes: number) {
return (bytes / 1024 / 1024).toFixed(0) + " MB";
}
console.log("Extracting archive 10,000 times per round...\n");
for (let round = 0; round < 20; round++) {
for (let i = 0; i < 10_000; i++) {
await archive.extract(dir);
}
Bun.gc(true);
const rss = process.memoryUsage.rss();
console.log(`Round ${round + 1}: RSS = ${formatMB(rss)}`);
}
rmSync(dir, { recursive: true });

1211
test/js/bun/archive.test.ts Normal file

File diff suppressed because it is too large Load Diff

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.