mirror of
https://github.com/oven-sh/bun
synced 2026-02-14 04:49:06 +00:00
This commit replaces the broken unshare() approach with clone3() for creating container namespaces. This fixes the immediate crash but the implementation is NOT production ready. What works: - Basic namespace creation via clone3() - PID namespace isolation (process sees itself as PID 1) - PR_SET_PDEATHSIG properly set for cleanup - No more errno conversion crashes Critical issues remaining (see CONTAINER_FIXES_ASSESSMENT.md): - User namespace UID/GID mapping broken (needs parent to write) - No parent-child synchronization for setup stages - Cgroup setup won't work (needs parent process to configure) - Network namespace configuration incomplete - Mount operations timing issues - Silent fallback when clone3 fails This is a step forward but needs significant additional work for production use. The architecture needs parent-child coordination via pipes/eventfd to properly sequence namespace configuration. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
3.9 KiB
3.9 KiB
Container Implementation - Clone3 Migration Assessment
What Was Done
Migrated from unshare() after vfork() to using clone3() to create namespaces atomically, avoiding TOCTOU issues.
Changes Made:
- bun-spawn.cpp: Added
clone3()support for namespace creation - spawn.zig: Added namespace_flags to spawn request
- process.zig: Calculate namespace flags from container options
- linux_container.zig: Removed
unshare()calls
What Works
✅ Basic PID namespace creation (with user namespace) ✅ PR_SET_PDEATHSIG is properly set ✅ Process sees itself as PID 1 in PID namespace ✅ Clean compile with no errors
Critical Issues - NOT Production Ready
1. ❌ User Namespace UID/GID Mapping Broken
- Problem: Mappings are written from child process (won't work)
- Required: Parent must write
/proc/<pid>/uid_mapafterclone3() - Impact: User namespaces don't work properly
2. ❌ No Parent-Child Synchronization
- Problem: No coordination between parent setup and child execution
- Required: Pipe or eventfd for synchronization
- Impact: Race conditions, child may exec before parent setup completes
3. ❌ Cgroup Setup Won't Work
- Problem: Trying to set up cgroups from child process
- Required: Parent must create cgroup and add child PID
- Impact: Resource limits don't work
4. ❌ Network Namespace Config Broken
- Problem: No proper veth pair creation or network setup
- Required: Parent creates veth, child configures interface
- Impact: Network isolation doesn't work beyond basic namespace
5. ❌ Mount Operations Timing Wrong
- Problem: Mount operations happen at wrong time
- Required: Child must mount after namespace entry but before exec
- Impact: Filesystem isolation doesn't work
6. ❌ Silent Fallback on Error
- Problem: Falls back to vfork without error when clone3 fails
- Required: Should propagate error to user
- Impact: User thinks container is working when it's not
Proper Architecture Needed
Parent Process Child Process
-------------- -------------
clone3() ──────────────────────> (created in namespaces)
│ │
├─ Write UID/GID mappings │
├─ Create cgroups │
├─ Add child to cgroup │
├─ Create veth pairs │
│ ├─ Wait for parent signal
├─ Signal child ────────────────────>│
│ ├─ Setup mounts
│ ├─ Configure network
│ ├─ Apply limits
│ └─ execve()
└─ Return PID
Required for Production
- Implement parent-child synchronization (pipe or eventfd)
- Split setup into parent/child operations
- Fix UID/GID mapping (parent writes after clone3)
- Fix cgroup setup (parent creates and assigns)
- Implement proper network setup (veth pairs)
- Add error propagation from child to parent
- Add comprehensive tests for error cases
- Add fallback detection and proper error reporting
- Test on various kernel versions (clone3 availability)
- Add cleanup on failure paths
Recommendation
DO NOT MERGE in current state. This needs significant rework to be production-ready. The basic approach of using clone3() is correct, but the implementation needs proper parent-child coordination and split responsibilities.
Time Estimate for Proper Implementation
- 2-3 days for proper architecture implementation
- 1-2 days for comprehensive testing
- 1 day for documentation and review prep
Total: ~1 week of focused development