This update significantly improves the macOS runner infrastructure based on detailed analysis of the bootstrap.sh script and adds robust testing and validation:
## 🔧 **Key Improvements**
### Software Version Synchronization
- **Node.js**: 24.3.0 (exact version matching bootstrap.sh)
- **Bun**: 1.2.17 (exact version matching bootstrap.sh)
- **LLVM**: 19.1.7 (exact version matching bootstrap.sh)
- **CMake**: 3.30.5 (exact version matching bootstrap.sh)
- **Buildkite Agent**: 3.87.0
### Enhanced bootstrap-macos.sh
- Complete rewrite based on bootstrap.sh analysis
- Added Tailscale configuration for VPN connectivity
- Age encryption tool for macOS equivalent of core dumps
- macFUSE and python-fuse for filesystem testing
- Chromium installation for browser testing
- Exact version installations with verification
- Node.js headers and node-gyp cache setup
### Comprehensive Testing & Validation
- **Image Validation**: Tests all software installations after build
- **Flakiness Testing**: 3 iterations with 80% success rate minimum
- **Software Verification**: Node.js, Bun, CMake, Clang, Docker, Tailscale
- **Health Endpoint Testing**: Validates service availability
- **Automated Cleanup**: Test VMs are automatically cleaned up
### Discord Notifications
- Replaced Slack with Discord webhooks for all notifications
- Enhanced notification format with markdown support
- Color-coded status indicators (green=success, red=failure, gray=skipped)
- Detailed deployment information and links
### User Isolation Improvements
- Enhanced user creation with proper environment setup
- Improved cleanup with comprehensive process termination
- Better error handling and logging
- Timeout management for job execution
### Documentation & Developer Experience
- **CLAUDE.md**: Comprehensive guide for future Claude development
- Updated README.md with exact version requirements
- Updated DEPLOYMENT.md with Discord configuration
- Detailed troubleshooting and debugging sections
## 🚀 **Architecture Benefits**
- **Reliability**: Flakiness testing ensures consistent VM performance
- **Consistency**: Exact version matching with bootstrap.sh prevents environment drift
- **Isolation**: Complete job isolation with disposable user accounts
- **Monitoring**: Enhanced health checks and status reporting
- **Maintainability**: Clear documentation and development guidelines
## 🛠️ **Technical Details**
- Enhanced Packer configuration with comprehensive software installation
- Improved Terraform infrastructure with better resource management
- Robust GitHub Actions workflows with multi-stage validation
- Comprehensive user management scripts with proper cleanup
- Health monitoring and automated recovery mechanisms
The infrastructure now provides production-ready macOS CI runners with enterprise-grade reliability, security, and monitoring capabilities.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
This implements a comprehensive macOS CI runner infrastructure based on the MacStadium Orka platform, providing:
## Key Features
- **Complete Job Isolation**: Each Buildkite job runs in its own user account
- **Automated VM Image Building**: Daily Packer-based image rebuilds with latest software
- **Fleet Management**: Terraform-managed VM fleet with auto-scaling
- **Multi-Version Support**: macOS 13, 14, and 15 simultaneously
- **Comprehensive Cleanup**: Automated cleanup of processes, files, and resources
## Components
- **Packer Configuration**: Automated VM image building with all required software
- **Terraform Infrastructure**: VM fleet management with auto-scaling and monitoring
- **User Management Scripts**: Per-job user creation and cleanup for complete isolation
- **GitHub Actions Workflows**: Daily image rebuilds and fleet deployment automation
- **Bootstrap Scripts**: macOS-specific software installation and configuration
## Architecture
- Uses MacStadium Orka platform for macOS VM hosting
- Implements disposable user accounts per job (bk-<job-id>)
- Includes health monitoring and auto-scaling based on queue demand
- Provides comprehensive logging and Slack notifications
- Supports cost optimization through efficient resource utilization
## Software Included
- Xcode Command Line Tools, LLVM/Clang 19, Node.js 24.3.0, Bun 1.2.17
- Python 3.11/3.12, Go, Rust, Docker Desktop
- Build tools: CMake, Ninja, make, pkg-config, ccache
- Development utilities and system libraries
Based on the existing bootstrap.sh but optimized for macOS CI environments with complete job isolation and automated management.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
We cannot assume that the next event loop cycle will yield the expected outcome from the operating system. We can assume certain things happen in a certain order, but when timers run next does not mean kqueue/epoll/iocp will always have the data available there.
We cannot assume that the next event loop cycle will yield the expected outcome from the operating system. We can assume certain things happen in a certain order, but when timers run next does not mean kqueue/epoll/iocp will always have the data available there.
There are many situations where using `catch unreachable` is a reasonable or sometimes necessary decision. This rule causes many, many merge conflicts.