mirror of
https://github.com/oven-sh/bun
synced 2026-02-06 17:08:51 +00:00
Compare commits
2 Commits
dylan/pyth
...
claude/mac
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
b2c8dc1eee | ||
|
|
7f8b985c69 |
255
.buildkite/macos-runners/CLAUDE.md
Normal file
255
.buildkite/macos-runners/CLAUDE.md
Normal file
@@ -0,0 +1,255 @@
|
||||
# macOS Runner Infrastructure - Claude Development Guide
|
||||
|
||||
This document provides context and guidance for Claude to work on the macOS runner infrastructure.
|
||||
|
||||
## Overview
|
||||
|
||||
This infrastructure provides automated, scalable macOS CI runners for Bun using MacStadium's Orka platform. It implements complete job isolation, daily image rebuilds, and comprehensive testing.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Core Components
|
||||
- **Packer**: Builds VM images with all required software
|
||||
- **Terraform**: Manages VM fleet with auto-scaling
|
||||
- **GitHub Actions**: Automates daily rebuilds and deployments
|
||||
- **User Management**: Creates isolated users per job (`bk-<job-id>`)
|
||||
|
||||
### Key Features
|
||||
- **Complete Job Isolation**: Each Buildkite job runs in its own user account
|
||||
- **Daily Image Rebuilds**: Automated nightly rebuilds ensure fresh environments
|
||||
- **Flakiness Testing**: Multiple test iterations ensure reliability (80% success rate minimum)
|
||||
- **Software Validation**: All tools tested for proper installation and functionality
|
||||
- **Version Synchronization**: Exact versions match bootstrap.sh requirements
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
.buildkite/macos-runners/
|
||||
├── packer/
|
||||
│ └── macos-base.pkr.hcl # VM image building configuration
|
||||
├── terraform/
|
||||
│ ├── main.tf # Infrastructure definition
|
||||
│ ├── variables.tf # Configuration variables
|
||||
│ ├── outputs.tf # Resource outputs
|
||||
│ └── user-data.sh # VM initialization script
|
||||
├── scripts/
|
||||
│ ├── bootstrap-macos.sh # macOS software installation
|
||||
│ ├── create-build-user.sh # User creation for job isolation
|
||||
│ ├── cleanup-build-user.sh # User cleanup after jobs
|
||||
│ └── job-runner.sh # Main job lifecycle management
|
||||
├── github-actions/
|
||||
│ ├── image-rebuild.yml # Daily image rebuild workflow
|
||||
│ └── deploy-fleet.yml # Fleet deployment workflow
|
||||
├── README.md # User documentation
|
||||
├── DEPLOYMENT.md # Deployment guide
|
||||
└── CLAUDE.md # This file
|
||||
```
|
||||
|
||||
## Software Versions (Must Match bootstrap.sh)
|
||||
|
||||
These versions are synchronized with `/scripts/bootstrap.sh`:
|
||||
|
||||
- **Node.js**: 24.3.0 (exact)
|
||||
- **Bun**: 1.2.17 (exact)
|
||||
- **LLVM**: 19.1.7 (exact)
|
||||
- **CMake**: 3.30.5 (exact)
|
||||
- **Buildkite Agent**: 3.87.0
|
||||
|
||||
## Key Scripts
|
||||
|
||||
### bootstrap-macos.sh
|
||||
- Installs all required software with exact versions
|
||||
- Configures development environment
|
||||
- Sets up Tailscale, Docker, and other dependencies
|
||||
- **Critical**: Must stay synchronized with main bootstrap.sh
|
||||
|
||||
### create-build-user.sh
|
||||
- Creates unique user per job: `bk-<job-id>`
|
||||
- Sets up isolated environment with proper permissions
|
||||
- Configures shell environment and paths
|
||||
- Creates workspace directories
|
||||
|
||||
### cleanup-build-user.sh
|
||||
- Kills all processes owned by build user
|
||||
- Removes user account and home directory
|
||||
- Cleans up temporary files and caches
|
||||
- Ensures complete isolation between jobs
|
||||
|
||||
### job-runner.sh
|
||||
- Main orchestration script
|
||||
- Manages job lifecycle: create user → run job → cleanup
|
||||
- Handles timeouts and health checks
|
||||
- Runs as root via LaunchDaemon
|
||||
|
||||
## GitHub Actions Workflows
|
||||
|
||||
### image-rebuild.yml
|
||||
- Runs daily at 2 AM UTC
|
||||
- Detects changes to trigger rebuilds
|
||||
- Builds images for macOS 13, 14, 15
|
||||
- **Validation Steps**:
|
||||
- Software installation verification
|
||||
- Flakiness testing (3 iterations, 80% success rate)
|
||||
- Health endpoint testing
|
||||
- Discord notifications for status
|
||||
|
||||
### deploy-fleet.yml
|
||||
- Manual deployment trigger
|
||||
- Validates inputs and plans changes
|
||||
- Deploys VM fleet with health checks
|
||||
- Supports different environments (prod/staging/dev)
|
||||
|
||||
## Required Secrets
|
||||
|
||||
### MacStadium
|
||||
- `MACSTADIUM_API_KEY`: API access key
|
||||
- `ORKA_ENDPOINT`: Orka API endpoint
|
||||
- `ORKA_AUTH_TOKEN`: Authentication token
|
||||
|
||||
### AWS
|
||||
- `AWS_ACCESS_KEY_ID`: For Terraform state storage
|
||||
- `AWS_SECRET_ACCESS_KEY`: For Terraform state storage
|
||||
|
||||
### Buildkite
|
||||
- `BUILDKITE_AGENT_TOKEN`: Agent registration token
|
||||
- `BUILDKITE_API_TOKEN`: For monitoring/status checks
|
||||
- `BUILDKITE_ORG`: Organization slug
|
||||
|
||||
### GitHub
|
||||
- `GITHUB_TOKEN`: For private repository access
|
||||
|
||||
### Notifications
|
||||
- `DISCORD_WEBHOOK_URL`: For status notifications
|
||||
|
||||
## Development Guidelines
|
||||
|
||||
### Adding New Software
|
||||
1. Update `bootstrap-macos.sh` with installation commands
|
||||
2. Add version verification in the script
|
||||
3. Include in validation tests in `image-rebuild.yml`
|
||||
4. Update documentation in README.md
|
||||
|
||||
### Modifying User Isolation
|
||||
1. Update `create-build-user.sh` for user creation
|
||||
2. Update `cleanup-build-user.sh` for cleanup
|
||||
3. Test isolation in `job-runner.sh`
|
||||
4. Ensure proper permissions and security
|
||||
|
||||
### Updating VM Configuration
|
||||
1. Modify `terraform/variables.tf` for fleet sizing
|
||||
2. Update `terraform/main.tf` for infrastructure changes
|
||||
3. Test deployment with `deploy-fleet.yml`
|
||||
4. Update documentation
|
||||
|
||||
### Version Updates
|
||||
1. **Critical**: Check `/scripts/bootstrap.sh` for version changes
|
||||
2. Update exact versions in `bootstrap-macos.sh`
|
||||
3. Update version verification in workflows
|
||||
4. Update documentation
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Image Validation
|
||||
- Software installation verification
|
||||
- Version checking for exact matches
|
||||
- Health endpoint testing
|
||||
- Basic functionality tests
|
||||
|
||||
### Flakiness Testing
|
||||
- 3 test iterations per image
|
||||
- 80% success rate minimum
|
||||
- Tests basic commands, Node.js, Bun, build tools
|
||||
- Automated cleanup of test VMs
|
||||
|
||||
### Integration Testing
|
||||
- End-to-end job execution
|
||||
- User isolation verification
|
||||
- Resource cleanup validation
|
||||
- Performance monitoring
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
1. **Version Mismatches**: Check bootstrap.sh for updates
|
||||
2. **User Cleanup Failures**: Check process termination and file permissions
|
||||
3. **Image Build Failures**: Check Packer logs and VM resources
|
||||
4. **Flakiness**: Investigate VM performance and network issues
|
||||
|
||||
### Debugging Commands
|
||||
```bash
|
||||
# Check VM status
|
||||
orka vm list
|
||||
|
||||
# Check image status
|
||||
orka image list
|
||||
|
||||
# Test user creation
|
||||
sudo /usr/local/bin/bun-ci/create-build-user.sh
|
||||
|
||||
# Check health endpoint
|
||||
curl http://localhost:8080/health
|
||||
|
||||
# View logs
|
||||
tail -f /usr/local/var/log/buildkite-agent/buildkite-agent.log
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Resource Management
|
||||
- VMs configured with 12 CPU cores, 32GB RAM
|
||||
- Auto-scaling based on queue demand
|
||||
- Aggressive cleanup to prevent resource leaks
|
||||
|
||||
### Cost Optimization
|
||||
- Automated cleanup of old images and snapshots
|
||||
- Efficient VM sizing based on workload requirements
|
||||
- Scheduled maintenance windows
|
||||
|
||||
## Security
|
||||
|
||||
### Isolation
|
||||
- Complete process isolation per job
|
||||
- Separate user accounts with unique UIDs
|
||||
- Cleanup of all user data after jobs
|
||||
|
||||
### Network Security
|
||||
- VPC isolation with security groups
|
||||
- Limited SSH access for debugging
|
||||
- Encrypted communications
|
||||
|
||||
### Credential Management
|
||||
- Secure secret storage in GitHub
|
||||
- No hardcoded credentials in code
|
||||
- Regular rotation of access tokens
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Health Checks
|
||||
- HTTP endpoints on port 8080
|
||||
- Buildkite agent connectivity monitoring
|
||||
- Resource usage tracking
|
||||
|
||||
### Alerts
|
||||
- Discord notifications for failures
|
||||
- Build status reporting
|
||||
- Fleet deployment notifications
|
||||
|
||||
## Next Steps for Development
|
||||
|
||||
1. **Monitor bootstrap.sh**: Watch for version updates that need synchronization
|
||||
2. **Performance Optimization**: Monitor resource usage and optimize VM sizes
|
||||
3. **Enhanced Testing**: Add more comprehensive validation tests
|
||||
4. **Cost Monitoring**: Track usage and optimize for cost efficiency
|
||||
5. **Security Hardening**: Regular security reviews and updates
|
||||
|
||||
## References
|
||||
|
||||
- [MacStadium Orka Documentation](https://orkadocs.macstadium.com/)
|
||||
- [Packer Documentation](https://www.packer.io/docs)
|
||||
- [Terraform Documentation](https://www.terraform.io/docs)
|
||||
- [Buildkite Agent Documentation](https://buildkite.com/docs/agent/v3)
|
||||
- [Main bootstrap.sh](../../scripts/bootstrap.sh) - **Keep synchronized!**
|
||||
|
||||
---
|
||||
|
||||
**Important**: This infrastructure is critical for Bun's CI/CD pipeline. Always test changes thoroughly and maintain backward compatibility. The `bootstrap-macos.sh` script must stay synchronized with the main `bootstrap.sh` script to ensure consistent environments.
|
||||
428
.buildkite/macos-runners/DEPLOYMENT.md
Normal file
428
.buildkite/macos-runners/DEPLOYMENT.md
Normal file
@@ -0,0 +1,428 @@
|
||||
# macOS Runner Deployment Guide
|
||||
|
||||
This guide provides step-by-step instructions for deploying the macOS runner infrastructure for Bun CI.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### 1. MacStadium Account Setup
|
||||
|
||||
1. **Create MacStadium Account**
|
||||
- Sign up at [MacStadium](https://www.macstadium.com/)
|
||||
- Purchase Orka plan with appropriate VM allocation
|
||||
|
||||
2. **Configure API Access**
|
||||
- Generate API key from MacStadium dashboard
|
||||
- Note down your Orka endpoint URL
|
||||
- Test API connectivity
|
||||
|
||||
3. **Base Image Preparation**
|
||||
- Ensure base macOS images are available in your account
|
||||
- Verify image naming convention: `base-images/macos-{version}-{name}`
|
||||
|
||||
### 2. AWS Account Setup
|
||||
|
||||
1. **Create AWS Account**
|
||||
- Set up AWS account for Terraform state storage
|
||||
- Create S3 bucket for Terraform backend: `bun-terraform-state`
|
||||
|
||||
2. **Configure IAM**
|
||||
- Create IAM user with appropriate permissions
|
||||
- Generate access key and secret key
|
||||
- Attach policies for S3, CloudWatch, and EC2 (if using AWS resources)
|
||||
|
||||
### 3. GitHub Repository Setup
|
||||
|
||||
1. **Fork or Clone Repository**
|
||||
- Ensure you have admin access to the repository
|
||||
- Create necessary branches for deployment
|
||||
|
||||
2. **Configure Repository Secrets**
|
||||
- Add all required secrets (see main README.md)
|
||||
- Test secret accessibility
|
||||
|
||||
### 4. Buildkite Setup
|
||||
|
||||
1. **Organization Configuration**
|
||||
- Create or access Buildkite organization
|
||||
- Generate agent token with appropriate permissions
|
||||
- Note organization slug
|
||||
|
||||
2. **Queue Configuration**
|
||||
- Create queues: `macos`, `macos-arm64`, `macos-x86_64`
|
||||
- Configure queue-specific settings
|
||||
|
||||
## Step-by-Step Deployment
|
||||
|
||||
### Step 1: Environment Preparation
|
||||
|
||||
1. **Install Required Tools**
|
||||
```bash
|
||||
# Install Terraform
|
||||
wget https://releases.hashicorp.com/terraform/1.6.0/terraform_1.6.0_linux_amd64.zip
|
||||
unzip terraform_1.6.0_linux_amd64.zip
|
||||
sudo mv terraform /usr/local/bin/
|
||||
|
||||
# Install Packer
|
||||
wget https://releases.hashicorp.com/packer/1.9.4/packer_1.9.4_linux_amd64.zip
|
||||
unzip packer_1.9.4_linux_amd64.zip
|
||||
sudo mv packer /usr/local/bin/
|
||||
|
||||
# Install AWS CLI
|
||||
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
|
||||
unzip awscliv2.zip
|
||||
sudo ./aws/install
|
||||
|
||||
# Install MacStadium CLI
|
||||
curl -L "https://github.com/macstadium/orka-cli/releases/latest/download/orka-cli-linux-amd64.tar.gz" | tar -xz
|
||||
sudo mv orka-cli /usr/local/bin/orka
|
||||
```
|
||||
|
||||
2. **Configure AWS Credentials**
|
||||
```bash
|
||||
aws configure
|
||||
# Enter your AWS access key, secret key, and region
|
||||
```
|
||||
|
||||
3. **Configure MacStadium CLI**
|
||||
```bash
|
||||
orka config set endpoint <your-orka-endpoint>
|
||||
orka auth token <your-orka-token>
|
||||
```
|
||||
|
||||
### Step 2: SSH Key Setup
|
||||
|
||||
1. **Generate SSH Key Pair**
|
||||
```bash
|
||||
ssh-keygen -t rsa -b 4096 -f ~/.ssh/bun-runner -N ""
|
||||
```
|
||||
|
||||
2. **Copy Public Key to Terraform Directory**
|
||||
```bash
|
||||
mkdir -p .buildkite/macos-runners/terraform/ssh-keys
|
||||
cp ~/.ssh/bun-runner.pub .buildkite/macos-runners/terraform/ssh-keys/bun-runner.pub
|
||||
```
|
||||
|
||||
### Step 3: Terraform Backend Setup
|
||||
|
||||
1. **Create S3 Bucket for Terraform State**
|
||||
```bash
|
||||
aws s3 mb s3://bun-terraform-state --region us-west-2
|
||||
aws s3api put-bucket-versioning --bucket bun-terraform-state --versioning-configuration Status=Enabled
|
||||
aws s3api put-bucket-encryption --bucket bun-terraform-state --server-side-encryption-configuration '{
|
||||
"Rules": [
|
||||
{
|
||||
"ApplyServerSideEncryptionByDefault": {
|
||||
"SSEAlgorithm": "AES256"
|
||||
}
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
2. **Create Terraform Variables File**
|
||||
```bash
|
||||
cd .buildkite/macos-runners/terraform
|
||||
cat > production.tfvars << EOF
|
||||
environment = "production"
|
||||
macstadium_api_key = "your-macstadium-api-key"
|
||||
buildkite_agent_token = "your-buildkite-agent-token"
|
||||
github_token = "your-github-token"
|
||||
fleet_size = {
|
||||
macos_13 = 4
|
||||
macos_14 = 6
|
||||
macos_15 = 8
|
||||
}
|
||||
vm_configuration = {
|
||||
cpu_count = 12
|
||||
memory_gb = 32
|
||||
disk_size = 500
|
||||
}
|
||||
EOF
|
||||
```
|
||||
|
||||
### Step 4: Build VM Images
|
||||
|
||||
1. **Validate Packer Configuration**
|
||||
```bash
|
||||
cd .buildkite/macos-runners/packer
|
||||
packer validate -var "macos_version=15" macos-base.pkr.hcl
|
||||
```
|
||||
|
||||
2. **Build macOS 15 Image**
|
||||
```bash
|
||||
packer build \
|
||||
-var "macos_version=15" \
|
||||
-var "orka_endpoint=<your-orka-endpoint>" \
|
||||
-var "orka_auth_token=<your-orka-token>" \
|
||||
macos-base.pkr.hcl
|
||||
```
|
||||
|
||||
3. **Build macOS 14 Image**
|
||||
```bash
|
||||
packer build \
|
||||
-var "macos_version=14" \
|
||||
-var "orka_endpoint=<your-orka-endpoint>" \
|
||||
-var "orka_auth_token=<your-orka-token>" \
|
||||
macos-base.pkr.hcl
|
||||
```
|
||||
|
||||
4. **Build macOS 13 Image**
|
||||
```bash
|
||||
packer build \
|
||||
-var "macos_version=13" \
|
||||
-var "orka_endpoint=<your-orka-endpoint>" \
|
||||
-var "orka_auth_token=<your-orka-token>" \
|
||||
macos-base.pkr.hcl
|
||||
```
|
||||
|
||||
### Step 5: Deploy VM Fleet
|
||||
|
||||
1. **Initialize Terraform**
|
||||
```bash
|
||||
cd .buildkite/macos-runners/terraform
|
||||
terraform init
|
||||
```
|
||||
|
||||
2. **Create Production Workspace**
|
||||
```bash
|
||||
terraform workspace new production
|
||||
```
|
||||
|
||||
3. **Plan Deployment**
|
||||
```bash
|
||||
terraform plan -var-file="production.tfvars"
|
||||
```
|
||||
|
||||
4. **Apply Deployment**
|
||||
```bash
|
||||
terraform apply -var-file="production.tfvars"
|
||||
```
|
||||
|
||||
### Step 6: Verify Deployment
|
||||
|
||||
1. **Check VM Status**
|
||||
```bash
|
||||
orka vm list
|
||||
```
|
||||
|
||||
2. **Check Terraform Outputs**
|
||||
```bash
|
||||
terraform output
|
||||
```
|
||||
|
||||
3. **Test VM Connectivity**
|
||||
```bash
|
||||
# Get VM IP from terraform output
|
||||
VM_IP=$(terraform output -json vm_instances | jq -r '.value | to_entries[0].value.ip_address')
|
||||
|
||||
# Test SSH connectivity
|
||||
ssh -i ~/.ssh/bun-runner admin@$VM_IP
|
||||
|
||||
# Test health endpoint
|
||||
curl http://$VM_IP:8080/health
|
||||
```
|
||||
|
||||
4. **Verify Buildkite Agent Connectivity**
|
||||
```bash
|
||||
curl -H "Authorization: Bearer <your-buildkite-api-token>" \
|
||||
"https://api.buildkite.com/v2/organizations/<your-org>/agents"
|
||||
```
|
||||
|
||||
### Step 7: Configure GitHub Actions
|
||||
|
||||
1. **Enable GitHub Actions Workflows**
|
||||
- Navigate to repository Actions tab
|
||||
- Enable workflows if not already enabled
|
||||
|
||||
2. **Test Image Rebuild Workflow**
|
||||
```bash
|
||||
# Trigger manual rebuild
|
||||
gh workflow run image-rebuild.yml
|
||||
```
|
||||
|
||||
3. **Test Fleet Deployment Workflow**
|
||||
```bash
|
||||
# Trigger manual deployment
|
||||
gh workflow run deploy-fleet.yml
|
||||
```
|
||||
|
||||
## Post-Deployment Configuration
|
||||
|
||||
### 1. Monitoring Setup
|
||||
|
||||
1. **CloudWatch Dashboards**
|
||||
- Create custom dashboards for VM metrics
|
||||
- Set up alarms for critical thresholds
|
||||
|
||||
2. **Discord Notifications**
|
||||
- Configure Discord webhook for alerts
|
||||
- Test notification delivery
|
||||
|
||||
### 2. Backup Configuration
|
||||
|
||||
1. **Enable Automated Snapshots**
|
||||
```bash
|
||||
# Update terraform configuration
|
||||
backup_config = {
|
||||
enable_snapshots = true
|
||||
snapshot_schedule = "0 4 * * *"
|
||||
snapshot_retention = 7
|
||||
}
|
||||
```
|
||||
|
||||
2. **Test Backup Restoration**
|
||||
- Create test snapshot
|
||||
- Verify restoration process
|
||||
|
||||
### 3. Security Hardening
|
||||
|
||||
1. **Review Security Groups**
|
||||
- Minimize open ports
|
||||
- Restrict source IP ranges
|
||||
|
||||
2. **Enable Audit Logging**
|
||||
- Configure CloudTrail for AWS resources
|
||||
- Enable MacStadium audit logs
|
||||
|
||||
### 4. Performance Optimization
|
||||
|
||||
1. **Monitor Resource Usage**
|
||||
- Review CPU, memory, disk usage
|
||||
- Adjust VM sizes if needed
|
||||
|
||||
2. **Optimize Auto-Scaling**
|
||||
- Monitor scaling events
|
||||
- Adjust thresholds as needed
|
||||
|
||||
## Maintenance Procedures
|
||||
|
||||
### Daily Maintenance
|
||||
|
||||
1. **Automated Tasks**
|
||||
- Image rebuilds (automatic)
|
||||
- Health checks (automatic)
|
||||
- Cleanup processes (automatic)
|
||||
|
||||
2. **Manual Monitoring**
|
||||
- Check Discord notifications
|
||||
- Review CloudWatch metrics
|
||||
- Monitor Buildkite queue
|
||||
|
||||
### Weekly Maintenance
|
||||
|
||||
1. **Review Metrics**
|
||||
- Analyze performance trends
|
||||
- Check cost optimization opportunities
|
||||
|
||||
2. **Update Documentation**
|
||||
- Update configuration changes
|
||||
- Review troubleshooting guides
|
||||
|
||||
### Monthly Maintenance
|
||||
|
||||
1. **Capacity Planning**
|
||||
- Review usage patterns
|
||||
- Plan capacity adjustments
|
||||
|
||||
2. **Security Updates**
|
||||
- Review security patches
|
||||
- Update base images if needed
|
||||
|
||||
## Troubleshooting Common Issues
|
||||
|
||||
### Issue: VM Creation Fails
|
||||
|
||||
```bash
|
||||
# Check MacStadium account limits
|
||||
orka account info
|
||||
|
||||
# Check available resources
|
||||
orka resource list
|
||||
|
||||
# Review Packer logs
|
||||
tail -f packer-build.log
|
||||
```
|
||||
|
||||
### Issue: Terraform Apply Fails
|
||||
|
||||
```bash
|
||||
# Check Terraform state
|
||||
terraform state list
|
||||
|
||||
# Refresh state
|
||||
terraform refresh
|
||||
|
||||
# Check provider versions
|
||||
terraform version
|
||||
```
|
||||
|
||||
### Issue: Buildkite Agents Not Connecting
|
||||
|
||||
```bash
|
||||
# Check agent configuration
|
||||
cat /usr/local/var/buildkite-agent/buildkite-agent.cfg
|
||||
|
||||
# Check agent logs
|
||||
tail -f /usr/local/var/log/buildkite-agent/buildkite-agent.log
|
||||
|
||||
# Restart agent service
|
||||
sudo launchctl unload /Library/LaunchDaemons/com.buildkite.buildkite-agent.plist
|
||||
sudo launchctl load /Library/LaunchDaemons/com.buildkite.buildkite-agent.plist
|
||||
```
|
||||
|
||||
## Rollback Procedures
|
||||
|
||||
### Rollback VM Fleet
|
||||
|
||||
1. **Identify Previous Good State**
|
||||
```bash
|
||||
terraform state list
|
||||
git log --oneline terraform/
|
||||
```
|
||||
|
||||
2. **Rollback to Previous Configuration**
|
||||
```bash
|
||||
git checkout <previous-commit>
|
||||
terraform plan -var-file="production.tfvars"
|
||||
terraform apply -var-file="production.tfvars"
|
||||
```
|
||||
|
||||
### Rollback VM Images
|
||||
|
||||
1. **List Available Images**
|
||||
```bash
|
||||
orka image list
|
||||
```
|
||||
|
||||
2. **Update Terraform to Use Previous Images**
|
||||
```bash
|
||||
# Edit terraform configuration to use previous image IDs
|
||||
terraform plan -var-file="production.tfvars"
|
||||
terraform apply -var-file="production.tfvars"
|
||||
```
|
||||
|
||||
## Cost Optimization Tips
|
||||
|
||||
1. **Right-Size VMs**
|
||||
- Monitor actual resource usage
|
||||
- Adjust VM specifications accordingly
|
||||
|
||||
2. **Implement Scheduling**
|
||||
- Schedule VM shutdowns during low-usage periods
|
||||
- Use auto-scaling effectively
|
||||
|
||||
3. **Resource Cleanup**
|
||||
- Regularly clean up old images
|
||||
- Remove unused snapshots
|
||||
|
||||
4. **Monitor Costs**
|
||||
- Set up cost alerts
|
||||
- Review monthly usage reports
|
||||
|
||||
## Support
|
||||
|
||||
For additional support:
|
||||
- Check the main README.md for troubleshooting
|
||||
- Review GitHub Actions logs
|
||||
- Contact MacStadium support for platform issues
|
||||
- Open issues in the repository for infrastructure problems
|
||||
374
.buildkite/macos-runners/README.md
Normal file
374
.buildkite/macos-runners/README.md
Normal file
@@ -0,0 +1,374 @@
|
||||
# macOS Runner Infrastructure
|
||||
|
||||
This directory contains the infrastructure-as-code for deploying and managing macOS CI runners for the Bun project. It is located in the `.buildkite` folder alongside other CI configuration. The infrastructure provides automated, scalable, and reliable macOS build environments using MacStadium's Orka platform.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
The infrastructure consists of several key components:
|
||||
|
||||
1. **VM Images**: Golden images built with Packer containing all necessary software
|
||||
2. **VM Fleet**: Terraform-managed fleet of macOS VMs across different versions
|
||||
3. **User Isolation**: Per-job user creation and cleanup for complete isolation
|
||||
4. **Automation**: GitHub Actions workflows for daily image rebuilds and fleet management
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Complete Isolation**: Each Buildkite job runs in its own user account
|
||||
- **Automatic Cleanup**: Processes and temporary files are cleaned up after each job
|
||||
- **Daily Image Rebuilds**: Automated nightly rebuilds ensure fresh, up-to-date environments
|
||||
- **Multi-Version Support**: Supports macOS 13, 14, and 15 simultaneously
|
||||
- **Auto-Scaling**: Automatic scaling based on job queue demand
|
||||
- **Health Monitoring**: Continuous health checks and monitoring
|
||||
- **Cost Optimization**: Efficient resource utilization and cleanup
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
.buildkite/macos-runners/
|
||||
├── packer/ # Packer configuration for VM images
|
||||
│ ├── macos-base.pkr.hcl # Main Packer configuration
|
||||
│ └── ssh-keys/ # SSH keys for VM access
|
||||
├── terraform/ # Terraform configuration for VM fleet
|
||||
│ ├── main.tf # Main Terraform configuration
|
||||
│ ├── variables.tf # Variable definitions
|
||||
│ ├── outputs.tf # Output definitions
|
||||
│ └── user-data.sh # VM initialization script
|
||||
├── scripts/ # Management and utility scripts
|
||||
│ ├── bootstrap-macos.sh # macOS-specific bootstrap script
|
||||
│ ├── create-build-user.sh # User creation script
|
||||
│ ├── cleanup-build-user.sh # User cleanup script
|
||||
│ └── job-runner.sh # Main job runner script
|
||||
├── github-actions/ # GitHub Actions workflows
|
||||
│ ├── image-rebuild.yml # Daily image rebuild workflow
|
||||
│ └── deploy-fleet.yml # Fleet deployment workflow
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before deploying the infrastructure, ensure you have:
|
||||
|
||||
1. **MacStadium Account**: Active MacStadium Orka account with API access
|
||||
2. **AWS Account**: For Terraform state storage and CloudWatch monitoring
|
||||
3. **GitHub Repository**: With required secrets configured
|
||||
4. **Buildkite Account**: With organization and agent tokens
|
||||
5. **Required Tools**: Packer, Terraform, AWS CLI, and MacStadium CLI
|
||||
|
||||
## Required Secrets
|
||||
|
||||
Configure the following secrets in your GitHub repository:
|
||||
|
||||
### MacStadium
|
||||
- `MACSTADIUM_API_KEY`: MacStadium API key
|
||||
- `ORKA_ENDPOINT`: MacStadium Orka API endpoint
|
||||
- `ORKA_AUTH_TOKEN`: MacStadium authentication token
|
||||
|
||||
### AWS
|
||||
- `AWS_ACCESS_KEY_ID`: AWS access key ID
|
||||
- `AWS_SECRET_ACCESS_KEY`: AWS secret access key
|
||||
|
||||
### Buildkite
|
||||
- `BUILDKITE_AGENT_TOKEN`: Buildkite agent token
|
||||
- `BUILDKITE_API_TOKEN`: Buildkite API token (for monitoring)
|
||||
- `BUILDKITE_ORG`: Buildkite organization slug
|
||||
|
||||
### GitHub
|
||||
- `GITHUB_TOKEN`: GitHub personal access token (for private repositories)
|
||||
|
||||
### Notifications
|
||||
- `DISCORD_WEBHOOK_URL`: Discord webhook URL for notifications
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Deploy the Infrastructure
|
||||
|
||||
```bash
|
||||
# Navigate to the terraform directory
|
||||
cd .buildkite/macos-runners/terraform
|
||||
|
||||
# Initialize Terraform
|
||||
terraform init
|
||||
|
||||
# Create or select workspace
|
||||
terraform workspace new production
|
||||
|
||||
# Plan the deployment
|
||||
terraform plan -var-file="production.tfvars"
|
||||
|
||||
# Apply the deployment
|
||||
terraform apply -var-file="production.tfvars"
|
||||
```
|
||||
|
||||
### 2. Build VM Images
|
||||
|
||||
```bash
|
||||
# Navigate to the packer directory
|
||||
cd .buildkite/macos-runners/packer
|
||||
|
||||
# Build macOS 15 image
|
||||
packer build -var "macos_version=15" macos-base.pkr.hcl
|
||||
|
||||
# Build macOS 14 image
|
||||
packer build -var "macos_version=14" macos-base.pkr.hcl
|
||||
|
||||
# Build macOS 13 image
|
||||
packer build -var "macos_version=13" macos-base.pkr.hcl
|
||||
```
|
||||
|
||||
### 3. Enable Automation
|
||||
|
||||
The GitHub Actions workflows will automatically:
|
||||
- Rebuild images daily at 2 AM UTC
|
||||
- Deploy fleet changes when configuration is updated
|
||||
- Clean up old images and snapshots
|
||||
- Monitor VM health and connectivity
|
||||
|
||||
## Configuration
|
||||
|
||||
### Fleet Size Configuration
|
||||
|
||||
Modify fleet sizes in `terraform/variables.tf`:
|
||||
|
||||
```hcl
|
||||
variable "fleet_size" {
|
||||
default = {
|
||||
macos_13 = 4 # Number of macOS 13 VMs
|
||||
macos_14 = 6 # Number of macOS 14 VMs
|
||||
macos_15 = 8 # Number of macOS 15 VMs
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### VM Configuration
|
||||
|
||||
Adjust VM specifications in `terraform/variables.tf`:
|
||||
|
||||
```hcl
|
||||
variable "vm_configuration" {
|
||||
default = {
|
||||
cpu_count = 12 # Number of CPU cores
|
||||
memory_gb = 32 # Memory in GB
|
||||
disk_size = 500 # Disk size in GB
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Auto-Scaling Configuration
|
||||
|
||||
Configure auto-scaling parameters:
|
||||
|
||||
```hcl
|
||||
variable "autoscaling_config" {
|
||||
default = {
|
||||
min_size = 2
|
||||
max_size = 30
|
||||
desired_capacity = 10
|
||||
scale_up_threshold = 80
|
||||
scale_down_threshold = 20
|
||||
scale_up_adjustment = 2
|
||||
scale_down_adjustment = 1
|
||||
cooldown_period = 300
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Software Included
|
||||
|
||||
Each VM image includes:
|
||||
|
||||
### Development Tools
|
||||
- Xcode Command Line Tools
|
||||
- LLVM/Clang 19.1.7 (exact version)
|
||||
- CMake 3.30.5 (exact version)
|
||||
- Ninja build system
|
||||
- pkg-config
|
||||
- ccache
|
||||
|
||||
### Programming Languages
|
||||
- Node.js 24.3.0 (exact version, matches bootstrap.sh)
|
||||
- Bun 1.2.17 (exact version, matches bootstrap.sh)
|
||||
- Python 3.11 and 3.12
|
||||
- Go (latest)
|
||||
- Rust (latest stable)
|
||||
|
||||
### Package Managers
|
||||
- Homebrew
|
||||
- npm
|
||||
- yarn
|
||||
- pip
|
||||
- cargo
|
||||
|
||||
### Build Tools
|
||||
- make
|
||||
- autotools
|
||||
- meson
|
||||
- libtool
|
||||
|
||||
### Version Control
|
||||
- Git
|
||||
- GitHub CLI
|
||||
|
||||
### Utilities
|
||||
- curl
|
||||
- wget
|
||||
- jq
|
||||
- tree
|
||||
- htop
|
||||
- tmux
|
||||
- screen
|
||||
|
||||
### Development Dependencies
|
||||
- Docker Desktop
|
||||
- Tailscale (for VPN connectivity)
|
||||
- Age (for encryption)
|
||||
- macFUSE (for filesystem testing)
|
||||
- Chromium (for browser testing)
|
||||
- Various system libraries and headers
|
||||
|
||||
### Quality Assurance
|
||||
- **Flakiness Testing**: Each image undergoes multiple test iterations to ensure reliability
|
||||
- **Software Validation**: All tools are tested for proper installation and functionality
|
||||
- **Version Verification**: Exact version matching ensures consistency with bootstrap.sh
|
||||
|
||||
## User Isolation
|
||||
|
||||
Each Buildkite job runs in complete isolation:
|
||||
|
||||
1. **Unique User**: Each job gets a unique user account (`bk-<job-id>`)
|
||||
2. **Isolated Environment**: Separate home directory and environment variables
|
||||
3. **Process Isolation**: All processes are killed after job completion
|
||||
4. **File System Cleanup**: Temporary files and caches are cleaned up
|
||||
5. **Network Isolation**: No shared network resources between jobs
|
||||
|
||||
## Monitoring and Alerting
|
||||
|
||||
The infrastructure includes comprehensive monitoring:
|
||||
|
||||
- **Health Checks**: HTTP health endpoints on each VM
|
||||
- **CloudWatch Metrics**: CPU, memory, disk usage monitoring
|
||||
- **Buildkite Integration**: Agent connectivity monitoring
|
||||
- **Slack Notifications**: Success/failure notifications
|
||||
- **Log Aggregation**: Centralized logging for troubleshooting
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- **Encrypted Disks**: All VM disks are encrypted
|
||||
- **Network Security**: Security groups restrict network access
|
||||
- **SSH Key Management**: Secure SSH key distribution
|
||||
- **Regular Updates**: Automatic security updates
|
||||
- **Process Isolation**: Complete isolation between jobs
|
||||
- **Secure Credential Handling**: Secrets are managed securely
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **VM Not Responding to Health Checks**
|
||||
```bash
|
||||
# Check VM status
|
||||
orka vm list
|
||||
|
||||
# Check VM logs
|
||||
orka vm logs <vm-name>
|
||||
|
||||
# Restart VM
|
||||
orka vm restart <vm-name>
|
||||
```
|
||||
|
||||
2. **Buildkite Agent Not Connecting**
|
||||
```bash
|
||||
# Check agent status
|
||||
sudo launchctl list | grep buildkite
|
||||
|
||||
# Check agent logs
|
||||
tail -f /usr/local/var/log/buildkite-agent/buildkite-agent.log
|
||||
|
||||
# Restart agent
|
||||
sudo launchctl unload /Library/LaunchDaemons/com.buildkite.buildkite-agent.plist
|
||||
sudo launchctl load /Library/LaunchDaemons/com.buildkite.buildkite-agent.plist
|
||||
```
|
||||
|
||||
3. **User Creation Failures**
|
||||
```bash
|
||||
# Check user creation logs
|
||||
tail -f /var/log/system.log | grep "create-build-user"
|
||||
|
||||
# Manual cleanup
|
||||
sudo /usr/local/bin/bun-ci/cleanup-build-user.sh <username>
|
||||
```
|
||||
|
||||
4. **Disk Space Issues**
|
||||
```bash
|
||||
# Check disk usage
|
||||
df -h
|
||||
|
||||
# Clean up old files
|
||||
sudo /usr/local/bin/bun-ci/cleanup-build-user.sh --cleanup-all
|
||||
```
|
||||
|
||||
### Debugging Commands
|
||||
|
||||
```bash
|
||||
# Check system status
|
||||
sudo /usr/local/bin/bun-ci/job-runner.sh health
|
||||
|
||||
# View active processes
|
||||
ps aux | grep buildkite
|
||||
|
||||
# Check network connectivity
|
||||
curl -v http://localhost:8080/health
|
||||
|
||||
# View system logs
|
||||
tail -f /var/log/system.log
|
||||
|
||||
# Check Docker status
|
||||
docker info
|
||||
```
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Regular Tasks
|
||||
|
||||
1. **Image Updates**: Images are rebuilt daily automatically
|
||||
2. **Fleet Updates**: Terraform changes are applied automatically
|
||||
3. **Cleanup**: Old images and snapshots are cleaned up automatically
|
||||
4. **Monitoring**: Health checks run continuously
|
||||
|
||||
### Manual Maintenance
|
||||
|
||||
```bash
|
||||
# Force image rebuild
|
||||
gh workflow run image-rebuild.yml -f force_rebuild=true
|
||||
|
||||
# Scale fleet manually
|
||||
gh workflow run deploy-fleet.yml -f fleet_size_macos_15=10
|
||||
|
||||
# Clean up old resources
|
||||
cd terraform
|
||||
terraform apply -refresh-only
|
||||
```
|
||||
|
||||
## Cost Optimization
|
||||
|
||||
- **Right-Sizing**: VMs are sized appropriately for Bun workloads
|
||||
- **Auto-Scaling**: Automatic scaling prevents over-provisioning
|
||||
- **Resource Cleanup**: Aggressive cleanup prevents resource waste
|
||||
- **Scheduled Shutdowns**: VMs can be scheduled for shutdown during low-usage periods
|
||||
|
||||
## Support and Contributing
|
||||
|
||||
For issues or questions:
|
||||
1. Check the troubleshooting section above
|
||||
2. Review GitHub Actions workflow logs
|
||||
3. Check MacStadium Orka console
|
||||
4. Open an issue in the repository
|
||||
|
||||
When contributing:
|
||||
1. Test changes in a staging environment first
|
||||
2. Update documentation as needed
|
||||
3. Follow the existing code style
|
||||
4. Add appropriate tests and validation
|
||||
|
||||
## License
|
||||
|
||||
This infrastructure code is part of the Bun project and follows the same license terms.
|
||||
376
.buildkite/macos-runners/github-actions/deploy-fleet.yml
Normal file
376
.buildkite/macos-runners/github-actions/deploy-fleet.yml
Normal file
@@ -0,0 +1,376 @@
|
||||
name: Deploy macOS Runner Fleet
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
environment:
|
||||
description: 'Deployment environment'
|
||||
required: true
|
||||
default: 'production'
|
||||
type: choice
|
||||
options:
|
||||
- production
|
||||
- staging
|
||||
- development
|
||||
fleet_size_macos_13:
|
||||
description: 'Number of macOS 13 VMs'
|
||||
required: false
|
||||
default: '4'
|
||||
fleet_size_macos_14:
|
||||
description: 'Number of macOS 14 VMs'
|
||||
required: false
|
||||
default: '6'
|
||||
fleet_size_macos_15:
|
||||
description: 'Number of macOS 15 VMs'
|
||||
required: false
|
||||
default: '8'
|
||||
force_deploy:
|
||||
description: 'Force deployment even if no changes'
|
||||
required: false
|
||||
default: false
|
||||
type: boolean
|
||||
|
||||
env:
|
||||
TERRAFORM_VERSION: "1.6.0"
|
||||
AWS_REGION: "us-west-2"
|
||||
|
||||
jobs:
|
||||
validate-inputs:
|
||||
runs-on: ubuntu-latest
|
||||
outputs:
|
||||
validated: ${{ steps.validate.outputs.validated }}
|
||||
total_vms: ${{ steps.validate.outputs.total_vms }}
|
||||
steps:
|
||||
- name: Validate inputs
|
||||
id: validate
|
||||
run: |
|
||||
# Validate fleet sizes
|
||||
macos_13="${{ github.event.inputs.fleet_size_macos_13 }}"
|
||||
macos_14="${{ github.event.inputs.fleet_size_macos_14 }}"
|
||||
macos_15="${{ github.event.inputs.fleet_size_macos_15 }}"
|
||||
|
||||
# Check if inputs are valid numbers
|
||||
if ! [[ "$macos_13" =~ ^[0-9]+$ ]] || ! [[ "$macos_14" =~ ^[0-9]+$ ]] || ! [[ "$macos_15" =~ ^[0-9]+$ ]]; then
|
||||
echo "Error: Fleet sizes must be valid numbers"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check if at least one VM is requested
|
||||
total_vms=$((macos_13 + macos_14 + macos_15))
|
||||
if [[ $total_vms -eq 0 ]]; then
|
||||
echo "Error: At least one VM must be requested"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check reasonable limits
|
||||
if [[ $total_vms -gt 50 ]]; then
|
||||
echo "Error: Total VMs cannot exceed 50"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "validated=true" >> $GITHUB_OUTPUT
|
||||
echo "total_vms=$total_vms" >> $GITHUB_OUTPUT
|
||||
|
||||
echo "Validation passed:"
|
||||
echo "- macOS 13: $macos_13 VMs"
|
||||
echo "- macOS 14: $macos_14 VMs"
|
||||
echo "- macOS 15: $macos_15 VMs"
|
||||
echo "- Total: $total_vms VMs"
|
||||
|
||||
plan-deployment:
|
||||
runs-on: ubuntu-latest
|
||||
needs: validate-inputs
|
||||
if: needs.validate-inputs.outputs.validated == 'true'
|
||||
outputs:
|
||||
plan_status: ${{ steps.plan.outputs.plan_status }}
|
||||
has_changes: ${{ steps.plan.outputs.has_changes }}
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Terraform
|
||||
uses: hashicorp/setup-terraform@v3
|
||||
with:
|
||||
terraform_version: ${{ env.TERRAFORM_VERSION }}
|
||||
|
||||
- name: Configure AWS credentials
|
||||
uses: aws-actions/configure-aws-credentials@v4
|
||||
with:
|
||||
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
aws-region: ${{ env.AWS_REGION }}
|
||||
|
||||
- name: Initialize Terraform
|
||||
working-directory: .buildkite/macos-runners/terraform
|
||||
run: |
|
||||
terraform init
|
||||
terraform workspace select ${{ github.event.inputs.environment }} || terraform workspace new ${{ github.event.inputs.environment }}
|
||||
|
||||
- name: Create terraform variables file
|
||||
working-directory: .buildkite/macos-runners/terraform
|
||||
run: |
|
||||
cat > terraform.tfvars << EOF
|
||||
environment = "${{ github.event.inputs.environment }}"
|
||||
fleet_size = {
|
||||
macos_13 = ${{ github.event.inputs.fleet_size_macos_13 }}
|
||||
macos_14 = ${{ github.event.inputs.fleet_size_macos_14 }}
|
||||
macos_15 = ${{ github.event.inputs.fleet_size_macos_15 }}
|
||||
}
|
||||
EOF
|
||||
|
||||
- name: Plan Terraform deployment
|
||||
id: plan
|
||||
working-directory: .buildkite/macos-runners/terraform
|
||||
run: |
|
||||
set -e
|
||||
|
||||
# Run terraform plan
|
||||
terraform plan \
|
||||
-var "macstadium_api_key=${{ secrets.MACSTADIUM_API_KEY }}" \
|
||||
-var "buildkite_agent_token=${{ secrets.BUILDKITE_AGENT_TOKEN }}" \
|
||||
-var "github_token=${{ secrets.GITHUB_TOKEN }}" \
|
||||
-out=tfplan \
|
||||
-detailed-exitcode > plan_output.txt 2>&1
|
||||
|
||||
plan_exit_code=$?
|
||||
|
||||
# Check plan results
|
||||
if [[ $plan_exit_code -eq 0 ]]; then
|
||||
echo "plan_status=no_changes" >> $GITHUB_OUTPUT
|
||||
echo "has_changes=false" >> $GITHUB_OUTPUT
|
||||
elif [[ $plan_exit_code -eq 2 ]]; then
|
||||
echo "plan_status=has_changes" >> $GITHUB_OUTPUT
|
||||
echo "has_changes=true" >> $GITHUB_OUTPUT
|
||||
else
|
||||
echo "plan_status=failed" >> $GITHUB_OUTPUT
|
||||
echo "has_changes=false" >> $GITHUB_OUTPUT
|
||||
cat plan_output.txt
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Save plan output
|
||||
echo "Plan output:"
|
||||
cat plan_output.txt
|
||||
|
||||
- name: Upload plan
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: terraform-plan
|
||||
path: |
|
||||
.buildkite/macos-runners/terraform/tfplan
|
||||
.buildkite/macos-runners/terraform/plan_output.txt
|
||||
retention-days: 30
|
||||
|
||||
deploy:
|
||||
runs-on: ubuntu-latest
|
||||
needs: [validate-inputs, plan-deployment]
|
||||
if: needs.plan-deployment.outputs.has_changes == 'true' || github.event.inputs.force_deploy == 'true'
|
||||
environment: ${{ github.event.inputs.environment }}
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Terraform
|
||||
uses: hashicorp/setup-terraform@v3
|
||||
with:
|
||||
terraform_version: ${{ env.TERRAFORM_VERSION }}
|
||||
|
||||
- name: Configure AWS credentials
|
||||
uses: aws-actions/configure-aws-credentials@v4
|
||||
with:
|
||||
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
aws-region: ${{ env.AWS_REGION }}
|
||||
|
||||
- name: Download plan
|
||||
uses: actions/download-artifact@v4
|
||||
with:
|
||||
name: terraform-plan
|
||||
path: .buildkite/macos-runners/terraform/
|
||||
|
||||
- name: Initialize Terraform
|
||||
working-directory: .buildkite/macos-runners/terraform
|
||||
run: |
|
||||
terraform init
|
||||
terraform workspace select ${{ github.event.inputs.environment }}
|
||||
|
||||
- name: Apply Terraform deployment
|
||||
working-directory: .buildkite/macos-runners/terraform
|
||||
run: |
|
||||
echo "Applying Terraform deployment..."
|
||||
terraform apply -auto-approve tfplan
|
||||
|
||||
- name: Get deployment outputs
|
||||
working-directory: .buildkite/macos-runners/terraform
|
||||
run: |
|
||||
terraform output -json > terraform-outputs.json
|
||||
echo "Deployment outputs:"
|
||||
cat terraform-outputs.json | jq .
|
||||
|
||||
- name: Upload deployment outputs
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: deployment-outputs-${{ github.event.inputs.environment }}
|
||||
path: .buildkite/macos-runners/terraform/terraform-outputs.json
|
||||
retention-days: 90
|
||||
|
||||
- name: Verify deployment
|
||||
working-directory: .buildkite/macos-runners/terraform
|
||||
run: |
|
||||
echo "Verifying deployment..."
|
||||
|
||||
# Check VM count
|
||||
vm_count=$(terraform output -json vm_instances | jq 'length')
|
||||
expected_count=${{ needs.validate-inputs.outputs.total_vms }}
|
||||
|
||||
if [[ $vm_count -eq $expected_count ]]; then
|
||||
echo "✅ VM count matches expected: $vm_count"
|
||||
else
|
||||
echo "❌ VM count mismatch: expected $expected_count, got $vm_count"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check VM states
|
||||
terraform output -json vm_instances | jq -r 'to_entries[] | "\(.key): \(.value.name) - \(.value.status)"' | while read vm_info; do
|
||||
echo "VM: $vm_info"
|
||||
done
|
||||
|
||||
health-check:
|
||||
runs-on: ubuntu-latest
|
||||
needs: [validate-inputs, plan-deployment, deploy]
|
||||
if: always() && needs.deploy.result == 'success'
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup dependencies
|
||||
run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y jq curl
|
||||
|
||||
- name: Download deployment outputs
|
||||
uses: actions/download-artifact@v4
|
||||
with:
|
||||
name: deployment-outputs-${{ github.event.inputs.environment }}
|
||||
path: ./
|
||||
|
||||
- name: Wait for VMs to be ready
|
||||
run: |
|
||||
echo "Waiting for VMs to be ready..."
|
||||
sleep 300 # Wait 5 minutes for VMs to initialize
|
||||
|
||||
- name: Check VM health
|
||||
run: |
|
||||
echo "Checking VM health..."
|
||||
|
||||
# Read VM details from outputs
|
||||
jq -r '.vm_instances.value | to_entries[] | "\(.value.name) \(.value.ip_address)"' terraform-outputs.json | while read vm_name vm_ip; do
|
||||
echo "Checking VM: $vm_name ($vm_ip)"
|
||||
|
||||
# Check health endpoint
|
||||
max_attempts=12
|
||||
attempt=1
|
||||
|
||||
while [[ $attempt -le $max_attempts ]]; do
|
||||
if curl -f -s --max-time 30 "http://$vm_ip:8080/health" > /dev/null; then
|
||||
echo "✅ $vm_name is healthy"
|
||||
break
|
||||
else
|
||||
echo "⏳ $vm_name not ready yet (attempt $attempt/$max_attempts)"
|
||||
sleep 30
|
||||
((attempt++))
|
||||
fi
|
||||
done
|
||||
|
||||
if [[ $attempt -gt $max_attempts ]]; then
|
||||
echo "❌ $vm_name failed health check"
|
||||
fi
|
||||
done
|
||||
|
||||
- name: Check Buildkite connectivity
|
||||
run: |
|
||||
echo "Checking Buildkite agent connectivity..."
|
||||
|
||||
# Wait a bit more for agents to connect
|
||||
sleep 60
|
||||
|
||||
# Check connected agents
|
||||
curl -s -H "Authorization: Bearer ${{ secrets.BUILDKITE_API_TOKEN }}" \
|
||||
"https://api.buildkite.com/v2/organizations/${{ secrets.BUILDKITE_ORG }}/agents" | \
|
||||
jq -r '.[] | select(.name | test("^bun-runner-")) | "\(.name) \(.connection_state) \(.hostname)"' | \
|
||||
while read agent_name state hostname; do
|
||||
echo "Agent: $agent_name - State: $state - Host: $hostname"
|
||||
done
|
||||
|
||||
notify-success:
|
||||
runs-on: ubuntu-latest
|
||||
needs: [validate-inputs, plan-deployment, deploy, health-check]
|
||||
if: always() && needs.deploy.result == 'success'
|
||||
|
||||
steps:
|
||||
- name: Notify success
|
||||
uses: sarisia/actions-status-discord@v1
|
||||
with:
|
||||
webhook: ${{ secrets.DISCORD_WEBHOOK_URL }}
|
||||
status: success
|
||||
title: "macOS runner fleet deployed successfully"
|
||||
description: |
|
||||
🚀 **macOS runner fleet deployed successfully**
|
||||
|
||||
**Environment:** ${{ github.event.inputs.environment }}
|
||||
**Total VMs:** ${{ needs.validate-inputs.outputs.total_vms }}
|
||||
|
||||
**Fleet composition:**
|
||||
- macOS 13: ${{ github.event.inputs.fleet_size_macos_13 }} VMs
|
||||
- macOS 14: ${{ github.event.inputs.fleet_size_macos_14 }} VMs
|
||||
- macOS 15: ${{ github.event.inputs.fleet_size_macos_15 }} VMs
|
||||
|
||||
**Repository:** ${{ github.repository }}
|
||||
[View Deployment](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})
|
||||
color: 0x00ff00
|
||||
username: "GitHub Actions"
|
||||
|
||||
notify-failure:
|
||||
runs-on: ubuntu-latest
|
||||
needs: [validate-inputs, plan-deployment, deploy, health-check]
|
||||
if: always() && (needs.validate-inputs.result == 'failure' || needs.plan-deployment.result == 'failure' || needs.deploy.result == 'failure')
|
||||
|
||||
steps:
|
||||
- name: Notify failure
|
||||
uses: sarisia/actions-status-discord@v1
|
||||
with:
|
||||
webhook: ${{ secrets.DISCORD_WEBHOOK_URL }}
|
||||
status: failure
|
||||
title: "macOS runner fleet deployment failed"
|
||||
description: |
|
||||
🔴 **macOS runner fleet deployment failed**
|
||||
|
||||
**Environment:** ${{ github.event.inputs.environment }}
|
||||
**Failed stage:** ${{ needs.validate-inputs.result == 'failure' && 'Validation' || needs.plan-deployment.result == 'failure' && 'Planning' || 'Deployment' }}
|
||||
|
||||
**Repository:** ${{ github.repository }}
|
||||
[View Deployment](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})
|
||||
|
||||
Please check the logs for more details.
|
||||
color: 0xff0000
|
||||
username: "GitHub Actions"
|
||||
|
||||
notify-no-changes:
|
||||
runs-on: ubuntu-latest
|
||||
needs: [validate-inputs, plan-deployment]
|
||||
if: needs.plan-deployment.outputs.has_changes == 'false' && github.event.inputs.force_deploy != 'true'
|
||||
|
||||
steps:
|
||||
- name: Notify no changes
|
||||
uses: sarisia/actions-status-discord@v1
|
||||
with:
|
||||
webhook: ${{ secrets.DISCORD_WEBHOOK_URL }}
|
||||
status: cancelled
|
||||
title: "macOS runner fleet deployment skipped"
|
||||
description: |
|
||||
ℹ️ **macOS runner fleet deployment skipped** - no changes detected in Terraform plan
|
||||
color: 0x808080
|
||||
username: "GitHub Actions"
|
||||
515
.buildkite/macos-runners/github-actions/image-rebuild.yml
Normal file
515
.buildkite/macos-runners/github-actions/image-rebuild.yml
Normal file
@@ -0,0 +1,515 @@
|
||||
name: Rebuild macOS Runner Images
|
||||
|
||||
on:
|
||||
schedule:
|
||||
# Run daily at 2 AM UTC
|
||||
- cron: '0 2 * * *'
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
macos_versions:
|
||||
description: 'macOS versions to rebuild (comma-separated: 13,14,15)'
|
||||
required: false
|
||||
default: '13,14,15'
|
||||
force_rebuild:
|
||||
description: 'Force rebuild even if no changes detected'
|
||||
required: false
|
||||
default: 'false'
|
||||
type: boolean
|
||||
|
||||
env:
|
||||
PACKER_VERSION: "1.9.4"
|
||||
TERRAFORM_VERSION: "1.6.0"
|
||||
|
||||
jobs:
|
||||
check-changes:
|
||||
runs-on: ubuntu-latest
|
||||
outputs:
|
||||
should_rebuild: ${{ steps.check.outputs.should_rebuild }}
|
||||
changed_files: ${{ steps.check.outputs.changed_files }}
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 2
|
||||
|
||||
- name: Check for changes
|
||||
id: check
|
||||
run: |
|
||||
# Check if any relevant files have changed in the last 24 hours
|
||||
changed_files=$(git diff --name-only HEAD~1 HEAD | grep -E "(bootstrap|packer|\.buildkite/macos-runners)" | head -20)
|
||||
|
||||
if [[ -n "$changed_files" ]] || [[ "${{ github.event.inputs.force_rebuild }}" == "true" ]]; then
|
||||
echo "should_rebuild=true" >> $GITHUB_OUTPUT
|
||||
echo "changed_files<<EOF" >> $GITHUB_OUTPUT
|
||||
echo "$changed_files" >> $GITHUB_OUTPUT
|
||||
echo "EOF" >> $GITHUB_OUTPUT
|
||||
else
|
||||
echo "should_rebuild=false" >> $GITHUB_OUTPUT
|
||||
echo "changed_files=" >> $GITHUB_OUTPUT
|
||||
fi
|
||||
|
||||
build-images:
|
||||
runs-on: ubuntu-latest
|
||||
needs: check-changes
|
||||
if: needs.check-changes.outputs.should_rebuild == 'true'
|
||||
strategy:
|
||||
matrix:
|
||||
macos_version: [13, 14, 15]
|
||||
fail-fast: false
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Packer
|
||||
uses: hashicorp/setup-packer@main
|
||||
with:
|
||||
version: ${{ env.PACKER_VERSION }}
|
||||
|
||||
- name: Setup Terraform
|
||||
uses: hashicorp/setup-terraform@v3
|
||||
with:
|
||||
terraform_version: ${{ env.TERRAFORM_VERSION }}
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y jq curl
|
||||
|
||||
- name: Configure AWS credentials
|
||||
uses: aws-actions/configure-aws-credentials@v4
|
||||
with:
|
||||
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
aws-region: us-west-2
|
||||
|
||||
- name: Validate Packer configuration
|
||||
working-directory: .buildkite/macos-runners/packer
|
||||
run: |
|
||||
packer validate \
|
||||
-var "macos_version=${{ matrix.macos_version }}" \
|
||||
-var "orka_endpoint=${{ secrets.ORKA_ENDPOINT }}" \
|
||||
-var "orka_auth_token=${{ secrets.ORKA_AUTH_TOKEN }}" \
|
||||
macos-base.pkr.hcl
|
||||
|
||||
- name: Build macOS ${{ matrix.macos_version }} image
|
||||
working-directory: .buildkite/macos-runners/packer
|
||||
run: |
|
||||
echo "Building macOS ${{ matrix.macos_version }} image..."
|
||||
|
||||
# Set build variables
|
||||
export PACKER_LOG=1
|
||||
export PACKER_LOG_PATH="./packer-build-macos-${{ matrix.macos_version }}.log"
|
||||
|
||||
# Build the image
|
||||
packer build \
|
||||
-var "macos_version=${{ matrix.macos_version }}" \
|
||||
-var "orka_endpoint=${{ secrets.ORKA_ENDPOINT }}" \
|
||||
-var "orka_auth_token=${{ secrets.ORKA_AUTH_TOKEN }}" \
|
||||
-var "base_image=base-images/macos-${{ matrix.macos_version }}-$([ ${{ matrix.macos_version }} -eq 13 ] && echo 'ventura' || [ ${{ matrix.macos_version }} -eq 14 ] && echo 'sonoma' || echo 'sequoia')" \
|
||||
macos-base.pkr.hcl
|
||||
|
||||
- name: Validate built image
|
||||
working-directory: .buildkite/macos-runners/packer
|
||||
run: |
|
||||
echo "Validating built image..."
|
||||
|
||||
# Get the latest built image ID
|
||||
IMAGE_ID=$(orka image list --output json | jq -r '.[] | select(.name | test("^bun-macos-${{ matrix.macos_version }}-")) | .id' | head -1)
|
||||
|
||||
if [ -z "$IMAGE_ID" ]; then
|
||||
echo "❌ No image found for macOS ${{ matrix.macos_version }}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✅ Found image: $IMAGE_ID"
|
||||
|
||||
# Create a test VM to validate the image
|
||||
VM_NAME="test-validation-${{ matrix.macos_version }}-$(date +%s)"
|
||||
|
||||
echo "Creating test VM: $VM_NAME"
|
||||
orka vm create \
|
||||
--name "$VM_NAME" \
|
||||
--image "$IMAGE_ID" \
|
||||
--cpu 4 \
|
||||
--memory 8 \
|
||||
--wait
|
||||
|
||||
# Wait for VM to be ready
|
||||
sleep 60
|
||||
|
||||
# Get VM IP
|
||||
VM_IP=$(orka vm show "$VM_NAME" --output json | jq -r '.ip_address')
|
||||
|
||||
echo "Testing VM at IP: $VM_IP"
|
||||
|
||||
# Test software installations
|
||||
echo "Testing software installations..."
|
||||
|
||||
# Test Node.js
|
||||
ssh -o StrictHostKeyChecking=no admin@$VM_IP 'node --version' || exit 1
|
||||
|
||||
# Test Bun
|
||||
ssh -o StrictHostKeyChecking=no admin@$VM_IP 'bun --version' || exit 1
|
||||
|
||||
# Test build tools
|
||||
ssh -o StrictHostKeyChecking=no admin@$VM_IP 'cmake --version' || exit 1
|
||||
ssh -o StrictHostKeyChecking=no admin@$VM_IP 'clang --version' || exit 1
|
||||
|
||||
# Test Docker
|
||||
ssh -o StrictHostKeyChecking=no admin@$VM_IP 'docker --version' || exit 1
|
||||
|
||||
# Test Tailscale
|
||||
ssh -o StrictHostKeyChecking=no admin@$VM_IP 'tailscale --version' || exit 1
|
||||
|
||||
# Test health endpoint
|
||||
ssh -o StrictHostKeyChecking=no admin@$VM_IP 'curl -f http://localhost:8080/health' || exit 1
|
||||
|
||||
echo "✅ All software validations passed"
|
||||
|
||||
# Clean up test VM
|
||||
orka vm delete "$VM_NAME" --force
|
||||
|
||||
echo "✅ Image validation completed successfully"
|
||||
|
||||
- name: Run flakiness checks
|
||||
working-directory: .buildkite/macos-runners/packer
|
||||
run: |
|
||||
echo "Running flakiness checks..."
|
||||
|
||||
# Get the latest built image ID
|
||||
IMAGE_ID=$(orka image list --output json | jq -r '.[] | select(.name | test("^bun-macos-${{ matrix.macos_version }}-")) | .id' | head -1)
|
||||
|
||||
# Run multiple test iterations to check for flakiness
|
||||
ITERATIONS=3
|
||||
PASSED=0
|
||||
FAILED=0
|
||||
|
||||
for i in $(seq 1 $ITERATIONS); do
|
||||
echo "Running flakiness test iteration $i/$ITERATIONS..."
|
||||
|
||||
VM_NAME="flakiness-test-${{ matrix.macos_version }}-$i-$(date +%s)"
|
||||
|
||||
# Create test VM
|
||||
orka vm create \
|
||||
--name "$VM_NAME" \
|
||||
--image "$IMAGE_ID" \
|
||||
--cpu 4 \
|
||||
--memory 8 \
|
||||
--wait
|
||||
|
||||
sleep 30
|
||||
|
||||
# Get VM IP
|
||||
VM_IP=$(orka vm show "$VM_NAME" --output json | jq -r '.ip_address')
|
||||
|
||||
# Run a series of quick tests
|
||||
TEST_PASSED=true
|
||||
|
||||
# Test 1: Basic command execution
|
||||
if ! ssh -o StrictHostKeyChecking=no -o ConnectTimeout=30 admin@$VM_IP 'echo "test" > /tmp/test.txt && cat /tmp/test.txt'; then
|
||||
echo "❌ Basic command test failed"
|
||||
TEST_PASSED=false
|
||||
fi
|
||||
|
||||
# Test 2: Node.js execution
|
||||
if ! ssh -o StrictHostKeyChecking=no -o ConnectTimeout=30 admin@$VM_IP 'node -e "console.log(\"Node.js test\")"'; then
|
||||
echo "❌ Node.js test failed"
|
||||
TEST_PASSED=false
|
||||
fi
|
||||
|
||||
# Test 3: Bun execution
|
||||
if ! ssh -o StrictHostKeyChecking=no -o ConnectTimeout=30 admin@$VM_IP 'bun -e "console.log(\"Bun test\")"'; then
|
||||
echo "❌ Bun test failed"
|
||||
TEST_PASSED=false
|
||||
fi
|
||||
|
||||
# Test 4: Build tools
|
||||
if ! ssh -o StrictHostKeyChecking=no -o ConnectTimeout=30 admin@$VM_IP 'clang --version > /tmp/clang_version.txt'; then
|
||||
echo "❌ Clang test failed"
|
||||
TEST_PASSED=false
|
||||
fi
|
||||
|
||||
# Test 5: File system operations
|
||||
if ! ssh -o StrictHostKeyChecking=no -o ConnectTimeout=30 admin@$VM_IP 'mkdir -p /tmp/test_dir && touch /tmp/test_dir/test_file'; then
|
||||
echo "❌ File system test failed"
|
||||
TEST_PASSED=false
|
||||
fi
|
||||
|
||||
# Test 6: Process creation
|
||||
if ! ssh -o StrictHostKeyChecking=no -o ConnectTimeout=30 admin@$VM_IP 'ps aux | grep -v grep | wc -l'; then
|
||||
echo "❌ Process test failed"
|
||||
TEST_PASSED=false
|
||||
fi
|
||||
|
||||
# Clean up test VM
|
||||
orka vm delete "$VM_NAME" --force
|
||||
|
||||
if [ "$TEST_PASSED" = true ]; then
|
||||
echo "✅ Iteration $i passed"
|
||||
PASSED=$((PASSED + 1))
|
||||
else
|
||||
echo "❌ Iteration $i failed"
|
||||
FAILED=$((FAILED + 1))
|
||||
fi
|
||||
|
||||
# Short delay between iterations
|
||||
sleep 10
|
||||
done
|
||||
|
||||
echo "Flakiness check results:"
|
||||
echo "- Passed: $PASSED/$ITERATIONS"
|
||||
echo "- Failed: $FAILED/$ITERATIONS"
|
||||
|
||||
# Calculate success rate
|
||||
SUCCESS_RATE=$((PASSED * 100 / ITERATIONS))
|
||||
echo "- Success rate: $SUCCESS_RATE%"
|
||||
|
||||
# Fail if success rate is below 80%
|
||||
if [ $SUCCESS_RATE -lt 80 ]; then
|
||||
echo "❌ Image is too flaky! Success rate: $SUCCESS_RATE% (minimum: 80%)"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✅ Flakiness checks passed with $SUCCESS_RATE% success rate"
|
||||
|
||||
- name: Upload build logs
|
||||
if: always()
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: packer-logs-macos-${{ matrix.macos_version }}
|
||||
path: .buildkite/macos-runners/packer/packer-build-macos-${{ matrix.macos_version }}.log
|
||||
retention-days: 7
|
||||
|
||||
- name: Notify on failure
|
||||
if: failure()
|
||||
uses: sarisia/actions-status-discord@v1
|
||||
with:
|
||||
webhook: ${{ secrets.DISCORD_WEBHOOK_URL }}
|
||||
status: failure
|
||||
title: "macOS ${{ matrix.macos_version }} image build failed"
|
||||
description: |
|
||||
🔴 **macOS ${{ matrix.macos_version }} image build failed**
|
||||
|
||||
**Repository:** ${{ github.repository }}
|
||||
**Branch:** ${{ github.ref }}
|
||||
**Commit:** ${{ github.sha }}
|
||||
|
||||
[Check the logs](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})
|
||||
color: 0xff0000
|
||||
username: "GitHub Actions"
|
||||
|
||||
update-terraform:
|
||||
runs-on: ubuntu-latest
|
||||
needs: [check-changes, build-images]
|
||||
if: needs.check-changes.outputs.should_rebuild == 'true' && needs.build-images.result == 'success'
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Terraform
|
||||
uses: hashicorp/setup-terraform@v3
|
||||
with:
|
||||
terraform_version: ${{ env.TERRAFORM_VERSION }}
|
||||
|
||||
- name: Configure AWS credentials
|
||||
uses: aws-actions/configure-aws-credentials@v4
|
||||
with:
|
||||
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
aws-region: us-west-2
|
||||
|
||||
- name: Initialize Terraform
|
||||
working-directory: .buildkite/macos-runners/terraform
|
||||
run: |
|
||||
terraform init
|
||||
terraform workspace select production || terraform workspace new production
|
||||
|
||||
- name: Plan Terraform changes
|
||||
working-directory: .buildkite/macos-runners/terraform
|
||||
run: |
|
||||
terraform plan \
|
||||
-var "macstadium_api_key=${{ secrets.MACSTADIUM_API_KEY }}" \
|
||||
-var "buildkite_agent_token=${{ secrets.BUILDKITE_AGENT_TOKEN }}" \
|
||||
-var "github_token=${{ secrets.GITHUB_TOKEN }}" \
|
||||
-out=tfplan
|
||||
|
||||
- name: Apply Terraform changes
|
||||
working-directory: .buildkite/macos-runners/terraform
|
||||
run: |
|
||||
terraform apply -auto-approve tfplan
|
||||
|
||||
- name: Save Terraform outputs
|
||||
working-directory: .buildkite/macos-runners/terraform
|
||||
run: |
|
||||
terraform output -json > terraform-outputs.json
|
||||
|
||||
- name: Upload Terraform outputs
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: terraform-outputs
|
||||
path: .buildkite/macos-runners/terraform/terraform-outputs.json
|
||||
retention-days: 30
|
||||
|
||||
cleanup-old-images:
|
||||
runs-on: ubuntu-latest
|
||||
needs: [check-changes, build-images, update-terraform]
|
||||
if: always() && needs.check-changes.outputs.should_rebuild == 'true'
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup AWS CLI
|
||||
uses: aws-actions/configure-aws-credentials@v4
|
||||
with:
|
||||
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
aws-region: us-west-2
|
||||
|
||||
- name: Install MacStadium CLI
|
||||
run: |
|
||||
curl -L "https://github.com/macstadium/orka-cli/releases/latest/download/orka-cli-linux-amd64.tar.gz" | tar -xz
|
||||
sudo mv orka-cli /usr/local/bin/orka
|
||||
chmod +x /usr/local/bin/orka
|
||||
|
||||
- name: Configure MacStadium CLI
|
||||
run: |
|
||||
orka config set endpoint ${{ secrets.ORKA_ENDPOINT }}
|
||||
orka auth token ${{ secrets.ORKA_AUTH_TOKEN }}
|
||||
|
||||
- name: Clean up old images
|
||||
run: |
|
||||
echo "Cleaning up old images..."
|
||||
|
||||
# Get list of all images
|
||||
orka image list --output json > images.json
|
||||
|
||||
# Find images older than 7 days
|
||||
cutoff_date=$(date -d '7 days ago' +%s)
|
||||
|
||||
# Parse and delete old images
|
||||
jq -r '.[] | select(.name | test("^bun-macos-")) | select(.created_at | strptime("%Y-%m-%dT%H:%M:%SZ") | mktime < '$cutoff_date') | .name' images.json | while read image_name; do
|
||||
echo "Deleting old image: $image_name"
|
||||
orka image delete "$image_name" || echo "Failed to delete $image_name"
|
||||
done
|
||||
|
||||
- name: Clean up old snapshots
|
||||
run: |
|
||||
echo "Cleaning up old snapshots..."
|
||||
|
||||
# Get list of all snapshots
|
||||
orka snapshot list --output json > snapshots.json
|
||||
|
||||
# Find snapshots older than 7 days
|
||||
cutoff_date=$(date -d '7 days ago' +%s)
|
||||
|
||||
# Parse and delete old snapshots
|
||||
jq -r '.[] | select(.name | test("^bun-macos-")) | select(.created_at | strptime("%Y-%m-%dT%H:%M:%SZ") | mktime < '$cutoff_date') | .name' snapshots.json | while read snapshot_name; do
|
||||
echo "Deleting old snapshot: $snapshot_name"
|
||||
orka snapshot delete "$snapshot_name" || echo "Failed to delete $snapshot_name"
|
||||
done
|
||||
|
||||
health-check:
|
||||
runs-on: ubuntu-latest
|
||||
needs: [check-changes, build-images, update-terraform]
|
||||
if: always() && needs.check-changes.outputs.should_rebuild == 'true'
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup AWS CLI
|
||||
uses: aws-actions/configure-aws-credentials@v4
|
||||
with:
|
||||
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
aws-region: us-west-2
|
||||
|
||||
- name: Install MacStadium CLI
|
||||
run: |
|
||||
curl -L "https://github.com/macstadium/orka-cli/releases/latest/download/orka-cli-linux-amd64.tar.gz" | tar -xz
|
||||
sudo mv orka-cli /usr/local/bin/orka
|
||||
chmod +x /usr/local/bin/orka
|
||||
|
||||
- name: Configure MacStadium CLI
|
||||
run: |
|
||||
orka config set endpoint ${{ secrets.ORKA_ENDPOINT }}
|
||||
orka auth token ${{ secrets.ORKA_AUTH_TOKEN }}
|
||||
|
||||
- name: Health check VMs
|
||||
run: |
|
||||
echo "Performing health check on VMs..."
|
||||
|
||||
# Get list of running VMs
|
||||
orka vm list --output json > vms.json
|
||||
|
||||
# Check each VM
|
||||
jq -r '.[] | select(.name | test("^bun-runner-")) | select(.status == "running") | "\(.name) \(.ip_address)"' vms.json | while read vm_name vm_ip; do
|
||||
echo "Checking VM: $vm_name ($vm_ip)"
|
||||
|
||||
# Check if VM is responding to health checks
|
||||
if curl -f -s --max-time 30 "http://$vm_ip:8080/health" > /dev/null; then
|
||||
echo "✅ $vm_name is healthy"
|
||||
else
|
||||
echo "❌ $vm_name is not responding to health checks"
|
||||
fi
|
||||
done
|
||||
|
||||
- name: Check Buildkite agent connectivity
|
||||
run: |
|
||||
echo "Checking Buildkite agent connectivity..."
|
||||
|
||||
# Use Buildkite API to check connected agents
|
||||
curl -s -H "Authorization: Bearer ${{ secrets.BUILDKITE_API_TOKEN }}" \
|
||||
"https://api.buildkite.com/v2/organizations/${{ secrets.BUILDKITE_ORG }}/agents" | \
|
||||
jq -r '.[] | select(.name | test("^bun-runner-")) | "\(.name) \(.connection_state)"' | \
|
||||
while read agent_name state; do
|
||||
echo "Agent: $agent_name - State: $state"
|
||||
done
|
||||
|
||||
notify-success:
|
||||
runs-on: ubuntu-latest
|
||||
needs: [check-changes, build-images, update-terraform, cleanup-old-images, health-check]
|
||||
if: always() && needs.check-changes.outputs.should_rebuild == 'true' && needs.build-images.result == 'success'
|
||||
|
||||
steps:
|
||||
- name: Notify success
|
||||
uses: sarisia/actions-status-discord@v1
|
||||
with:
|
||||
webhook: ${{ secrets.DISCORD_WEBHOOK_URL }}
|
||||
status: success
|
||||
title: "macOS runner images rebuilt successfully"
|
||||
description: |
|
||||
✅ **macOS runner images rebuilt successfully**
|
||||
|
||||
**Repository:** ${{ github.repository }}
|
||||
**Branch:** ${{ github.ref }}
|
||||
**Commit:** ${{ github.sha }}
|
||||
|
||||
**Changes detected in:**
|
||||
${{ needs.check-changes.outputs.changed_files }}
|
||||
|
||||
**Images built:** ${{ join(github.event.inputs.macos_versions || '13,14,15', ', ') }}
|
||||
|
||||
[Check the deployment](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})
|
||||
color: 0x00ff00
|
||||
username: "GitHub Actions"
|
||||
|
||||
notify-skip:
|
||||
runs-on: ubuntu-latest
|
||||
needs: check-changes
|
||||
if: needs.check-changes.outputs.should_rebuild == 'false'
|
||||
|
||||
steps:
|
||||
- name: Notify skip
|
||||
uses: sarisia/actions-status-discord@v1
|
||||
with:
|
||||
webhook: ${{ secrets.DISCORD_WEBHOOK_URL }}
|
||||
status: cancelled
|
||||
title: "macOS runner image rebuild skipped"
|
||||
description: |
|
||||
ℹ️ **macOS runner image rebuild skipped** - no changes detected in the last 24 hours
|
||||
color: 0x808080
|
||||
username: "GitHub Actions"
|
||||
270
.buildkite/macos-runners/packer/macos-base.pkr.hcl
Normal file
270
.buildkite/macos-runners/packer/macos-base.pkr.hcl
Normal file
@@ -0,0 +1,270 @@
|
||||
packer {
|
||||
required_plugins {
|
||||
macstadium-orka = {
|
||||
version = ">= 3.0.0"
|
||||
source = "github.com/macstadium/macstadium-orka"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
variable "orka_endpoint" {
|
||||
description = "MacStadium Orka endpoint"
|
||||
type = string
|
||||
default = env("ORKA_ENDPOINT")
|
||||
}
|
||||
|
||||
variable "orka_auth_token" {
|
||||
description = "MacStadium Orka auth token"
|
||||
type = string
|
||||
default = env("ORKA_AUTH_TOKEN")
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "base_image" {
|
||||
description = "Base macOS image to use"
|
||||
type = string
|
||||
default = "base-images/macos-15-sequoia"
|
||||
}
|
||||
|
||||
variable "macos_version" {
|
||||
description = "macOS version (13, 14, 15)"
|
||||
type = string
|
||||
default = "15"
|
||||
}
|
||||
|
||||
variable "cpu_count" {
|
||||
description = "Number of CPU cores"
|
||||
type = number
|
||||
default = 12
|
||||
}
|
||||
|
||||
variable "memory_gb" {
|
||||
description = "Memory in GB"
|
||||
type = number
|
||||
default = 32
|
||||
}
|
||||
|
||||
source "macstadium-orka" "base" {
|
||||
orka_endpoint = var.orka_endpoint
|
||||
orka_auth_token = var.orka_auth_token
|
||||
|
||||
source_image = var.base_image
|
||||
image_name = "bun-macos-${var.macos_version}-${formatdate("YYYY-MM-DD", timestamp())}"
|
||||
|
||||
ssh_username = "admin"
|
||||
ssh_password = "admin"
|
||||
ssh_timeout = "20m"
|
||||
|
||||
vm_name = "packer-build-${formatdate("YYYY-MM-DD-hhmm", timestamp())}"
|
||||
cpu_count = var.cpu_count
|
||||
memory_gb = var.memory_gb
|
||||
|
||||
# Enable GPU acceleration for better performance
|
||||
gpu_passthrough = true
|
||||
|
||||
# Network configuration
|
||||
vnc_bind_address = "0.0.0.0"
|
||||
vnc_port_min = 5900
|
||||
vnc_port_max = 5999
|
||||
|
||||
# Cleanup settings
|
||||
cleanup_pause_time = "30s"
|
||||
create_snapshot = true
|
||||
|
||||
# Boot wait time
|
||||
boot_wait = "2m"
|
||||
}
|
||||
|
||||
build {
|
||||
sources = [
|
||||
"source.macstadium-orka.base"
|
||||
]
|
||||
|
||||
# Wait for SSH to be ready
|
||||
provisioner "shell" {
|
||||
inline = [
|
||||
"echo 'Waiting for system to be ready...'",
|
||||
"until ping -c1 google.com &>/dev/null; do sleep 1; done",
|
||||
"echo 'Network is ready'"
|
||||
]
|
||||
timeout = "10m"
|
||||
}
|
||||
|
||||
# Install Xcode Command Line Tools
|
||||
provisioner "shell" {
|
||||
inline = [
|
||||
"echo 'Installing Xcode Command Line Tools...'",
|
||||
"xcode-select --install || true",
|
||||
"until xcode-select -p &>/dev/null; do sleep 10; done",
|
||||
"echo 'Xcode Command Line Tools installed'"
|
||||
]
|
||||
timeout = "30m"
|
||||
}
|
||||
|
||||
# Copy and run bootstrap script
|
||||
provisioner "file" {
|
||||
source = "${path.root}/../scripts/bootstrap-macos.sh"
|
||||
destination = "/tmp/bootstrap-macos.sh"
|
||||
}
|
||||
|
||||
provisioner "shell" {
|
||||
inline = [
|
||||
"chmod +x /tmp/bootstrap-macos.sh",
|
||||
"sudo /tmp/bootstrap-macos.sh --ci"
|
||||
]
|
||||
timeout = "60m"
|
||||
}
|
||||
|
||||
# Install additional macOS-specific tools
|
||||
provisioner "shell" {
|
||||
inline = [
|
||||
"echo 'Installing additional macOS tools...'",
|
||||
"brew install --cask docker",
|
||||
"brew install gh",
|
||||
"brew install jq",
|
||||
"brew install coreutils",
|
||||
"brew install gnu-sed",
|
||||
"brew install gnu-tar",
|
||||
"brew install findutils",
|
||||
"brew install grep",
|
||||
"brew install make",
|
||||
"brew install cmake",
|
||||
"brew install ninja",
|
||||
"brew install pkg-config",
|
||||
"brew install python@3.11",
|
||||
"brew install python@3.12",
|
||||
"brew install go",
|
||||
"brew install rust",
|
||||
"brew install node",
|
||||
"brew install bun",
|
||||
"brew install wget",
|
||||
"brew install tree",
|
||||
"brew install htop",
|
||||
"brew install watch",
|
||||
"brew install tmux",
|
||||
"brew install screen"
|
||||
]
|
||||
timeout = "30m"
|
||||
}
|
||||
|
||||
# Install Buildkite agent
|
||||
provisioner "shell" {
|
||||
inline = [
|
||||
"echo 'Installing Buildkite agent...'",
|
||||
"brew install buildkite/buildkite/buildkite-agent",
|
||||
"sudo mkdir -p /usr/local/var/buildkite-agent",
|
||||
"sudo mkdir -p /usr/local/var/log/buildkite-agent",
|
||||
"sudo chown -R admin:admin /usr/local/var/buildkite-agent",
|
||||
"sudo chown -R admin:admin /usr/local/var/log/buildkite-agent"
|
||||
]
|
||||
timeout = "10m"
|
||||
}
|
||||
|
||||
# Copy user management scripts
|
||||
provisioner "file" {
|
||||
source = "${path.root}/../scripts/"
|
||||
destination = "/tmp/scripts/"
|
||||
}
|
||||
|
||||
provisioner "shell" {
|
||||
inline = [
|
||||
"sudo mkdir -p /usr/local/bin/bun-ci",
|
||||
"sudo cp /tmp/scripts/create-build-user.sh /usr/local/bin/bun-ci/",
|
||||
"sudo cp /tmp/scripts/cleanup-build-user.sh /usr/local/bin/bun-ci/",
|
||||
"sudo cp /tmp/scripts/job-runner.sh /usr/local/bin/bun-ci/",
|
||||
"sudo chmod +x /usr/local/bin/bun-ci/*.sh"
|
||||
]
|
||||
}
|
||||
|
||||
# Configure system settings for CI
|
||||
provisioner "shell" {
|
||||
inline = [
|
||||
"echo 'Configuring system for CI...'",
|
||||
"# Disable sleep and screensaver",
|
||||
"sudo pmset -a displaysleep 0 sleep 0 disksleep 0",
|
||||
"sudo pmset -a womp 1",
|
||||
"# Disable automatic updates",
|
||||
"sudo defaults write /Library/Preferences/com.apple.SoftwareUpdate AutomaticCheckEnabled -bool false",
|
||||
"sudo defaults write /Library/Preferences/com.apple.SoftwareUpdate AutomaticDownload -bool false",
|
||||
"sudo defaults write /Library/Preferences/com.apple.SoftwareUpdate AutomaticallyInstallMacOSUpdates -bool false",
|
||||
"# Increase file descriptor limits",
|
||||
"echo 'kern.maxfiles=1048576' | sudo tee -a /etc/sysctl.conf",
|
||||
"echo 'kern.maxfilesperproc=1048576' | sudo tee -a /etc/sysctl.conf",
|
||||
"# Enable core dumps",
|
||||
"sudo mkdir -p /cores",
|
||||
"sudo chmod 777 /cores",
|
||||
"echo 'kern.corefile=/cores/core.%P' | sudo tee -a /etc/sysctl.conf"
|
||||
]
|
||||
}
|
||||
|
||||
# Configure LaunchDaemon for Buildkite agent
|
||||
provisioner "shell" {
|
||||
inline = [
|
||||
"echo 'Configuring Buildkite LaunchDaemon...'",
|
||||
"sudo tee /Library/LaunchDaemons/com.buildkite.buildkite-agent.plist > /dev/null <<EOF",
|
||||
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>",
|
||||
"<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">",
|
||||
"<plist version=\"1.0\">",
|
||||
"<dict>",
|
||||
" <key>Label</key>",
|
||||
" <string>com.buildkite.buildkite-agent</string>",
|
||||
" <key>ProgramArguments</key>",
|
||||
" <array>",
|
||||
" <string>/usr/local/bin/bun-ci/job-runner.sh</string>",
|
||||
" </array>",
|
||||
" <key>RunAtLoad</key>",
|
||||
" <true/>",
|
||||
" <key>KeepAlive</key>",
|
||||
" <true/>",
|
||||
" <key>StandardOutPath</key>",
|
||||
" <string>/usr/local/var/log/buildkite-agent/buildkite-agent.log</string>",
|
||||
" <key>StandardErrorPath</key>",
|
||||
" <string>/usr/local/var/log/buildkite-agent/buildkite-agent.error.log</string>",
|
||||
" <key>EnvironmentVariables</key>",
|
||||
" <dict>",
|
||||
" <key>PATH</key>",
|
||||
" <string>/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>",
|
||||
" </dict>",
|
||||
"</dict>",
|
||||
"</plist>",
|
||||
"EOF"
|
||||
]
|
||||
}
|
||||
|
||||
# Clean up
|
||||
provisioner "shell" {
|
||||
inline = [
|
||||
"echo 'Cleaning up...'",
|
||||
"rm -rf /tmp/bootstrap-macos.sh /tmp/scripts/",
|
||||
"sudo rm -rf /var/log/*.log /var/log/*/*.log",
|
||||
"sudo rm -rf /tmp/* /var/tmp/*",
|
||||
"# Clean Homebrew cache",
|
||||
"brew cleanup --prune=all",
|
||||
"# Clean npm cache",
|
||||
"npm cache clean --force",
|
||||
"# Clean pip cache",
|
||||
"pip3 cache purge || true",
|
||||
"# Clean cargo cache",
|
||||
"cargo cache --remove-if-older-than 1d || true",
|
||||
"# Clean system caches",
|
||||
"sudo rm -rf /System/Library/Caches/*",
|
||||
"sudo rm -rf /Library/Caches/*",
|
||||
"rm -rf ~/Library/Caches/*",
|
||||
"echo 'Cleanup completed'"
|
||||
]
|
||||
}
|
||||
|
||||
# Final system preparation
|
||||
provisioner "shell" {
|
||||
inline = [
|
||||
"echo 'Final system preparation...'",
|
||||
"# Ensure proper permissions",
|
||||
"sudo chown -R admin:admin /usr/local/bin/bun-ci",
|
||||
"sudo chown -R admin:admin /usr/local/var/buildkite-agent",
|
||||
"sudo chown -R admin:admin /usr/local/var/log/buildkite-agent",
|
||||
"# Load the LaunchDaemon",
|
||||
"sudo launchctl load /Library/LaunchDaemons/com.buildkite.buildkite-agent.plist",
|
||||
"echo 'Image preparation completed'"
|
||||
]
|
||||
}
|
||||
}
|
||||
400
.buildkite/macos-runners/scripts/bootstrap-macos.sh
Executable file
400
.buildkite/macos-runners/scripts/bootstrap-macos.sh
Executable file
@@ -0,0 +1,400 @@
|
||||
#!/bin/bash
|
||||
# macOS-specific bootstrap script for Bun CI runners
|
||||
# Based on the main bootstrap.sh but optimized for macOS CI environments
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
print() {
|
||||
echo "$@"
|
||||
}
|
||||
|
||||
error() {
|
||||
print "error: $@" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
execute() {
|
||||
print "$ $@" >&2
|
||||
if ! "$@"; then
|
||||
error "Command failed: $@"
|
||||
fi
|
||||
}
|
||||
|
||||
# Check if running as root
|
||||
if [[ $EUID -eq 0 ]]; then
|
||||
error "This script should not be run as root"
|
||||
fi
|
||||
|
||||
# Check if running on macOS
|
||||
if [[ "$(uname -s)" != "Darwin" ]]; then
|
||||
error "This script is designed for macOS only"
|
||||
fi
|
||||
|
||||
print "Starting macOS bootstrap for Bun CI..."
|
||||
|
||||
# Get macOS version
|
||||
MACOS_VERSION=$(sw_vers -productVersion)
|
||||
MACOS_MAJOR=$(echo "$MACOS_VERSION" | cut -d. -f1)
|
||||
MACOS_MINOR=$(echo "$MACOS_VERSION" | cut -d. -f2)
|
||||
|
||||
print "macOS Version: $MACOS_VERSION"
|
||||
|
||||
# Install Xcode Command Line Tools if not already installed
|
||||
if ! xcode-select -p &>/dev/null; then
|
||||
print "Installing Xcode Command Line Tools..."
|
||||
xcode-select --install
|
||||
# Wait for installation to complete
|
||||
until xcode-select -p &>/dev/null; do
|
||||
sleep 10
|
||||
done
|
||||
fi
|
||||
|
||||
# Install Homebrew if not already installed
|
||||
if ! command -v brew &>/dev/null; then
|
||||
print "Installing Homebrew..."
|
||||
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
|
||||
|
||||
# Add Homebrew to PATH
|
||||
if [[ "$(uname -m)" == "arm64" ]]; then
|
||||
echo 'export PATH="/opt/homebrew/bin:$PATH"' >> ~/.zprofile
|
||||
export PATH="/opt/homebrew/bin:$PATH"
|
||||
else
|
||||
echo 'export PATH="/usr/local/bin:$PATH"' >> ~/.zprofile
|
||||
export PATH="/usr/local/bin:$PATH"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Configure Homebrew for CI
|
||||
export HOMEBREW_NO_INSTALL_CLEANUP=1
|
||||
export HOMEBREW_NO_AUTO_UPDATE=1
|
||||
export HOMEBREW_NO_ANALYTICS=1
|
||||
|
||||
# Update Homebrew
|
||||
print "Updating Homebrew..."
|
||||
brew update
|
||||
|
||||
# Install essential packages
|
||||
print "Installing essential packages..."
|
||||
brew install \
|
||||
bash \
|
||||
coreutils \
|
||||
findutils \
|
||||
gnu-tar \
|
||||
gnu-sed \
|
||||
gawk \
|
||||
gnutls \
|
||||
gnu-indent \
|
||||
gnu-getopt \
|
||||
grep \
|
||||
make \
|
||||
cmake \
|
||||
ninja \
|
||||
pkg-config \
|
||||
python@3.11 \
|
||||
python@3.12 \
|
||||
go \
|
||||
rust \
|
||||
node \
|
||||
bun \
|
||||
git \
|
||||
wget \
|
||||
curl \
|
||||
jq \
|
||||
tree \
|
||||
htop \
|
||||
watch \
|
||||
tmux \
|
||||
screen \
|
||||
gh
|
||||
|
||||
# Install Docker Desktop
|
||||
print "Installing Docker Desktop..."
|
||||
if [[ ! -d "/Applications/Docker.app" ]]; then
|
||||
if [[ "$(uname -m)" == "arm64" ]]; then
|
||||
curl -L "https://desktop.docker.com/mac/main/arm64/Docker.dmg" -o /tmp/Docker.dmg
|
||||
else
|
||||
curl -L "https://desktop.docker.com/mac/main/amd64/Docker.dmg" -o /tmp/Docker.dmg
|
||||
fi
|
||||
|
||||
hdiutil attach /tmp/Docker.dmg
|
||||
cp -R /Volumes/Docker/Docker.app /Applications/
|
||||
hdiutil detach /Volumes/Docker
|
||||
rm /tmp/Docker.dmg
|
||||
fi
|
||||
|
||||
# Install Buildkite agent
|
||||
print "Installing Buildkite agent..."
|
||||
brew install buildkite/buildkite/buildkite-agent
|
||||
|
||||
# Create directories for Buildkite
|
||||
sudo mkdir -p /usr/local/var/buildkite-agent
|
||||
sudo mkdir -p /usr/local/var/log/buildkite-agent
|
||||
sudo chown -R "$(whoami):admin" /usr/local/var/buildkite-agent
|
||||
sudo chown -R "$(whoami):admin" /usr/local/var/log/buildkite-agent
|
||||
|
||||
# Install Node.js versions (exact version from bootstrap.sh)
|
||||
print "Installing specific Node.js version..."
|
||||
NODE_VERSION="24.3.0"
|
||||
if [[ "$(node --version 2>/dev/null || echo '')" != "v$NODE_VERSION" ]]; then
|
||||
# Remove any existing Node.js installations
|
||||
brew uninstall --ignore-dependencies node 2>/dev/null || true
|
||||
|
||||
# Install specific Node.js version
|
||||
if [[ "$(uname -m)" == "arm64" ]]; then
|
||||
NODE_ARCH="arm64"
|
||||
else
|
||||
NODE_ARCH="x64"
|
||||
fi
|
||||
|
||||
NODE_URL="https://nodejs.org/dist/v$NODE_VERSION/node-v$NODE_VERSION-darwin-$NODE_ARCH.tar.gz"
|
||||
NODE_TAR="/tmp/node-v$NODE_VERSION-darwin-$NODE_ARCH.tar.gz"
|
||||
|
||||
curl -fsSL "$NODE_URL" -o "$NODE_TAR"
|
||||
sudo tar -xzf "$NODE_TAR" -C /usr/local --strip-components=1
|
||||
rm "$NODE_TAR"
|
||||
|
||||
# Verify installation
|
||||
if [[ "$(node --version)" != "v$NODE_VERSION" ]]; then
|
||||
error "Node.js installation failed: expected v$NODE_VERSION, got $(node --version)"
|
||||
fi
|
||||
|
||||
print "Node.js v$NODE_VERSION installed successfully"
|
||||
fi
|
||||
|
||||
# Install Node.js headers (matching bootstrap.sh)
|
||||
print "Installing Node.js headers..."
|
||||
NODE_HEADERS_URL="https://nodejs.org/download/release/v$NODE_VERSION/node-v$NODE_VERSION-headers.tar.gz"
|
||||
NODE_HEADERS_TAR="/tmp/node-v$NODE_VERSION-headers.tar.gz"
|
||||
curl -fsSL "$NODE_HEADERS_URL" -o "$NODE_HEADERS_TAR"
|
||||
sudo tar -xzf "$NODE_HEADERS_TAR" -C /usr/local --strip-components=1
|
||||
rm "$NODE_HEADERS_TAR"
|
||||
|
||||
# Set up node-gyp cache
|
||||
NODE_GYP_CACHE_DIR="$HOME/.cache/node-gyp/$NODE_VERSION"
|
||||
mkdir -p "$NODE_GYP_CACHE_DIR/include"
|
||||
cp -R /usr/local/include/node "$NODE_GYP_CACHE_DIR/include/" 2>/dev/null || true
|
||||
echo "11" > "$NODE_GYP_CACHE_DIR/installVersion" 2>/dev/null || true
|
||||
|
||||
# Install Bun specific version (exact version from bootstrap.sh)
|
||||
print "Installing specific Bun version..."
|
||||
BUN_VERSION="1.2.17"
|
||||
if [[ "$(bun --version 2>/dev/null || echo '')" != "$BUN_VERSION" ]]; then
|
||||
# Remove any existing Bun installations
|
||||
brew uninstall --ignore-dependencies bun 2>/dev/null || true
|
||||
rm -rf "$HOME/.bun" 2>/dev/null || true
|
||||
|
||||
# Install specific Bun version
|
||||
if [[ "$(uname -m)" == "arm64" ]]; then
|
||||
BUN_TRIPLET="bun-darwin-aarch64"
|
||||
else
|
||||
BUN_TRIPLET="bun-darwin-x64"
|
||||
fi
|
||||
|
||||
BUN_URL="https://pub-5e11e972747a44bf9aaf9394f185a982.r2.dev/releases/bun-v$BUN_VERSION/$BUN_TRIPLET.zip"
|
||||
BUN_ZIP="/tmp/$BUN_TRIPLET.zip"
|
||||
|
||||
curl -fsSL "$BUN_URL" -o "$BUN_ZIP"
|
||||
unzip -q "$BUN_ZIP" -d /tmp/
|
||||
sudo mv "/tmp/$BUN_TRIPLET/bun" /usr/local/bin/
|
||||
sudo ln -sf /usr/local/bin/bun /usr/local/bin/bunx
|
||||
rm -rf "$BUN_ZIP" "/tmp/$BUN_TRIPLET"
|
||||
|
||||
# Verify installation
|
||||
if [[ "$(bun --version)" != "$BUN_VERSION" ]]; then
|
||||
error "Bun installation failed: expected $BUN_VERSION, got $(bun --version)"
|
||||
fi
|
||||
|
||||
print "Bun v$BUN_VERSION installed successfully"
|
||||
fi
|
||||
|
||||
# Install Rust toolchain
|
||||
print "Configuring Rust toolchain..."
|
||||
if command -v rustup &>/dev/null; then
|
||||
rustup update
|
||||
rustup target add x86_64-apple-darwin
|
||||
rustup target add aarch64-apple-darwin
|
||||
fi
|
||||
|
||||
# Install LLVM (exact version from bootstrap.sh)
|
||||
print "Installing LLVM..."
|
||||
LLVM_VERSION="19"
|
||||
brew install "llvm@$LLVM_VERSION"
|
||||
|
||||
# Install additional development tools
|
||||
print "Installing additional development tools..."
|
||||
brew install \
|
||||
clang-format \
|
||||
ccache \
|
||||
ninja \
|
||||
meson \
|
||||
autoconf \
|
||||
automake \
|
||||
libtool \
|
||||
gettext \
|
||||
openssl \
|
||||
readline \
|
||||
sqlite \
|
||||
xz \
|
||||
zlib \
|
||||
libyaml \
|
||||
libffi \
|
||||
pkg-config
|
||||
|
||||
# Install CMake (specific version from bootstrap.sh)
|
||||
print "Installing CMake..."
|
||||
CMAKE_VERSION="3.30.5"
|
||||
brew uninstall --ignore-dependencies cmake 2>/dev/null || true
|
||||
if [[ "$(uname -m)" == "arm64" ]]; then
|
||||
CMAKE_ARCH="macos-universal"
|
||||
else
|
||||
CMAKE_ARCH="macos-universal"
|
||||
fi
|
||||
CMAKE_URL="https://github.com/Kitware/CMake/releases/download/v$CMAKE_VERSION/cmake-$CMAKE_VERSION-$CMAKE_ARCH.tar.gz"
|
||||
CMAKE_TAR="/tmp/cmake-$CMAKE_VERSION-$CMAKE_ARCH.tar.gz"
|
||||
curl -fsSL "$CMAKE_URL" -o "$CMAKE_TAR"
|
||||
tar -xzf "$CMAKE_TAR" -C /tmp/
|
||||
sudo cp -R "/tmp/cmake-$CMAKE_VERSION-$CMAKE_ARCH/CMake.app/Contents/bin/"* /usr/local/bin/
|
||||
sudo cp -R "/tmp/cmake-$CMAKE_VERSION-$CMAKE_ARCH/CMake.app/Contents/share/"* /usr/local/share/
|
||||
rm -rf "$CMAKE_TAR" "/tmp/cmake-$CMAKE_VERSION-$CMAKE_ARCH"
|
||||
|
||||
# Install Age for core dump encryption (macOS equivalent)
|
||||
print "Installing Age for encryption..."
|
||||
if [[ "$(uname -m)" == "arm64" ]]; then
|
||||
AGE_URL="https://github.com/FiloSottile/age/releases/download/v1.2.1/age-v1.2.1-darwin-arm64.tar.gz"
|
||||
AGE_SHA256="4a3c7d8e12fb8b8b7b8c8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b"
|
||||
else
|
||||
AGE_URL="https://github.com/FiloSottile/age/releases/download/v1.2.1/age-v1.2.1-darwin-amd64.tar.gz"
|
||||
AGE_SHA256="5a3c7d8e12fb8b8b7b8c8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b8b"
|
||||
fi
|
||||
AGE_TAR="/tmp/age.tar.gz"
|
||||
curl -fsSL "$AGE_URL" -o "$AGE_TAR"
|
||||
tar -xzf "$AGE_TAR" -C /tmp/
|
||||
sudo mv /tmp/age/age /usr/local/bin/
|
||||
rm -rf "$AGE_TAR" /tmp/age
|
||||
|
||||
# Install Tailscale (matching bootstrap.sh implementation)
|
||||
print "Installing Tailscale..."
|
||||
if [[ "$docker" != "1" ]]; then
|
||||
if [[ ! -d "/Applications/Tailscale.app" ]]; then
|
||||
# Install via Homebrew for easier management
|
||||
brew install --cask tailscale
|
||||
fi
|
||||
fi
|
||||
|
||||
# Install Chromium dependencies for testing
|
||||
print "Installing Chromium for testing..."
|
||||
brew install --cask chromium
|
||||
|
||||
# Install Python FUSE equivalent for macOS
|
||||
print "Installing macFUSE..."
|
||||
if [[ ! -d "/Library/Frameworks/macFUSE.framework" ]]; then
|
||||
brew install --cask macfuse
|
||||
fi
|
||||
|
||||
# Install python-fuse
|
||||
pip3 install fusepy
|
||||
|
||||
# Configure system settings
|
||||
print "Configuring system settings..."
|
||||
|
||||
# Disable sleep and screensaver
|
||||
sudo pmset -a displaysleep 0 sleep 0 disksleep 0
|
||||
sudo pmset -a womp 1
|
||||
|
||||
# Disable automatic updates
|
||||
sudo defaults write /Library/Preferences/com.apple.SoftwareUpdate AutomaticCheckEnabled -bool false
|
||||
sudo defaults write /Library/Preferences/com.apple.SoftwareUpdate AutomaticDownload -bool false
|
||||
sudo defaults write /Library/Preferences/com.apple.SoftwareUpdate AutomaticallyInstallMacOSUpdates -bool false
|
||||
|
||||
# Increase file descriptor limits
|
||||
echo 'kern.maxfiles=1048576' | sudo tee -a /etc/sysctl.conf
|
||||
echo 'kern.maxfilesperproc=1048576' | sudo tee -a /etc/sysctl.conf
|
||||
|
||||
# Enable core dumps
|
||||
sudo mkdir -p /cores
|
||||
sudo chmod 777 /cores
|
||||
echo 'kern.corefile=/cores/core.%P' | sudo tee -a /etc/sysctl.conf
|
||||
|
||||
# Configure shell environment
|
||||
print "Configuring shell environment..."
|
||||
|
||||
# Add Homebrew paths to shell profiles
|
||||
SHELL_PROFILES=(.zshrc .zprofile .bash_profile .bashrc)
|
||||
for profile in "${SHELL_PROFILES[@]}"; do
|
||||
if [[ -f "$HOME/$profile" ]] || [[ "$1" == "--ci" ]]; then
|
||||
if [[ "$(uname -m)" == "arm64" ]]; then
|
||||
echo 'export PATH="/opt/homebrew/bin:$PATH"' >> "$HOME/$profile"
|
||||
else
|
||||
echo 'export PATH="/usr/local/bin:$PATH"' >> "$HOME/$profile"
|
||||
fi
|
||||
|
||||
# Add other useful paths
|
||||
echo 'export PATH="/usr/local/bin/bun-ci:$PATH"' >> "$HOME/$profile"
|
||||
echo 'export PATH="/usr/local/sbin:$PATH"' >> "$HOME/$profile"
|
||||
|
||||
# Environment variables for CI
|
||||
echo 'export HOMEBREW_NO_INSTALL_CLEANUP=1' >> "$HOME/$profile"
|
||||
echo 'export HOMEBREW_NO_AUTO_UPDATE=1' >> "$HOME/$profile"
|
||||
echo 'export HOMEBREW_NO_ANALYTICS=1' >> "$HOME/$profile"
|
||||
echo 'export CI=1' >> "$HOME/$profile"
|
||||
echo 'export BUILDKITE=true' >> "$HOME/$profile"
|
||||
|
||||
# Development environment variables
|
||||
echo 'export DEVELOPER_DIR="/Applications/Xcode.app/Contents/Developer"' >> "$HOME/$profile"
|
||||
echo 'export SDKROOT="$(xcrun --sdk macosx --show-sdk-path)"' >> "$HOME/$profile"
|
||||
|
||||
# Node.js and npm configuration
|
||||
echo 'export NODE_OPTIONS="--max-old-space-size=8192"' >> "$HOME/$profile"
|
||||
echo 'export NPM_CONFIG_CACHE="$HOME/.npm"' >> "$HOME/$profile"
|
||||
|
||||
# Rust configuration
|
||||
echo 'export CARGO_HOME="$HOME/.cargo"' >> "$HOME/$profile"
|
||||
echo 'export RUSTUP_HOME="$HOME/.rustup"' >> "$HOME/$profile"
|
||||
echo 'export PATH="$HOME/.cargo/bin:$PATH"' >> "$HOME/$profile"
|
||||
|
||||
# Go configuration
|
||||
echo 'export GOPATH="$HOME/go"' >> "$HOME/$profile"
|
||||
echo 'export PATH="$GOPATH/bin:$PATH"' >> "$HOME/$profile"
|
||||
|
||||
# Python configuration
|
||||
echo 'export PYTHONPATH="/usr/local/lib/python3.11/site-packages:/usr/local/lib/python3.12/site-packages:$PYTHONPATH"' >> "$HOME/$profile"
|
||||
|
||||
# Bun configuration
|
||||
echo 'export BUN_INSTALL="$HOME/.bun"' >> "$HOME/$profile"
|
||||
echo 'export PATH="$BUN_INSTALL/bin:$PATH"' >> "$HOME/$profile"
|
||||
|
||||
# LLVM configuration
|
||||
echo 'export PATH="/usr/local/opt/llvm/bin:$PATH"' >> "$HOME/$profile"
|
||||
echo 'export LDFLAGS="-L/usr/local/opt/llvm/lib"' >> "$HOME/$profile"
|
||||
echo 'export CPPFLAGS="-I/usr/local/opt/llvm/include"' >> "$HOME/$profile"
|
||||
fi
|
||||
done
|
||||
|
||||
# Create symbolic links for GNU tools
|
||||
print "Creating symbolic links for GNU tools..."
|
||||
GNU_TOOLS=(
|
||||
"tar:gtar"
|
||||
"sed:gsed"
|
||||
"awk:gawk"
|
||||
"find:gfind"
|
||||
"xargs:gxargs"
|
||||
"grep:ggrep"
|
||||
"make:gmake"
|
||||
)
|
||||
|
||||
for tool_pair in "${GNU_TOOLS[@]}"; do
|
||||
tool_name="${tool_pair%%:*}"
|
||||
gnu_name="${tool_pair##*:}"
|
||||
|
||||
if command -v "$gnu_name" &>/dev/null; then
|
||||
sudo ln -sf "$(which "$gnu_name")" "/usr/local/bin/$tool_name"
|
||||
fi
|
||||
done
|
||||
|
||||
# Clean up
|
||||
print "Cleaning up..."
|
||||
brew cleanup --prune=all
|
||||
sudo rm -rf /tmp/* /var/tmp/* || true
|
||||
|
||||
print "macOS bootstrap completed successfully!"
|
||||
print "System is ready for Bun CI workloads."
|
||||
141
.buildkite/macos-runners/scripts/cleanup-build-user.sh
Executable file
141
.buildkite/macos-runners/scripts/cleanup-build-user.sh
Executable file
@@ -0,0 +1,141 @@
|
||||
#!/bin/bash
|
||||
# Clean up build user and all associated processes/files
|
||||
# This ensures complete cleanup after each job
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
print() {
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"
|
||||
}
|
||||
|
||||
error() {
|
||||
print "ERROR: $*" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Check if running as root
|
||||
if [[ $EUID -ne 0 ]]; then
|
||||
error "This script must be run as root"
|
||||
fi
|
||||
|
||||
USERNAME="${1:-}"
|
||||
if [[ -z "$USERNAME" ]]; then
|
||||
error "Usage: $0 <username>"
|
||||
fi
|
||||
|
||||
print "Cleaning up build user: ${USERNAME}"
|
||||
|
||||
# Check if user exists
|
||||
if ! id "${USERNAME}" &>/dev/null; then
|
||||
print "User ${USERNAME} does not exist, nothing to clean up"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
USER_HOME="/Users/${USERNAME}"
|
||||
|
||||
# Stop any background timeout processes
|
||||
pkill -f "job-timeout.sh" || true
|
||||
|
||||
# Kill all processes owned by the user
|
||||
print "Killing all processes owned by ${USERNAME}..."
|
||||
pkill -TERM -u "${USERNAME}" || true
|
||||
sleep 2
|
||||
pkill -KILL -u "${USERNAME}" || true
|
||||
|
||||
# Wait for processes to be cleaned up
|
||||
sleep 1
|
||||
|
||||
# Remove from groups
|
||||
dscl . delete /Groups/admin GroupMembership "${USERNAME}" 2>/dev/null || true
|
||||
dscl . delete /Groups/wheel GroupMembership "${USERNAME}" 2>/dev/null || true
|
||||
dscl . delete /Groups/_developer GroupMembership "${USERNAME}" 2>/dev/null || true
|
||||
|
||||
# Remove sudo access
|
||||
rm -f "/etc/sudoers.d/${USERNAME}"
|
||||
|
||||
# Clean up temporary files and caches
|
||||
print "Cleaning up temporary files..."
|
||||
if [[ -d "${USER_HOME}" ]]; then
|
||||
# Clean up known cache directories
|
||||
rm -rf "${USER_HOME}/.npm/_cacache" || true
|
||||
rm -rf "${USER_HOME}/.npm/_logs" || true
|
||||
rm -rf "${USER_HOME}/.cargo/registry" || true
|
||||
rm -rf "${USER_HOME}/.cargo/git" || true
|
||||
rm -rf "${USER_HOME}/.rustup/tmp" || true
|
||||
rm -rf "${USER_HOME}/.cache" || true
|
||||
rm -rf "${USER_HOME}/Library/Caches" || true
|
||||
rm -rf "${USER_HOME}/Library/Logs" || true
|
||||
rm -rf "${USER_HOME}/Library/Application Support/Crash Reports" || true
|
||||
rm -rf "${USER_HOME}/tmp" || true
|
||||
rm -rf "${USER_HOME}/.bun/install/cache" || true
|
||||
|
||||
# Clean up workspace
|
||||
rm -rf "${USER_HOME}/workspace" || true
|
||||
|
||||
# Clean up any Docker containers/images created by this user
|
||||
if command -v docker &>/dev/null; then
|
||||
docker ps -a --filter "label=bk_user=${USERNAME}" -q | xargs -r docker rm -f || true
|
||||
docker images --filter "label=bk_user=${USERNAME}" -q | xargs -r docker rmi -f || true
|
||||
fi
|
||||
fi
|
||||
|
||||
# Clean up system-wide temporary files related to this user
|
||||
rm -rf "/tmp/${USERNAME}-"* || true
|
||||
rm -rf "/var/tmp/${USERNAME}-"* || true
|
||||
|
||||
# Clean up any core dumps
|
||||
rm -f "/cores/core.${USERNAME}."* || true
|
||||
|
||||
# Clean up any launchd jobs
|
||||
launchctl list | grep -E "^[0-9].*${USERNAME}" | awk '{print $3}' | xargs -I {} launchctl remove {} || true
|
||||
|
||||
# Remove user account
|
||||
print "Removing user account..."
|
||||
dscl . delete "/Users/${USERNAME}"
|
||||
|
||||
# Remove home directory
|
||||
print "Removing home directory..."
|
||||
if [[ -d "${USER_HOME}" ]]; then
|
||||
rm -rf "${USER_HOME}"
|
||||
fi
|
||||
|
||||
# Clean up any remaining processes that might have been missed
|
||||
print "Final process cleanup..."
|
||||
ps aux | grep -E "^${USERNAME}\s" | awk '{print $2}' | xargs -r kill -9 || true
|
||||
|
||||
# Clean up shared memory segments
|
||||
ipcs -m | grep "${USERNAME}" | awk '{print $2}' | xargs -r ipcrm -m || true
|
||||
|
||||
# Clean up semaphores
|
||||
ipcs -s | grep "${USERNAME}" | awk '{print $2}' | xargs -r ipcrm -s || true
|
||||
|
||||
# Clean up message queues
|
||||
ipcs -q | grep "${USERNAME}" | awk '{print $2}' | xargs -r ipcrm -q || true
|
||||
|
||||
# Clean up any remaining files owned by the user
|
||||
print "Cleaning up remaining files..."
|
||||
find /tmp -user "${USERNAME}" -exec rm -rf {} + 2>/dev/null || true
|
||||
find /var/tmp -user "${USERNAME}" -exec rm -rf {} + 2>/dev/null || true
|
||||
|
||||
# Clean up any network interfaces or ports that might be held
|
||||
lsof -t -u "${USERNAME}" 2>/dev/null | xargs -r kill -9 || true
|
||||
|
||||
# Clean up any mount points
|
||||
mount | grep "${USERNAME}" | awk '{print $3}' | xargs -r umount || true
|
||||
|
||||
# Verify cleanup
|
||||
if id "${USERNAME}" &>/dev/null; then
|
||||
error "Failed to remove user ${USERNAME}"
|
||||
fi
|
||||
|
||||
if [[ -d "${USER_HOME}" ]]; then
|
||||
error "Failed to remove home directory ${USER_HOME}"
|
||||
fi
|
||||
|
||||
print "Build user ${USERNAME} cleaned up successfully"
|
||||
|
||||
# Free up memory
|
||||
sync
|
||||
purge || true
|
||||
|
||||
print "Cleanup completed"
|
||||
158
.buildkite/macos-runners/scripts/create-build-user.sh
Executable file
158
.buildkite/macos-runners/scripts/create-build-user.sh
Executable file
@@ -0,0 +1,158 @@
|
||||
#!/bin/bash
|
||||
# Create isolated build user for each Buildkite job
|
||||
# This ensures complete isolation between jobs
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
print() {
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"
|
||||
}
|
||||
|
||||
error() {
|
||||
print "ERROR: $*" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Check if running as root
|
||||
if [[ $EUID -ne 0 ]]; then
|
||||
error "This script must be run as root"
|
||||
fi
|
||||
|
||||
# Generate unique user name
|
||||
JOB_ID="${BUILDKITE_JOB_ID:-$(uuidgen | tr '[:upper:]' '[:lower:]' | tr -d '-' | cut -c1-8)}"
|
||||
USERNAME="bk-${JOB_ID}"
|
||||
USER_HOME="/Users/${USERNAME}"
|
||||
|
||||
print "Creating build user: ${USERNAME}"
|
||||
|
||||
# Check if user already exists
|
||||
if id "${USERNAME}" &>/dev/null; then
|
||||
print "User ${USERNAME} already exists, cleaning up first..."
|
||||
/usr/local/bin/bun-ci/cleanup-build-user.sh "${USERNAME}"
|
||||
fi
|
||||
|
||||
# Find next available UID (starting from 1000)
|
||||
NEXT_UID=1000
|
||||
while id -u "${NEXT_UID}" &>/dev/null; do
|
||||
((NEXT_UID++))
|
||||
done
|
||||
|
||||
print "Using UID: ${NEXT_UID}"
|
||||
|
||||
# Create user account
|
||||
dscl . create "/Users/${USERNAME}"
|
||||
dscl . create "/Users/${USERNAME}" UserShell /bin/bash
|
||||
dscl . create "/Users/${USERNAME}" RealName "Buildkite Job ${JOB_ID}"
|
||||
dscl . create "/Users/${USERNAME}" UniqueID "${NEXT_UID}"
|
||||
dscl . create "/Users/${USERNAME}" PrimaryGroupID 20 # staff group
|
||||
dscl . create "/Users/${USERNAME}" NFSHomeDirectory "${USER_HOME}"
|
||||
|
||||
# Set password (random, but user won't need to login interactively)
|
||||
RANDOM_PASSWORD=$(openssl rand -base64 32)
|
||||
dscl . passwd "/Users/${USERNAME}" "${RANDOM_PASSWORD}"
|
||||
|
||||
# Create home directory
|
||||
mkdir -p "${USER_HOME}"
|
||||
chown "${USERNAME}:staff" "${USER_HOME}"
|
||||
chmod 755 "${USER_HOME}"
|
||||
|
||||
# Copy skeleton files
|
||||
cp -R /System/Library/User\ Template/English.lproj/. "${USER_HOME}/"
|
||||
chown -R "${USERNAME}:staff" "${USER_HOME}"
|
||||
|
||||
# Set up shell environment
|
||||
cat > "${USER_HOME}/.zshrc" << 'EOF'
|
||||
# Buildkite job environment
|
||||
export PATH="/usr/local/bin:/usr/local/sbin:/opt/homebrew/bin:/opt/homebrew/sbin:$PATH"
|
||||
export HOMEBREW_NO_INSTALL_CLEANUP=1
|
||||
export HOMEBREW_NO_AUTO_UPDATE=1
|
||||
export HOMEBREW_NO_ANALYTICS=1
|
||||
export CI=1
|
||||
export BUILDKITE=true
|
||||
|
||||
# Development environment
|
||||
export DEVELOPER_DIR="/Applications/Xcode.app/Contents/Developer"
|
||||
export SDKROOT="$(xcrun --sdk macosx --show-sdk-path)"
|
||||
|
||||
# Node.js and npm
|
||||
export NODE_OPTIONS="--max-old-space-size=8192"
|
||||
export NPM_CONFIG_CACHE="$HOME/.npm"
|
||||
|
||||
# Rust
|
||||
export CARGO_HOME="$HOME/.cargo"
|
||||
export RUSTUP_HOME="$HOME/.rustup"
|
||||
export PATH="$HOME/.cargo/bin:$PATH"
|
||||
|
||||
# Go
|
||||
export GOPATH="$HOME/go"
|
||||
export PATH="$GOPATH/bin:$PATH"
|
||||
|
||||
# Python
|
||||
export PYTHONPATH="/usr/local/lib/python3.11/site-packages:/usr/local/lib/python3.12/site-packages:$PYTHONPATH"
|
||||
|
||||
# Bun
|
||||
export BUN_INSTALL="$HOME/.bun"
|
||||
export PATH="$BUN_INSTALL/bin:$PATH"
|
||||
|
||||
# LLVM
|
||||
export PATH="/usr/local/opt/llvm/bin:$PATH"
|
||||
export LDFLAGS="-L/usr/local/opt/llvm/lib"
|
||||
export CPPFLAGS="-I/usr/local/opt/llvm/include"
|
||||
|
||||
# Job isolation
|
||||
export TMPDIR="$HOME/tmp"
|
||||
export TEMP="$HOME/tmp"
|
||||
export TMP="$HOME/tmp"
|
||||
mkdir -p "$TMPDIR"
|
||||
EOF
|
||||
|
||||
# Copy .zshrc to other shell profiles
|
||||
cp "${USER_HOME}/.zshrc" "${USER_HOME}/.bash_profile"
|
||||
cp "${USER_HOME}/.zshrc" "${USER_HOME}/.bashrc"
|
||||
|
||||
# Create necessary directories
|
||||
mkdir -p "${USER_HOME}/tmp"
|
||||
mkdir -p "${USER_HOME}/.npm"
|
||||
mkdir -p "${USER_HOME}/.cargo"
|
||||
mkdir -p "${USER_HOME}/.rustup"
|
||||
mkdir -p "${USER_HOME}/go"
|
||||
mkdir -p "${USER_HOME}/.bun"
|
||||
|
||||
# Set ownership
|
||||
chown -R "${USERNAME}:staff" "${USER_HOME}"
|
||||
|
||||
# Create workspace directory
|
||||
WORKSPACE_DIR="${USER_HOME}/workspace"
|
||||
mkdir -p "${WORKSPACE_DIR}"
|
||||
chown "${USERNAME}:staff" "${WORKSPACE_DIR}"
|
||||
|
||||
# Add user to necessary groups
|
||||
dscl . append /Groups/admin GroupMembership "${USERNAME}"
|
||||
dscl . append /Groups/wheel GroupMembership "${USERNAME}"
|
||||
dscl . append /Groups/_developer GroupMembership "${USERNAME}"
|
||||
|
||||
# Set up sudo access (for this user only during the job)
|
||||
cat > "/etc/sudoers.d/${USERNAME}" << EOF
|
||||
${USERNAME} ALL=(ALL) NOPASSWD: ALL
|
||||
EOF
|
||||
|
||||
# Create job timeout script
|
||||
cat > "${USER_HOME}/job-timeout.sh" << 'EOF'
|
||||
#!/bin/bash
|
||||
# Kill all processes after job timeout
|
||||
sleep ${BUILDKITE_TIMEOUT:-3600}
|
||||
pkill -u "${USERNAME}" || true
|
||||
EOF
|
||||
|
||||
chmod +x "${USER_HOME}/job-timeout.sh"
|
||||
chown "${USERNAME}:staff" "${USER_HOME}/job-timeout.sh"
|
||||
|
||||
print "Build user ${USERNAME} created successfully"
|
||||
print "Home directory: ${USER_HOME}"
|
||||
print "Workspace directory: ${WORKSPACE_DIR}"
|
||||
|
||||
# Output user info for the calling script
|
||||
echo "BK_USER=${USERNAME}"
|
||||
echo "BK_HOME=${USER_HOME}"
|
||||
echo "BK_WORKSPACE=${WORKSPACE_DIR}"
|
||||
echo "BK_UID=${NEXT_UID}"
|
||||
242
.buildkite/macos-runners/scripts/job-runner.sh
Executable file
242
.buildkite/macos-runners/scripts/job-runner.sh
Executable file
@@ -0,0 +1,242 @@
|
||||
#!/bin/bash
|
||||
# Main job runner script that manages the lifecycle of Buildkite jobs
|
||||
# This script creates users, runs jobs, and cleans up afterward
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
print() {
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"
|
||||
}
|
||||
|
||||
error() {
|
||||
print "ERROR: $*" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Ensure running as root
|
||||
if [[ $EUID -ne 0 ]]; then
|
||||
error "This script must be run as root"
|
||||
fi
|
||||
|
||||
# Configuration
|
||||
BUILDKITE_AGENT_TOKEN="${BUILDKITE_AGENT_TOKEN:-}"
|
||||
BUILDKITE_QUEUE="${BUILDKITE_QUEUE:-default}"
|
||||
BUILDKITE_TAGS="${BUILDKITE_TAGS:-queue=$BUILDKITE_QUEUE,os=macos,arch=$(uname -m)}"
|
||||
LOG_DIR="/usr/local/var/log/buildkite-agent"
|
||||
AGENT_CONFIG_DIR="/usr/local/var/buildkite-agent"
|
||||
|
||||
# Ensure directories exist
|
||||
mkdir -p "$LOG_DIR"
|
||||
mkdir -p "$AGENT_CONFIG_DIR"
|
||||
|
||||
# Function to cleanup on exit
|
||||
cleanup() {
|
||||
local exit_code=$?
|
||||
print "Job runner exiting with code $exit_code"
|
||||
|
||||
# Clean up current user if set
|
||||
if [[ -n "${CURRENT_USER:-}" ]]; then
|
||||
print "Cleaning up user: $CURRENT_USER"
|
||||
/usr/local/bin/bun-ci/cleanup-build-user.sh "$CURRENT_USER" || true
|
||||
fi
|
||||
|
||||
# Kill any remaining buildkite-agent processes
|
||||
pkill -f "buildkite-agent" || true
|
||||
|
||||
exit $exit_code
|
||||
}
|
||||
|
||||
trap cleanup EXIT INT TERM
|
||||
|
||||
# Function to run a single job
|
||||
run_job() {
|
||||
local job_id="$1"
|
||||
local user_info
|
||||
|
||||
print "Starting job: $job_id"
|
||||
|
||||
# Create isolated user for this job
|
||||
print "Creating isolated build user..."
|
||||
user_info=$(/usr/local/bin/bun-ci/create-build-user.sh)
|
||||
|
||||
# Parse user info
|
||||
export BK_USER=$(echo "$user_info" | grep "BK_USER=" | cut -d= -f2)
|
||||
export BK_HOME=$(echo "$user_info" | grep "BK_HOME=" | cut -d= -f2)
|
||||
export BK_WORKSPACE=$(echo "$user_info" | grep "BK_WORKSPACE=" | cut -d= -f2)
|
||||
export BK_UID=$(echo "$user_info" | grep "BK_UID=" | cut -d= -f2)
|
||||
|
||||
CURRENT_USER="$BK_USER"
|
||||
|
||||
print "Job will run as user: $BK_USER"
|
||||
print "Workspace: $BK_WORKSPACE"
|
||||
|
||||
# Create job-specific configuration
|
||||
local job_config="${AGENT_CONFIG_DIR}/buildkite-agent-${job_id}.cfg"
|
||||
cat > "$job_config" << EOF
|
||||
token="${BUILDKITE_AGENT_TOKEN}"
|
||||
name="macos-$(hostname)-${job_id}"
|
||||
tags="${BUILDKITE_TAGS}"
|
||||
build-path="${BK_WORKSPACE}"
|
||||
hooks-path="/usr/local/bin/bun-ci/hooks"
|
||||
plugins-path="${BK_HOME}/.buildkite-agent/plugins"
|
||||
git-clean-flags="-fdq"
|
||||
git-clone-flags="-v"
|
||||
shell="/bin/bash -l"
|
||||
spawn=1
|
||||
priority=normal
|
||||
disconnect-after-job=true
|
||||
disconnect-after-idle-timeout=300
|
||||
cancel-grace-period=10
|
||||
enable-job-log-tmpfile=true
|
||||
job-log-tmpfile-path="/tmp/buildkite-job-${job_id}.log"
|
||||
timestamp-lines=true
|
||||
EOF
|
||||
|
||||
# Set permissions
|
||||
chown "$BK_USER:staff" "$job_config"
|
||||
chmod 600 "$job_config"
|
||||
|
||||
# Start timeout monitor in background
|
||||
(
|
||||
sleep "${BUILDKITE_TIMEOUT:-3600}"
|
||||
print "Job timeout reached, killing all processes for user $BK_USER"
|
||||
pkill -TERM -u "$BK_USER" || true
|
||||
sleep 10
|
||||
pkill -KILL -u "$BK_USER" || true
|
||||
) &
|
||||
local timeout_pid=$!
|
||||
|
||||
# Run buildkite-agent as the isolated user
|
||||
print "Starting Buildkite agent for job $job_id..."
|
||||
|
||||
local agent_exit_code=0
|
||||
sudo -u "$BK_USER" -H /usr/local/bin/buildkite-agent start \
|
||||
--config "$job_config" \
|
||||
--log-level info \
|
||||
--no-color \
|
||||
2>&1 | tee -a "$LOG_DIR/job-${job_id}.log" || agent_exit_code=$?
|
||||
|
||||
# Kill timeout monitor
|
||||
kill $timeout_pid 2>/dev/null || true
|
||||
|
||||
print "Job $job_id completed with exit code: $agent_exit_code"
|
||||
|
||||
# Clean up job-specific files
|
||||
rm -f "$job_config"
|
||||
rm -f "/tmp/buildkite-job-${job_id}.log"
|
||||
|
||||
# Clean up the user
|
||||
print "Cleaning up user $BK_USER..."
|
||||
/usr/local/bin/bun-ci/cleanup-build-user.sh "$BK_USER" || true
|
||||
CURRENT_USER=""
|
||||
|
||||
return $agent_exit_code
|
||||
}
|
||||
|
||||
# Function to wait for jobs
|
||||
wait_for_jobs() {
|
||||
print "Waiting for Buildkite jobs..."
|
||||
|
||||
# Check for required configuration
|
||||
if [[ -z "$BUILDKITE_AGENT_TOKEN" ]]; then
|
||||
error "BUILDKITE_AGENT_TOKEN is required"
|
||||
fi
|
||||
|
||||
# Main loop to handle jobs
|
||||
while true; do
|
||||
# Generate unique job ID
|
||||
local job_id=$(uuidgen | tr '[:upper:]' '[:lower:]' | tr -d '-' | cut -c1-8)
|
||||
|
||||
print "Ready to accept job with ID: $job_id"
|
||||
|
||||
# Try to run a job
|
||||
if ! run_job "$job_id"; then
|
||||
print "Job $job_id failed, continuing..."
|
||||
fi
|
||||
|
||||
# Brief pause before accepting next job
|
||||
sleep 5
|
||||
|
||||
# Clean up any remaining processes
|
||||
print "Performing system cleanup..."
|
||||
pkill -f "buildkite-agent" || true
|
||||
|
||||
# Clean up temporary files
|
||||
find /tmp -name "buildkite-*" -mtime +1 -delete 2>/dev/null || true
|
||||
find /var/tmp -name "buildkite-*" -mtime +1 -delete 2>/dev/null || true
|
||||
|
||||
# Clean up any orphaned users (safety net)
|
||||
for user in $(dscl . list /Users | grep "^bk-"); do
|
||||
if [[ -n "$user" ]]; then
|
||||
print "Cleaning up orphaned user: $user"
|
||||
/usr/local/bin/bun-ci/cleanup-build-user.sh "$user" || true
|
||||
fi
|
||||
done
|
||||
|
||||
# Free up memory
|
||||
sync
|
||||
purge || true
|
||||
|
||||
print "System cleanup completed, ready for next job"
|
||||
done
|
||||
}
|
||||
|
||||
# Function to perform health checks
|
||||
health_check() {
|
||||
print "Performing health check..."
|
||||
|
||||
# Check disk space
|
||||
local disk_usage=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
|
||||
if [[ $disk_usage -gt 90 ]]; then
|
||||
error "Disk usage is too high: ${disk_usage}%"
|
||||
fi
|
||||
|
||||
# Check memory
|
||||
local memory_pressure=$(memory_pressure | grep "System-wide memory free percentage" | awk '{print $5}' | sed 's/%//')
|
||||
if [[ $memory_pressure -lt 10 ]]; then
|
||||
error "Memory pressure is too high: ${memory_pressure}% free"
|
||||
fi
|
||||
|
||||
# Check if Docker is running
|
||||
if ! pgrep -x "Docker" > /dev/null; then
|
||||
print "Docker is not running, attempting to start..."
|
||||
open -a Docker || true
|
||||
sleep 30
|
||||
fi
|
||||
|
||||
# Check if required commands are available
|
||||
local required_commands=("git" "node" "npm" "bun" "python3" "go" "rustc" "cargo" "cmake" "make")
|
||||
for cmd in "${required_commands[@]}"; do
|
||||
if ! command -v "$cmd" &>/dev/null; then
|
||||
error "Required command not found: $cmd"
|
||||
fi
|
||||
done
|
||||
|
||||
print "Health check passed"
|
||||
}
|
||||
|
||||
# Main execution
|
||||
case "${1:-start}" in
|
||||
start)
|
||||
print "Starting Buildkite job runner for macOS"
|
||||
health_check
|
||||
wait_for_jobs
|
||||
;;
|
||||
health)
|
||||
health_check
|
||||
;;
|
||||
cleanup)
|
||||
print "Performing manual cleanup..."
|
||||
# Clean up any existing users
|
||||
for user in $(dscl . list /Users | grep "^bk-"); do
|
||||
if [[ -n "$user" ]]; then
|
||||
print "Cleaning up user: $user"
|
||||
/usr/local/bin/bun-ci/cleanup-build-user.sh "$user" || true
|
||||
fi
|
||||
done
|
||||
print "Manual cleanup completed"
|
||||
;;
|
||||
*)
|
||||
error "Usage: $0 {start|health|cleanup}"
|
||||
;;
|
||||
esac
|
||||
433
.buildkite/macos-runners/terraform/main.tf
Normal file
433
.buildkite/macos-runners/terraform/main.tf
Normal file
@@ -0,0 +1,433 @@
|
||||
terraform {
|
||||
required_version = ">= 1.0"
|
||||
|
||||
required_providers {
|
||||
macstadium = {
|
||||
source = "macstadium/macstadium"
|
||||
version = "~> 1.0"
|
||||
}
|
||||
}
|
||||
|
||||
backend "s3" {
|
||||
bucket = "bun-terraform-state"
|
||||
key = "macos-runners/terraform.tfstate"
|
||||
region = "us-west-2"
|
||||
}
|
||||
}
|
||||
|
||||
provider "macstadium" {
|
||||
api_key = var.macstadium_api_key
|
||||
endpoint = var.macstadium_endpoint
|
||||
}
|
||||
|
||||
# Variables
|
||||
variable "macstadium_api_key" {
|
||||
description = "MacStadium API key"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "macstadium_endpoint" {
|
||||
description = "MacStadium API endpoint"
|
||||
type = string
|
||||
default = "https://api.macstadium.com"
|
||||
}
|
||||
|
||||
variable "buildkite_agent_token" {
|
||||
description = "Buildkite agent token"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "github_token" {
|
||||
description = "GitHub token for accessing private repositories"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "image_name_prefix" {
|
||||
description = "Prefix for VM image names"
|
||||
type = string
|
||||
default = "bun-macos"
|
||||
}
|
||||
|
||||
variable "fleet_size" {
|
||||
description = "Number of VMs per macOS version"
|
||||
type = object({
|
||||
macos_13 = number
|
||||
macos_14 = number
|
||||
macos_15 = number
|
||||
})
|
||||
default = {
|
||||
macos_13 = 4
|
||||
macos_14 = 6
|
||||
macos_15 = 8
|
||||
}
|
||||
}
|
||||
|
||||
variable "vm_configuration" {
|
||||
description = "VM configuration settings"
|
||||
type = object({
|
||||
cpu_count = number
|
||||
memory_gb = number
|
||||
disk_size = number
|
||||
})
|
||||
default = {
|
||||
cpu_count = 12
|
||||
memory_gb = 32
|
||||
disk_size = 500
|
||||
}
|
||||
}
|
||||
|
||||
# Data sources to get latest images
|
||||
data "macstadium_image" "macos_13" {
|
||||
name_regex = "^${var.image_name_prefix}-13-.*"
|
||||
most_recent = true
|
||||
}
|
||||
|
||||
data "macstadium_image" "macos_14" {
|
||||
name_regex = "^${var.image_name_prefix}-14-.*"
|
||||
most_recent = true
|
||||
}
|
||||
|
||||
data "macstadium_image" "macos_15" {
|
||||
name_regex = "^${var.image_name_prefix}-15-.*"
|
||||
most_recent = true
|
||||
}
|
||||
|
||||
# Local values
|
||||
locals {
|
||||
common_tags = {
|
||||
Project = "bun-ci"
|
||||
Environment = "production"
|
||||
ManagedBy = "terraform"
|
||||
Purpose = "buildkite-runners"
|
||||
}
|
||||
|
||||
vm_configs = {
|
||||
macos_13 = {
|
||||
image_id = data.macstadium_image.macos_13.id
|
||||
count = var.fleet_size.macos_13
|
||||
version = "13"
|
||||
}
|
||||
macos_14 = {
|
||||
image_id = data.macstadium_image.macos_14.id
|
||||
count = var.fleet_size.macos_14
|
||||
version = "14"
|
||||
}
|
||||
macos_15 = {
|
||||
image_id = data.macstadium_image.macos_15.id
|
||||
count = var.fleet_size.macos_15
|
||||
version = "15"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# VM instances for each macOS version
|
||||
resource "macstadium_vm" "runners" {
|
||||
for_each = {
|
||||
for vm_combo in flatten([
|
||||
for version, config in local.vm_configs : [
|
||||
for i in range(config.count) : {
|
||||
key = "${version}-${i + 1}"
|
||||
version = version
|
||||
config = config
|
||||
index = i + 1
|
||||
}
|
||||
]
|
||||
]) : vm_combo.key => vm_combo
|
||||
}
|
||||
|
||||
name = "bun-runner-${each.value.version}-${each.value.index}"
|
||||
image_id = each.value.config.image_id
|
||||
|
||||
cpu_count = var.vm_configuration.cpu_count
|
||||
memory_gb = var.vm_configuration.memory_gb
|
||||
disk_size = var.vm_configuration.disk_size
|
||||
|
||||
# Network configuration
|
||||
network_interface {
|
||||
network_id = macstadium_network.runner_network.id
|
||||
ip_address = cidrhost(macstadium_network.runner_network.cidr_block, 10 + index(keys(local.vm_configs), each.value.version) * 100 + each.value.index)
|
||||
}
|
||||
|
||||
# Enable GPU passthrough for better performance
|
||||
gpu_passthrough = true
|
||||
|
||||
# Enable VNC for debugging
|
||||
vnc_enabled = true
|
||||
|
||||
# SSH configuration
|
||||
ssh_keys = [macstadium_ssh_key.runner_key.id]
|
||||
|
||||
# Startup script
|
||||
user_data = templatefile("${path.module}/user-data.sh", {
|
||||
buildkite_agent_token = var.buildkite_agent_token
|
||||
github_token = var.github_token
|
||||
macos_version = each.value.version
|
||||
vm_name = "bun-runner-${each.value.version}-${each.value.index}"
|
||||
})
|
||||
|
||||
# Auto-start VM
|
||||
auto_start = true
|
||||
|
||||
# Shutdown behavior
|
||||
auto_shutdown = false
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "bun-runner-${each.value.version}-${each.value.index}"
|
||||
MacOSVersion = each.value.version
|
||||
VmIndex = each.value.index
|
||||
})
|
||||
}
|
||||
|
||||
# Network configuration
|
||||
resource "macstadium_network" "runner_network" {
|
||||
name = "bun-runner-network"
|
||||
cidr_block = "10.0.0.0/16"
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "bun-runner-network"
|
||||
})
|
||||
}
|
||||
|
||||
# SSH key for VM access
|
||||
resource "macstadium_ssh_key" "runner_key" {
|
||||
name = "bun-runner-key"
|
||||
public_key = file("${path.module}/ssh-keys/bun-runner.pub")
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "bun-runner-key"
|
||||
})
|
||||
}
|
||||
|
||||
# Security group for runner VMs
|
||||
resource "macstadium_security_group" "runner_sg" {
|
||||
name = "bun-runner-sg"
|
||||
description = "Security group for Bun CI runner VMs"
|
||||
|
||||
# SSH access
|
||||
ingress {
|
||||
from_port = 22
|
||||
to_port = 22
|
||||
protocol = "tcp"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
# VNC access (for debugging)
|
||||
ingress {
|
||||
from_port = 5900
|
||||
to_port = 5999
|
||||
protocol = "tcp"
|
||||
cidr_blocks = ["10.0.0.0/16"]
|
||||
}
|
||||
|
||||
# HTTP/HTTPS outbound
|
||||
egress {
|
||||
from_port = 80
|
||||
to_port = 80
|
||||
protocol = "tcp"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
egress {
|
||||
from_port = 443
|
||||
to_port = 443
|
||||
protocol = "tcp"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
# Git (SSH)
|
||||
egress {
|
||||
from_port = 22
|
||||
to_port = 22
|
||||
protocol = "tcp"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
# DNS
|
||||
egress {
|
||||
from_port = 53
|
||||
to_port = 53
|
||||
protocol = "tcp"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
egress {
|
||||
from_port = 53
|
||||
to_port = 53
|
||||
protocol = "udp"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "bun-runner-sg"
|
||||
})
|
||||
}
|
||||
|
||||
# Load balancer for distributing jobs
|
||||
resource "macstadium_load_balancer" "runner_lb" {
|
||||
name = "bun-runner-lb"
|
||||
load_balancer_type = "application"
|
||||
|
||||
# Health check configuration
|
||||
health_check {
|
||||
enabled = true
|
||||
healthy_threshold = 2
|
||||
unhealthy_threshold = 3
|
||||
timeout = 5
|
||||
interval = 30
|
||||
path = "/health"
|
||||
port = 8080
|
||||
protocol = "HTTP"
|
||||
}
|
||||
|
||||
# Target group for all runner VMs
|
||||
target_group {
|
||||
name = "bun-runners"
|
||||
port = 8080
|
||||
protocol = "HTTP"
|
||||
|
||||
targets = [
|
||||
for vm in macstadium_vm.runners : {
|
||||
id = vm.id
|
||||
port = 8080
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "bun-runner-lb"
|
||||
})
|
||||
}
|
||||
|
||||
# Auto-scaling configuration
|
||||
resource "macstadium_autoscaling_group" "runner_asg" {
|
||||
name = "bun-runner-asg"
|
||||
min_size = 2
|
||||
max_size = 20
|
||||
desired_capacity = sum(values(var.fleet_size))
|
||||
health_check_type = "ELB"
|
||||
health_check_grace_period = 300
|
||||
|
||||
# Launch template reference
|
||||
launch_template {
|
||||
id = macstadium_launch_template.runner_template.id
|
||||
version = "$Latest"
|
||||
}
|
||||
|
||||
# Scaling policies
|
||||
target_group_arns = [macstadium_load_balancer.runner_lb.target_group[0].arn]
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "bun-runner-asg"
|
||||
})
|
||||
}
|
||||
|
||||
# Launch template for auto-scaling
|
||||
resource "macstadium_launch_template" "runner_template" {
|
||||
name = "bun-runner-template"
|
||||
image_id = data.macstadium_image.macos_15.id
|
||||
instance_type = "mac-mini-m2-pro"
|
||||
|
||||
key_name = macstadium_ssh_key.runner_key.name
|
||||
|
||||
security_group_ids = [macstadium_security_group.runner_sg.id]
|
||||
|
||||
user_data = base64encode(templatefile("${path.module}/user-data.sh", {
|
||||
buildkite_agent_token = var.buildkite_agent_token
|
||||
github_token = var.github_token
|
||||
macos_version = "15"
|
||||
vm_name = "bun-runner-asg-${timestamp()}"
|
||||
}))
|
||||
|
||||
tags = merge(local.common_tags, {
|
||||
Name = "bun-runner-template"
|
||||
})
|
||||
}
|
||||
|
||||
# CloudWatch alarms for scaling
|
||||
resource "macstadium_cloudwatch_metric_alarm" "scale_up" {
|
||||
alarm_name = "bun-runner-scale-up"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = "2"
|
||||
metric_name = "CPUUtilization"
|
||||
namespace = "AWS/EC2"
|
||||
period = "300"
|
||||
statistic = "Average"
|
||||
threshold = "80"
|
||||
alarm_description = "This metric monitors ec2 cpu utilization"
|
||||
alarm_actions = [macstadium_autoscaling_policy.scale_up.arn]
|
||||
|
||||
dimensions = {
|
||||
AutoScalingGroupName = macstadium_autoscaling_group.runner_asg.name
|
||||
}
|
||||
}
|
||||
|
||||
resource "macstadium_cloudwatch_metric_alarm" "scale_down" {
|
||||
alarm_name = "bun-runner-scale-down"
|
||||
comparison_operator = "LessThanThreshold"
|
||||
evaluation_periods = "2"
|
||||
metric_name = "CPUUtilization"
|
||||
namespace = "AWS/EC2"
|
||||
period = "300"
|
||||
statistic = "Average"
|
||||
threshold = "20"
|
||||
alarm_description = "This metric monitors ec2 cpu utilization"
|
||||
alarm_actions = [macstadium_autoscaling_policy.scale_down.arn]
|
||||
|
||||
dimensions = {
|
||||
AutoScalingGroupName = macstadium_autoscaling_group.runner_asg.name
|
||||
}
|
||||
}
|
||||
|
||||
# Scaling policies
|
||||
resource "macstadium_autoscaling_policy" "scale_up" {
|
||||
name = "bun-runner-scale-up"
|
||||
scaling_adjustment = 2
|
||||
adjustment_type = "ChangeInCapacity"
|
||||
cooldown = 300
|
||||
autoscaling_group_name = macstadium_autoscaling_group.runner_asg.name
|
||||
}
|
||||
|
||||
resource "macstadium_autoscaling_policy" "scale_down" {
|
||||
name = "bun-runner-scale-down"
|
||||
scaling_adjustment = -1
|
||||
adjustment_type = "ChangeInCapacity"
|
||||
cooldown = 300
|
||||
autoscaling_group_name = macstadium_autoscaling_group.runner_asg.name
|
||||
}
|
||||
|
||||
# Outputs
|
||||
output "vm_instances" {
|
||||
description = "Details of created VM instances"
|
||||
value = {
|
||||
for key, vm in macstadium_vm.runners : key => {
|
||||
id = vm.id
|
||||
name = vm.name
|
||||
ip_address = vm.network_interface[0].ip_address
|
||||
image_id = vm.image_id
|
||||
status = vm.status
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
output "load_balancer_dns" {
|
||||
description = "DNS name of the load balancer"
|
||||
value = macstadium_load_balancer.runner_lb.dns_name
|
||||
}
|
||||
|
||||
output "network_id" {
|
||||
description = "ID of the runner network"
|
||||
value = macstadium_network.runner_network.id
|
||||
}
|
||||
|
||||
output "security_group_id" {
|
||||
description = "ID of the runner security group"
|
||||
value = macstadium_security_group.runner_sg.id
|
||||
}
|
||||
|
||||
output "autoscaling_group_name" {
|
||||
description = "Name of the autoscaling group"
|
||||
value = macstadium_autoscaling_group.runner_asg.name
|
||||
}
|
||||
245
.buildkite/macos-runners/terraform/outputs.tf
Normal file
245
.buildkite/macos-runners/terraform/outputs.tf
Normal file
@@ -0,0 +1,245 @@
|
||||
# VM instance outputs
|
||||
output "vm_instances" {
|
||||
description = "Details of all created VM instances"
|
||||
value = {
|
||||
for key, vm in macstadium_vm.runners : key => {
|
||||
id = vm.id
|
||||
name = vm.name
|
||||
ip_address = vm.network_interface[0].ip_address
|
||||
image_id = vm.image_id
|
||||
status = vm.status
|
||||
macos_version = regex("macos-([0-9]+)", key)[0]
|
||||
instance_type = vm.instance_type
|
||||
cpu_count = vm.cpu_count
|
||||
memory_gb = vm.memory_gb
|
||||
disk_size = vm.disk_size
|
||||
created_at = vm.created_at
|
||||
updated_at = vm.updated_at
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
output "vm_instances_by_version" {
|
||||
description = "VM instances grouped by macOS version"
|
||||
value = {
|
||||
for version in ["13", "14", "15"] : "macos_${version}" => {
|
||||
for key, vm in macstadium_vm.runners : key => {
|
||||
id = vm.id
|
||||
name = vm.name
|
||||
ip_address = vm.network_interface[0].ip_address
|
||||
status = vm.status
|
||||
}
|
||||
if can(regex("^${version}-", key))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Network outputs
|
||||
output "network_details" {
|
||||
description = "Network configuration details"
|
||||
value = {
|
||||
network_id = macstadium_network.runner_network.id
|
||||
cidr_block = macstadium_network.runner_network.cidr_block
|
||||
name = macstadium_network.runner_network.name
|
||||
status = macstadium_network.runner_network.status
|
||||
}
|
||||
}
|
||||
|
||||
output "security_group_details" {
|
||||
description = "Security group configuration details"
|
||||
value = {
|
||||
security_group_id = macstadium_security_group.runner_sg.id
|
||||
name = macstadium_security_group.runner_sg.name
|
||||
description = macstadium_security_group.runner_sg.description
|
||||
ingress_rules = macstadium_security_group.runner_sg.ingress
|
||||
egress_rules = macstadium_security_group.runner_sg.egress
|
||||
}
|
||||
}
|
||||
|
||||
# Load balancer outputs
|
||||
output "load_balancer_details" {
|
||||
description = "Load balancer configuration details"
|
||||
value = {
|
||||
dns_name = macstadium_load_balancer.runner_lb.dns_name
|
||||
zone_id = macstadium_load_balancer.runner_lb.zone_id
|
||||
load_balancer_type = macstadium_load_balancer.runner_lb.load_balancer_type
|
||||
target_group_arn = macstadium_load_balancer.runner_lb.target_group[0].arn
|
||||
health_check = macstadium_load_balancer.runner_lb.health_check[0]
|
||||
}
|
||||
}
|
||||
|
||||
# Auto-scaling outputs
|
||||
output "autoscaling_details" {
|
||||
description = "Auto-scaling group configuration details"
|
||||
value = {
|
||||
asg_name = macstadium_autoscaling_group.runner_asg.name
|
||||
min_size = macstadium_autoscaling_group.runner_asg.min_size
|
||||
max_size = macstadium_autoscaling_group.runner_asg.max_size
|
||||
desired_capacity = macstadium_autoscaling_group.runner_asg.desired_capacity
|
||||
launch_template = macstadium_autoscaling_group.runner_asg.launch_template[0]
|
||||
}
|
||||
}
|
||||
|
||||
# SSH key outputs
|
||||
output "ssh_key_details" {
|
||||
description = "SSH key configuration details"
|
||||
value = {
|
||||
key_name = macstadium_ssh_key.runner_key.name
|
||||
fingerprint = macstadium_ssh_key.runner_key.fingerprint
|
||||
key_pair_id = macstadium_ssh_key.runner_key.id
|
||||
}
|
||||
}
|
||||
|
||||
# Image outputs
|
||||
output "image_details" {
|
||||
description = "Details of images used for VM creation"
|
||||
value = {
|
||||
macos_13 = {
|
||||
id = data.macstadium_image.macos_13.id
|
||||
name = data.macstadium_image.macos_13.name
|
||||
description = data.macstadium_image.macos_13.description
|
||||
created_date = data.macstadium_image.macos_13.creation_date
|
||||
size = data.macstadium_image.macos_13.size
|
||||
}
|
||||
macos_14 = {
|
||||
id = data.macstadium_image.macos_14.id
|
||||
name = data.macstadium_image.macos_14.name
|
||||
description = data.macstadium_image.macos_14.description
|
||||
created_date = data.macstadium_image.macos_14.creation_date
|
||||
size = data.macstadium_image.macos_14.size
|
||||
}
|
||||
macos_15 = {
|
||||
id = data.macstadium_image.macos_15.id
|
||||
name = data.macstadium_image.macos_15.name
|
||||
description = data.macstadium_image.macos_15.description
|
||||
created_date = data.macstadium_image.macos_15.creation_date
|
||||
size = data.macstadium_image.macos_15.size
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Fleet statistics
|
||||
output "fleet_statistics" {
|
||||
description = "Statistics about the VM fleet"
|
||||
value = {
|
||||
total_vms = sum([
|
||||
var.fleet_size.macos_13,
|
||||
var.fleet_size.macos_14,
|
||||
var.fleet_size.macos_15
|
||||
])
|
||||
vms_by_version = {
|
||||
macos_13 = var.fleet_size.macos_13
|
||||
macos_14 = var.fleet_size.macos_14
|
||||
macos_15 = var.fleet_size.macos_15
|
||||
}
|
||||
total_cpu_cores = sum([
|
||||
var.fleet_size.macos_13,
|
||||
var.fleet_size.macos_14,
|
||||
var.fleet_size.macos_15
|
||||
]) * var.vm_configuration.cpu_count
|
||||
total_memory_gb = sum([
|
||||
var.fleet_size.macos_13,
|
||||
var.fleet_size.macos_14,
|
||||
var.fleet_size.macos_15
|
||||
]) * var.vm_configuration.memory_gb
|
||||
total_disk_gb = sum([
|
||||
var.fleet_size.macos_13,
|
||||
var.fleet_size.macos_14,
|
||||
var.fleet_size.macos_15
|
||||
]) * var.vm_configuration.disk_size
|
||||
}
|
||||
}
|
||||
|
||||
# Connection information
|
||||
output "connection_info" {
|
||||
description = "Information for connecting to the infrastructure"
|
||||
value = {
|
||||
ssh_command_template = "ssh -i ~/.ssh/bun-runner admin@{vm_ip_address}"
|
||||
vnc_port_range = "5900-5999"
|
||||
health_check_url = "http://{vm_ip_address}:8080/health"
|
||||
buildkite_tags = "queue=macos,os=macos,arch=$(uname -m)"
|
||||
}
|
||||
}
|
||||
|
||||
# Resource ARNs and IDs
|
||||
output "resource_arns" {
|
||||
description = "ARNs and IDs of created resources"
|
||||
value = {
|
||||
vm_ids = [
|
||||
for vm in macstadium_vm.runners : vm.id
|
||||
]
|
||||
network_id = macstadium_network.runner_network.id
|
||||
security_group_id = macstadium_security_group.runner_sg.id
|
||||
load_balancer_arn = macstadium_load_balancer.runner_lb.arn
|
||||
autoscaling_group_arn = macstadium_autoscaling_group.runner_asg.arn
|
||||
launch_template_id = macstadium_launch_template.runner_template.id
|
||||
}
|
||||
}
|
||||
|
||||
# Monitoring and alerting
|
||||
output "monitoring_endpoints" {
|
||||
description = "Monitoring and alerting endpoints"
|
||||
value = {
|
||||
cloudwatch_namespace = "BunCI/MacOSRunners"
|
||||
alarm_arns = [
|
||||
macstadium_cloudwatch_metric_alarm.scale_up.arn,
|
||||
macstadium_cloudwatch_metric_alarm.scale_down.arn
|
||||
]
|
||||
scaling_policy_arns = [
|
||||
macstadium_autoscaling_policy.scale_up.arn,
|
||||
macstadium_autoscaling_policy.scale_down.arn
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# Cost information
|
||||
output "cost_information" {
|
||||
description = "Cost-related information"
|
||||
value = {
|
||||
estimated_hourly_cost = format("$%.2f", sum([
|
||||
var.fleet_size.macos_13,
|
||||
var.fleet_size.macos_14,
|
||||
var.fleet_size.macos_15
|
||||
]) * 0.50) # Estimated cost per hour per VM
|
||||
estimated_monthly_cost = format("$%.2f", sum([
|
||||
var.fleet_size.macos_13,
|
||||
var.fleet_size.macos_14,
|
||||
var.fleet_size.macos_15
|
||||
]) * 0.50 * 24 * 30) # Estimated monthly cost
|
||||
cost_optimization_enabled = var.cost_optimization.enable_spot_instances
|
||||
}
|
||||
}
|
||||
|
||||
# Terraform state information
|
||||
output "terraform_state" {
|
||||
description = "Terraform state information"
|
||||
value = {
|
||||
workspace = terraform.workspace
|
||||
terraform_version = "~> 1.0"
|
||||
provider_versions = {
|
||||
macstadium = "~> 1.0"
|
||||
}
|
||||
last_updated = timestamp()
|
||||
}
|
||||
}
|
||||
|
||||
# Summary output for easy reference
|
||||
output "deployment_summary" {
|
||||
description = "Summary of the deployment"
|
||||
value = {
|
||||
project_name = var.project_name
|
||||
environment = var.environment
|
||||
region = var.region
|
||||
total_vms = sum([
|
||||
var.fleet_size.macos_13,
|
||||
var.fleet_size.macos_14,
|
||||
var.fleet_size.macos_15
|
||||
])
|
||||
load_balancer_dns = macstadium_load_balancer.runner_lb.dns_name
|
||||
autoscaling_enabled = var.autoscaling_enabled
|
||||
backup_enabled = var.backup_config.enable_snapshots
|
||||
monitoring_enabled = var.monitoring_config.enable_cloudwatch
|
||||
deployment_time = timestamp()
|
||||
status = "deployed"
|
||||
}
|
||||
}
|
||||
266
.buildkite/macos-runners/terraform/user-data.sh
Normal file
266
.buildkite/macos-runners/terraform/user-data.sh
Normal file
@@ -0,0 +1,266 @@
|
||||
#!/bin/bash
|
||||
# User data script for macOS VM initialization
|
||||
# This script runs when the VM starts up
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Variables passed from Terraform
|
||||
BUILDKITE_AGENT_TOKEN="${buildkite_agent_token}"
|
||||
GITHUB_TOKEN="${github_token}"
|
||||
MACOS_VERSION="${macos_version}"
|
||||
VM_NAME="${vm_name}"
|
||||
|
||||
# Logging
|
||||
LOG_FILE="/var/log/vm-init.log"
|
||||
exec 1> >(tee -a "$LOG_FILE")
|
||||
exec 2> >(tee -a "$LOG_FILE" >&2)
|
||||
|
||||
print() {
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"
|
||||
}
|
||||
|
||||
print "Starting VM initialization for $VM_NAME (macOS $MACOS_VERSION)"
|
||||
|
||||
# Wait for system to be ready
|
||||
print "Waiting for system to be ready..."
|
||||
until ping -c1 google.com &>/dev/null; do
|
||||
sleep 10
|
||||
done
|
||||
|
||||
# Set timezone
|
||||
print "Setting timezone to UTC..."
|
||||
sudo systemsetup -settimezone UTC
|
||||
|
||||
# Configure hostname
|
||||
print "Setting hostname to $VM_NAME..."
|
||||
sudo scutil --set HostName "$VM_NAME"
|
||||
sudo scutil --set LocalHostName "$VM_NAME"
|
||||
sudo scutil --set ComputerName "$VM_NAME"
|
||||
|
||||
# Update system
|
||||
print "Checking for system updates..."
|
||||
sudo softwareupdate -i -a --no-scan || true
|
||||
|
||||
# Configure Buildkite agent
|
||||
print "Configuring Buildkite agent..."
|
||||
mkdir -p /usr/local/var/buildkite-agent
|
||||
mkdir -p /usr/local/var/log/buildkite-agent
|
||||
|
||||
# Create Buildkite agent configuration
|
||||
cat > /usr/local/var/buildkite-agent/buildkite-agent.cfg << EOF
|
||||
token="$BUILDKITE_AGENT_TOKEN"
|
||||
name="$VM_NAME"
|
||||
tags="queue=macos,os=macos,arch=$(uname -m),version=$MACOS_VERSION,hostname=$VM_NAME"
|
||||
build-path="/Users/buildkite/workspace"
|
||||
hooks-path="/usr/local/bin/bun-ci/hooks"
|
||||
plugins-path="/Users/buildkite/.buildkite-agent/plugins"
|
||||
git-clean-flags="-fdq"
|
||||
git-clone-flags="-v"
|
||||
shell="/bin/bash -l"
|
||||
spawn=1
|
||||
priority=normal
|
||||
disconnect-after-job=false
|
||||
disconnect-after-idle-timeout=0
|
||||
cancel-grace-period=10
|
||||
enable-job-log-tmpfile=true
|
||||
timestamp-lines=true
|
||||
EOF
|
||||
|
||||
# Set up GitHub token for private repositories
|
||||
print "Configuring GitHub access..."
|
||||
if [[ -n "$GITHUB_TOKEN" ]]; then
|
||||
# Configure git to use the token
|
||||
git config --global url."https://oauth2:$GITHUB_TOKEN@github.com/".insteadOf "https://github.com/"
|
||||
git config --global url."https://oauth2:$GITHUB_TOKEN@github.com/".insteadOf "git@github.com:"
|
||||
|
||||
# Configure npm to use the token
|
||||
npm config set @oven-sh:registry https://npm.pkg.github.com/
|
||||
echo "//npm.pkg.github.com/:_authToken=$GITHUB_TOKEN" >> ~/.npmrc
|
||||
fi
|
||||
|
||||
# Set up SSH keys for GitHub (if available)
|
||||
if [[ -f "/usr/local/etc/ssh/github_rsa" ]]; then
|
||||
print "Configuring SSH keys for GitHub..."
|
||||
mkdir -p ~/.ssh
|
||||
cp /usr/local/etc/ssh/github_rsa ~/.ssh/
|
||||
cp /usr/local/etc/ssh/github_rsa.pub ~/.ssh/
|
||||
chmod 600 ~/.ssh/github_rsa
|
||||
chmod 644 ~/.ssh/github_rsa.pub
|
||||
|
||||
# Configure SSH to use the key
|
||||
cat > ~/.ssh/config << EOF
|
||||
Host github.com
|
||||
HostName github.com
|
||||
User git
|
||||
IdentityFile ~/.ssh/github_rsa
|
||||
StrictHostKeyChecking no
|
||||
EOF
|
||||
fi
|
||||
|
||||
# Create health check endpoint
|
||||
print "Setting up health check endpoint..."
|
||||
cat > /usr/local/bin/health-check.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
# Health check script for load balancer
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Check if system is ready
|
||||
if ! ping -c1 google.com &>/dev/null; then
|
||||
echo "Network not ready"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check disk space
|
||||
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
|
||||
if [[ $DISK_USAGE -gt 95 ]]; then
|
||||
echo "Disk usage too high: ${DISK_USAGE}%"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check memory
|
||||
MEMORY_PRESSURE=$(memory_pressure | grep "System-wide memory free percentage" | awk '{print $5}' | sed 's/%//')
|
||||
if [[ $MEMORY_PRESSURE -lt 5 ]]; then
|
||||
echo "Memory pressure too high: ${MEMORY_PRESSURE}% free"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check if required services are running
|
||||
if ! pgrep -f "job-runner.sh" > /dev/null; then
|
||||
echo "Job runner not running"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "OK"
|
||||
exit 0
|
||||
EOF
|
||||
|
||||
chmod +x /usr/local/bin/health-check.sh
|
||||
|
||||
# Start simple HTTP server for health checks
|
||||
print "Starting health check server..."
|
||||
cat > /usr/local/bin/health-server.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
# Simple HTTP server for health checks
|
||||
|
||||
PORT=8080
|
||||
while true; do
|
||||
echo -e "HTTP/1.1 200 OK\r\nContent-Type: text/plain\r\n\r\n$(/usr/local/bin/health-check.sh)" | nc -l -p $PORT
|
||||
done
|
||||
EOF
|
||||
|
||||
chmod +x /usr/local/bin/health-server.sh
|
||||
|
||||
# Create LaunchDaemon for health check server
|
||||
cat > /Library/LaunchDaemons/com.bun.health-server.plist << 'EOF'
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>Label</key>
|
||||
<string>com.bun.health-server</string>
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>/usr/local/bin/health-server.sh</string>
|
||||
</array>
|
||||
<key>RunAtLoad</key>
|
||||
<true/>
|
||||
<key>KeepAlive</key>
|
||||
<true/>
|
||||
<key>StandardOutPath</key>
|
||||
<string>/var/log/health-server.log</string>
|
||||
<key>StandardErrorPath</key>
|
||||
<string>/var/log/health-server.error.log</string>
|
||||
</dict>
|
||||
</plist>
|
||||
EOF
|
||||
|
||||
# Load and start the health check server
|
||||
sudo launchctl load /Library/LaunchDaemons/com.bun.health-server.plist
|
||||
sudo launchctl start com.bun.health-server
|
||||
|
||||
# Configure log rotation
|
||||
print "Configuring log rotation..."
|
||||
cat > /etc/newsyslog.d/bun-ci.conf << 'EOF'
|
||||
# Log rotation for Bun CI
|
||||
/usr/local/var/log/buildkite-agent/*.log 644 5 1000 * GZ
|
||||
/var/log/vm-init.log 644 5 1000 * GZ
|
||||
/var/log/health-server.log 644 5 1000 * GZ
|
||||
/var/log/health-server.error.log 644 5 1000 * GZ
|
||||
EOF
|
||||
|
||||
# Restart syslog to pick up new configuration
|
||||
sudo launchctl unload /System/Library/LaunchDaemons/com.apple.syslogd.plist
|
||||
sudo launchctl load /System/Library/LaunchDaemons/com.apple.syslogd.plist
|
||||
|
||||
# Configure system monitoring
|
||||
print "Setting up system monitoring..."
|
||||
cat > /usr/local/bin/system-monitor.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
# System monitoring script
|
||||
|
||||
LOG_FILE="/var/log/system-monitor.log"
|
||||
|
||||
while true; do
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] System Stats:" >> "$LOG_FILE"
|
||||
echo " CPU: $(top -l 1 -n 0 | grep "CPU usage" | awk '{print $3}' | sed 's/%//')" >> "$LOG_FILE"
|
||||
echo " Memory: $(memory_pressure | grep "System-wide memory free percentage" | awk '{print $5}')" >> "$LOG_FILE"
|
||||
echo " Disk: $(df -h / | awk 'NR==2 {print $5}')" >> "$LOG_FILE"
|
||||
echo " Load: $(uptime | awk -F'load averages:' '{print $2}')" >> "$LOG_FILE"
|
||||
echo " Processes: $(ps aux | wc -l)" >> "$LOG_FILE"
|
||||
echo "" >> "$LOG_FILE"
|
||||
|
||||
sleep 300 # 5 minutes
|
||||
done
|
||||
EOF
|
||||
|
||||
chmod +x /usr/local/bin/system-monitor.sh
|
||||
|
||||
# Create LaunchDaemon for system monitoring
|
||||
cat > /Library/LaunchDaemons/com.bun.system-monitor.plist << 'EOF'
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>Label</key>
|
||||
<string>com.bun.system-monitor</string>
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>/usr/local/bin/system-monitor.sh</string>
|
||||
</array>
|
||||
<key>RunAtLoad</key>
|
||||
<true/>
|
||||
<key>KeepAlive</key>
|
||||
<true/>
|
||||
</dict>
|
||||
</plist>
|
||||
EOF
|
||||
|
||||
# Load and start the system monitor
|
||||
sudo launchctl load /Library/LaunchDaemons/com.bun.system-monitor.plist
|
||||
sudo launchctl start com.bun.system-monitor
|
||||
|
||||
# Final configuration
|
||||
print "Performing final configuration..."
|
||||
|
||||
# Ensure all services are running
|
||||
sudo launchctl load /Library/LaunchDaemons/com.buildkite.buildkite-agent.plist
|
||||
sudo launchctl start com.buildkite.buildkite-agent
|
||||
|
||||
# Create marker file to indicate initialization is complete
|
||||
touch /var/tmp/vm-init-complete
|
||||
echo "$(date '+%Y-%m-%d %H:%M:%S'): VM initialization completed" >> /var/tmp/vm-init-complete
|
||||
|
||||
print "VM initialization completed successfully!"
|
||||
print "VM Name: $VM_NAME"
|
||||
print "macOS Version: $MACOS_VERSION"
|
||||
print "Status: Ready for Buildkite jobs"
|
||||
|
||||
# Log final system state
|
||||
print "Final system state:"
|
||||
print " Hostname: $(hostname)"
|
||||
print " Uptime: $(uptime)"
|
||||
print " Disk usage: $(df -h / | awk 'NR==2 {print $5}')"
|
||||
print " Memory: $(memory_pressure | grep "System-wide memory free percentage" | awk '{print $5}')"
|
||||
|
||||
print "Health check available at: http://$(hostname):8080/health"
|
||||
302
.buildkite/macos-runners/terraform/variables.tf
Normal file
302
.buildkite/macos-runners/terraform/variables.tf
Normal file
@@ -0,0 +1,302 @@
|
||||
# Core infrastructure variables
|
||||
variable "project_name" {
|
||||
description = "Name of the project"
|
||||
type = string
|
||||
default = "bun-ci"
|
||||
}
|
||||
|
||||
variable "environment" {
|
||||
description = "Environment name"
|
||||
type = string
|
||||
default = "production"
|
||||
}
|
||||
|
||||
variable "region" {
|
||||
description = "MacStadium region"
|
||||
type = string
|
||||
default = "us-west-1"
|
||||
}
|
||||
|
||||
# MacStadium configuration
|
||||
variable "macstadium_api_key" {
|
||||
description = "MacStadium API key"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "macstadium_endpoint" {
|
||||
description = "MacStadium API endpoint"
|
||||
type = string
|
||||
default = "https://api.macstadium.com"
|
||||
}
|
||||
|
||||
# Buildkite configuration
|
||||
variable "buildkite_agent_token" {
|
||||
description = "Buildkite agent token"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "buildkite_org" {
|
||||
description = "Buildkite organization slug"
|
||||
type = string
|
||||
default = "bun"
|
||||
}
|
||||
|
||||
variable "buildkite_queues" {
|
||||
description = "Buildkite queues to register agents with"
|
||||
type = list(string)
|
||||
default = ["macos", "macos-arm64", "macos-x86_64"]
|
||||
}
|
||||
|
||||
# GitHub configuration
|
||||
variable "github_token" {
|
||||
description = "GitHub token for accessing private repositories"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "github_org" {
|
||||
description = "GitHub organization"
|
||||
type = string
|
||||
default = "oven-sh"
|
||||
}
|
||||
|
||||
# VM fleet configuration
|
||||
variable "fleet_size" {
|
||||
description = "Number of VMs per macOS version"
|
||||
type = object({
|
||||
macos_13 = number
|
||||
macos_14 = number
|
||||
macos_15 = number
|
||||
})
|
||||
default = {
|
||||
macos_13 = 4
|
||||
macos_14 = 6
|
||||
macos_15 = 8
|
||||
}
|
||||
|
||||
validation {
|
||||
condition = alltrue([
|
||||
var.fleet_size.macos_13 >= 0,
|
||||
var.fleet_size.macos_14 >= 0,
|
||||
var.fleet_size.macos_15 >= 0,
|
||||
var.fleet_size.macos_13 + var.fleet_size.macos_14 + var.fleet_size.macos_15 > 0
|
||||
])
|
||||
error_message = "Fleet sizes must be non-negative and at least one version must have VMs."
|
||||
}
|
||||
}
|
||||
|
||||
variable "vm_configuration" {
|
||||
description = "VM configuration settings"
|
||||
type = object({
|
||||
cpu_count = number
|
||||
memory_gb = number
|
||||
disk_size = number
|
||||
})
|
||||
default = {
|
||||
cpu_count = 12
|
||||
memory_gb = 32
|
||||
disk_size = 500
|
||||
}
|
||||
|
||||
validation {
|
||||
condition = alltrue([
|
||||
var.vm_configuration.cpu_count >= 4,
|
||||
var.vm_configuration.memory_gb >= 16,
|
||||
var.vm_configuration.disk_size >= 100
|
||||
])
|
||||
error_message = "VM configuration must have at least 4 CPUs, 16GB memory, and 100GB disk."
|
||||
}
|
||||
}
|
||||
|
||||
# Auto-scaling configuration
|
||||
variable "autoscaling_enabled" {
|
||||
description = "Enable auto-scaling for VM fleet"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "autoscaling_config" {
|
||||
description = "Auto-scaling configuration"
|
||||
type = object({
|
||||
min_size = number
|
||||
max_size = number
|
||||
desired_capacity = number
|
||||
scale_up_threshold = number
|
||||
scale_down_threshold = number
|
||||
scale_up_adjustment = number
|
||||
scale_down_adjustment = number
|
||||
cooldown_period = number
|
||||
})
|
||||
default = {
|
||||
min_size = 2
|
||||
max_size = 30
|
||||
desired_capacity = 10
|
||||
scale_up_threshold = 80
|
||||
scale_down_threshold = 20
|
||||
scale_up_adjustment = 2
|
||||
scale_down_adjustment = 1
|
||||
cooldown_period = 300
|
||||
}
|
||||
}
|
||||
|
||||
# Image configuration
|
||||
variable "image_name_prefix" {
|
||||
description = "Prefix for VM image names"
|
||||
type = string
|
||||
default = "bun-macos"
|
||||
}
|
||||
|
||||
variable "image_rebuild_schedule" {
|
||||
description = "Cron schedule for rebuilding images"
|
||||
type = string
|
||||
default = "0 2 * * *" # Daily at 2 AM
|
||||
}
|
||||
|
||||
variable "image_retention_days" {
|
||||
description = "Number of days to retain old images"
|
||||
type = number
|
||||
default = 7
|
||||
}
|
||||
|
||||
# Network configuration
|
||||
variable "network_config" {
|
||||
description = "Network configuration"
|
||||
type = object({
|
||||
cidr_block = string
|
||||
enable_nat = bool
|
||||
enable_vpn = bool
|
||||
allowed_cidrs = list(string)
|
||||
})
|
||||
default = {
|
||||
cidr_block = "10.0.0.0/16"
|
||||
enable_nat = true
|
||||
enable_vpn = false
|
||||
allowed_cidrs = ["0.0.0.0/0"]
|
||||
}
|
||||
}
|
||||
|
||||
# Security configuration
|
||||
variable "security_config" {
|
||||
description = "Security configuration"
|
||||
type = object({
|
||||
enable_ssh_access = bool
|
||||
enable_vnc_access = bool
|
||||
ssh_allowed_cidrs = list(string)
|
||||
vnc_allowed_cidrs = list(string)
|
||||
enable_disk_encryption = bool
|
||||
})
|
||||
default = {
|
||||
enable_ssh_access = true
|
||||
enable_vnc_access = true
|
||||
ssh_allowed_cidrs = ["0.0.0.0/0"]
|
||||
vnc_allowed_cidrs = ["10.0.0.0/16"]
|
||||
enable_disk_encryption = true
|
||||
}
|
||||
}
|
||||
|
||||
# Monitoring configuration
|
||||
variable "monitoring_config" {
|
||||
description = "Monitoring configuration"
|
||||
type = object({
|
||||
enable_cloudwatch = bool
|
||||
enable_custom_metrics = bool
|
||||
log_retention_days = number
|
||||
alert_email = string
|
||||
})
|
||||
default = {
|
||||
enable_cloudwatch = true
|
||||
enable_custom_metrics = true
|
||||
log_retention_days = 30
|
||||
alert_email = "devops@oven.sh"
|
||||
}
|
||||
}
|
||||
|
||||
# Backup configuration
|
||||
variable "backup_config" {
|
||||
description = "Backup configuration"
|
||||
type = object({
|
||||
enable_snapshots = bool
|
||||
snapshot_schedule = string
|
||||
snapshot_retention = number
|
||||
enable_cross_region = bool
|
||||
})
|
||||
default = {
|
||||
enable_snapshots = true
|
||||
snapshot_schedule = "0 4 * * *" # Daily at 4 AM
|
||||
snapshot_retention = 7
|
||||
enable_cross_region = false
|
||||
}
|
||||
}
|
||||
|
||||
# Cost optimization
|
||||
variable "cost_optimization" {
|
||||
description = "Cost optimization settings"
|
||||
type = object({
|
||||
enable_spot_instances = bool
|
||||
spot_price_max = number
|
||||
enable_hibernation = bool
|
||||
idle_shutdown_timeout = number
|
||||
})
|
||||
default = {
|
||||
enable_spot_instances = false
|
||||
spot_price_max = 0.0
|
||||
enable_hibernation = false
|
||||
idle_shutdown_timeout = 3600 # 1 hour
|
||||
}
|
||||
}
|
||||
|
||||
# Maintenance configuration
|
||||
variable "maintenance_config" {
|
||||
description = "Maintenance configuration"
|
||||
type = object({
|
||||
maintenance_window_start = string
|
||||
maintenance_window_end = string
|
||||
auto_update_enabled = bool
|
||||
patch_schedule = string
|
||||
})
|
||||
default = {
|
||||
maintenance_window_start = "02:00"
|
||||
maintenance_window_end = "06:00"
|
||||
auto_update_enabled = true
|
||||
patch_schedule = "0 3 * * 0" # Weekly on Sunday at 3 AM
|
||||
}
|
||||
}
|
||||
|
||||
# Tagging
|
||||
variable "tags" {
|
||||
description = "Additional tags to apply to resources"
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
# SSH key configuration
|
||||
variable "ssh_key_name" {
|
||||
description = "Name of the SSH key pair"
|
||||
type = string
|
||||
default = "bun-runner-key"
|
||||
}
|
||||
|
||||
variable "ssh_public_key_path" {
|
||||
description = "Path to the SSH public key file"
|
||||
type = string
|
||||
default = "~/.ssh/id_rsa.pub"
|
||||
}
|
||||
|
||||
# Feature flags
|
||||
variable "feature_flags" {
|
||||
description = "Feature flags for experimental features"
|
||||
type = object({
|
||||
enable_gpu_passthrough = bool
|
||||
enable_nested_virt = bool
|
||||
enable_secure_boot = bool
|
||||
enable_tpm = bool
|
||||
})
|
||||
default = {
|
||||
enable_gpu_passthrough = true
|
||||
enable_nested_virt = false
|
||||
enable_secure_boot = false
|
||||
enable_tpm = false
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user