Reruns & flakiness
Reduce CI timeouts and stop paying for hung jobs
By Keith Mazanec, Founder, CostOps ยท Updated January 31, 2026
A test process deadlocks. A Docker build stalls on a network call. A browser test hangs waiting for an element that never appears. GitHub Actions keeps the runner alive for the full default timeout of 360 minutes before killing the job. On a macOS runner, that single hung job costs $22.32. On a standard Linux runner, it quietly eats $2.16. Either way, it produces zero useful output. Setting explicit timeouts is the simplest, highest-ROI change you can make to protect your CI budget.
Symptoms
How to tell if CI timeouts are costing you money
Timeouts are easy to miss because they look like slow builds until you check the conclusion. They are a specific type of CI failure that is disproportionately expensive. Look for these patterns:
-
Jobs that run for exactly 360 minutes. GitHub’s default job timeout is 6 hours. If you see jobs hitting exactly that limit, they didn’t finish. They were killed. The runner billed for every one of those minutes.
-
Timed-out runs followed by successful reruns. A run times out, a developer re-triggers it, and the second attempt passes in 12 minutes. The first run wasted 360 minutes of budget. This pattern of timing out then passing on retry points to flaky infrastructure, not broken code. You can track this pattern by targeting rerun hotspots across your workflows.
-
Timeout rate above 2%. Even a small timeout rate adds up fast. If 2% of your runs hit the default 6-hour timeout, those runs consume a disproportionate share of your minutes budget because each one bills 20–30x more than a normal run.
-
Budget alerts triggered by a single workflow. One hung job on a macOS runner can burn through your entire monthly free-tier allocation. GitHub Free includes 2,000 minutes; one macOS timeout at the 10x multiplier consumes 3,600 weighted minutes, which is more than the full monthly quota.
Metrics
Quantify the waste
A team running 40 CI jobs per day on Linux with a 3% timeout rate. Normal jobs complete in 12 minutes. Without explicit timeouts, hung jobs run for the full 360-minute default:
Without explicit timeouts
1.2 jobs × 360 min × 22 days × $0.006/min
With timeout-minutes: 30
Save $52/mo · $624/year · per workflow
That’s Linux at $0.006/min. On macOS at $0.062/min, the same scenario wastes $590/mo without timeouts vs. $49/mo with a 30-minute cap, saving $541/mo from a one-line YAML change. And that does not account for the additional minutes wasted when developers re-trigger the workflow after a timeout.
Fix 1
Set explicit timeout-minutes on every job
GitHub Actions defaults to a 360-minute (6-hour) timeout for every job. That default exists to avoid killing legitimate long-running builds, but it means a hung process burns runner minutes silently for hours. The fix is to add timeout-minutes to every job in your workflow.
A good starting point: set timeout-minutes to roughly 3x the job’s average duration. If a job normally runs in 8 minutes, set the timeout to 25. This gives enough headroom for slow runs without letting a hung process run for 6 hours.
jobs: test: runs-on: ubuntu-latest # No timeout-minutes set # Default: 360 minutes (6 hours) steps: - uses: actions/checkout@v4 - run: npm test
jobs: test: runs-on: ubuntu-latest timeout-minutes: 20 steps: - uses: actions/checkout@v4 - run: npm test
One caveat: there is no workflow-level or organization-level default for timeout-minutes. You must set it on each job individually. This is a known gap in GitHub Actions. Tools like ghatm can scan your workflows and add timeout-minutes to every job automatically, using historical run data to set appropriate values.
Fix 2
Add step-level timeouts for network calls and long steps
Job-level timeouts cap total damage, but step-level timeouts let you fail fast on specific operations. A npm ci that normally takes 2 minutes doesn’t need 20 minutes of runway. If it hangs for 5 minutes, something is wrong with the registry or network, and waiting longer won’t fix it.
Add timeout-minutes to individual steps, especially dependency installs, Docker builds, and any step that calls external services.
jobs: test: runs-on: ubuntu-latest timeout-minutes: 25 # Job-level cap steps: - uses: actions/checkout@v4 - name: Install dependencies run: npm ci timeout-minutes: 5 # Normally ~2 min - name: Run tests run: npm test timeout-minutes: 15 # Normally ~8 min - name: Upload coverage run: ./upload-coverage.sh timeout-minutes: 3 # External API call
Step-level timeouts operate independently of the job-level timeout. If a step exceeds its own limit, it fails immediately without waiting for the job-level cap. The remaining steps in the job are skipped, and the job is marked as failed. This means the job-level timeout acts as a backstop while step-level timeouts provide precise control. Combining timeouts with efforts to stabilize CI runtime lets you set tighter limits without hitting false positives.
Fix 3
Use shell-level timeouts for external dependencies
Some operations hang at the process level, such as a curl to an unresponsive API, a database migration waiting on a lock, or a docker pull stalled on a slow registry. These won’t trigger a step-level timeout until the step’s full allocation expires. Use the timeout command (available on all GitHub-hosted Linux runners) to kill individual processes before they exhaust the step budget.
- name: Wait for service readiness run: timeout 120 ./wait-for-service.sh timeout-minutes: 5 - name: Download model weights run: timeout 300 curl -fsSL -o model.bin https://models.example.com/v2/weights timeout-minutes: 8 - name: Run database migration run: timeout 60 bin/rails db:migrate timeout-minutes: 3
The timeout command takes seconds (not minutes). It sends SIGTERM first, then SIGKILL if the process doesn’t exit. This gives the process a chance to clean up before being force-killed. On macOS runners, use gtimeout from coreutils or a step-level timeout-minutes instead.
Fix 4
Enforce timeouts across all workflows with tooling
Setting timeouts manually is error-prone. New workflows get added without them. Existing workflows accumulate jobs without anyone checking. Use linting and automation to enforce the policy across your organization.
ghatm is a CLI tool that scans all workflow files and adds timeout-minutes: 30 to any job missing it. With the -auto flag and a GitHub token, it queries your repository’s run history via the API and sets timeouts based on actual historical durations.
# Add timeout-minutes: 30 to all jobs missing it ghatm set # Auto-set based on historical run durations (requires GITHUB_TOKEN) ghatm set -auto
To prevent regressions, add ghalint to your CI pipeline. It enforces the job_timeout_minutes_is_required policy, failing the build if any job is missing an explicit timeout. Run it as a linting step so new workflows can’t be merged without timeouts:
- name: Lint GitHub Actions workflows run: ghalint run env: GHALINT_LOG_COLOR: always
Reference
Complete workflow with layered timeouts
Here’s all three layers of timeout protection applied to a typical CI workflow. Job-level timeouts cap total damage, step-level timeouts catch specific hangs early, and shell-level timeouts protect individual process calls.
name: CI on: push: branches: [main] pull_request: concurrency: group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} cancel-in-progress: ${{ github.ref != 'refs/heads/main' }} jobs: lint: runs-on: ubuntu-latest timeout-minutes: 10 # Job cap steps: - uses: actions/checkout@v4 - name: Install run: npm ci timeout-minutes: 5 # Step cap - name: Lint run: npm run lint timeout-minutes: 5 test: runs-on: ubuntu-latest timeout-minutes: 25 services: postgres: image: postgres:16 options: >- --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5 steps: - uses: actions/checkout@v4 - name: Install run: npm ci timeout-minutes: 5 - name: Wait for Postgres run: timeout 30 bash -c 'until pg_isready; do sleep 1; done' timeout-minutes: 2 # Shell + step cap - name: Run tests run: npm test timeout-minutes: 15 build: needs: [lint, test] runs-on: ubuntu-latest timeout-minutes: 15 steps: - uses: actions/checkout@v4 - name: Build run: npm run build timeout-minutes: 10
Reference
Cost of a single 6-hour timeout by runner type
This table shows what one hung job costs if it runs for the full default 360 minutes. Use these numbers to estimate the damage a missing timeout-minutes can cause.
| Runner | Rate | 6-hour cost |
|---|---|---|
| Linux 2-core | $0.006/min | $2.16 |
| Windows 2-core | $0.010/min | $3.60 |
| Linux 8-core | $0.022/min | $7.92 |
| macOS (M1/Intel) | $0.062/min | $22.32 |
| macOS M2 Pro | $0.102/min | $36.72 |
Free-tier included minutes: 2,000/mo (Free), 3,000/mo (Team/Pro), 50,000/mo (Enterprise). A single macOS timeout at the 10x billing multiplier consumes 3,600 weighted free-tier minutes, which exceeds the entire Free plan monthly quota.
Related guides
Target Rerun Hotspots
Find the 1-3 workflows responsible for most of your rerun spend.
Reduce CI Failures
Separate infrastructure failures from test failures and cut wasted minutes.
Stabilize CI Runtime
Fix cache misses and queue spikes that cause unpredictable pipeline durations.
Cancelled Runs Wasting Minutes
Stop paying for runs that are already obsolete with concurrency groups.