Reruns & flakiness

Reduce CI timeouts and stop paying for hung jobs

By Keith Mazanec, Founder, CostOps · Updated January 31, 2026

A test process deadlocks. A Docker build stalls on a network call. A browser test hangs waiting for an element that never appears. GitHub Actions keeps the runner alive for the full default timeout of 360 minutes before killing the job. On a macOS runner, that single hung job costs $22.32. On a standard Linux runner, it quietly eats $2.16. Either way, it produces zero useful output. Setting explicit timeouts is the simplest, highest-ROI change you can make to protect your CI budget.

Symptoms

How to tell if CI timeouts are costing you money

Timeouts are easy to miss because they look like slow builds until you check the conclusion. They are a specific type of CI failure that is disproportionately expensive. Look for these patterns:

Jobs that run for exactly 360 minutes. GitHub’s default job timeout is 6 hours. If you see jobs hitting exactly that limit, they didn’t finish. They were killed. The runner billed for every one of those minutes.
Timed-out runs followed by successful reruns. A run times out, a developer re-triggers it, and the second attempt passes in 12 minutes. The first run wasted 360 minutes of budget. This pattern of timing out then passing on retry points to flaky infrastructure, not broken code. You can track this pattern by targeting rerun hotspots across your workflows.
Timeout rate above 2%. Even a small timeout rate adds up fast. If 2% of your runs hit the default 6-hour timeout, those runs consume a disproportionate share of your minutes budget because each one bills 20–30x more than a normal run.
Budget alerts triggered by a single workflow. One hung job on a macOS runner can burn through your entire monthly free-tier allocation. GitHub Free includes 2,000 minutes; one macOS timeout at the 10x multiplier consumes 3,600 weighted minutes, which is more than the full monthly quota.

Metrics

Quantify the waste

A team running 40 CI jobs per day on Linux with a 3% timeout rate. Normal jobs complete in 12 minutes. Without explicit timeouts, hung jobs run for the full 360-minute default:

Without explicit timeouts

Jobs/day 40

Timeout rate 3%

Timed-out jobs/day 1.2

Wasted minutes/day 432

Monthly wasted cost $57/mo

1.2 jobs × 360 min × 22 days × $0.006/min

With timeout-minutes: 30

Jobs/day 40

Timeout rate 3%

Timed-out jobs/day 1.2

Wasted minutes/day 36

Monthly wasted cost $4.75/mo

Save $52/mo · $624/year · per workflow

That’s Linux at $0.006/min. On macOS at $0.062/min, the same scenario wastes $590/mo without timeouts vs. $49/mo with a 30-minute cap, saving $541/mo from a one-line YAML change. And that does not account for the additional minutes wasted when developers re-trigger the workflow after a timeout.

Fix 1

Set explicit timeout-minutes on every job

GitHub Actions defaults to a 360-minute (6-hour) timeout for every job. That default exists to avoid killing legitimate long-running builds, but it means a hung process burns runner minutes silently for hours. The fix is to add timeout-minutes to every job in your workflow.

A good starting point: set timeout-minutes to roughly 3x the job’s average duration. If a job normally runs in 8 minutes, set the timeout to 25. This gives enough headroom for slow runs without letting a hung process run for 6 hours.

Default: 6-hour timeout

jobs:
  test:
    runs-on: ubuntu-latest
    # No timeout-minutes set
    # Default: 360 minutes (6 hours)
    steps:
      - uses: actions/checkout@v4
      - run: npm test

Explicit: 20-minute cap

jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 20
    steps:
      - uses: actions/checkout@v4
      - run: npm test

One caveat: there is no workflow-level or organization-level default for timeout-minutes. You must set it on each job individually. This is a known gap in GitHub Actions. Tools like ghatm can scan your workflows and add timeout-minutes to every job automatically, using historical run data to set appropriate values.

Fix 2

Add step-level timeouts for network calls and long steps

Job-level timeouts cap total damage, but step-level timeouts let you fail fast on specific operations. A npm ci that normally takes 2 minutes doesn’t need 20 minutes of runway. If it hangs for 5 minutes, something is wrong with the registry or network, and waiting longer won’t fix it.

Add timeout-minutes to individual steps, especially dependency installs, Docker builds, and any step that calls external services.

.github/workflows/ci.yml

jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 25         # Job-level cap
    steps:
      - uses: actions/checkout@v4
      - name: Install dependencies
        run: npm ci
        timeout-minutes: 5       # Normally ~2 min
      - name: Run tests
        run: npm test
        timeout-minutes: 15      # Normally ~8 min
      - name: Upload coverage
        run: ./upload-coverage.sh
        timeout-minutes: 3       # External API call

Step-level timeouts operate independently of the job-level timeout. If a step exceeds its own limit, it fails immediately without waiting for the job-level cap. The remaining steps in the job are skipped, and the job is marked as failed. This means the job-level timeout acts as a backstop while step-level timeouts provide precise control. Combining timeouts with efforts to stabilize CI runtime lets you set tighter limits without hitting false positives.

Fix 3

Use shell-level timeouts for external dependencies

Some operations hang at the process level, such as a curl to an unresponsive API, a database migration waiting on a lock, or a docker pull stalled on a slow registry. These won’t trigger a step-level timeout until the step’s full allocation expires. Use the timeout command (available on all GitHub-hosted Linux runners) to kill individual processes before they exhaust the step budget.

.github/workflows/ci.yml

- name: Wait for service readiness
  run: timeout 120 ./wait-for-service.sh
  timeout-minutes: 5

- name: Download model weights
  run: timeout 300 curl -fsSL -o model.bin https://models.example.com/v2/weights
  timeout-minutes: 8

- name: Run database migration
  run: timeout 60 bin/rails db:migrate
  timeout-minutes: 3

The timeout command takes seconds (not minutes). It sends SIGTERM first, then SIGKILL if the process doesn’t exit. This gives the process a chance to clean up before being force-killed. On macOS runners, use gtimeout from coreutils or a step-level timeout-minutes instead.

Fix 4

Enforce timeouts across all workflows with tooling

Setting timeouts manually is error-prone. New workflows get added without them. Existing workflows accumulate jobs without anyone checking. Use linting and automation to enforce the policy across your organization.

ghatm is a CLI tool that scans all workflow files and adds timeout-minutes: 30 to any job missing it. With the -auto flag and a GitHub token, it queries your repository’s run history via the API and sets timeouts based on actual historical durations.

Terminal

# Add timeout-minutes: 30 to all jobs missing it
ghatm set

# Auto-set based on historical run durations (requires GITHUB_TOKEN)
ghatm set -auto

To prevent regressions, add ghalint to your CI pipeline. It enforces the job_timeout_minutes_is_required policy, failing the build if any job is missing an explicit timeout. Run it as a linting step so new workflows can’t be merged without timeouts:

.github/workflows/lint.yml

- name: Lint GitHub Actions workflows
  run: ghalint run
  env:
    GHALINT_LOG_COLOR: always

Reference

Complete workflow with layered timeouts

Here’s all three layers of timeout protection applied to a typical CI workflow. Job-level timeouts cap total damage, step-level timeouts catch specific hangs early, and shell-level timeouts protect individual process calls.

.github/workflows/ci.yml

name: CI

on:
  push:
    branches: [main]
  pull_request:

concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

jobs:
  lint:
    runs-on: ubuntu-latest
    timeout-minutes: 10              # Job cap
    steps:
      - uses: actions/checkout@v4
      - name: Install
        run: npm ci
        timeout-minutes: 5            # Step cap
      - name: Lint
        run: npm run lint
        timeout-minutes: 5

  test:
    runs-on: ubuntu-latest
    timeout-minutes: 25
    services:
      postgres:
        image: postgres:16
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
    steps:
      - uses: actions/checkout@v4
      - name: Install
        run: npm ci
        timeout-minutes: 5
      - name: Wait for Postgres
        run: timeout 30 bash -c 'until pg_isready; do sleep 1; done'
        timeout-minutes: 2            # Shell + step cap
      - name: Run tests
        run: npm test
        timeout-minutes: 15

  build:
    needs: [lint, test]
    runs-on: ubuntu-latest
    timeout-minutes: 15
    steps:
      - uses: actions/checkout@v4
      - name: Build
        run: npm run build
        timeout-minutes: 10

Reference

Cost of a single 6-hour timeout by runner type

This table shows what one hung job costs if it runs for the full default 360 minutes. Use these numbers to estimate the damage a missing timeout-minutes can cause.

Runner	Rate	6-hour cost
Linux 2-core	$0.006/min	$2.16
Windows 2-core	$0.010/min	$3.60
Linux 8-core	$0.022/min	$7.92
macOS (M1/Intel)	$0.062/min	$22.32
macOS M2 Pro	$0.102/min	$36.72

Free-tier included minutes: 2,000/mo (Free), 3,000/mo (Team/Pro), 50,000/mo (Enterprise). A single macOS timeout at the 10x billing multiplier consumes 3,600 weighted free-tier minutes, which exceeds the entire Free plan monthly quota.

Related guides

Target Rerun Hotspots

Find the 1-3 workflows responsible for most of your rerun spend.

Reduce CI Failures

Separate infrastructure failures from test failures and cut wasted minutes.

Stabilize CI Runtime

Fix cache misses and queue spikes that cause unpredictable pipeline durations.

Cancelled Runs Wasting Minutes

Stop paying for runs that are already obsolete with concurrency groups.

How to tell if CI timeouts are costing you money

Quantify the waste

Set explicit timeout-minutes on every job

Add step-level timeouts for network calls and long steps

Use shell-level timeouts for external dependencies

Enforce timeouts across all workflows with tooling

Complete workflow with layered timeouts

Cost of a single 6-hour timeout by runner type

See which jobs are timing out