Runner mismatch

Always-on self-hosted runners are burning money

By Keith Mazanec, Founder, CostOps · Updated January 31, 2026

Your team provisions 5 EC2 instances as self-hosted GitHub Actions runners. They run 24/7. CI runs during business hours, roughly 8 hours a day, 5 days a week. The other 76% of the time, those machines sit idle, burning compute you never use. Self-hosted runners don't show up on your GitHub bill, which makes the waste invisible until someone audits the AWS invoice. The fix is autoscaling with scale-to-zero, and the tooling is mature enough that there's no reason to keep paying for idle.

Symptoms

How to tell if your self-hosted runners are wasting money

Self-hosted runner waste is harder to spot than GitHub-hosted waste because it doesn't appear in your Actions usage report. Look for these patterns instead:

Low runner utilization. Check your EC2/VM monitoring. If average CPU utilization is below 20–30% over a week, the machines are mostly idle. CI workloads are bursty, peaking during code review hours and dropping to near-zero at night and on weekends. If queue wait times are long enough that developers re-push, see how queue time causes more runs.
Fixed runner count regardless of demand. Your infrastructure runs the same number of instances at 3 AM on Saturday as it does at 10 AM on Tuesday. If runner count never changes, you're paying peak capacity at all times.
Self-hosted cost exceeds equivalent GitHub-hosted cost. A c5.xlarge (4 vCPU) on EC2 costs $124/mo on-demand. GitHub's hosted 4-core Linux runner costs $0.016/min. If your self-hosted runner handles fewer than 7,750 minutes/month of actual CI work, you're paying more than GitHub would charge for the same compute. For guidance on matching GitHub-hosted runner size to workload, see right-sizing overpowered runners.
On-demand instances instead of spot. CI jobs are short-lived and stateless, making them ideal for spot instances. If your runners use on-demand pricing, you're paying a 50–60% premium for reliability guarantees CI doesn't need.

Metrics

What idle runners actually cost

The math starts with utilization. A team that runs CI during business hours (8h/day, 5 days/week) uses runners for 174 hours/month out of 730 total. That's 24% utilization. The other 76% is idle compute. Here's what that looks like for a 5-runner pool on c5.xlarge (4 vCPU, 8 GB):

Always-on (on-demand)

Runners 5

Hours/mo each 730

Rate (c5.xlarge) $0.17/hr

Monthly cost $621/mo

5 × 730h × $0.17/hr · 24% utilized

Autoscaled (spot, scale-to-zero)

Avg runners active 2.5

Hours/mo each 174

Rate (c5.xlarge spot) $0.08/hr

Monthly cost $35/mo

Save $586/mo · $7,032/year · 94% reduction

The savings come from two factors: scale-to-zero eliminates idle hours (730h → 174h per runner), and spot instances cut the per-hour rate 53% ($0.17 → $0.08 for c5.xlarge). Combined, a 5-runner pool drops from $621/mo to $35/mo. Scale that to 10 or 20 runners and the annual savings are in the tens of thousands.

Fix 1

Switch to ephemeral runners

Persistent self-hosted runners stay registered and wait for jobs indefinitely. Ephemeral runners accept exactly one job, execute it, then de-register and shut down. This is the foundation for scale-to-zero because without ephemeral mode, your autoscaler can't safely terminate runners between jobs.

GitHub recommends ephemeral runners for all autoscaling setups. The --ephemeral flag during registration tells the runner to exit after completing one job. This also gives you a clean environment per job, reducing the risk of cross-job contamination from leftover state.

Runner registration (ephemeral mode)

# Persistent runner (default) - stays alive between jobs
./config.sh --url https://github.com/org --token TOKEN

# Ephemeral runner - exits after one job
./config.sh --url https://github.com/org --token TOKEN --ephemeral

# For containerized runners, also disable auto-updates:
./config.sh --url https://github.com/org --token TOKEN --ephemeral --disableupdate

One caveat: ephemeral runners don't persist logs locally after the job completes, since the runner is destroyed. Forward logs to an external store (CloudWatch, Datadog, etc.) if you need post-job debugging beyond what GitHub's UI shows.

Fix 2

Use Actions Runner Controller for Kubernetes

If your team already runs Kubernetes, Actions Runner Controller (ARC) is the official GitHub-supported autoscaler. It runs runner pods on your cluster, scaling up when jobs are queued and back to zero when idle. The Runner Scale Sets mode maintains a long-poll connection to GitHub's Actions Service, so it receives job notifications instantly without requiring any webhook infrastructure.

Scale-to-zero is the default behavior. When both minRunners and maxRunners are omitted, ARC creates pods only when jobs are queued and terminates them after completion. Set minRunners to keep a warm pool if cold-start latency matters for your workflow.

ARC Runner Scale Set (Helm values)

# values.yaml for gha-runner-scale-set Helm chart
githubConfigUrl: "https://github.com/your-org"
githubConfigSecret: github-app-credentials

# Scale-to-zero (default - omit both for auto behavior)
# minRunners: 0
# maxRunners: 0

# Bounded scaling - warm pool + cap
minRunners: 0     # scale to zero when idle
maxRunners: 20    # cap concurrent runners

template:
  spec:
    containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        resources:
          requests:
            cpu: "2"
            memory: "4Gi"
          limits:
            cpu: "4"
            memory: "8Gi"

For maximum savings, run ARC on spot/preemptible node pools in your cluster. CI pods are short-lived and stateless, making them ideal spot workloads. If a spot node is reclaimed mid-job, GitHub Actions will re-queue the job automatically.

One caveat: ARC requires Kubernetes expertise to operate. GitHub's own docs note that ARC is recommended only if "the team deploying it has expert Kubernetes knowledge and experience." If you don't run Kubernetes today, deploying a cluster just for CI runners adds operational overhead that may exceed the savings.

Fix 3

Use terraform-aws-github-runner for EC2 autoscaling

For teams on AWS without Kubernetes, terraform-aws-github-runner (originally by Philips Labs) provides serverless autoscaling. It uses a Lambda function that listens for workflow_job webhooks from GitHub. When a job is queued matching your runner labels, the Lambda creates an ephemeral EC2 instance via the CreateFleet API. After the job completes, the instance terminates.

The module defaults to spot instances with automatic on-demand fallback for capacity errors. It also supports instance type diversification by requesting across multiple instance families to reduce spot interruption probability.

main.tf (terraform-aws-github-runner)

module "github_runner" {
  source  = "github-aws-runners/github-runner/aws"

  github_app = {
    id             = var.github_app_id
    key_base64     = var.github_app_key_base64
    webhook_secret = var.github_webhook_secret
  }

  instance_types = ["c5.xlarge", "c5a.xlarge", "c6i.xlarge"]

  runners = {
    linux = {
      os   = "linux"
      arch = "x64"

      # Spot instances with on-demand fallback
      market_options = "spot"

      # Ephemeral by default - one job per instance
      ephemeral = true

      # Optional: warm pool during business hours
      idle = {
        count = 2
        schedule = "cron(0 14 ? * MON-FRI *)"  # 9 AM EST
      }
    }
  }
}

The idle block is optional. It keeps a warm pool of runners during scheduled hours to eliminate cold-start latency (typically 60–90 seconds for EC2 boot + runner registration). Outside those hours, the pool scales to zero. This gives you warm starts during peak demand without paying for overnight or weekend idle time.

Fix 4

Switch from on-demand to spot instances

Even if you already autoscale, using on-demand instances for CI runners leaves money on the table. AWS spot instances offer the same compute at 50–60% lower cost for common CI instance types. CI workloads are ideal for spot: jobs are short-lived (minutes, not hours), stateless (no data loss on interruption), and fault-tolerant (GitHub re-queues interrupted jobs).

AWS provides a 2-minute warning before spot reclamation. Historical data shows that 95% of spot instances run to completion without interruption, and interruption rates for compute-optimized instances (c5, c6i) are among the lowest.

Spot vs on-demand cost comparison

Instance	On-Demand	Spot	Savings
m5.large (2 vCPU)	$0.096/hr	$0.039/hr	59%
c5.xlarge (4 vCPU)	$0.170/hr	$0.080/hr	53%
c5.2xlarge (8 vCPU)	$0.340/hr	$0.137/hr	60%

To minimize interruption risk, request across multiple instance types (e.g., c5.xlarge, c5a.xlarge, c6i.xlarge) and availability zones. Both ARC and terraform-aws-github-runner support this via instance type diversification. If a spot instance is reclaimed mid-job, the CI system re-queues the job, so the developer sees a retry, not a failure.

Reference

Which autoscaling solution to use

The right choice depends on your existing infrastructure. All three options support scale-to-zero and ephemeral runners.

Solution	Best for	Cold start
ARC	Teams already running Kubernetes	Seconds–1 min
terraform-aws-github-runner	AWS teams without Kubernetes	60–90 sec
Custom webhook + ASG	Teams needing full control	60–90 sec

If you're not on AWS or Kubernetes, you can build a custom autoscaler using GitHub's workflow_job webhook. The queued event tells you when to create a runner; the completed event tells you when to destroy it. Both ARC and the Terraform module are open-source implementations of this pattern.

Team / Enterprise Custom runner groups (controlling which repos can use which runners) require GitHub Team or Enterprise. Self-hosted runners themselves are available on all plans, including Free.

Reference

Cost comparison by strategy

All numbers assume a single c5.xlarge equivalent with business-hours CI usage (174 hours/month of actual compute). Multiply by your runner count for total spend.

Strategy	Monthly	vs Always-On
Always-on (on-demand)	$124	baseline
Always-on (spot)	$58	–53%
Autoscaled (on-demand)	$30	–76%
Autoscaled (spot + scale-to-zero)	$14	–89%

The biggest single improvement is scale-to-zero (76% reduction alone). Adding spot instances brings it to 89%. If you can only do one thing, start with autoscaling, since the instance pricing optimization is secondary.

Related guides

Overpowered Runners for Lightweight Jobs

Match runner type to job weight and stop overpaying for trivial CI steps.

Underpowered Runners Slowing CI

Right-size CPU and memory so builds aren't bottlenecked on undersized runners.

Queue Time Causes More Runs

When runners are busy, developers re-push and multiply CI cost.

Reduce CI Setup and Install Overhead

Cache dependencies and consolidate setup steps to cut per-job overhead.