Guides / Matrix explosion

Too much work per run

Matrix explosion: when parallelism increases cost

By Keith Mazanec, Founder, CostOps · Updated January 29, 2026

A developer adds Windows to the CI matrix alongside Linux and macOS. Another adds Node 22 to the version list. Now the matrix is 4 versions × 3 operating systems × 2 databases = 24 jobs per push. Each job spins up its own runner, installs dependencies from scratch, and gets billed independently, with each job rounded up to the nearest minute. The matrix was 6 jobs a month ago. Nobody noticed it tripled.

Symptoms

How to tell if your matrix is costing more than it's worth

Matrix growth is gradual. Each new dimension seems harmless in isolation. Look for these signs in your Actions tab:

  • Job count multiplies faster than coverage value. Adding one entry to a matrix dimension doesn't add one job. Instead, it multiplies across all other dimensions. A 4 × 3 matrix becomes 5 × 3 = 15 jobs, not 13. Every workflow run fans out into dozens of jobs, most of which pass identically.

  • Billable minutes far exceed wall-clock time. A workflow that takes 8 minutes to complete shows 120+ billable minutes. That's because each of the 15+ matrix jobs runs independently, each billed at a minimum of 1 minute, and each repeating checkout, install, and setup steps.

  • Cross-platform jobs that never fail differently. You test on Windows, Linux, and macOS, but failures are always in the application logic, never OS-specific. The Windows and macOS jobs pass for months without catching a single platform-specific bug, yet they consume 2x and 10x the per-minute rate of Linux.

  • Non-LTS versions in the matrix. The matrix includes Node 17, 19, and 21, all odd-numbered versions that are already end-of-life. Or it tests Python 3.8 for an application that requires 3.11+. These jobs consume minutes without providing actionable signal.

Metrics

The combinatorial math behind matrix cost

GitHub Actions bills each matrix job independently, rounded up to the nearest minute. A 24-job matrix where each job runs for 8 minutes costs 24 × 8 = 192 billable minutes per workflow run, even though wall-clock time is just 8 minutes. Compare that to running 4 representative jobs:

Full cross-product matrix

Matrix jobs 24
Minutes/job 8
Billable min/run 192
Runs/day 20
Monthly cost $506/mo

At $0.006/min (Linux 2-core) · 22 working days

Reduced matrix on PRs (4 jobs)

Matrix jobs 4
Minutes/job 8
Billable min/run 32
Runs/day 20
Monthly cost $84/mo

Save $422/mo · $5,064/year · per workflow

That's all Linux. If the matrix includes macOS jobs at $0.062/min, a single macOS matrix dimension with 8 entries costs $1,743/mo on its own. Dropping macOS from PR builds and running it only on main can save thousands per year from one workflow change.


Fix 1

Use a smaller matrix on pull requests

Most PR builds don't need the full compatibility matrix. A developer changing application logic doesn't need to validate against 4 Node versions and 3 operating systems on every push. Run a representative subset on PRs, and save the full matrix for main or release branches where the broader coverage actually matters.

The cleanest approach uses a prep job that outputs different JSON matrices based on the event type. The build job consumes that output via fromJSON.

.github/workflows/ci.yml
name: CI

on:
  push:
    branches: [main]
  pull_request:

jobs:
  matrix-prep:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    steps:
      - id: set-matrix
        run: |
          if [ "${{ github.event_name }}" == "push" ]; then
            # Full matrix on main: 4 versions × 3 OS = 12 jobs
            echo 'matrix={"node-version":["18","20","22","23"],"os":["ubuntu-latest","windows-latest","macos-latest"]}' >> $GITHUB_OUTPUT
          else
            # PR matrix: 2 versions × 1 OS = 2 jobs
            echo 'matrix={"node-version":["20","22"],"os":["ubuntu-latest"]}' >> $GITHUB_OUTPUT
          fi

  test:
    needs: matrix-prep
    strategy:
      matrix: ${{ fromJSON(needs.matrix-prep.outputs.matrix) }}
      fail-fast: true
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}
      - run: npm ci
      - run: npm test

This drops the PR matrix from 12 jobs to 2, which is an 83% reduction in billable minutes per PR run. The full 12-job matrix still runs on every merge to main, so you don't lose coverage. You just stop paying for it on every intermediate push.

One caveat: the prep job itself consumes a billed minute (rounded up from a few seconds of shell execution). For matrices with fewer than 3 jobs, the overhead of the prep job may not be worth it. For matrices above 6 jobs, the savings are substantial.

Fix 2

Replace cross-products with exact combinations

The default matrix behavior computes a Cartesian product: every value in every dimension combined with every value in every other dimension. But you rarely need all those combinations. You probably don't need to test Node 18 on macOS and Node 20 on macOS and Node 22 on macOS. You need Node 18 on Linux, Node 22 on Linux, and Node 22 on macOS.

GitHub supports an include-only matrix that skips the cross-product entirely. You list exactly the combinations you want, and only those jobs run.

Cross-product: 12 jobs
strategy:
  matrix:
    node: [18, 20, 22, 23]
    os:
      - ubuntu-latest
      - windows-latest
      - macos-latest

# 4 × 3 = 12 jobs
# Most combinations add no signal
Include-only: 5 jobs
strategy:
  matrix:
    include:
      - node: 18
        os: ubuntu-latest
      - node: 20
        os: ubuntu-latest
      - node: 22
        os: ubuntu-latest
      - node: 22
        os: windows-latest
      - node: 22
        os: macos-latest

Same coverage where it matters: all Node versions validated on Linux, and the latest version validated cross-platform. That takes the matrix from 12 jobs to 5, a 58% reduction. The include-only approach also makes the matrix self-documenting: anyone reading the workflow sees exactly what runs, instead of having to mentally compute a cross-product.

Fix 3

Remove matrix dimensions that don't catch bugs

Every dimension in the matrix should earn its place by catching bugs that other dimensions miss. If the Windows jobs haven't caught a single platform-specific failure in 6 months, they're not providing signal, just generating invoices. Audit each dimension against your failure history.

Dimension Keep if Drop if
OS variants You ship native binaries or use OS-specific APIs App only deploys to Linux containers
Runtime versions You publish a library consumed on multiple versions You control the runtime in production (one version)
Database engines You support Postgres and MySQL in production You only deploy against one database
Non-LTS versions You need to verify upcoming breaking changes The version is EOL and not used by consumers

A common pattern in Node.js projects: the matrix tests versions 16, 18, 20, 22. But Node 16 reached end-of-life in September 2023, and Node 18 maintenance ended in April 2025. If your package.json specifies "engines": { "node": ">=20" }, testing 16 and 18 is pure waste. Removing two entries from a 4 × 3 matrix cuts it from 12 to 6 jobs, effectively halving your CI bill for that workflow.

Fix 4

Use fail-fast to stop wasting minutes on broken builds

When one matrix job fails, do you need the other 23 to keep running? By default, GitHub Actions sets fail-fast: true, which cancels all remaining matrix jobs when any job fails. This is the correct default for cost optimization, but it can be accidentally disabled.

Verify your workflows haven't set fail-fast: false. Some teams disable it to "see all failures at once," but on a 24-job matrix with an 8-minute runtime, a failure in minute 2 means 22 jobs run for 6 unnecessary minutes each: 132 wasted minutes per failed run.

fail-fast disabled
strategy:
  fail-fast: false
  matrix:
    node: [18, 20, 22, 23]
    os: [ubuntu-latest, windows-latest]

# Job 1 fails at minute 2
# Jobs 2-8 keep running to completion
# Billed: 8 × 8 = 64 minutes
fail-fast enabled (default)
strategy:
  fail-fast: true
  matrix:
    node: [18, 20, 22, 23]
    os: [ubuntu-latest, windows-latest]

# Job 1 fails at minute 2
# Jobs 2-8 cancelled within seconds
# Billed: ~10 minutes total

If you need all failures visible for debugging, consider a hybrid approach: use fail-fast: true on PRs (where speed and cost matter) and fail-fast: false only on scheduled nightly runs where you specifically want the full failure report. For more on stopping canceled runs from wasting minutes, see our dedicated guide.


Reference

How matrix size affects monthly cost

Use this table to estimate the cost impact of your matrix configuration. Assumes 8 min/job, 20 runs/day, 22 working days/month.

Matrix size Min/run Linux/mo Mixed OS/mo
2 jobs 16 $42 -
6 jobs 48 $127 $549
12 jobs 96 $253 $1,098
24 jobs 192 $506 $2,196

"Mixed OS" assumes equal splits across Linux ($0.006/min), Windows ($0.010/min), and macOS ($0.062/min), which is the typical result of adding an OS dimension to a matrix. The macOS jobs alone account for roughly 80% of the mixed-OS cost. If most of your jobs only need Linux, you may also be overpaying for runner types you don't need. GitHub imposes a limit of 256 jobs per matrix per workflow run.

Related guides

Guides / Matrix explosion

See which matrix builds cost the most

CostOps breaks down billable minutes per matrix job, shows which combinations never fail, and identifies the dimensions you can safely drop.

Free for 1 repo. No credit card. No code access.

Built by engineers who've managed CI spend at scale.