Reruns & flakiness

Over-parallelized test suites: when sharding increases CI cost

By Keith Mazanec, Founder, CostOps · Updated January 31, 2026

A team splits their test suite into 10 parallel shards to get faster feedback. Each shard spends 3 minutes on setup (checkout, install, build) and 90 seconds running tests. Total billable minutes jump from 18 to 50. Wall-clock time drops, but the bill doubles. This happens because every shard is a separate GitHub Actions job, and every job pays the full setup tax. Past a certain point, adding shards costs more than it saves. Reducing that per-job setup and install overhead is a key part of the solution.

Symptoms

How to tell if test sharding is costing you more

Open your workflow's job list and compare setup time to test time in each shard. If setup dominates, you have too many shards.

Setup dominates shard runtime. Each shard spends more time on checkout, dependency install, and build than on actual test execution. A shard that runs 3 minutes of setup and 90 seconds of tests is 67% overhead.
Total billable minutes increase with shard count. You added more shards to speed things up, but total minutes went up. Going from 4 shards to 8 didn't halve anything. It just doubled the setup repetitions. GitHub bills each job separately, rounded up to the next minute.
Per-minute rounding inflates short shards. GitHub rounds every job up to the nearest minute. A shard that takes 4 minutes 10 seconds is billed as 5 minutes. With 10 shards, that rounding tax adds up to several extra billed minutes per run.
Uneven shard distribution. One shard finishes in 2 minutes while another takes 8. Static splitting by file count or name doesn't account for test duration. The slowest shard sets wall-clock time, and the fast shards waste their extra capacity.

Metrics

The math behind over-sharding

Every shard repeats the same setup work. The formula is: total minutes = shards × (setup + tests/shards). As shard count grows, total minutes approach shards × setup, and the test time becomes irrelevant.

10 shards (over-parallelized)

Setup per shard 3 min

Tests per shard 1.5 min

Total billable 50 min

Monthly cost (22×6 runs) $39.60/mo

10 shards × 5 min each = 50 min/run · $0.006/min

3 shards (right-sized)

Setup per shard 3 min

Tests per shard 5 min

Total billable 24 min

Monthly cost (22×6 runs) $19.01/mo

Save $20.59/mo · $247/year · per workflow

That's on Linux at $0.006/min. On macOS at $0.062/min, the 10-shard configuration costs $409/mo vs $196/mo for 3 shards, producing a $213/mo savings from reducing shard count alone. The wall-clock difference between 3 shards and 10 is often under 2 minutes; the cost difference is not.

Fix 1

Right-size your shard count

The optimal shard count is where test time per shard is at least 2–3× setup time. If setup takes 3 minutes and total test time is 15 minutes, 3 shards give you 5 minutes of tests per shard (1.7× setup). Going to 5 shards drops test time to 3 minutes per shard (1× setup), which is the crossover point where you're paying more in setup than you're saving in parallelism.

Use the matrix strategy with a controlled shard count. Resist the temptation to match your shard count to your concurrency limit.

10 shards - 50 billable min

strategy:
  matrix:
    shard: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
  fail-fast: false

3 shards - 24 billable min

strategy:
  matrix:
    shard: [1, 2, 3]
  fail-fast: false

To find the right number, measure your setup time (checkout + install + build) and total test time. Divide test time by setup time to get a rough upper bound on useful shards. A test suite that takes 15 minutes with 3 minutes of setup should use at most 5 shards. In practice, 3–4 is better because of per-minute rounding.

Shards	Wall clock	Total billed	Cost/run
1 (no sharding)	18 min	18 min	$0.108
3	8 min	24 min	$0.144
5	6 min	30 min	$0.180
10	5 min	50 min	$0.300

Going from 1 to 3 shards cuts wall time by 56% for a 33% cost increase, which is a good trade. Going from 3 to 10 only cuts wall time by another 3 minutes but doubles the cost. The sweet spot is usually 3–5 shards for a 15-minute test suite.

Fix 2

Share setup work across shards with artifacts

The biggest waste in over-sharding is repeating the same setup in every shard. If each of your 10 shards runs npm ci and npm run build, you're doing that work 10 times. Instead, run setup once in a dedicated job and pass the result to shards via artifacts or cache. This is the build once, use everywhere pattern.

This pattern uses actions/upload-artifact and actions/download-artifact to share the installed node_modules and build output. The shards then skip install and build entirely.

.github/workflows/ci.yml

jobs:
  setup:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci
      - run: npm run build
      - uses: actions/upload-artifact@v4
        with:
          name: build-output
          path: |
            node_modules
            dist
          retention-days: 1

  test:
    needs: setup
    runs-on: ubuntu-latest
    strategy:
      matrix:
        shard: [1/4, 2/4, 3/4, 4/4]
      fail-fast: false
    steps:
      - uses: actions/checkout@v4
      - uses: actions/download-artifact@v4
        with:
          name: build-output
      # No npm ci, no npm run build - already done
      - run: npx jest --shard=${{ matrix.shard }}

With this pattern, setup runs once (3 minutes) and each shard only needs checkout + artifact download (under 1 minute). Four shards that previously cost 4 × (3 + 4) = 28 min now cost 3 + 4 × (1 + 4) = 23 min. At 10 shards: 10 × 5 = 50 min becomes 3 + 10 × 2.5 = 28 min, a 44% reduction.

One caveat: artifact upload/download has its own overhead. For small projects where setup is just npm ci with cached dependencies (under 30 seconds), the artifact round-trip may not save anything. This pattern pays off when setup exceeds 2 minutes.

Fix 3

Use duration-based test splitting

Static sharding (splitting by file count or alphabetically) creates uneven shards. One shard gets all the integration tests (8 minutes), another gets unit tests (45 seconds). The slowest shard sets wall-clock time, and the fast shards finish early with nothing to do. You're paying for idle runners.

Duration-based splitting distributes tests by measured execution time so each shard finishes near the same time. Most test frameworks support this natively or via plugins.

Jest: native --shard (static, but evenly distributed)

# Jest 28+ has native sharding by test file
- run: npx jest --shard=${{ matrix.shard }}

# matrix.shard values: 1/3, 2/3, 3/3
# Jest distributes files evenly across shards

RSpec: parallel_tests with runtime balancing

# Generate runtime data from previous runs
- run: |
    bundle exec parallel_test spec/ \
      --group-by runtime \
      --only-group ${{ matrix.shard }} \
      --type rspec

# Reads tmp/parallel_runtime_rspec.log
# Balances groups by actual execution time

Playwright: native --shard with fullyParallel

- run: npx playwright test --shard=${{ matrix.shard }}

# In playwright.config.ts:
# fullyParallel: true
# Distributes individual tests (not just files)
# across shards for even load balancing

Even splitting reduces the gap between your fastest and slowest shard. If your slowest shard takes 8 minutes and your fastest takes 2, that's 6 minutes of wasted runner time across shards. Even splitting can reduce that gap to under 1 minute, which means fewer total shards needed for the same wall-clock time.

Fix 4

Add fail-fast to stop shards early on failure

By default, fail-fast is true in GitHub Actions matrix jobs. But many teams set it to false to get full test results from all shards. This means all 10 shards keep running even when shard 1 fails in the first minute. On a 5-minute run, that's 45 minutes of wasted compute across the remaining 9 shards.

The tradeoff is real because you lose visibility into failures in other shards. But for cost optimization, consider keeping fail-fast: true on PR checks and disabling it only for main or nightly runs where you need full failure coverage.

.github/workflows/ci.yml

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        shard: [1/4, 2/4, 3/4, 4/4]
      fail-fast: ${{ github.event_name == 'pull_request' }}
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npx jest --shard=${{ matrix.shard }}

With this configuration, PR checks cancel remaining shards on first failure (saving minutes), while pushes to main run all shards to completion (giving you the full picture). On a typical 4-shard setup where failures happen 20% of the time, this saves roughly 3 × 5 min × 0.2 = 3 billed minutes per run on average.

Reference

When to shard, when to consolidate

Sharding is worth the cost when the developer-time savings outweigh the additional CI spend. Use this table to calibrate.

Scenario	Recommendation
Tests < 5 min, setup > 2 min	Don't shard. One job is cheaper.
Tests 5–15 min, setup 1–3 min	2–4 shards. Sweet spot.
Tests 15–30 min, setup < 2 min	4–8 shards with shared setup.
Tests > 30 min, setup < 1 min	8–12 shards. Setup tax is low.
Tests any length, macOS runners	Fewer shards. 10× rate amplifies waste.

GitHub limits matrix jobs to 256 per workflow run. Concurrent job limits depend on plan: 20 (Free), 40 (Pro), 60 (Team), 500 (Enterprise). Exceeding concurrency doesn't cause an error. Instead, jobs queue and wait, which adds wall-clock time without reducing cost. For more on this pattern, see the guide on how queue time causes more runs.

Related guides

Build Once, Use Everywhere

Run setup once and share artifacts across jobs to eliminate repeated work.

Matrix Explosion

Tame runaway matrix combinations that multiply jobs and billable minutes.

Reduce CI Setup and Install Overhead

Cut the per-job setup tax that makes over-sharding expensive.

Too Many Small Jobs

When splitting into small jobs increases overhead more than it reduces wall-clock time.