Reruns & flakiness
Over-parallelized test suites: when sharding increases CI cost
By Keith Mazanec, Founder, CostOps ยท Updated January 31, 2026
A team splits their test suite into 10 parallel shards to get faster feedback. Each shard spends 3 minutes on setup (checkout, install, build) and 90 seconds running tests. Total billable minutes jump from 18 to 50. Wall-clock time drops, but the bill doubles. This happens because every shard is a separate GitHub Actions job, and every job pays the full setup tax. Past a certain point, adding shards costs more than it saves. Reducing that per-job setup and install overhead is a key part of the solution.
Symptoms
How to tell if test sharding is costing you more
Open your workflow's job list and compare setup time to test time in each shard. If setup dominates, you have too many shards.
-
Setup dominates shard runtime. Each shard spends more time on checkout, dependency install, and build than on actual test execution. A shard that runs 3 minutes of setup and 90 seconds of tests is 67% overhead.
-
Total billable minutes increase with shard count. You added more shards to speed things up, but total minutes went up. Going from 4 shards to 8 didn't halve anything. It just doubled the setup repetitions. GitHub bills each job separately, rounded up to the next minute.
-
Per-minute rounding inflates short shards. GitHub rounds every job up to the nearest minute. A shard that takes 4 minutes 10 seconds is billed as 5 minutes. With 10 shards, that rounding tax adds up to several extra billed minutes per run.
-
Uneven shard distribution. One shard finishes in 2 minutes while another takes 8. Static splitting by file count or name doesn't account for test duration. The slowest shard sets wall-clock time, and the fast shards waste their extra capacity.
Metrics
The math behind over-sharding
Every shard repeats the same setup work. The formula is: total minutes = shards × (setup + tests/shards). As shard count grows, total minutes approach shards × setup, and the test time becomes irrelevant.
10 shards (over-parallelized)
10 shards × 5 min each = 50 min/run · $0.006/min
3 shards (right-sized)
Save $20.59/mo · $247/year · per workflow
That's on Linux at $0.006/min. On macOS at $0.062/min, the 10-shard configuration costs $409/mo vs $196/mo for 3 shards, producing a $213/mo savings from reducing shard count alone. The wall-clock difference between 3 shards and 10 is often under 2 minutes; the cost difference is not.
Fix 1
Right-size your shard count
The optimal shard count is where test time per shard is at least 2–3× setup time. If setup takes 3 minutes and total test time is 15 minutes, 3 shards give you 5 minutes of tests per shard (1.7× setup). Going to 5 shards drops test time to 3 minutes per shard (1× setup), which is the crossover point where you're paying more in setup than you're saving in parallelism.
Use the matrix strategy with a controlled shard count. Resist the temptation to match your shard count to your concurrency limit.
strategy: matrix: shard: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] fail-fast: false
strategy: matrix: shard: [1, 2, 3] fail-fast: false
To find the right number, measure your setup time (checkout + install + build) and total test time. Divide test time by setup time to get a rough upper bound on useful shards. A test suite that takes 15 minutes with 3 minutes of setup should use at most 5 shards. In practice, 3–4 is better because of per-minute rounding.
| Shards | Wall clock | Total billed | Cost/run |
|---|---|---|---|
| 1 (no sharding) | 18 min | 18 min | $0.108 |
| 3 | 8 min | 24 min | $0.144 |
| 5 | 6 min | 30 min | $0.180 |
| 10 | 5 min | 50 min | $0.300 |
Going from 1 to 3 shards cuts wall time by 56% for a 33% cost increase, which is a good trade. Going from 3 to 10 only cuts wall time by another 3 minutes but doubles the cost. The sweet spot is usually 3–5 shards for a 15-minute test suite.
Fix 2
Share setup work across shards with artifacts
The biggest waste in over-sharding is repeating the same setup in every shard. If each of your 10 shards runs npm ci and npm run build, you're doing that work 10 times. Instead, run setup once in a dedicated job and pass the result to shards via artifacts or cache. This is the build once, use everywhere pattern.
This pattern uses actions/upload-artifact and actions/download-artifact to share the installed node_modules and build output. The shards then skip install and build entirely.
jobs: setup: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 20 - run: npm ci - run: npm run build - uses: actions/upload-artifact@v4 with: name: build-output path: | node_modules dist retention-days: 1 test: needs: setup runs-on: ubuntu-latest strategy: matrix: shard: [1/4, 2/4, 3/4, 4/4] fail-fast: false steps: - uses: actions/checkout@v4 - uses: actions/download-artifact@v4 with: name: build-output # No npm ci, no npm run build - already done - run: npx jest --shard=${{ matrix.shard }}
With this pattern, setup runs once (3 minutes) and each shard only needs checkout + artifact download (under 1 minute). Four shards that previously cost 4 × (3 + 4) = 28 min now cost 3 + 4 × (1 + 4) = 23 min. At 10 shards: 10 × 5 = 50 min becomes 3 + 10 × 2.5 = 28 min, a 44% reduction.
One caveat: artifact upload/download has its own overhead. For small projects where setup is just npm ci with cached dependencies (under 30 seconds), the artifact round-trip may not save anything. This pattern pays off when setup exceeds 2 minutes.
Fix 3
Use duration-based test splitting
Static sharding (splitting by file count or alphabetically) creates uneven shards. One shard gets all the integration tests (8 minutes), another gets unit tests (45 seconds). The slowest shard sets wall-clock time, and the fast shards finish early with nothing to do. You're paying for idle runners.
Duration-based splitting distributes tests by measured execution time so each shard finishes near the same time. Most test frameworks support this natively or via plugins.
# Jest 28+ has native sharding by test file - run: npx jest --shard=${{ matrix.shard }} # matrix.shard values: 1/3, 2/3, 3/3 # Jest distributes files evenly across shards
# Generate runtime data from previous runs - run: | bundle exec parallel_test spec/ \ --group-by runtime \ --only-group ${{ matrix.shard }} \ --type rspec # Reads tmp/parallel_runtime_rspec.log # Balances groups by actual execution time
- run: npx playwright test --shard=${{ matrix.shard }} # In playwright.config.ts: # fullyParallel: true # Distributes individual tests (not just files) # across shards for even load balancing
Even splitting reduces the gap between your fastest and slowest shard. If your slowest shard takes 8 minutes and your fastest takes 2, that's 6 minutes of wasted runner time across shards. Even splitting can reduce that gap to under 1 minute, which means fewer total shards needed for the same wall-clock time.
Fix 4
Add fail-fast to stop shards early on failure
By default, fail-fast is true in GitHub Actions matrix jobs. But many teams set it to false to get full test results from all shards. This means all 10 shards keep running even when shard 1 fails in the first minute. On a 5-minute run, that's 45 minutes of wasted compute across the remaining 9 shards.
The tradeoff is real because you lose visibility into failures in other shards. But for cost optimization, consider keeping fail-fast: true on PR checks and disabling it only for main or nightly runs where you need full failure coverage.
jobs: test: runs-on: ubuntu-latest strategy: matrix: shard: [1/4, 2/4, 3/4, 4/4] fail-fast: ${{ github.event_name == 'pull_request' }} steps: - uses: actions/checkout@v4 - run: npm ci - run: npx jest --shard=${{ matrix.shard }}
With this configuration, PR checks cancel remaining shards on first failure (saving minutes), while pushes to main run all shards to completion (giving you the full picture). On a typical 4-shard setup where failures happen 20% of the time, this saves roughly 3 × 5 min × 0.2 = 3 billed minutes per run on average.
Reference
When to shard, when to consolidate
Sharding is worth the cost when the developer-time savings outweigh the additional CI spend. Use this table to calibrate.
| Scenario | Recommendation |
|---|---|
| Tests < 5 min, setup > 2 min | Don't shard. One job is cheaper. |
| Tests 5–15 min, setup 1–3 min | 2–4 shards. Sweet spot. |
| Tests 15–30 min, setup < 2 min | 4–8 shards with shared setup. |
| Tests > 30 min, setup < 1 min | 8–12 shards. Setup tax is low. |
| Tests any length, macOS runners | Fewer shards. 10× rate amplifies waste. |
GitHub limits matrix jobs to 256 per workflow run. Concurrent job limits depend on plan: 20 (Free), 40 (Pro), 60 (Team), 500 (Enterprise). Exceeding concurrency doesn't cause an error. Instead, jobs queue and wait, which adds wall-clock time without reducing cost. For more on this pattern, see the guide on how queue time causes more runs.
Related guides
Build Once, Use Everywhere
Run setup once and share artifacts across jobs to eliminate repeated work.
Matrix Explosion
Tame runaway matrix combinations that multiply jobs and billable minutes.
Reduce CI Setup and Install Overhead
Cut the per-job setup tax that makes over-sharding expensive.
Too Many Small Jobs
When splitting into small jobs increases overhead more than it reduces wall-clock time.