Progressive Rollouts in Practice: From 1% to 100% Without Breaking Production

Why 100% rollouts are a trap

Most teams deploy to 100% of users on day one. The reasoning is pragmatic: staging environments do not reproduce production load, beta testers do not behave like real users, and keeping two code paths alive adds complexity.

The problem is that production is the only environment where you find out what you missed. And by the time you know something is wrong, 100% of your users are already affected.

Progressive rollouts flip this tradeoff. You expose a controlled slice of traffic to new code, observe real behaviour under real load, and only proceed when you have confidence.

The rollout ladder

A typical rollout ladder looks like this:

Stage	Audience	Goal
0.1%	Internal team	Catch obvious crashes before any external exposure
1%	Canary users	Validate under real load, detect performance regressions
10%	Early adopters	Collect behavioural signals, validate metrics
50%	Majority	Final check before full exposure
100%	Everyone	Flag becomes the default, code path can be cleaned up

Each stage has a gate: you only advance if your key metrics hold.

What to measure at each stage

The metrics that matter depend on what the feature touches. A checkout flow has different failure modes than a recommendation algorithm. But there are universal signals to watch regardless:

Error rates — your p99 error rate at 1% should not diverge meaningfully from baseline. If it does, stop.

Latency — watch p95 and p99, not just average. Averages mask tail latency regressions.

Business metrics — conversion rate, retention, revenue per session, depending on what the feature is expected to influence.

Infra load — CPU, memory, database query counts. New features often introduce N+1 queries that are invisible in staging.

Deterministic bucketing matters

A rollout is only meaningful if the same user consistently gets the same experience. If a user sees the new checkout on one page load and the old one on the next, your metrics are noise.

Signal uses deterministic hash-based bucketing:

bucket = hash(userId + flagKey) % 100

The same user ID always maps to the same bucket. No sessions, no sticky routing, no server affinity. This works across multiple instances, serverless functions, and horizontal pod scaling.

Combining rollout with targeting

Progressive rollout and targeting rules compose. You can roll out to 10% of users — but only those on the enterprise plan:

// Signal evaluates: user must match targeting rules AND fall within rollout percentage
const enabled = await signal.isEnabled('checkout-v2', {
  userId: user.id,
  plan: user.plan,
  country: user.country,
});

This is useful for gradual exposure within a specific segment — releasing a feature to enterprise customers before free tier, or to a specific region before global rollout.

When to pause (and when to roll back)

Pausing is not failure. It is the signal working as intended.

Roll back immediately if:

Error rate increases by more than 1-2% relative to baseline
p99 latency increases significantly
A critical business metric drops in a statistically significant way

Pause and investigate if:

Metrics are trending in the wrong direction but the change is small
You see anomalous patterns in logs that are not yet reflected in aggregates
Infrastructure costs are higher than expected

Signal's rollback is a flag toggle — it takes effect in under 50ms across all connected instances. No redeploy, no incident bridge, no rollback commit.

Cleaning up after you ship

Once a feature reaches 100% and has been stable for a release cycle, the flag is dead code. Remove it.

Dead flags accumulate fast. A codebase with hundreds of permanent flags becomes unreadable. The discipline of treating flag removal as part of the feature cycle is as important as the rollout strategy itself.

A clean audit log of flag usage tells you which flags are still evaluating, which have not been triggered in 30 days, and which are safe to delete. This is on the Signal roadmap.