How to Know if Your Engineering Org Has a Continuity Problem

Continuity problems in engineering are invisible until they're catastrophic. The team feels productive, ships steadily, and then a single engineer leaves and three systems become unmaintainable. The early symptoms are subtle but recognizable. If you see four or more of the ten below, your org has a continuity problem you can fix in 60-90 days.

Why continuity goes unnoticed

Engineering orgs measure deploys, incidents, and shipping velocity. None of those degrade gracefully when continuity weakens — they hold steady until they collapse. The leading indicators are different: context concentration, decision opacity, ramp times that quietly extend.

Without monitoring the leading indicators, you discover the problem only when someone resigns.

Symptom 1: questions that bounce around for hours

If "who knows about X?" goes through 3-4 people in Slack before resolution, the team doesn't have clear ownership. Healthy teams answer that question in one message.

Symptom 2: decisions that get reversed by new engineers

A newer engineer makes a change that quietly reverses an older decision because they didn't know it existed. This means decisions aren't queryable, and the org will keep relitigating the same choices forever.

Symptom 3: ramp time extending without explanation

If new hires are taking 12-16 weeks to ramp where they used to take 6-8, you're losing the artifacts that compress learning. Each cohort starts the climb a little higher up the mountain.

Symptom 4: "only X knows that" said often

Listen for this phrase. Once a week is normal — there are specialists. Three times a week means the bus factor is 1 on multiple surfaces. Five times means the org is fragile.

Symptom 5: incidents resolved without runbook updates

Every incident is a learning event. If runbooks aren't updated after incidents — within the same week — the org is leaking knowledge faster than it accumulates.

Symptom 6: senior engineers spending half their time re-explaining

If your seniors are constantly re-explaining decisions to junior engineers and peers, the knowledge isn't externalized. It lives in one head and gets transmitted by oral history.

Symptom 7: vacation panic

When an engineer's vacation triggers "I hope nothing breaks while they're out," you've identified a single point of failure. Healthy continuity means vacations are routine, not anxiety triggers.

Diagnose the Continuity Gap

StandIn surfaces continuity risk through declared state — unowned surfaces, hoarded context, missing handoffs — before it costs you a quarter.

See the Workflow →

Symptom 8: PRs that sit waiting for one specific reviewer

Multiple engineers' PRs all sitting waiting for the same person means that person is the only qualified reviewer for that surface. The reviewer is exhausted; the team is bottlenecked; the surface has a continuity risk.

Symptom 9: leaders saying "we'll figure it out"

When leadership defers structural decisions about ownership, decision authority, or documentation with "we'll figure it out as we grow," they're betting on continuity holding through scale. That bet usually loses.

Symptom 10: post-departure discovery shocks

An engineer leaves and over the next three months the team discovers things they didn't know they didn't know. This is the late-stage symptom — the diagnosis is already overdue.

What to do once you see the pattern

Three immediate moves:

Run a bus-factor audit. List every surface and ask: how many engineers could maintain this if the primary left tomorrow? Anywhere the answer is one or zero, prioritize.
Build the decision archive. Even backfilled — a quarter's worth of decisions written in the four-field format — is enough to start.
Pair the context-concentrators with named backups. 90 days of focused pairing transfers most surface knowledge.

Common failure modes

Failure: treating continuity as a documentation problem. It's a structural problem with documentation as one output. The structural fixes — ownership maps, decision archives, pairing — produce the documentation as a side effect.

Failure: addressing it only after an incident. The cost of a single critical-system departure usually pays for years of continuity work. Front-load.

Failure: assuming AI will fix it. AI surfaces what's documented; it can't extract what's only in heads. Build the artifacts first; AI amplifies them, doesn't substitute for them.

What to do tomorrow

Walk the ten symptoms above. Count which apply to your org. If four or more, the program is the next 60-90 days of focused continuity work. If fewer, periodic audit is enough.

Frequently asked questions

How often should we audit continuity?

Quarterly is enough for most orgs. Specifically: bus-factor audit per surface, ramp-time check on the latest cohort, decision archive review. 90 minutes per quarter prevents the worst surprises.

What's the single highest-leverage fix?

Surface ownership with named backups. The map alone doesn't fix continuity, but it identifies what to fix. From the map, the rest of the program follows.

Can a small team have a continuity problem?

Especially a small team. Smaller teams have higher concentration risk per person. A 10-engineer team can have a worse bus factor than a 50-engineer team.