On-call across timezones either works very well or fails catastrophically. There is no middle ground — either the rotation is designed for asynchrony or the alerts pile up while the responder is asleep. The teams that get it right share four concrete practices.
Design the rotation for full coverage, not heroism
The simplest design that works: follow-the-sun, with three rotations of roughly 8 hours each, each in a timezone where someone is awake. Three engineers cover all 24 hours. The handoff happens at zone change.
If you have fewer than three zones, you have an alerting problem to solve before you have an on-call problem to solve. Loud, untriaged alerts at 3am will burn out the most resilient engineer in six months.
Write the shift handoff as a record
At the end of each on-call shift, write a structured handoff: what fired, what was resolved, what was suppressed and why, what is still open, what the incoming responder should watch for. Three minutes of writing.
Without this, the incoming responder rediscovers the same false alarm three times in a week.
Keep runbooks current — or stop calling them runbooks
The single biggest failure mode of distributed on-call is stale runbooks. The responder gets paged at 2am their time, opens the runbook, and the first three commands no longer work. That's a 20-minute time tax during an incident and a confidence collapse for the rest of the shift.
The fix: after every incident, the resolver updates the runbook. Not a separate ticket — same-day, before they sign off the shift. If the team can't sustain that, the runbook should be deleted; a wrong runbook is worse than no runbook.
Distinguish wake-me-up from morning-triage alerts
If everything pages at 2am, nothing matters. Two tiers minimum: page-now (active customer impact, security, data loss) and queue-for-morning (degraded but not broken). Tune ruthlessly.
Every page that turns out to be "not actually urgent" gets downgraded the same day. The on-call quality ratchet only works if you tune in real time.
On-Call That Hands Off Cleanly
StandIn structures shift wraps with open incidents, suppressions, and next actions — so handoff is a record, not a Slack scroll.
See the Workflow →Authority during incidents must be explicit
The on-call engineer needs explicit authority to do specific things without escalation: revert deploys, scale services, kill rogue jobs, page secondary responders. Write the authority down. "On-call has authority to revert any deploy from the last 24 hours without manager approval" is the kind of sentence that prevents 40-minute decision delays at 4am.
If something falls outside the on-call's authority, the runbook must name the escalation contact for each timezone.
Common failure modes
Failure: silent acknowledgments. The responder ack's the page and starts working — but never posts in the channel that they're on it. Two timezones later, someone else also responds. Always ack publicly.
Failure: "hero" responders who fix everything alone. The fix gets made; the team learns nothing. Force a 5-minute incident note even for fast resolutions. Otherwise the next shift hits the same fire blind.
Failure: holiday gaps. If a zone's holiday isn't covered, page traffic flows to whoever happens to be online. Plan the coverage swaps a quarter ahead.
What to do tomorrow
Pull last month's pages. Count: how many were genuine page-worthy emergencies? How many were noise? If more than 30% were noise, the next sprint's most valuable work is alert tuning, not features. The on-call rotation will not survive at current noise levels.
Frequently asked questions
How long should an on-call shift be?
8 hours active per day, one week at a time, in a follow-the-sun rotation. Longer shifts in single timezones produce sleep-deprived responders making bad decisions. Shorter rotations within a week produce handoff overhead.
Should on-call be paid extra?
Yes, by some mechanism — extra pay, time off, or both. On-call without compensation creates resentment and accelerates attrition. The cost is much lower than the cost of a senior engineer leaving.
What if a zone is too small to staff a rotation?
Stretch the adjacent zone's hours, but track it. If a zone is staffing more than 12 active on-call hours per day, it is a hiring problem, not a scheduling problem.
Get async handoff insights in your inbox
One email per week. No spam. Unsubscribe anytime.
Ready to eliminate your daily standup?
Distributed teams use StandIn to start every shift with full context — no standup required. Engineers post a 60-second wrap. The next shift wakes up knowing exactly what to work on.