On-Call
Policy
The on-call policy lives in Confluence: Engineering On-Call Policy.
It covers rotation structure, response expectations, handoffs, escalation, and shift swaps.
Datadog
We use Datadog On-Call for scheduling and paging.
Install the Datadog mobile app and enable push + critical alerts before your first shift.
Runbooks
Runbooks are short, alert-specific response procedures. Each Datadog paging monitor should link to a runbook via its runbook field so it shows up in the page payload.
Author runbooks in this module (on_call/pages/runbooks/<service>-<alert>.adoc) using the template.
Playbooks
Playbooks are step-by-step procedures for specific actions taken in response to incidents (e.g. "turn off floats and collections", "fail over the database"). Runbooks should link to playbooks rather than duplicating the steps.
Author playbooks in on_call/pages/playbooks/<action>.adoc using the template.