Runbook: <Datadog alert name or symptom> Datadog Resources Monitor: TODO Dashboard: TODO Logs: TODO APM: TODO Symptoms How this manifests — what the alert says, what dashboards show. Likely Causes Most common Next most common Diagnosis Step-by-step checks with exact queries / commands Mitigation / Resolution Steps to stop the bleeding and resolve. Prefer safe, reversible actions first (feature flags, restarts) before code changes. Escalation When to page secondary / notify #incidents / tag a team lead.