Sample Deliverable

Incident Triage Report (Public Example)

This is the report format clients receive during active coverage. Scenario below is anonymized but technically realistic.

Generated format for the $39/month Monthly Incident Guard pilot. Payment link is active.

1. Incident Snapshot

Window: 2026-02-18 03:22-04:11 UTC
Primary symptom: 504 spikes on POST /api/bounties
Impact: 31% request failure during deploy window
Boundary signal: proxy timed out before upstream app response

2. Ranked Hypotheses

Upstream app workers saturated after migration lock (highest likelihood).
Connection pool exhausted because stale transactions were not released.
Ingress timeout lower than app timeout, causing premature 504 at proxy.

3. Verification Commands

Run in this exact order and capture UTC timestamps with each output:

# Proxy 5xx concentration by minute
awk '$9 ~ /^50[24]$/ {print substr($4,2,17)}' access.log | sort | uniq -c | tail -n 20

# App worker and event loop pressure
curl -sS http://127.0.0.1:3000/health
curl -sS http://127.0.0.1:3000/metrics | grep -E 'event_loop|active_requests|db_pool'

# Database lock and wait inspection
psql "$DATABASE_URL" -c "select now(), wait_event_type, wait_event, state, query from pg_stat_activity where state <> 'idle' order by query_start asc limit 20;"

4. Safe Fix Sequence

Freeze deploys and autoscaling changes for 30 minutes.
Raise proxy read timeout from 30s to 75s only for affected route.
Restart one app replica at a time; confirm error-rate drop before next restart.
Kill long-running DB sessions older than 120s tied to failed migration.
Re-run synthetic checks and compare p95 latency versus pre-incident baseline.

5. Rollback Checkpoints

If 5xx rate does not drop below 3% in 10 minutes, revert to previous app image.
If DB wait events remain above baseline for 15 minutes, rollback latest schema change.
If error budget burn exceeds 5%/hour, disable non-critical background workers.

6. 7-Day Hardening Plan

Add route-level timeout budget matrix (proxy/app/db) to release checklist.
Ship health endpoint contract test in CI for dependency readiness checks.
Add deploy canary gate: block full rollout when canary 5xx exceeds threshold.
Publish weekly reliability report with risk rank, owner, and due date per fix.

Need this report format for your active outage? Start at the outage fast path.