diff --git a/CHANGELOG.md b/CHANGELOG.md index 033d4979..654b550a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -20,6 +20,7 @@ Versioning follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html). - Vault performance dynamic date filter ### Documentation +- Add incident postmortem templates, publication playbook, and CI validation workflow (#769) - Add release notes playbook and changelog curation guidelines (#618) - Add API versioning and deprecation policy with sunset windows, migration guide, and breaking-change classification (#610) diff --git a/README.md b/README.md index ae0a8479..c540d30f 100644 --- a/README.md +++ b/README.md @@ -163,6 +163,16 @@ YieldVault has comprehensive disaster recovery procedures to ensure system resil - [Disaster Recovery Runbooks Overview](./docs/runbooks/README.md) - [Replay and State Recovery Procedures](./docs/runbooks/REPLAY_PROCEDURES.md) +## Incident Postmortems + +YieldVault documents significant incidents with blameless postmortems and tracked action items: + +- **Templates:** [Post-mortem](./docs/runbooks/templates/post-mortem.md), [Incident Report](./docs/runbooks/templates/incident-report.md) +- **Publication workflow:** [Postmortem Playbook](./docs/postmortem-playbook.md) +- **Published reports:** [docs/incidents/](./docs/incidents/README.md) + +Postmortem drafts are due within 48 hours of incident resolution; publication within 5 business days. + ## Roadmap (Phases) - **Phase 1**: Planning, Documentation, and Frontend UI Baseline (Completed) diff --git a/docs/ci/postmortem-docs.workflow.yml b/docs/ci/postmortem-docs.workflow.yml new file mode 100644 index 00000000..02016e2d --- /dev/null +++ b/docs/ci/postmortem-docs.workflow.yml @@ -0,0 +1,26 @@ +# Postmortem Docs CI Workflow + +Install this file at `.github/workflows/postmortem-docs.yml` to enable PR validation +for published postmortem reports. + +```yaml +name: Validate Postmortem Docs + +on: + pull_request: + paths: + - 'docs/incidents/**' + - 'docs/runbooks/templates/**' + - 'docs/postmortem-playbook.md' + - 'scripts/validate-postmortem.sh' + +jobs: + validate: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Validate postmortem structure + run: chmod +x scripts/validate-postmortem.sh && ./scripts/validate-postmortem.sh +``` + +See [ISSUE_769_IMPLEMENTATION_SUMMARY.md](../runbooks/ISSUE_769_IMPLEMENTATION_SUMMARY.md). diff --git a/docs/incident_response_runbook.md b/docs/incident_response_runbook.md index 0b46bf20..ca29c61a 100644 --- a/docs/incident_response_runbook.md +++ b/docs/incident_response_runbook.md @@ -77,7 +77,9 @@ This runbook documents the operational procedures for handling **RPC degradation --- ## 7. Post‑mortem & Continuous Improvement -- Complete the **Post‑mortem Template** (`docs/POSTMORTEM_TEMPLATE.md`). +- Complete the **Post‑mortem Template** ([`docs/runbooks/templates/post-mortem.md`](./runbooks/templates/post-mortem.md)). +- Follow the **Publication Workflow** in [`docs/postmortem-playbook.md`](./postmortem-playbook.md). +- Publish finalized reports to [`docs/incidents/`](./incidents/README.md). - Update runbook if new failure modes were discovered. - Review alert thresholds and adjust if false‑positives occurred. - Schedule a **runbook drill** quarterly. diff --git a/docs/incidents/README.md b/docs/incidents/README.md new file mode 100644 index 00000000..3d29c3b3 --- /dev/null +++ b/docs/incidents/README.md @@ -0,0 +1,26 @@ +# Published Incident Postmortems + +This directory contains finalized, published postmortem reports for YieldVault incidents and significant DR exercises. + +## Index + +| Date | Incident ID | Title | Severity | Postmortem | +|------|-------------|-------|----------|------------| +| — | — | *No published postmortems yet* | — | — | + +## Creating a New Postmortem + +1. Copy [`docs/runbooks/templates/post-mortem.md`](../runbooks/templates/post-mortem.md) +2. Draft in `docs/incidents/drafts/` during review (optional) +3. Follow the publication workflow in [`docs/postmortem-playbook.md`](../postmortem-playbook.md) +4. Publish via PR using filename: `YYYY-MM-DD-INCIDENT-XXX-short-slug.md` +5. Update this index table + +## Related Resources + +- [Postmortem Playbook](../postmortem-playbook.md) +- [Incident Response Runbooks](../runbooks/README.md) +- [Incident Report Template](../runbooks/templates/incident-report.md) + +**Last Updated:** June 26, 2026 +**Maintained By:** DevOps Team diff --git a/docs/incidents/drafts/.gitkeep b/docs/incidents/drafts/.gitkeep new file mode 100644 index 00000000..8b137891 --- /dev/null +++ b/docs/incidents/drafts/.gitkeep @@ -0,0 +1 @@ + diff --git a/docs/postmortem-playbook.md b/docs/postmortem-playbook.md new file mode 100644 index 00000000..26084ef7 --- /dev/null +++ b/docs/postmortem-playbook.md @@ -0,0 +1,138 @@ +# Incident Postmortem Playbook + +This document describes when YieldVault writes postmortems, how action items are +tracked, and the publication workflow for finalized reports. + +--- + +## 1. When to write a postmortem + +Write a postmortem for any of the following: + +| Trigger | Examples | +|---------|----------| +| **Severity 1–2 incidents** | Full outage, data loss risk, contract pause | +| **DR events** | Database restore, RPC failover, backend redeploy under pressure | +| **Security incidents** | Key compromise, unauthorized access, exploit attempt | +| **Contract upgrades with issues** | Failed upgrade, rollback, unexpected state | + +Lower-severity incidents may use a shortened report at the Incident Commander's +discretion, but must still capture root cause and action items. + +--- + +## 2. Timeline + +| Phase | Deadline | Deliverable | +|-------|----------|-------------| +| During incident | Real-time | [Incident Report Template](./runbooks/templates/incident-report.md) | +| Post-incident | Within 48 hours | Postmortem draft | +| Publication | Within 5 business days | Published report in `docs/incidents/` | + +These deadlines align with the [Quick Reference](./runbooks/QUICK_REFERENCE.md) +post-mortem checklist and [Incident Response Runbooks](./runbooks/README.md). + +--- + +## 3. Roles + +| Role | Responsibility | +|------|----------------| +| **Incident Commander** | Owns timeline accuracy and severity classification | +| **Author** | Drafts postmortem from incident report and logs | +| **Reviewer** | DevOps or Security lead validates technical accuracy | +| **Release engineer** | Ensures security-sensitive details follow disclosure rules | + +--- + +## 4. Creation flow + +1. **Start from template** — Copy + [`docs/runbooks/templates/post-mortem.md`](./runbooks/templates/post-mortem.md). +2. **Optional draft location** — Save work-in-progress to + `docs/incidents/drafts/INCIDENT-XXX-slug.md` (not indexed until published). +3. **Gather inputs**: + - Live [incident report](./runbooks/templates/incident-report.md) + - Grafana / PagerDuty timelines + - Backend diagnostics bundle (`/api/diagnostics/bundle`) + - Relevant runbook steps exercised +4. **Complete all sections** — Summary, impact metrics, timeline, root cause, + action items table, lessons learned. + +--- + +## 5. Action-item tracking + +Every postmortem must include an **Action Items** table with: + +| Column | Required | +|--------|----------| +| ID | Yes (`AI-001`, `AI-002`, …) | +| Action | Yes | +| Owner | Yes | +| Priority | Yes (P0/P1/P2) | +| Due Date | Yes | +| Tracking Issue | Yes — link to GitHub issue | +| Status | Yes (Open / In Progress / Done) | + +**Workflow:** + +1. File each action item as a GitHub issue referencing the incident ID. +2. Link the issue number in the postmortem table. +3. Review open action items in the quarterly runbook review + ([runbooks README](./runbooks/README.md) §Continuous Improvement). + +--- + +## 6. Review and redaction + +Before publication: + +- [ ] Incident Commander and Reviewer sign off on timeline and severity +- [ ] Remove credentials, PII, and unreleased vulnerability details +- [ ] For **security incidents**, follow the 48-hour minimum disclosure window + described in [Release Notes Playbook](./release-notes-playbook.md) §8 +- [ ] Confirm customer-facing language is approved if published externally + +--- + +## 7. Publication flow + +1. **Open a PR** adding the finalized report to `docs/incidents/` using the + naming convention: `YYYY-MM-DD-INCIDENT-XXX-short-slug.md` +2. **Set `Status: Published`** in the report header (drafts must not remain in + `docs/incidents/` root) +3. **Update the index** in [`docs/incidents/README.md`](./incidents/README.md) +4. **Link action items** — Ensure every `AI-xxx` row has a merged or open GitHub + issue +5. **Update runbooks** if new failure modes were discovered +6. **Announce** in `#yieldvault-incidents`; update status page if user-facing +7. **Merge PR** after reviewer approval + +CI validates postmortem structure via `scripts/validate-postmortem.sh`. Install the +workflow from [`docs/ci/postmortem-docs.workflow.yml`](./ci/postmortem-docs.workflow.yml) +into `.github/workflows/` to enable automated PR checks. + +--- + +## 8. DR test reports + +Disaster recovery exercises that surface runbook gaps should file a +[DR Test Report](./runbooks/templates/dr-test-report.md). Significant findings +warrant a full postmortem using the same publication flow. + +--- + +## 9. Runbook feedback loop + +After each published postmortem: + +1. Identify runbook sections that were unclear or missing +2. Open a follow-up PR updating the relevant runbook under `docs/runbooks/` +3. Record the change in the postmortem's **Runbook Updates Required** section + +--- + +**Last Updated:** June 26, 2026 +**Maintained By:** DevOps Team +**Issue:** [#769](https://github.com/Junirezz/YieldVault-RWA/issues/769) diff --git a/docs/runbooks/ISSUE_769_IMPLEMENTATION_SUMMARY.md b/docs/runbooks/ISSUE_769_IMPLEMENTATION_SUMMARY.md new file mode 100644 index 00000000..e6c9a2cf --- /dev/null +++ b/docs/runbooks/ISSUE_769_IMPLEMENTATION_SUMMARY.md @@ -0,0 +1,86 @@ +# Issue #769 Implementation Summary: Incident Postmortem Template and Publication Workflow + +**Issue:** General: Add incident postmortem template and publication workflow +**Status:** ✅ COMPLETED +**Date:** June 26, 2026 + +--- + +## Goal + +Create a standard postmortem template with action-item tracking and a publication +workflow so the team can consistently document and learn from incidents. + +--- + +## Scope Delivered + +### 1. Postmortem and Incident Templates ✅ + +**Directory:** [docs/runbooks/templates/](./templates/) + +| File | Purpose | +|------|---------| +| [post-mortem.md](./templates/post-mortem.md) | Blameless postmortem with action-item table and publication checklist | +| [incident-report.md](./templates/incident-report.md) | Live incident log during active response | +| [dr-test-report.md](./templates/dr-test-report.md) | DR exercise report with RTO/RPO tracking | + +Fixes previously broken links in [runbooks README](./README.md) Appendix C. + +### 2. Publication Workflow Playbook ✅ + +**File:** [docs/postmortem-playbook.md](../postmortem-playbook.md) + +- When to write postmortems (severity, DR, security, contract events) +- 48-hour draft / 5-day publication timeline +- Roles, review/redaction, and security disclosure alignment +- PR-based publication flow into `docs/incidents/` +- Action-item → GitHub issue tracking requirements + +### 3. Published Postmortem Archive ✅ + +**File:** [docs/incidents/README.md](../incidents/README.md) + +- Index table for published reports +- Naming convention: `YYYY-MM-DD-INCIDENT-XXX-slug.md` +- Optional drafts under `docs/incidents/drafts/` + +### 4. Automation ✅ + +| File | Purpose | +|------|---------| +| [scripts/new-postmortem.sh](../../scripts/new-postmortem.sh) | Scaffold draft from template | +| [scripts/validate-postmortem.sh](../../scripts/validate-postmortem.sh) | CI validation for published reports | +| [docs/ci/postmortem-docs.workflow.yml](../ci/postmortem-docs.workflow.yml) | Workflow definition for maintainers to install under `.github/workflows/` | + +### 5. Cross-Link Updates ✅ + +- [docs/incident_response_runbook.md](../incident_response_runbook.md) — fixed broken template link +- [docs/runbooks/README.md](./README.md) — quick links to playbook and incidents index +- [docs/runbooks/QUICK_REFERENCE.md](./QUICK_REFERENCE.md) — postmortem step links +- [README.md](../../README.md) — incident postmortems section +- [CHANGELOG.md](../../CHANGELOG.md) — unreleased documentation entry + +--- + +## Acceptance Checklist + +- [x] Standard postmortem template with action-item tracking +- [x] Incident report template for live incidents +- [x] DR test report template (unblocks broken README link) +- [x] Publication workflow playbook +- [x] Published postmortem archive index +- [x] Scaffold and validation scripts +- [x] CI workflow for postmortem doc validation +- [x] Broken documentation links fixed + +--- + +## Related Files + +- Issue: [#769](https://github.com/Junirezz/YieldVault-RWA/issues/769) +- Pattern reference: [ISSUE_392_IMPLEMENTATION_SUMMARY.md](./ISSUE_392_IMPLEMENTATION_SUMMARY.md) +- Release disclosure pattern: [release-notes-playbook.md](../release-notes-playbook.md) §8 + +**Last Updated:** June 26, 2026 +**Maintained By:** DevOps Team diff --git a/docs/runbooks/QUICK_REFERENCE.md b/docs/runbooks/QUICK_REFERENCE.md index 2d661d6e..ea082e65 100644 --- a/docs/runbooks/QUICK_REFERENCE.md +++ b/docs/runbooks/QUICK_REFERENCE.md @@ -146,8 +146,8 @@ All runbooks: `docs/runbooks/` 3. **Notify** - Alert team via PagerDuty/Slack 4. **Respond** - Follow appropriate runbook 5. **Verify** - Confirm system restored -6. **Document** - Create incident report -7. **Review** - Post-mortem within 48 hours +6. **Document** - Create [incident report](./templates/incident-report.md) +7. **Review** - [Post-mortem](./templates/post-mortem.md) within 48 hours per [playbook](../postmortem-playbook.md) --- diff --git a/docs/runbooks/README.md b/docs/runbooks/README.md index d6027f3d..dc3ed578 100644 --- a/docs/runbooks/README.md +++ b/docs/runbooks/README.md @@ -15,6 +15,8 @@ This directory contains operational runbooks for disaster recovery and incident | [RPC Failover](./RPC_FAILOVER.md) | 5 min | N/A | Stellar RPC node failure | | [Full DR Procedure](./FULL_DR_PROCEDURE.md) | 4 hours | 15 min | Complete infrastructure failure | | [Replay & State Recovery](./REPLAY_PROCEDURES.md) | N/A | N/A | Recovering/syncing ledger events or email queue | +| [Postmortem Playbook](../postmortem-playbook.md) | N/A | N/A | Publishing incident postmortems | +| [Published Postmortems](../incidents/README.md) | N/A | N/A | Archive of finalized incident reports | --- diff --git a/docs/runbooks/templates/dr-test-report.md b/docs/runbooks/templates/dr-test-report.md new file mode 100644 index 00000000..3f1f4a83 --- /dev/null +++ b/docs/runbooks/templates/dr-test-report.md @@ -0,0 +1,57 @@ +# DR Test Report: [TEST-ID] — [Scenario Name] + +**Test ID:** DR-TEST-___ +**Date:** YYYY-MM-DD +**Participants:** [Names] +**Runbook Exercised:** [link] +**Facilitator:** [Name] +**Last Updated:** YYYY-MM-DD + +--- + +## Objectives + +- [Objective 1] + +## Targets vs Actuals + +| Metric | Target | Actual | Pass/Fail | +|--------|--------|--------|-----------| +| RTO | | | | +| RPO | | | | +| Total test duration | | | | + +## Test Steps + +| Step | Description | Expected | Actual | Pass/Fail | Notes | +|------|-------------|----------|--------|-----------|-------| +| 1 | | | | | | + +## Issues Encountered + +- [Issue description] + +## What Went Well + +- [Item] + +## What Could Be Improved + +- [Item] + +## Action Items + +| ID | Action | Owner | Priority | Due Date | Tracking Issue | Status | +|----|--------|-------|----------|----------|----------------|--------| +| AI-001 | | | P1 | YYYY-MM-DD | #___ | Open | + +## Sign-off + +| Role | Name | Date | +|------|------|------| +| Test lead | | | +| Incident Commander | | | + +--- + +*File completed reports in `docs/incidents/` when the test surfaces production-impacting findings. See the [Postmortem Playbook](../../postmortem-playbook.md).* diff --git a/docs/runbooks/templates/incident-report.md b/docs/runbooks/templates/incident-report.md new file mode 100644 index 00000000..f7711177 --- /dev/null +++ b/docs/runbooks/templates/incident-report.md @@ -0,0 +1,57 @@ +# Incident Report: [INCIDENT-ID] — [Brief Title] + +**Incident ID:** INCIDENT-___ +**Date Opened:** YYYY-MM-DD HH:MM UTC +**Severity:** [Critical / High / Medium / Low] +**Status:** [Investigating / Mitigating / Monitoring / Resolved] +**Incident Commander:** [Name] +**War Room Channel:** #yieldvault-war-room +**Last Updated:** YYYY-MM-DD HH:MM UTC + +--- + +## Affected Components + +- [ ] Backend API +- [ ] Frontend +- [ ] Database +- [ ] RPC / Soroban nodes +- [ ] Smart contracts +- [ ] Other: ___ + +## Runbook Used + +- [Runbook link or "N/A"] + +## Current Status + +[One-paragraph summary of current state and ETA] + +## Live Timeline (append during incident) + +| Time (UTC) | Actor | Event | +|------------|-------|-------| +| HH:MM | | Incident detected | +| HH:MM | | | + +## Diagnostics Collected + +- [ ] Incident ticket created +- [ ] Backend diagnostics bundle retrieved (`/api/diagnostics/bundle`) +- [ ] RPC / node logs captured +- [ ] Grafana dashboards linked + +## Communication Log + +| Time (UTC) | Channel | Message summary | +|------------|---------|-----------------| +| HH:MM | #yieldvault-incidents | | + +## Next Steps + +1. [Immediate action] +2. [Follow-up] + +--- + +*When the incident is resolved, complete the [Post-Mortem Template](./post-mortem.md) within 48 hours per the [Postmortem Playbook](../../postmortem-playbook.md).* diff --git a/docs/runbooks/templates/post-mortem.md b/docs/runbooks/templates/post-mortem.md new file mode 100644 index 00000000..a524c1a3 --- /dev/null +++ b/docs/runbooks/templates/post-mortem.md @@ -0,0 +1,77 @@ +# Post-Mortem: [INCIDENT-ID] — [Brief Title] + +**Incident ID:** INCIDENT-___ +**Date:** YYYY-MM-DD +**Severity:** [Critical / High / Medium / Low] +**Status:** [Draft / Published] +**Authors:** [Names] +**Reviewers:** [Names] +**Last Updated:** YYYY-MM-DD +**Related Runbook:** [link] + +--- + +## Summary + +[2–3 sentences: what happened, user impact, resolution] + +## Impact + +| Metric | Value | +|--------|-------| +| Detection time (MTTD) | | +| Response time | | +| Recovery time (MTTR) | | +| Total downtime | | +| Data loss (RPO) | | +| Affected components | | +| Affected users | | + +## Timeline (UTC) + +| Time | Event | +|------|-------| +| HH:MM | Incident detected | +| HH:MM | Team assembled | +| HH:MM | Mitigation started | +| HH:MM | Service restored | +| HH:MM | Monitoring confirmed stable | + +## Root Cause + +[Technical root cause — blameless] + +## Contributing Factors + +- [Factor 1] + +## What Went Well + +- [Item] + +## What Could Be Improved + +- [Item] + +## Action Items + +| ID | Action | Owner | Priority | Due Date | Tracking Issue | Status | +|----|--------|-------|----------|----------|----------------|--------| +| AI-001 | | | P0/P1/P2 | YYYY-MM-DD | #___ | Open | + +## Runbook Updates Required + +- [ ] [Runbook name] — [what to change] + +## Lessons Learned + +[Blameless takeaways] + +## Publication Checklist + +- [ ] Internal review complete +- [ ] Sensitive details redacted (if customer-facing) +- [ ] Action items filed as GitHub issues +- [ ] Runbooks updated (if applicable) +- [ ] Added to `docs/incidents/` index +- [ ] Stakeholders notified (#yieldvault-incidents) diff --git a/scripts/new-postmortem.sh b/scripts/new-postmortem.sh new file mode 100644 index 00000000..3a775520 --- /dev/null +++ b/scripts/new-postmortem.sh @@ -0,0 +1,32 @@ +#!/usr/bin/env bash +# Scaffold a new postmortem draft from the standard template. +set -euo pipefail + +if [[ $# -lt 2 ]]; then + echo "Usage: $0 INCIDENT-123 short-slug" + echo "Example: $0 INCIDENT-123 rpc-failover" + exit 1 +fi + +INCIDENT_ID="$1" +SLUG="$2" +DATE="$(date -u +%Y-%m-%d)" +REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +DRAFTS_DIR="${REPO_ROOT}/docs/incidents/drafts" +TEMPLATE="${REPO_ROOT}/docs/runbooks/templates/post-mortem.md" +OUTPUT="${DRAFTS_DIR}/${DATE}-${INCIDENT_ID}-${SLUG}.md" + +mkdir -p "$DRAFTS_DIR" + +if [[ ! -f "$TEMPLATE" ]]; then + echo "ERROR: template not found at ${TEMPLATE}" + exit 1 +fi + +cp "$TEMPLATE" "$OUTPUT" +sed -i "s/INCIDENT-___/${INCIDENT_ID}/" "$OUTPUT" 2>/dev/null || \ + sed -i '' "s/INCIDENT-___/${INCIDENT_ID}/" "$OUTPUT" +sed -i "s/YYYY-MM-DD/${DATE}/" "$OUTPUT" 2>/dev/null || \ + sed -i '' "s/YYYY-MM-DD/${DATE}/" "$OUTPUT" + +echo "Created draft: ${OUTPUT}" diff --git a/scripts/validate-postmortem.sh b/scripts/validate-postmortem.sh new file mode 100644 index 00000000..df1b4039 --- /dev/null +++ b/scripts/validate-postmortem.sh @@ -0,0 +1,84 @@ +#!/usr/bin/env bash +# Validate postmortem markdown structure for published reports in docs/incidents/. +set -euo pipefail + +REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +INCIDENTS_DIR="${REPO_ROOT}/docs/incidents" + +REQUIRED_HEADINGS=( + "## Summary" + "## Impact" + "## Timeline" + "## Root Cause" + "## Action Items" + "## Lessons Learned" +) + +errors=0 + +validate_published_report() { + local file="$1" + local basename + basename="$(basename "$file")" + + if [[ "$basename" == "README.md" ]]; then + return 0 + fi + + if [[ "$basename" == .gitkeep ]]; then + return 0 + fi + + if [[ "$file" == *"/drafts/"* ]]; then + return 0 + fi + + if [[ ! "$basename" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}-INCIDENT-.+\.md$ ]]; then + echo "ERROR: ${file}: filename must match YYYY-MM-DD-INCIDENT-*.md" + errors=$((errors + 1)) + fi + + for heading in "${REQUIRED_HEADINGS[@]}"; do + if ! grep -qF "$heading" "$file"; then + echo "ERROR: ${file}: missing required heading ${heading}" + errors=$((errors + 1)) + fi + done + + if grep -qE '^\*\*Status:\*\*.*Draft' "$file"; then + echo "ERROR: ${file}: published reports must not have Status: Draft" + errors=$((errors + 1)) + fi + + if ! grep -qE '^\| ID \| Action \| Owner \|' "$file"; then + echo "ERROR: ${file}: action items table must include ID, Action, Owner columns" + errors=$((errors + 1)) + fi +} + +# Validate templates exist +for template in post-mortem.md incident-report.md dr-test-report.md; do + if [[ ! -f "${REPO_ROOT}/docs/runbooks/templates/${template}" ]]; then + echo "ERROR: missing template docs/runbooks/templates/${template}" + errors=$((errors + 1)) + fi +done + +if [[ ! -f "${REPO_ROOT}/docs/postmortem-playbook.md" ]]; then + echo "ERROR: missing docs/postmortem-playbook.md" + errors=$((errors + 1)) +fi + +# Validate published incident reports (if any) +if [[ -d "$INCIDENTS_DIR" ]]; then + while IFS= read -r -d '' file; do + validate_published_report "$file" + done < <(find "$INCIDENTS_DIR" -maxdepth 1 -name '*.md' -print0 2>/dev/null || true) +fi + +if [[ "$errors" -gt 0 ]]; then + echo "Postmortem validation failed with ${errors} error(s)." + exit 1 +fi + +echo "Postmortem validation passed."