From b841db233c03b1b0a0e47e6369f77f51d37fe047 Mon Sep 17 00:00:00 2001
From: johnlakeee <johnlakee@google.com>
Date: Tue, 2 Jun 2026 14:43:36 -0400
Subject: [PATCH] Create incident_response_runbook_template.md

incident_response_runbook_template.md
---
 .../incident_response_runbook_template.md     | 117 ++++++++++++++++++
 1 file changed, 117 insertions(+)
 create mode 100644 docs/runbooks/incident_response_runbook_template.md

diff --git a/docs/runbooks/incident_response_runbook_template.md b/docs/runbooks/incident_response_runbook_template.md
new file mode 100644
index 000000000..5de375c17
--- /dev/null
+++ b/docs/runbooks/incident_response_runbook_template.md
@@ -0,0 +1,117 @@
+# Incident Response Runbook: [Threat Scenario Title]
+
+## Document Control
+| Attribute | Detail |
+| :--- | :--- |
+| **Runbook ID** | IR-[XYZ]-00[X] |
+| **Last Updated** | YYYY-MM-DD |
+| **Owner** | [Team Name / Role] |
+| **Target SLA** | Triage: [X]m \| Containment: [X]m \| Resolution: [X]h |
+
+## 1. Objective
+[Provide a brief, clear statement of the runbook's objective. What specific threat, vulnerability, or incident type does this address, and what is the ultimate goal of the response?]
+
+## 2. Target Audience & Prerequisites
+**Audience:**
+* [e.g., Security Operations Center (SOC) Analysts (L1/L2/L3)]
+* [e.g., Incident Responders]
+* [e.g., Cloud Infrastructure / Network Security Engineers]
+
+**Prerequisites for Responders:**
+* [List specific IAM Roles required to execute this runbook, e.g., `roles/logging.viewer`]
+* [List specific tool or console access required, e.g., Access Context Manager, Palo Alto Panorama, Kubernetes RBAC access]
+* [Specify any specialized procedures, such as Break-Glass / Emergency Elevation protocols]
+
+## 3. Scope
+[Define the exact scope of this runbook. Specify which environments (e.g., Dev, Staging, FedRAMP High/IL5 Prod landing zones), cloud providers, systems, or architectural components it applies to.]
+
+---
+
+## Phase 1: Identification & Scoping
+
+### 1.1 Detection Sources
+[List the primary tools, telemetry, alerts, and logs that indicate this specific type of incident has occurred.]
+* **[Source/Tool Name, e.g., Security Command Center]**: [Description of the specific finding class, alert ID, or rule name].
+* **[Source/Tool Name, e.g., Cloud Logging]**: [Description of the indicator, specific log sub-type, or automated alert metric].
+* **[Source/Tool Name, e.g., Third-Party EDR/SIEM]**: [Description of indicator].
+
+### 1.2 Initial Assessment & Log Extraction
+[Describe the immediate steps to perform an initial validation and capture the core attributes of the alert.]
+1. **Locate the Primary Log Event:** [Instructions on how to navigate to the source log or alert. Provide a template Log Explorer query if applicable.]
+    * **Log Explorer Query Template:**
+      ```text
+      [Insert reusable log query template here]
+      ```
+2. **Extract Key Details:** Identify and document the following attributes from the raw event payload:
+    * `principalEmail` / `identity`: [Who or what initiated the action]
+    * `callerIp`: [The originating IP address, ASN, or geographic location]
+    * `targetResource`: [The specific resource, asset, database, or bucket targeted]
+    * `methodName` / `action`: [The exact API call or action executed]
+3. **Determine Preliminary Severity:**
+    * **SEV 1 (Critical):** [Define conditions for maximum escalation, e.g., Production impact, data exfiltration, highly privileged account compromise].
+    * **SEV 2 (High):** [Define conditions for high escalation, e.g., Non-prod impact, isolated resource compromise without data loss].
+    * **SEV 3 (Medium/Low):** [Define conditions for standard tracking, e.g., Operational drift, low-risk misconfiguration, confirmed blocked attempt].
+
+---
+
+## Phase 2: Triage and Analysis
+
+**Goal:** Thoroughly verify the extent, impact, and authenticity of the incident to distinguish a true attack from operational misconfiguration.
+
+### 2.1 Attack vs. Misconfiguration Analysis
+[Detailed instructions for correlating historical events, recent change management, or operational context.]
+1. **Check Change Management & IaC Pipelines:** [Instructions to verify if recent automated deployments or approved manual "break-glass" changes caused the alert.]
+2. **Identity & Behavior Correlation:** [Instructions for evaluating if the observed behavior is normal or anomalous for the identity/resource involved (e.g., comparing historical IP ranges, times of operation).]
+
+### 2.2 Impact & Blast Radius Assessment
+[Steps to determine how far the threat actor has penetrated the architecture.]
+1. **Determine Directionality & Scope:** [Instructions to determine if the threat is inbound, outbound (data exfiltration/C2), or lateral (moving between project perimeters).]
+2. **Identify Sensitive Dependencies:** [How to quickly determine if the compromised resource has access to highly sensitive data, cryptographic keys, secrets, or adjacent cloud infrastructure.]
+
+---
+
+## Phase 3: Containment
+
+**Goal:** Stop active data exfiltration, eliminate lateral movement, and sever threat actor access while preserving evidence.
+
+### 3.1 Immediate Containment Actions
+[Provide explicit step-by-step technical instructions or CLI commands to isolate the threat.]
+1. **Isolate the Identity:** [Instructions or commands to revoke active sessions, suspend user accounts, or disable service accounts.]
+    ```bash
+    [Insert emergency CLI command template, e.g., gcloud iam service-accounts disable ...]
+    ```
+2. **Isolate the Infrastructure/Network:** [Instructions or commands to isolate network traffic, apply emergency quarantine firewall rules, or add restrictive network tags.]
+    ```bash
+    [Insert emergency CLI command template, e.g., gcloud compute instances add-tags ...]
+    ```
+3. **Perimeter Controls:** [Instructions for leveraging VPC Service Controls or Cloud Armor to enforce hard borders around the incident zone.]
+
+---
+
+## Phase 4: Eradication and Recovery
+
+**Goal:** Completely eliminate the threat actor's presence, patch the root vulnerability, and securely restore resources to a known good state.
+
+### 4.1 Eradication & Forensic Capture
+1. **Capture Forensic Evidence:** [Instructions for preserving volatile memory, snapshotting persistent disks, or saving container logs before destruction.]
+    ```bash
+    [Insert snapshot or forensic export command template]
+    ```
+2. **Eliminate Persistence Mechanisms:** [Steps to audit and remove backdoors, such as unauthorized IAM policy grants (`SetIamPolicy`), newly created service accounts, rogue SSH keys, or rogue API keys.]
+3. **Remediate the Root Vulnerability:** [Instructions for identifying and patching the entry point (e.g., updating software dependencies, closing open firewall ports).]
+
+### 4.2 Recovery & IaC Alignment
+1. **Reconcile Infrastructure as Code (IaC State):** [Instructions for verifying the cloud state against GitOps/Terraform configurations. Explain how to securely run plans/applies from a clean CI/CD pipeline to wipe away manual attacker modifications.]
+2. **Restore Access & Verification:** [Steps to re-enable legitimate access securely (e.g., enforcing new passwords, resetting MFA keys, rotating service account keys).]
+3. **Hyper-Care Monitoring:** [Identify the exact log metrics, dashboards, or security rules to monitor heavily for the next 72 hours to ensure the threat actor does not return.]
+
+---
+
+## Phase 5: Lessons Learned
+
+### 5.1 Post-Incident Review (PIR)
+[Questions and action items to address during the post-mortem with cross-functional teams.]
+1. **Timeline Reconstruction:** Document exact timestamps for Detection, Triage, Containment, Eradication, and Recovery.
+2. **Detection Optimization:** How was the incident detected? Could detection thresholds or logging configurations be improved to catch it earlier?
+3. **Response Optimization:** Which parts of the containment and eradication process were bottlenecked? How can the automation of this runbook be improved?
+4. **Architecture Architecture & Hardening:** What long-term preventative controls (e.g., Organization Policies, Service Control Policies, stricter network architecture) should be codified in the base framework to completely eliminate this attack vector?