Skip to content

Commit f9fae95

Browse files
Tags as primitive, alert enrichment, budget enforcement, RDS module (#77)
## Summary Foundation changes to make tags a first-class primitive for ABAC, cost attribution, and auditability. Plus alert enrichment, budget enforcement, and RDS support. ### Tags as primitive - **Tag schema**: 5 static tags (team, service, repo, environment, managed-by) via provider `default_tags`. Dropped `project` tag (always "javabin", zero information). - **Resource tagger Lambda**: EventBridge-triggered (wildcard `{"prefix": "aws."}` match), auto-tags `created-by` + `commit` from CloudTrail session names. Tags added via AWS API — invisible to Terraform, no drift. - **Cost allocation tags**: Activated for all 7 tag keys so Cost Explorer can group by team/service. - **ECS tag propagation**: `propagate_tags = SERVICE` so Fargate task costs are attributed to teams. ### Alert enrichment (Task A) - CI session names changed to `{actor}-{sha8}-{run_id}` in all 4 workflows. - `slack_alert` `parse_identity()` extracts actor/commit from new format. - Cost reports (daily + weekly) include per-team tag breakdown. ### Budget enforcement (Task D) - New `budget-enforcer` Lambda: scales ECS services to `desired_count=0` at 200% budget. - `team_provisioner` adds 200% notification alongside existing 80%. ### RDS module (Task E) - New `service-rds` module: PostgreSQL with Secrets Manager password, private subnets, ECS SG ingress. - `registry.py` + `expand-modules.py` updated with engine-based routing (`postgres` vs `dynamodb`). ### IAM restructure - Team deny policy: ABAC (ARN-scoped) where AWS supports tags (SNS, S3, ECS, ELB). Explicit denies only where AWS lacks tag conditions (EC2 VPC, GuardDuty, SecurityHub, Config, CloudTrail, Organizations, IAM). - `service-role` module: configurable `trusted_services` (ECS/EC2/Lambda). - EventBridge resource-tagger uses wildcard; monitoring rules keep curated lists (documented volume rationale). ## Test plan - [ ] `terraform plan` shows tag migration (project removed, service+repo added) on existing resources - [ ] After apply: resources show 5 Terraform-managed tags in console - [ ] Trigger CI run → Slack alert shows actor name + commit link - [ ] Next day: Cost Explorer GroupBy `team` returns per-team costs - [ ] Create test resource → resource-tagger tags it within 15 min - [ ] Invoke budget-enforcer with test payload → ECS service scales to 0 - [ ] Test app with `engine: postgres` in app.yaml → RDS created in private subnet
1 parent 3a7d504 commit f9fae95

31 files changed

Lines changed: 1355 additions & 87 deletions

File tree

.github/workflows/docker-build.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,9 @@ jobs:
3939
image_uri: ${{ steps.push.outputs.image_uri }}
4040
image_tag: ${{ steps.tags.outputs.primary_tag }}
4141
steps:
42+
- name: Set session name
43+
run: echo "SESSION_NAME=$(echo "${GITHUB_ACTOR}-${GITHUB_SHA:0:8}-${GITHUB_RUN_ID}" | head -c 64)" >> "$GITHUB_ENV"
44+
4245
- uses: actions/checkout@v6
4346

4447
- name: Generate GitHub App token
@@ -62,6 +65,7 @@ jobs:
6265
with:
6366
role-to-assume: arn:aws:iam::${{ inputs.aws_account_id }}:role/javabin-ci-app-broker
6467
aws-region: ${{ inputs.aws_region }}
68+
role-session-name: ${{ env.SESSION_NAME }}
6569

6670
- name: Get deploy credentials from broker
6771
id: broker

.github/workflows/ecs-deploy.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,9 @@ jobs:
3333
name: ECS Deploy
3434
runs-on: ubuntu-latest
3535
steps:
36+
- name: Set session name
37+
run: echo "SESSION_NAME=$(echo "${GITHUB_ACTOR}-${GITHUB_SHA:0:8}-${GITHUB_RUN_ID}" | head -c 64)" >> "$GITHUB_ENV"
38+
3639
- name: Generate GitHub App token
3740
id: app-token
3841
uses: actions/create-github-app-token@v2
@@ -55,6 +58,7 @@ jobs:
5558
with:
5659
role-to-assume: arn:aws:iam::${{ inputs.aws_account_id }}:role/javabin-ci-app-broker
5760
aws-region: ${{ inputs.aws_region }}
61+
role-session-name: ${{ env.SESSION_NAME }}
5862

5963
- name: Get deploy credentials from broker
6064
id: broker

.github/workflows/platform-ci.yml

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,9 @@ jobs:
4444
plan_sha256: ${{ steps.upload.outputs.plan_sha256 }}
4545
risk_level: ${{ steps.review.outputs.risk_level }}
4646
steps:
47+
- name: Set session name
48+
run: echo "SESSION_NAME=$(echo "${GITHUB_ACTOR}-${GITHUB_SHA:0:8}-${GITHUB_RUN_ID}" | head -c 64)" >> "$GITHUB_ENV"
49+
4750
- uses: actions/checkout@v6
4851
with:
4952
fetch-depth: 0
@@ -71,7 +74,7 @@ jobs:
7174
with:
7275
role-to-assume: arn:aws:iam::${{ env.AWS_ACCOUNT_ID }}:role/javabin-ci-infra-plan
7376
aws-region: ${{ env.AWS_REGION }}
74-
role-session-name: javabin-platform-plan-${{ github.run_id }}
77+
role-session-name: ${{ env.SESSION_NAME }}
7578

7679
- name: Sync registered teams from GitHub org
7780
if: steps.changes.outputs.has_infra_changes == 'true' && github.ref == 'refs/heads/main'
@@ -146,6 +149,9 @@ jobs:
146149
needs.plan.outputs.has_changes == 'true'
147150
environment: production
148151
steps:
152+
- name: Set session name
153+
run: echo "SESSION_NAME=$(echo "${GITHUB_ACTOR}-${GITHUB_SHA:0:8}-${GITHUB_RUN_ID}" | head -c 64)" >> "$GITHUB_ENV"
154+
149155
- uses: actions/checkout@v6
150156

151157
- uses: hashicorp/setup-terraform@v4
@@ -157,7 +163,7 @@ jobs:
157163
with:
158164
role-to-assume: arn:aws:iam::${{ env.AWS_ACCOUNT_ID }}:role/javabin-ci-infra
159165
aws-region: ${{ env.AWS_REGION }}
160-
role-session-name: javabin-apply-${{ github.run_id }}
166+
role-session-name: ${{ env.SESSION_NAME }}
161167

162168
- name: Check risk level
163169
env:
@@ -194,6 +200,9 @@ jobs:
194200
runs-on: ubuntu-latest
195201
if: github.event_name == 'schedule'
196202
steps:
203+
- name: Set session name
204+
run: echo "SESSION_NAME=$(echo "${GITHUB_ACTOR}-${GITHUB_SHA:0:8}-${GITHUB_RUN_ID}" | head -c 64)" >> "$GITHUB_ENV"
205+
197206
- uses: actions/checkout@v6
198207

199208
- uses: hashicorp/setup-terraform@v4
@@ -205,7 +214,7 @@ jobs:
205214
with:
206215
role-to-assume: arn:aws:iam::${{ env.AWS_ACCOUNT_ID }}:role/javabin-ci-infra
207216
aws-region: ${{ env.AWS_REGION }}
208-
role-session-name: javabin-drift-${{ github.run_id }}
217+
role-session-name: ${{ env.SESSION_NAME }}
209218

210219
- name: Terraform Init
211220
working-directory: ${{ env.TF_ROOT }}

.github/workflows/tf-plan.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,9 @@ jobs:
4646
env:
4747
PLAN_BUCKET: javabin-ci-plan-artifacts-${{ inputs.aws_account_id }}
4848
steps:
49+
- name: Set session name
50+
run: echo "SESSION_NAME=$(echo "${GITHUB_ACTOR}-${GITHUB_SHA:0:8}-${GITHUB_RUN_ID}" | head -c 64)" >> "$GITHUB_ENV"
51+
4952
- uses: actions/checkout@v6
5053
with:
5154
ref: ${{ github.ref }}
@@ -60,6 +63,7 @@ jobs:
6063
with:
6164
role-to-assume: arn:aws:iam::${{ inputs.aws_account_id }}:role/javabin-ci-app-broker
6265
aws-region: ${{ inputs.aws_region }}
66+
role-session-name: ${{ env.SESSION_NAME }}
6367

6468
- name: Get team credentials from broker
6569
id: broker

CLAUDE.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ terraform/platform/
116116
iam/ GitHub OIDC, CI roles, permission boundary
117117
compute/ ECS cluster, ECR base config
118118
monitoring/ SNS, EventBridge, Config, GuardDuty, Security Hub
119-
lambdas/ slack-alert, cost-report, daily-cost-check, compliance-reporter, override-cleanup, team-provisioner, apply-gate, securityhub-summary, password-set
119+
lambdas/ slack-alert, cost-report, daily-cost-check, compliance-reporter, resource-tagger, budget-enforcer, override-cleanup, team-provisioner, apply-gate, securityhub-summary, password-set, ci-broker
120120
identity/ Cognito user pools (internal + external). Identity Center is in terraform/org/
121121
```
122122

@@ -186,6 +186,9 @@ terraform/state/
186186
| `team-provisioner` | Syncs Google Groups, GitHub teams, AWS Budgets from registry team YAML |
187187
| `securityhub-summary` | Weekly Security Hub findings summary (Monday 08:00 UTC) |
188188
| `password-set` | Self-service password-set for new hero accounts (Function URL) |
189+
| `budget-enforcer` | Scales ECS services to zero when team exceeds 200% budget |
190+
| `resource-tagger` | EventBridge-triggered, auto-tags created-by + commit on new resources |
191+
| `ci-broker` | Validates team membership, vends short-lived team role credentials |
189192

190193
### Scripts
191194
| Script | What |
@@ -228,6 +231,8 @@ Scheduled:
228231
EventBridge (Create/Run) ──► compliance-reporter (report to Slack, no auto-fix)
229232
Hourly ──► override-cleanup (delete stale SSM override tokens)
230233
Registry merge ──► team-provisioner (Google/GitHub/Budget/Cognito/Identity Center sync + hero provisioning)
234+
AWS Budgets (200%) ──► budget-enforcer Lambda ──► ECS scale-to-zero + #javabin-cost-alerts
235+
EventBridge (Create/Run) ──► resource-tagger Lambda ──► Tag created-by + commit
231236
```
232237

233238
## SSM Parameters
@@ -274,7 +279,7 @@ The SA JSON key is at `/javabin/platform/google-admin-sa`, the impersonation tar
274279
| 2c | IAM / OIDC | **Deployed** — 6 CI roles (infra, infra-plan, per-app, deploy, override-approver, registry) |
275280
| 2d | Compute | **Deployed** — ECS cluster + ECR repos |
276281
| 2e | Monitoring | **Deployed** — GuardDuty, Security Hub, Config, SNS |
277-
| 2f | Lambda Functions | **Deployed**8 functions (Google/GitHub/Budget/Cognito/Identity Center sync live) |
282+
| 2f | Lambda Functions | **Deployed**11 functions (budget-enforcer, resource-tagger, ci-broker added; Google/GitHub/Budget/Cognito/Identity Center sync live) |
278283
| 2g | Platform CI | **Done** — plan → LLM review → apply pipeline working |
279284
| 3a | Reusable Terraform Modules | **Code done** — 12 modules in repo |
280285
| 3b | GitHub Actions Workflows | **Code done** — 14 reusable workflows |

docs/app-yaml-reference.md

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -116,17 +116,31 @@ resources:
116116

117117
#### databases
118118

119-
DynamoDB tables.
119+
DynamoDB tables (default) or RDS PostgreSQL instances.
120120

121121
```yaml
122122
resources:
123123
databases:
124124
- name: sessions
125-
hash_key: id # required
126-
range_key: timestamp # optional
125+
hash_key: id # required (DynamoDB)
126+
range_key: timestamp # optional (DynamoDB)
127127
env: SESSIONS_TABLE
128+
129+
- name: main
130+
engine: postgres # "dynamodb" (default) or "postgres"/"postgresql"
131+
instance_class: db.t3.micro # RDS only, default: db.t3.micro
132+
allocated_storage: 20 # GB, RDS only, default: 20
133+
engine_version: "16" # PostgreSQL version, RDS only, default: "16"
134+
backup_retention_period: 7 # days, RDS only, default: 7
135+
multi_az: false # RDS only, default: false
136+
deletion_protection: true # RDS only, default: true
137+
env: DATABASE_URL
128138
```
129139

140+
DynamoDB and PostgreSQL entries can coexist in the same `databases` list. Entries without `engine` (or with `engine: dynamodb`) use the DynamoDB module. Entries with `engine: postgres` or `engine: postgresql` use the RDS module.
141+
142+
RDS instances use `manage_master_user_password = true`, which stores the auto-generated master password in Secrets Manager. The ECS task role automatically receives IAM policies for `rds-db:connect` and `secretsmanager:GetSecretValue` on the password secret.
143+
130144
#### secrets
131145

132146
Secrets Manager secrets. Value is set manually after creation.
@@ -354,6 +368,7 @@ Generated files have a `# GENERATED FROM app.yaml` marker. The script only overw
354368
| S3 bucket | `javabin-{bucket_name}-{account_id}` |
355369
| DynamoDB table | `javabin-{table_name}` |
356370
| SQS queue | `javabin-{queue_name}` |
371+
| RDS instance | `{db_name}` (identifier) |
357372
| Secrets Manager | `javabin/{secret_name}` |
358373
| IAM task role | `javabin-{name}` |
359374
| CloudWatch logs | `/ecs/javabin/{name}` |

docs/lambda-functions.md

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,10 +70,40 @@ Queries Security Hub for active findings at HIGH and CRITICAL severity, aggregat
7070

7171
## team-provisioner
7272

73-
**Trigger:** (Future) Registry repo merge events
74-
**Purpose:** Syncs team definitions across Google Workspace, GitHub, Cognito, and IAM.
73+
**Trigger:** Registry repo merge events (via `provision-app.yml` workflow dispatch)
74+
**Purpose:** Syncs team definitions from registry YAML across Google Groups, GitHub teams, AWS Budgets (80% warning + 200% enforcement thresholds), Cognito groups, and Identity Center groups. Also handles hero account provisioning.
7575

76-
**Status:** Stub only — logs event and returns success. Blocked on Google Admin access.
76+
| SSM Parameter | Purpose |
77+
|---------------|---------|
78+
| `/javabin/platform/google-admin-sa` | GCP service account JSON key (domain-wide delegation) |
79+
| `/javabin/platform/google-admin-email` | Admin email for Google Admin SDK impersonation |
80+
| `/javabin/platform/github-app-id` | GitHub App ID for team management |
81+
| `/javabin/platform/github-app-key` | GitHub App private key |
82+
| `/javabin/platform/github-app-client-secret` | GitHub App client secret |
83+
84+
## budget-enforcer
85+
86+
**Trigger:** SNS notification from AWS Budgets (200% threshold)
87+
**Purpose:** Scales a team's ECS services to `desired_count=0` when spending exceeds 200% of their monthly budget. Does NOT destroy resources — services can be scaled back up after resolution.
88+
89+
**Flow:** Parse budget name (`javabin-team-{team}`) → list ECS services tagged with team → scale to zero → post Slack alert.
90+
91+
| SSM Parameter | Channel |
92+
|---------------|---------|
93+
| `/javabin/slack/platform-cost-alerts-webhook` | #javabin-cost-alerts |
94+
95+
**Environment vars:** `ECS_CLUSTER` (default: `javabin-platform`)
96+
97+
## resource-tagger
98+
99+
**Trigger:** EventBridge rule matching all AWS service creation events (`{"prefix": "aws."}` source, `Create*`/`Run*` event names)
100+
**Purpose:** Auto-tags newly created AWS resources with `created-by` (actor) and `commit` (SHA) parsed from the CloudTrail session name. Tags are set via AWS Resource Groups Tagging API, outside Terraform management — no drift or plan noise.
101+
102+
**Session name format:** `{actor}-{sha8}-{run_id}` (enriched in CI workflows)
103+
104+
Idempotent: skips resources that already have a `created-by` tag (preserves original creator).
105+
106+
**Environment vars:** `AWS_ACCOUNT_ID`
77107

78108
## Shared Module: pricing
79109

docs/platform-modules.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,10 +79,11 @@ SNS topics, EventBridge rules, Config, GuardDuty, Security Hub.
7979
| GuardDuty | Threat detection |
8080
| Security Hub | Findings aggregation |
8181
| `javabin-alert-dedup` DynamoDB | Deduplication table used by slack-alert Lambda |
82+
| Cost allocation tags | `aws_ce_cost_allocation_tag` resources activating 7 tags: team, service, repo, environment, managed-by, created-by, commit |
8283

8384
## lambdas
8485

85-
8 Lambda functions — see [lambda-functions.md](lambda-functions.md) for details.
86+
11 Lambda functions — see [lambda-functions.md](lambda-functions.md) for details.
8687

8788
## identity
8889

docs/reusable-modules.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,8 @@ additional_policy_jsons = {
5353
}
5454
```
5555

56+
**`trusted_services`** — controls which AWS service can assume the role. Default: `["ecs-tasks.amazonaws.com"]`. Can be set to `["ec2.amazonaws.com"]` or `["lambda.amazonaws.com"]` via `compute.trusted_service` in app.yaml. Enables cross-compute roles so EC2 instances and Lambda functions get the same auto-wired access policies.
57+
5658
**Outputs:** `role_arn`, `role_name`, `role_id`
5759

5860
## ecs-service
@@ -61,6 +63,8 @@ ECS Fargate task definition + service + CloudWatch log group.
6163

6264
Supports `environment` (map) and `secrets` (map of name => ARN) for container configuration.
6365

66+
**Tag propagation:** `enable_ecs_managed_tags = true` and `propagate_tags = "SERVICE"` ensure Fargate task-level compute costs are attributed to the team via Cost Explorer.
67+
6468
**Outputs:** `service_name`, `task_definition_arn`, `log_group_name`
6569

6670
## service-bucket
@@ -91,6 +95,41 @@ SQS queue + dead-letter queue with configurable retention and visibility timeout
9195
**Naming:** `{project}-{name}` (queue), `{project}-{name}-dlq` (DLQ)
9296
**Outputs:** `queue_url`, `queue_arn`, `dlq_url`, `dlq_arn`, `access_policy_json`
9397

98+
## service-rds
99+
100+
RDS PostgreSQL instance in private subnets.
101+
102+
**Inputs:**
103+
104+
| Input | Default |
105+
|-------|---------|
106+
| `name` | required |
107+
| `engine_version` | `"16"` |
108+
| `instance_class` | `db.t3.micro` |
109+
| `allocated_storage` | 20 GB |
110+
| `subnet_ids` | required |
111+
| `vpc_id` | required |
112+
| `allowed_security_group_ids` | required |
113+
| `backup_retention_period` | 7 |
114+
| `multi_az` | false |
115+
| `deletion_protection` | true |
116+
117+
**Password:** Managed by AWS via `manage_master_user_password = true` (Secrets Manager).
118+
119+
**Outputs:** `endpoint`, `port`, `db_name`, `access_policy_json`, `security_group_id`
120+
121+
**Auto-wiring:** `access_policy_json` grants `rds-db:connect` + `secretsmanager:GetSecretValue`. Auto-attached to task role via `collect:access_policy_json`.
122+
123+
**app.yaml:**
124+
```yaml
125+
databases:
126+
- name: main
127+
engine: postgres
128+
instance_class: db.t3.micro
129+
allocated_storage: 20
130+
engine_version: "16"
131+
```
132+
94133
## service-alarm
95134
96135
CloudWatch alarms for ECS services: CPU high, memory high, unhealthy targets, 5xx errors.

scripts/ensure-tf-boilerplate.sh

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,9 +58,11 @@ provider "aws" {
5858
5959
default_tags {
6060
tags = {
61-
project = "${REPO_NAME}"
62-
team = "${TEAM}"
63-
managed-by = "terraform"
61+
team = "${TEAM}"
62+
service = "${REPO_NAME}"
63+
repo = "${GITHUB_REPOSITORY}"
64+
environment = "production"
65+
managed-by = "terraform"
6466
}
6567
}
6668
}

0 commit comments

Comments
 (0)