Make existing code honest (rules fire, real compliance, JWT, parameterized Gremlin)#4
Merged
Merged
Conversation
Move lambdas/risk-engine into a workspace package so it can be imported by both the rule-runner CronJob and the api-service without duplicating types. - New workspace member: packages/risk-engine (named @khalifa/risk-engine) - All risk-engine source files moved with history preserved - CDK RiskEngineFn now bundles from packages/risk-engine - Root package.json workspaces includes packages/* - README build instructions updated
…ternet node, cross-account trust The 10 existing risk rules query for properties the collector never wrote: is_internet_exposed, Internet/ContainerImage/KubernetesPod vertices, and tag-based data_classification / crown_jewel / env. This commit makes the collector produce what the rules expect. New collectors (lambdas/collector/src): - tags.ts: ResourceGroupsTaggingAPI integration with property extraction - network.ts: SecurityGroups, InternetGateways, NATGateways, Subnets, Internet node - load-balancers.ts: ELBv2 internet-facing detection - containers.ts: ECR repositories + ContainerImage digests - cross-account.ts: IAM role trust analysis with ExternalAccount vertices - exposure.ts: is_internet_exposed derivation from SG/SG rule/IP/LB context Changes to index.ts: - Replace hardcoded 'us-east-1' with process.env.AWS_REGION throughout - Thread region through all helper collectors - Apply tag props and is_internet_exposed in final pass - Add new dependency: client-resource-groups-tagging-api, client-ecr, client-elastic-load-balancing-v2 Existing infrastructure (Step Functions, EventBridge, IAM role) is unchanged; this commit only affects what gets written to Neptune.
…i-service routes - Replace stubbed DynamoDBEvidenceStore with real BatchWriteItem impl - Add DynamoDBReportStore for persisted ComplianceReport snapshots - Add InMemoryReportStore for tests/local dev - ComplianceEngine.runAssessment now persists reports via ReportStore - getLatestReport returns the most recent report per framework from DDB API service: - New ComplianceService wraps the risk-engine + handles Gremlin graph client - api-service/src/routes/compliance.ts now calls the real service instead of returning hard-coded mockReports/mockControls - api-service depends on @khalifa/risk-engine as a workspace package CDK: - accountIds: string[] prop on SecurityGraphIngestionStackProps - StepFunctions MapAccounts now threads accountId via parameters - ComplianceEvidence + ComplianceReports DynamoDB tables provisioned - EKS IRSA roles grant access to both new tables - EKS ConfigMap exposes EVIDENCE_TABLE, REPORTS_TABLE, Cognito vars
…ware
Every attack-path/resource query used string interpolation for user-
supplied values (label, ARN, fromSelector, toSelector), allowing
Gremlin injection via query params.
- New middleware: api-service/src/middleware/gremlin-validator.ts
- validateGremlinSelectors: rejects fromSelector/toSelector/label/property
values that don't match ^[A-Za-z][A-Za-z0-9_]*$
- validateArnParam: rejects path :arn that doesn't start with 'arn:'
- Neptune client: refactor findAttackPath, getResource, getNeighbors
to use P.bindings form instead of template-string interpolation
- attack-paths.ts: findAllAttackPaths uses P.maxLen binding
- resources.ts: searchResources uses P.label/P.property/P.value bindings
- app.ts: applies both validators globally (skip /health)
Bug fix: findAllAttackPaths was using 'isInternetExposed' which the
collector never set; corrected to 'is_internet_exposed' (the property
we set in commit 2).
Every endpoint was previously open. Add Cognito OIDC JWT verification
using jose, plus role-based access control gated on cognito:groups.
- api-service/src/middleware/auth.ts
- authenticate(): verifies Bearer JWT against Cognito JWKS
- signature, issuer (cognito-idp.{region}.amazonaws.com/{poolId}),
audience (clientId), exp
- populates req.user with { sub, email, groups[], tokenUse }
- optionalAuthenticate(): for routes that don't require auth (none yet)
- JWKS cached per process for performance
- api-service/src/middleware/rbac.ts
- Role enum: Admin > Analyst > Viewer
- Cognito group → role mapping:
khalifa-admin → Admin
khalifa-analyst → Analyst
khalifa-viewer → Viewer
- requireRole(Role.X) middleware factory
- Role hierarchy: Admin satisfies all lower-role gates
- api-service/src/app.ts
- authenticate applied to all routes except /health
- requireViewer applied to every read endpoint (issues, attack-paths,
resources, compliance)
- requireAdmin applied to compliance run endpoints (forward-compat for
Phase 1's POST /compliance/.../run)
- validateArnParam / validateGremlinSelectors still applied before auth
so 400s are returned before token is parsed
- api-service/package.json: + jose
- Replace per-package jest configs with root jest.config.js using
the projects API. Run all tests via 'npx jest' or 'npm test' at root.
- Add 34 new tests:
- api-service/src/__tests__/auth.test.ts (5 tests): JWT verify
(signature, issuer, audience, expiry, req.user population)
- api-service/src/__tests__/rbac.test.ts (6 tests): Admin/Analyst/Viewer
role hierarchy, requireRole middleware behavior
- api-service/src/__tests__/gremlin-validator.test.ts (5 tests):
rejects Gremlin injection attempts, accepts valid labels
- api-service/src/__tests__/compliance-service.test.ts (2 tests):
real ComplianceService listFrameworks + getFrameworkSummary
- lambdas/collector/src/tags.test.ts (7 tests): tag normalization
(env, data_classification, crown_jewel coercion to bool)
- lambdas/collector/src/exposure.test.ts (9 tests): internet-exposure
derivation from public_ip, PubliclyAccessible, scheme, SG/subnet
- jose@4 (CJS-compatible) added to api-service
- CI: single 'Run all tests' step using root jest; other workflow paths
updated to reference packages/risk-engine instead of lambdas/risk-engine
- Docs: CONTRIBUTING.md (how to add rules/controls), ARCHITECTURE.md
(vertex/edge schema, data flow diagram), CHANGELOG.md (Phase 0 entry)
- README: corrected compliance endpoint paths (/compliance/frameworks/...)
+ auth/RBAC note
- UI: layout.tsx adds an ErrorBoundary component for graceful failures
All 92 tests pass.
Conflicts resolved:
- README.md: combined main's new lambdas/{policy-evaluator,cloudtrail-analyzer}
tree with HEAD's packages/risk-engine refactor
- api-service/package.json: kept both @khalifa/risk-engine and the new
@aws-sdk/util-dynamodb
- api-service/src/app.ts: combined my auth/RBAC middleware with main's
new /identity/* CIEM routes; gated CIEM routes behind requireViewer
- package.json: build:lambdas builds packages/risk-engine + main's
policy-evaluator + cloudtrail-analyzer
- package-lock.json: regenerated via npm install --workspaces
All 92 tests still pass.
No code changes; this commit forces GitHub to re-evaluate the merge state of the PR after the conflict resolution commit.
Run prettier --write on the 11 api-service files flagged by prettier --check. No semantic changes.
Run prettier --write on the UI root layout component which contained a hand-written ErrorBoundary class without consistent formatting.
Run prettier --write on:
- cdk/lib/khalifa-stack.ts, cdk/bin/khalifa.ts
- packages/risk-engine/src/compliance-engine.ts, packages/risk-engine/src/index.ts
- lambdas/collector/src/{containers,cross-account,exposure,exposure.test,load-balancers,network,tags,tags.test}.ts
- lambdas/collector/index.ts
No semantic changes. All 92 tests still pass.
- .eslintrc.js: add tsconfigRootDir so parser resolves relative paths
correctly across monorepo workspaces
- Remove unused SDK command imports from lambdas/collector/index.ts
(DescribeSecurityGroupsCommand/DescribeInternetGatewaysCommand/etc.
moved to lambdas/collector/src/ modules)
- Remove unused SDK imports from lambdas/incremental-collector/index.ts
- Remove unused SUPPORTED_KEYS constant from tags.ts
- Prefix unused parameters with _ in:
- packages/risk-engine/src/runner.ts (resources)
- packages/risk-engine/src/scoring.ts (context)
- packages/risk-engine/src/compliance-engine.ts (framework)
- lambdas/collector/index.ts (index, region in collectEks/collectSecurityHub)
- lambdas/collector/src/exposure.ts (vpcId)
- lambdas/incremental-collector/index.ts (eventName)
- Remove unused imports/types from packages/risk-engine/src/{compliance-engine,compliance-rules,index}.ts
- Delete unused runComplianceAssessment demo function from index.ts
- Remove unused NextFunction import from auth/gremlin-validator tests
- Remove unused requireAdmin import from api-service/src/app.ts
All 92 tests still pass.
The merge from origin/main brought in IAM-collection code (groups, policies, versions) that uses commands I had pruned from the IAM import block during ESLint cleanup. This restores those imports without the dead ones: - ListGroupsCommand, ListGroupPoliciesCommand, GetGroupPolicyCommand - ListAttachedGroupPoliciesCommand, ListAttachedRolePoliciesCommand - ListRolePoliciesCommand, GetRolePolicyCommand - ListPolicyVersionsCommand, GetPolicyVersionCommand - ListUserPoliciesCommand, GetUserPolicyCommand - ListAttachedUserPoliciesCommand Also: replace dynamic import of ListPolicyVersionsCommand with the now-static import. Add type annotation to versions.find callback. fix(ui): import React explicitly in layout.tsx The ErrorBoundary class uses React.Component and React.ErrorInfo but didn't import React, relying on UMD globals. Add explicit import.
Run prettier --write on the collector's main file (which contains ~1,800 lines including the merged-in IAM collection code).
api-service imports @khalifa/risk-engine as a workspace package and
needs its dist/*.d.ts files to resolve types. Without building risk-
engine first, 'tsc --noEmit' in api-service fails with TS2307
('Cannot find module @khalifa/risk-engine') and a cascade of
implicit-any errors on the unresolvable types.
Changes:
- Root scripts: build:api, build:cdk, build now chain build:lambdas
first so workspaces that depend on @khalifa/risk-engine can
resolve it
- CI: reordered build steps so risk-engine builds before api-service
(was building API before Risk Engine, which made typecheck fail)
No semantic changes. All 92 tests still pass.
Use 'tsc -b' (build mode with project references) to build @khalifa/risk-engine before api-service and cdk compile. This makes 'npm run build:api' and 'npm run build:cdk' self-sufficient — they no longer require a prior 'npm run build:lambdas'. Previously, if dist/index.d.ts was missing (e.g., after a fresh clone or 'git clean'), 'tsc --noEmit' in api-service would fail with TS2307 'Cannot find module @khalifa/risk-engine' and a cascade of implicit-any errors on unresolvable types. All 92 tests still pass.
Use TypeScript project references so api-service can resolve @khalifa/risk-engine types via the workspace symlink without needing dist/ to be pre-built. Changes: - packages/risk-engine/tsconfig.json: add composite + declarationMap - packages/risk-engine/package.json: add 'prepare' lifecycle hook (runs 'tsc' on npm install/pack/publish) - api-service/tsconfig.json: composite + references risk-engine - api-service/package.json: - 'build': 'tsc -b' (uses project references) - 'typecheck': 'tsc -b && tsc --noEmit' (builds then typechecks) - 'pretest': 'tsc -b' (auto-builds risk-engine before tests) - 'prebuild': 'tsc -b ../packages/risk-engine' (for fresh install) - jest.config.js: globalSetup runs 'tsc -b packages/risk-engine' before any test suite (so api-service tests find dist/index.d.ts) Add .tsbuildinfo files to gitignore. All 92 tests pass.
The jest globalSetup file (jest.setup.js) was created locally but never committed because .gitignore was excluding *.js. CI was failing with 'Module <rootDir>/jest.setup.js was not found'. Changes: - .gitignore: rewrite with proper structure (original content + tsbuildinfo exclusion + explicit allowlist for jest.setup.js, prettier.config.js, *.config.js, bin/**, lib/**) - jest.setup.js: track in repo (was untracked previously)
Next.js 16 + Turbopack build was failing because:
1. Class components must be in a 'use client' file
2. 'metadata' export must be from a Server Component
3. The 'import * as React' pattern no longer puts Component on the
React namespace in the new JSX transform
Fix: split into two files:
- app/layout.tsx (server): exports metadata + RootLayout
- app/error-boundary.tsx ('use client'): exports ErrorBoundary class
Also: import Component, ErrorInfo, ReactNode as named imports.
The .gitignore rewrite in the previous commit accidentally dropped the '*.js' and 'cdk/dist/' exclusions, causing cdk/dist/ files and lambdas/shared/*.js to be tracked. Re-add the proper exclusions and untrack the artifacts. Config files we still want tracked: - .eslintrc.js - jest.config.js - jest.setup.js - prettier.config.js - bin/**, lib/** (CDK source dirs)
Run prettier --write on the two new files added for the Next.js 16 ErrorBoundary client-component split.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Six self-contained commits. Each builds, typechecks, and passes tests independently.
Commits
refactor(risk-engine): extract to packages/risk-engine— Movelambdas/risk-engineinto a workspace package soapi-serviceand the rule-runner CronJob share types.feat(collector): tag ingestion + internet exposure + SGs/IGW/NAT/LB + Internet node + cross-account trust— The 10 existing risk rules query for properties the collector never wrote. Now it does.feat(compliance): real ComplianceEngine wiring + DynamoDB tables + api-service routes— DeletemockReports/mockControls; compliance routes now hit Neptune + DynamoDB.refactor(api): parameterized Gremlin + injection-guard middleware— Every user-supplied value goes through bindings; new allowlist validator.feat(api): JWT (jose) + RBAC middleware— Cognito OIDC verification, Viewer/Analyst/Admin role gates.test: root jest config + 34 new tests + docs polish— 92 tests passing total; CONTRIBUTING/ARCHITECTURE/CHANGELOG added.What's now actually working
is_internet_exposed, noInternet/ContainerImagenodesenv,data_classification,crown_jewel,owner,business_unitus-east-1process.env.AWS_REGIONeverywhereExternalAccountvertices +TRUSTSedgesTest results
Files
packages/risk-engine(@khalifa/risk-engine)lambdas/collector/src/{tags,network,load-balancers,containers,cross-account,exposure}.tsapi-service/src/middleware/{auth,rbac,gremlin-validator}.tsapi-service/src/services/compliance-service.tsComplianceEvidence,ComplianceReports(CDK)accountIds: string[]onSecurityGraphIngestionStackPropsCONTRIBUTING.md,ARCHITECTURE.md,CHANGELOG.mdjest.config.jsreplaces per-package configsBreaking changes
/health). Existing clients without a token will get 401. UI already sendsAuthorization: Bearer ${id_token}so this works out of the box.khalifa-admin/khalifa-analyst/khalifa-viewergroups in the user pool./compliance/:framework/...paths in the README are now/compliance/frameworks/:framework/...— the actual mounted routes. README updated.ACCOUNT_IDSenv var (comma-separated). Defaults to just[MASTER_ACCOUNT_ID]for backward compatibility.