Agentless ingestion of AWS Org resources and Security Hub findings into a Neptune-backed security graph, with a risk and attack-path engine to identify security issues.
See ARCHITECTURE.md for the full system diagram and design.
Khalifa is an agentless ingestion pipeline that sits between your AWS Organization and a Neptune-backed security graph. Every account, resource, and finding gets collected on a schedule, normalized into a graph model you own, and evaluated locally against risk rules, attack-path traversals, and CIEM effective-permission logic.
Why it was built: Cloud estates grow faster than any team can review them. A misconfigured S3 bucket, an over-privileged IAM role, or a publicly exposed RDS instance turns into a real finding before anyone notices. Khalifa gives those resources a graph: collected, scored, joined to attack paths, and rendered as issues you can act on.
How it works: Collectors run on an EventBridge schedule (or as Kubernetes CronJobs) and assume into every account in the AWS Organization via a cross-account role. They inventory 30+ AWS services, decompose IAM into policy statements + effective permissions, pull Security Hub and GuardDuty findings, and stream everything into Neptune. The Risk Engine then runs Gremlin traversals against the live graph to produce prioritized issues, attack paths, and compliance evaluations against CIS, SOC 2, and ISO 27001 — without ever moving data out of your AWS account.
Prerequisites: Node.js 20+, AWS CDK CLI, an AWS Organization with a delegated admin account, and a Neptune cluster reachable from your compute.
git clone https://github.com/therandomsecurityguy/khalifa
cd khalifa
npm ci --workspacesDeploy the cross-account collector role into every member account from the templates/SecurityGraphCollectorRole.yaml template, then bootstrap the ingestion stack.
There are two ways to run Khalifa. Both end up in the same place (a populated security graph with risk findings) but the first is faster to try, the second is the production setup.
Option A: Lambda + EventBridge (development / small scale)
The Lambda stack uses EventBridge schedules, Step Functions for parallel account fan-out, and a separate daily CloudTrail analyzer. It scales to roughly 20 accounts without tuning.
cd cdk
npm install
npm run build
cdk deploy KhalifaStack \
--neptune-endpoint neptune-cluster.us-east-1.amazonaws.com \
--issues-table-name SecurityIssues \
--access-analyzer-table AccessAnalyzerCache \
--athena-database khalifa_cloudtrail_db \
--cloudtrail-s3-location s3://cloudtrail-logs/AWSLogs/Every two hours the collector ingests all member accounts. CloudTrail analysis runs daily at 02:00 UTC. The policy evaluator runs every six hours and after each collector pass. The risk engine runs hourly.
Option B: EKS + CronJob (production / >20 accounts)
The EKS stack runs the API service, rule runner, and UI as Kubernetes workloads. It is built for sustained load across hundreds of accounts and gives you a UI plus a REST API.
# 1. Build and push images
cd api-service && docker build -t security-graph-api:latest . && docker push <ecr>/security-graph-api:v1.0.0
cd ../packages/risk-engine && docker build -t security-graph-rule-runner:latest . && docker push <ecr>/security-graph-rule-runner:v1.0.0
# 2. Deploy CDK
cd ../../cdk
cdk deploy SecurityGraphEksStack \
--vpc-id vpc-12345678 \
--neptune-endpoint neptune-cluster.us-east-1.amazonaws.com \
--issues-table-name SecurityIssues \
--certificate-arn arn:aws:acm:us-east-1:... \
--cognito-user-pool-id us-east-1_xxxxx \
--cognito-client-id xxxxx
# 3. Apply manifests
kubectl apply -f eks-manifests/Use this when you want to serve a UI to analysts, expose a stable REST API, or run the rule runner on a schedule that survives control-plane hiccups.
| Approach | Use Case | Complexity |
|---|---|---|
| Lambda + EventBridge | Development / small scale (<20 accounts) | Lower |
| EKS + CronJob | Production (>20 accounts, multi-tenant API) | Higher |
Environment variables are shared across both stacks. The CDK stack wires sane defaults; override at deploy time or via the Kubernetes ConfigMap (eks-manifests/01-configmap.yaml).
| Variable | Description | Default |
|---|---|---|
NEPTUNE_ENDPOINT |
Neptune cluster endpoint | — |
NEPTUNE_AUTH_SECRET_ARN |
Secrets Manager ARN for Neptune auth | — |
ISSUES_TABLE |
DynamoDB table for issues | SecurityIssues |
ACCESS_ANALYZER_TABLE |
DynamoDB table for CloudTrail usage cache | AccessAnalyzerCache |
ATHENA_DATABASE |
Glue database for CloudTrail logs | khalifa_cloudtrail_db |
ATHENA_WORKGROUP |
Athena workgroup | khalifa-cloudtrail-analysis |
CLOUDTRAIL_S3_LOCATION |
S3 prefix for CloudTrail logs | s3://cloudtrail-logs/AWSLogs/ |
ANALYSIS_DAYS |
CloudTrail lookback window | 90 |
AWS_REGION |
AWS region | us-east-1 |
LOG_LEVEL |
Logging level | info |
RULE_RUNNER_SCHEDULE |
Cron schedule | 0 */6 * * * (every 6h) |
Cross-account access is granted via the IAM role defined in
templates/SecurityGraphCollectorRole.yaml. Deploy it once per member account with a unique external ID per deployment.
All routes except /health require a valid Cognito bearer JWT. RBAC roles are mapped from Cognito groups: khalifa-admin → Admin, khalifa-analyst → Analyst, khalifa-viewer → Viewer.
Issues & risk
| Endpoint | Description |
|---|---|
GET /health |
Health check (unauthenticated) |
GET /issues |
List issues with filters (Viewer+) |
GET /issues/:id |
Get issue details with attack path (Viewer+) |
GET /issues/counts |
Get issue counts by severity (Viewer+) |
GET /issues/stats |
Get detailed statistics (Viewer+) |
GET /attack-paths?fromSelector=X&toSelector=Y |
Find attack paths (Viewer+) |
GET /resources/:arn |
Get resource with neighbors and issues (Viewer+) |
GET /resources/search?label=EC2Instance |
Search resources (Viewer+) |
Compliance
| Endpoint | Description |
|---|---|
GET /compliance/frameworks |
List available compliance frameworks (Viewer+) |
GET /compliance/frameworks/:framework |
Get framework overview with control summaries (Viewer+) |
GET /compliance/frameworks/:framework/controls |
List all controls for a framework (Viewer+) |
GET /compliance/frameworks/:framework/controls/:controlId |
Get control details with evidence (Viewer+) |
GET /compliance/frameworks/:framework/report |
Generate compliance report (Viewer+) |
GET /compliance/frameworks/:framework/drift |
Detect configuration drift since last evaluation (Viewer+) |
CIEM / Identity
| Endpoint | Description |
|---|---|
GET /identity/effective-permissions/:principal |
Get computed effective permissions for a principal |
GET /identity/escalation-paths |
List detected escalation paths with filters |
GET /identity/unused-permissions?principal=X&days=90 |
Find unused permissions by comparing effective perms vs CloudTrail usage |
GET /identity/rightsizing/:principal?safetyMarginDays=7 |
Generate least-privilege policy recommendation |
GET /identity/trust-graph?account=X |
Retrieve cross-account trust relationships as a graph |
Query parameters for /issues
| Parameter | Type | Description |
|---|---|---|
severity |
string[] | Filter by severity (critical, high, medium, low) |
team |
string[] | Filter by owning team |
status |
string[] | Filter by status (open, resolved, suppressed) |
ruleId |
string | Filter by rule ID |
limit |
number | Max results (default: 50, max: 1000) |
nextToken |
string | Pagination token |
Examples
# Get critical issues
curl "https://api.example.com/issues?severity=critical&status=open&limit=100" \
-H "Authorization: Bearer $TOKEN"
# Find attack paths
curl "https://api.example.com/attack-paths?fromSelector=Internet&toSelector=S3Bucket&maxPathLength=4" \
-H "Authorization: Bearer $TOKEN"
# Get CIS compliance report
curl "https://api.example.com/compliance/CIS_AWS_FOUNDATIONS/report" \
-H "Authorization: Bearer $TOKEN"
# Get effective permissions for a role
curl "https://api.example.com/identity/effective-permissions/arn:aws:iam::123456:role/AdminRole" \
-H "Authorization: Bearer $TOKEN"
# Get rightsizing recommendation
curl "https://api.example.com/identity/rightsizing/arn:aws:iam::123456:role/DataRole?safetyMarginDays=7&includeReadonlySafe=true" \
-H "Authorization: Bearer $TOKEN"1. Single-account dev (like Quickstart Option A)
The Lambda stack fans out from a single delegated admin account, assumes into each member account via the collector role, and writes directly to a Neptune cluster in the same VPC. Step Functions parallelize the per-account work.
cdk deploy KhalifaStack2. Multi-account org with EKS backend (like Quickstart Option B)
The EKS stack adds an API service, a UI, and a Kubernetes CronJob for the rule runner. The API is fronted by an ALB with Cognito OIDC, and the rule runner executes Gremlin traversals on the same Neptune cluster.
cdk deploy SecurityGraphEksStack
kubectl apply -f eks-manifests/3. Multi-account org with read replicas
Run collectors in each region and replicate into a single Neptune cluster via Neptune Streams. Use this when accounts are concentrated in specific regions or you need to keep data residency boundaries.
Full deployment reference:
ARCHITECTURE.md·OPERATIONAL.md
Khalifa includes automated compliance evaluation against three industry-standard frameworks with 124 controls and 40+ automated evaluators that run Gremlin graph queries against your security data.
CIS AWS Foundations Benchmark v3.0 (78 controls)
Covers the foundational security configurations for AWS accounts:
| Section | Controls | Focus |
|---|---|---|
| 1. IAM | 18 | Root account, MFA, password policy, access keys |
| 2. Logging | 10 | CloudTrail, Config, S3 logging |
| 3. Monitoring | 28 | CloudWatch alarms, GuardDuty, Config rules |
| 4. Networking | 12 | VPC flow logs, security groups, NACLs |
| 5. Data Protection | 10 | Encryption, KMS rotation, backup |
SOC 2 Type II (22 controls)
Maps to Trust Services Criteria:
| Section | Controls | Focus |
|---|---|---|
| CC6 | 8 | Logical access, authentication, authorization |
| CC7 | 6 | Monitoring, incident response, change management |
| CC8 | 4 | Risk mitigation, system boundaries |
| CC9 | 4 | Additional criteria |
ISO 27001:2022 (24 controls)
Based on Annex A controls:
| Section | Controls | Focus |
|---|---|---|
| A.5 | 4 | Organizational controls, policies |
| A.6 | 4 | People controls, onboarding, training |
| A.8 | 6 | Technological controls, encryption, logging |
| A.9 | 4 | Physical and environmental security |
| A.12 | 3 | Operations security, vulnerability management |
| A.13 | 3 | Communications security, network controls |
The compliance engine runs Gremlin evaluators that query the live graph, produce per-control evidence (pass/fail/manual), and write results to DynamoDB for the UI to render.
Collector: runs in a delegated admin account, assumes into every member account via the cross-account role, and inventories 30+ AWS services per pass. Writes raw resource nodes to Neptune.
Policy Evaluator: resolves IAM identity + resource + boundary + SCP policies into net effective permissions per principal, and traverses cross-account trust edges up to 3 hops to surface escalation paths.
CloudTrail Analyzer: runs Athena queries against the CloudTrail S3 logs on a daily schedule, with a 90-day lookback window. Writes usage data to the AccessAnalyzerCache DynamoDB table for the rightsizer.
Risk Engine: runs Gremlin traversals against the live graph on a schedule, producing prioritized issues, attack paths, and compliance evaluations. Each rule ships with severity, scoring, and remediation guidance.
API Service: REST API fronted by ALB + Cognito OIDC. Exposes issues, attack paths, resources, compliance reports, and CIEM identity endpoints. RBAC enforced from Cognito groups.
UI: Next.js dashboard for issues, attack paths, and compliance. Renders control-level evidence, drift detection, and CSV export.
- Agentless ingestion: no agents to install in member accounts; collectors assume via a single cross-account role defined in
templates/SecurityGraphCollectorRole.yaml - Deterministic graph model: every resource, IAM statement, and finding becomes a typed node with explicit edges; Gremlin returns the same traversal for the same input every time
- 30+ AWS services collected: compute, storage, database, identity, network, security, logging, serverless, secrets, and backup
- Risk + attack path + CIEM in one pass: rules, traversals, and effective-permission evaluation all run against the same live graph
- CIEM with CloudTrail grounding: effective permissions are joined to actual usage from Athena over CloudTrail, with rightsizing recommendations and a configurable safety margin
- Compliance built in: CIS v3.0, SOC 2 Type II, and ISO 27001:2022 evaluated by automated Gremlin queries with per-control evidence
- Two deployment modes: Lambda + EventBridge for development, EKS + CronJob for production — same data model, same graph, same API
Infrastructure
CDK stacks: KhalifaStack (Lambda + EventBridge) and SecurityGraphEksStack (EKS + ALB + Cognito)
Cross-account IAM role template deployed once per member account
Collectors
Lists org accounts from AWS Organizations
Per-account collector: 30+ AWS services + enhanced IAM decomposition
Neptune writer for raw resource nodes
Event-driven updates via EventBridge → SQS
CIEM engine: effective permissions, escalation paths, rightsizing
Athena queries over CloudTrail S3 logs → DynamoDB cache
Engine
Risk rules, attack-path traversals, scoring, compliance evaluators
Service
REST API (Express) — issues, attack paths, resources, compliance, identity
Next.js dashboard — issues, attack paths, compliance
Deploy
Kubernetes manifests for API service, rule runner CronJob, HPA, NetworkPolicy
Docs
System architecture, data model, ingestion topology
Runbooks for the rule runner, Neptune, IRSA, and incident response
Local development, workspaces, CI conventions
Release history
BSD 3-Clause. See LICENSE.