feat(backend): add ControlVerificationTemplate table and CRUD endpoints#229
feat(backend): add ControlVerificationTemplate table and CRUD endpoints#229AaronAlijani wants to merge 1 commit into
Conversation
Phase 1 of the semi-automated manual verification system. Adds a per-control template that stores auditor instructions, keywords, severity, and expected evidence type for the 14 manual CIS controls. - New table control_verification_template (Alembic 8a7b91ea95d9) - Pydantic schemas: Create, Update, Read - CRUD endpoints under /v1/verification-templates - Write operations (POST/PATCH/DELETE) gated to admin role - Read operations available to any authenticated user - Unique constraint on control_id with 409 Conflict on duplicates Smoke-tested locally: POST creates row, GET returns it, duplicate POST returns 409. Refs: Khan's product brief on manual control verification.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 358f6754df
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| update_fields = data.model_dump(exclude_unset=True) | ||
| for field, value in update_fields.items(): | ||
| setattr(template, field, value) |
There was a problem hiding this comment.
Reject nulls before applying template updates
When a PATCH request explicitly sends null for non-nullable fields such as title, instructions, keywords, or severity, the update schema accepts it and exclude_unset=True still includes that field, so this loop writes None to columns declared nullable=False. In that scenario the subsequent commit raises a database integrity error and returns a 500 instead of a client error; either disallow None in the update schema or reject provided nulls before assigning.
Useful? React with 👍 / 👎.
du-dhartley
left a comment
There was a problem hiding this comment.
@AaronAlijani Great work on this one so far. The concept makes sense and has been well thought through.
There are couple of comments on specific lines in files to address, however there is one major issue that needs to be considered before this can be merged.
The short summary is that control_id alone is not a suitable unique identifier and we're completely missing the concept of a benchmark/framework version when saving this info.
When a user runs a scan they're selecting a framework, a benchmark and a version, and the policies directory already carries v3.1.0, v4.0.0 and v6.0.0 of the M365 benchmark side by side. CIS renumbers and rewrites controls between versions, so "1.1.2" in v3 isn't the same control as "1.1.2" in v6, and once the Azure benchmarks come online we'll have a second framework sharing the same numeric space. The new table keys uniquely on control_id alone, which means it can only hold one set of instructions per control across the whole platform — there's no way to say "these are the v6.0.0 instructions for 1.1.2" distinctly from the v3.1.0 ones. That'll bite us in Phase 2 the moment we try to seed: either we pick one version and quietly mismatch every other scan, or the second seed run trips the unique constraint. Could you fold framework, benchmark and version onto the table now, and make uniqueness composite across those plus control_id? The lookup pattern in scans.py already passes that tuple through end to end, so it'll plug in cleanly and keep the Phase 2 seeding honest about which version each template belongs to.
| from app.models.evidence_validation import EvidenceValidation | ||
| from app.models.contact import ContactSubmission, SubmissionNote, SubmissionHistory | ||
| from app.models.user_settings import UserSettings | ||
| from app.models.user_settings import UserSettings |
There was a problem hiding this comment.
This line is a duplicate of the one above and should be removed
| keywords: Mapped[list] = mapped_column(JSONB, nullable=False) | ||
|
|
||
| # Risk severity: "high" | "medium" | "low". | ||
| severity: Mapped[str] = mapped_column(String(20), nullable=False) |
There was a problem hiding this comment.
The comment above shows what the values should be, but there's no actual enforcement that what's coming in here matches those values. If this is what we expect to see, we should enforce this - Pydantic supports the concept of a Literal that does this, such as
from typing import Literal
Severity = Literal["high", "medium", "low"]
Summary
Phase 1 of the semi-automated manual verification system. Adds a per-control template table that stores auditor instructions, keywords, severity, and expected evidence type for the 14 manual CIS controls that AutoAudit cannot automate via the M365 collectors.
This is the foundation that later phases build on:
Companion to Ash Alijani's GRC documentation work on classifying manual controls and authoring the per-control templates JSON.
Type of Change
Affected Components
/backend-apiMotivation
Khan asked us to connect the evidence scanner with manual verification to reduce auditor workload. The idea: each manual control gets a verification template with specific auditor instructions and per-control keywords, so when an auditor uploads evidence, the existing OCR and keyword matching can validate it automatically and suggest a verdict with a confidence score. The auditor confirms or overrides.
This PR delivers the foundation, the table that holds those templates and the CRUD endpoints to manage them.
What's Included
control_verification_template(Alembic revision8a7b91ea95d9, autogenerated)control_id(unique, indexed)title,instructions,keywords(JSONB),severity,evidence_typecreated_at/updated_atwithnow()defaultsCreate,Update,Read/v1/verification-templates:POST /create (admin only)GET /list allGET /{control_id}fetch onePATCH /{control_id}partial update (admin only)DELETE /{control_id}delete (admin only)require_adminpermissioncontrol_idreturns409 Conflicton duplicate createsTesting Done
Smoke tested locally end-to-end via curl:
alembic upgrade headapplied the migration cleanly with no errorsPOST /v1/verification-templates/with admin token created a template for control1.1.2and returned201 Createdwith the full record includingid,created_at,updated_atGET /v1/verification-templates/1.1.2returned200 OKwith the same record, JSONB keywords array intactPOSTfor the samecontrol_idreturned409 Conflictwith the expected error message, confirms unique constraint and conflict-handling logic/docs(Swagger) under the "Verification Templates" tagSecurity Considerations
POST/PATCH/DELETE) gated to admin role viarequire_adminfromapp/core/permissions.py(existing pattern).get_current_user.keywordsfield is stored as JSONB, same pattern asEvidenceValidation.matches_jsonandScanResult.evidence, so no new attack surface.Breaking Changes
The new table is additive. No existing schemas, contracts, or workflows are modified.
Rollback Plan
If a rollback is required:
docker compose exec backend-api uv run alembic downgrade j1k2l3m4n567The migration's
downgrade()drops the new index and the new table cleanly.Checklist
Screenshots
All 5 endpoints visible in Swagger UI under the "Verification Templates" tag:
GET request returns the created template with all fields and timestamps: