Skip to content

feat(backend): add ControlVerificationTemplate table and CRUD endpoints#229

Open
AaronAlijani wants to merge 1 commit into
mainfrom
feature/26T1-BAC-AA-verification-templates
Open

feat(backend): add ControlVerificationTemplate table and CRUD endpoints#229
AaronAlijani wants to merge 1 commit into
mainfrom
feature/26T1-BAC-AA-verification-templates

Conversation

@AaronAlijani
Copy link
Copy Markdown
Collaborator

Summary

Phase 1 of the semi-automated manual verification system. Adds a per-control template table that stores auditor instructions, keywords, severity, and expected evidence type for the 14 manual CIS controls that AutoAudit cannot automate via the M365 collectors.

This is the foundation that later phases build on:

  • Phase 2 (subsequent commits to this branch), confidence scoring algorithm, seed the 14 templates, link evidence uploads to scan_result, auditor workflow.
  • Phase 3 (subsequent commits), status propagation from verdict to scan_result.status, scan finalisation for manual controls.

Companion to Ash Alijani's GRC documentation work on classifying manual controls and authoring the per-control templates JSON.

Type of Change

  • New feature

Affected Components

  • /backend-api

Motivation

Khan asked us to connect the evidence scanner with manual verification to reduce auditor workload. The idea: each manual control gets a verification template with specific auditor instructions and per-control keywords, so when an auditor uploads evidence, the existing OCR and keyword matching can validate it automatically and suggest a verdict with a confidence score. The auditor confirms or overrides.

This PR delivers the foundation, the table that holds those templates and the CRUD endpoints to manage them.

What's Included

  • New table control_verification_template (Alembic revision 8a7b91ea95d9, autogenerated)
    • control_id (unique, indexed)
    • title, instructions, keywords (JSONB), severity, evidence_type
    • created_at / updated_at with now() defaults
  • Pydantic schemas: Create, Update, Read
  • CRUD endpoints under /v1/verification-templates:
    • POST / create (admin only)
    • GET / list all
    • GET /{control_id} fetch one
    • PATCH /{control_id} partial update (admin only)
    • DELETE /{control_id} delete (admin only)
  • Write operations gated via the existing require_admin permission
  • Read operations available to any authenticated user (auditors/viewers need to see instructions in the UI)
  • Unique constraint on control_id returns 409 Conflict on duplicate creates

Testing Done

  • Tested manually — describe how:

Smoke tested locally end-to-end via curl:

  1. alembic upgrade head applied the migration cleanly with no errors
  2. POST /v1/verification-templates/ with admin token created a template for control 1.1.2 and returned 201 Created with the full record including id, created_at, updated_at
  3. GET /v1/verification-templates/1.1.2 returned 200 OK with the same record, JSONB keywords array intact
  4. Duplicate POST for the same control_id returned 409 Conflict with the expected error message, confirms unique constraint and conflict-handling logic
  5. Confirmed the new endpoints appear in /docs (Swagger) under the "Verification Templates" tag

Security Considerations

  • All write endpoints (POST / PATCH / DELETE) gated to admin role via require_admin from app/core/permissions.py (existing pattern).
  • All read endpoints require an authenticated user via get_current_user.
  • No secrets, tokens, or sensitive data introduced.
  • keywords field is stored as JSONB, same pattern as EvidenceValidation.matches_json and ScanResult.evidence, so no new attack surface.

Breaking Changes

  • No breaking changes

The new table is additive. No existing schemas, contracts, or workflows are modified.

Rollback Plan

  • Revert commit is sufficient

If a rollback is required:

docker compose exec backend-api uv run alembic downgrade j1k2l3m4n567

The migration's downgrade() drops the new index and the new table cleanly.

Checklist

  • Code follows project conventions
  • No secrets, credentials, or tokens committed
  • Relevant documentation updated (if applicable), docs are Ash's parallel work
  • CI/CD workflows pass on this branch
  • PR is focused on one thing

Screenshots

All 5 endpoints visible in Swagger UI under the "Verification Templates" tag:

image

GET request returns the created template with all fields and timestamps:

image

Phase 1 of the semi-automated manual verification system. Adds a per-control template that stores auditor instructions, keywords, severity, and expected evidence type for the 14 manual CIS controls.

- New table control_verification_template (Alembic 8a7b91ea95d9)

- Pydantic schemas: Create, Update, Read

- CRUD endpoints under /v1/verification-templates

- Write operations (POST/PATCH/DELETE) gated to admin role

- Read operations available to any authenticated user

- Unique constraint on control_id with 409 Conflict on duplicates

Smoke-tested locally: POST creates row, GET returns it, duplicate POST returns 409.

Refs: Khan's product brief on manual control verification.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 358f6754df

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +134 to +136
update_fields = data.model_dump(exclude_unset=True)
for field, value in update_fields.items():
setattr(template, field, value)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject nulls before applying template updates

When a PATCH request explicitly sends null for non-nullable fields such as title, instructions, keywords, or severity, the update schema accepts it and exclude_unset=True still includes that field, so this loop writes None to columns declared nullable=False. In that scenario the subsequent commit raises a database integrity error and returns a 500 instead of a client error; either disallow None in the update schema or reject provided nulls before assigning.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@du-dhartley du-dhartley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AaronAlijani Great work on this one so far. The concept makes sense and has been well thought through.
There are couple of comments on specific lines in files to address, however there is one major issue that needs to be considered before this can be merged.

The short summary is that control_id alone is not a suitable unique identifier and we're completely missing the concept of a benchmark/framework version when saving this info.

When a user runs a scan they're selecting a framework, a benchmark and a version, and the policies directory already carries v3.1.0, v4.0.0 and v6.0.0 of the M365 benchmark side by side. CIS renumbers and rewrites controls between versions, so "1.1.2" in v3 isn't the same control as "1.1.2" in v6, and once the Azure benchmarks come online we'll have a second framework sharing the same numeric space. The new table keys uniquely on control_id alone, which means it can only hold one set of instructions per control across the whole platform — there's no way to say "these are the v6.0.0 instructions for 1.1.2" distinctly from the v3.1.0 ones. That'll bite us in Phase 2 the moment we try to seed: either we pick one version and quietly mismatch every other scan, or the second seed run trips the unique constraint. Could you fold framework, benchmark and version onto the table now, and make uniqueness composite across those plus control_id? The lookup pattern in scans.py already passes that tuple through end to end, so it'll plug in cleanly and keep the Phase 2 seeding honest about which version each template belongs to.

from app.models.evidence_validation import EvidenceValidation
from app.models.contact import ContactSubmission, SubmissionNote, SubmissionHistory
from app.models.user_settings import UserSettings
from app.models.user_settings import UserSettings
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is a duplicate of the one above and should be removed

keywords: Mapped[list] = mapped_column(JSONB, nullable=False)

# Risk severity: "high" | "medium" | "low".
severity: Mapped[str] = mapped_column(String(20), nullable=False)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above shows what the values should be, but there's no actual enforcement that what's coming in here matches those values. If this is what we expect to see, we should enforce this - Pydantic supports the concept of a Literal that does this, such as

from typing import Literal

Severity = Literal["high", "medium", "low"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants