Skip to content

Add failsafe so original eICR always flows through to AugmentationEICRV2/ #545

@nickclyde

Description

@nickclyde

Background

The APHL AIMS eCR pipeline assumes that any eICR diverted into the TTC pipeline will reappear at s3://<bucket>/AugmentationEICRV2/<persistence_id> so it can be picked up by downstream processing. Today this assumption is violated in several cases (when there are no relevant Schematron errors, when an exception is raised in TTC, or when an exception is raised in Augmentation) and the eICR is silently dropped from the main pipeline. APHL has asked for a failsafe that guarantees the original document always makes it through.

Failure modes today

Scenario AugmentationEICRV2/ populated?
Happy path (errors found, matches found) ✅ Augmented eICR
No relevant Schematron errors ❌ Nothing. TTC early-exits at packages/text-to-code-lambda/src/text_to_code_lambda/lambda_function.py:472-485 without writing a TTC output, so the Augmentation Lambda is never triggered
Unhandled exception in TTC Lambda ❌ Nothing. caught at handler boundary (lambda_function.py:69-75), no output written
Unhandled exception in Augmentation Lambda ❌ Nothing. caught at handler boundary (packages/augmentation-lambda/src/augmentation_lambda/lambda_function.py:52-58), no output written
TTC found zero code matches ⚠️ Augmentation Lambda runs and writes a file, but EICRAugmenter.augment() still rewrites document headers even with an empty codes list (packages/augmentation/src/augmentation/services/eicr_augmenter.py:50-83). The original is modified, not passed through unchanged.

Proposed behavior

The Augmentation Lambda becomes the single owner of writes to AugmentationEICRV2/. The TTC Lambda is updated to always emit a TTCAugmentationMetadataV2/{persistence_id} record, optionally carrying a passthrough: true flag and a structured passthrough_reason. The Augmentation Lambda, when it sees the flag (or hits its own exception), skips augmentation and writes the original eICR byte-for-byte to AugmentationEICRV2/{persistence_id}. An AugmentationMetadataV2/{persistence_id} record captures the reason for observability.

Passthrough reasons (enum):

  • no_relevant_schematron_errors
  • no_code_matches
  • ttc_exception
  • augmentation_exception

TTC Lambda changes (packages/text-to-code-lambda/src/text_to_code_lambda/lambda_function.py)

  1. No-relevant-errors branch (lines 472-485): instead of writing only metadata and returning, also write a TTC output to TTCAugmentationMetadataV2/{persistence_id} with passthrough: true and passthrough_reason: "no_relevant_schematron_errors". Keep schematron_errors empty. This triggers the Augmentation Lambda, which sees the flag and writes the original eICR.
  2. Handler-level exception (lines 69-75): before adding to failures, attempt a best-effort write of a passthrough TTC output (passthrough: true, passthrough_reason: "ttc_exception", optional error string for observability - string only, never a stack trace with PHI). Wrap in its own try/except so a secondary failure still logs cleanly. If persistence_id cannot be extracted from the S3 event, skip the passthrough write and let the SQS DLQ handle it.
  3. TTC-found-zero-matches path: when the output has an empty schematron_errors map after processing, set passthrough: true and passthrough_reason: "no_code_matches" on the TTC output. Same downstream behavior.

Augmentation Lambda changes (packages/augmentation-lambda/src/augmentation_lambda/lambda_function.py)

  1. Detect passthrough in _process_record (around line 113): after loading ttc_output, check ttc_output.get("passthrough"). If true, skip the augmenter entirely. Write the original eICR bytes unmodified to AugmentationEICRV2/{persistence_id} and an augmentation metadata file recording {passthrough: true, passthrough_reason: <upstream reason>}.
  2. Wrap augmentation in try/except (lines 130-138): if EICRAugmenter raises, catch, log, and fall back to writing the original eICR unchanged with metadata {passthrough: true, passthrough_reason: "augmentation_exception", error: <str>}. Mark the SQS record as a success at the handler level, as the passthrough is the intended outcome.
  3. Pre-augmenter failures (S3 read of original eICR fails, TTC output missing, JSON parse fails): leave alone. The existing handler-level catch + SQS DLQ (maxReceiveCount: 3 in terraform/main.tf) handle these; if we can't read the original, there is nothing to pass through.

Shared model changes (packages/shared-models/)

  • Add optional passthrough: bool and passthrough_reason: str | None fields to the TTC output schema. Both Lambdas read via .get(...), so the change is backward-compatible.
  • Use an enum for the four reason values listed above.

Acceptance criteria

  • Any TextToCodeSubmissionV2/<persistence_id> that triggers the TTC Lambda results in a corresponding object at AugmentationEICRV2/<persistence_id> (augmented or original, byte-for-byte).
  • Passthrough writes produce a structured AugmentationMetadataV2/<persistence_id> record with passthrough: true and passthrough_reason.
  • Augmented-path behavior is unchanged.
  • Structured log fields include passthrough_reason so we can build a failsafe-rate metric in CloudWatch.
  • The only conditions under which AugmentationEICRV2/<persistence_id> is not populated are (a) the triggering S3 event has no recoverable persistence_id, or (b) the original eICR object is unreadable; both fall through to the existing SQS DLQ.

Files in scope

  • packages/text-to-code-lambda/src/text_to_code_lambda/lambda_function.py
  • packages/augmentation-lambda/src/augmentation_lambda/lambda_function.py
  • packages/shared-models/
  • Tests in packages/text-to-code-lambda/tests/ and packages/augmentation-lambda/tests/

No Terraform changes are required.

Test plan

  • Unit tests (uv run pytest packages/text-to-code-lambda/tests/ packages/augmentation-lambda/tests/): one test per passthrough reason. For passthrough cases, assert that bytes written to AugmentationEICRV2/ are byte-for-byte equal to bytes read from TextToCodeSubmissionV2/ (compare on the mocked S3).
  • End-to-end (local Docker Compose): drop a clean eICR (no Schematron errors) into the input prefix, confirm a passthrough copy lands in AugmentationEICRV2/. Repeat with a deliberately malformed eICR to exercise the TTC exception path.
  • Lint/type: just ruff && just ty.

Metadata

Metadata

Labels

enhancementNew feature or requestpythonPull requests that update python code

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions