Skip to content

Remove duplicate file preservation after import workflow matures #44

Description

@codeforester

Summary

Once duplicate-import handling is proven in regular use, revisit the temporary duplicates/ preservation policy introduced by #43. The mature behavior should likely skip exact duplicate successful imports and remove the inbox copy after recording history, without keeping another physical copy.

Background

Issue #43 intentionally preserves exact duplicate inbox files in a managed duplicates/ directory. That is conservative while BankBuddy is young and the import/archive workflow is still earning trust. Long term, exact SHA-256 duplicates of already successful imports do not need duplicate storage because the primary canonical processed copy already exists.

Future desired behavior

  • Keep SHA-256-based exact duplicate detection.
  • Keep auditable duplicate/skipped import history.
  • Stop preserving physical duplicate files once the workflow is mature enough.
  • Remove or deprecate the managed duplicates/ directory policy.
  • Update docs, spec, and tests to reflect the simplified lifecycle.

Acceptance criteria

  • Exact successful duplicate inbox files are removed after duplicate history is recorded.
  • The existing canonical processed file remains the only retained physical statement copy.
  • import history still explains the skip clearly.
  • Any duplicate-path schema or output fields introduced for Skip exact duplicate inbox imports and preserve duplicate files #43 are removed, deprecated, or clearly documented as historical-only.
  • README and design spec no longer recommend preserving duplicate files.

Depends on

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or product improvement

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions