| title | FormFlow |
|---|---|
| emoji | 🧾 |
| colorFrom | green |
| colorTo | blue |
| sdk | gradio |
| app_file | app.py |
| pinned | false |
| license | mit |
Live demo: https://huggingface.co/spaces/Faraz618/formflow

Most document-AI demos stop at "extract the fields." FormFlow goes one step further: it reads a document, extracts structured data, decides what action should be taken (approve for payment, flag for review, or reject), and proposes that action for a human to approve or override — it never executes anything on its own.
This is a small, dependency-light demonstration of the Intelligent Document Processing (IDP) + agentic decision-making + human-in-the-loop (HITL) pattern used in enterprise finance, compliance, and operations workflows.
Two recurring requirements I kept seeing in AI Engineer job postings were (1) agentic systems that take real actions, not just answer questions, and (2) human validation checkpoints for anything with real-world consequences. Most public RAG/agent demos skip the second part entirely. FormFlow makes the human-approval step the centerpiece, not an afterthought.
The pipeline is three small, separable agents — each one is a plain Python function, deliberately kept simple and inspectable rather than hidden behind a framework:
-
Extraction Agent — Reads the uploaded document (.txt or .pdf) and pulls out structured fields using rule-based pattern matching: vendor name, invoice number, amount, due date, payment method, bank details. (Swappable for an LLM-based extractor — see "What I'd build next" below.)
-
Risk Assessment Agent — Runs the extracted fields through a set of explicit, inspectable risk rules (not a black box): unusually large amount, suspiciously short payment deadline, wire-transfer-only payment, mismatched or personal-sounding account names, missing standard invoice fields, "urgent / don't call" language. Each triggered rule is shown, not just a final score — so a reviewer can see exactly why something was flagged.
-
Decision Agent — Based on the accumulated risk score, proposes one of three actions: Approve, Flag for Manual Review, or Reject. Critically, this agent's output is a proposal, not an action — the UI requires an explicit human click to confirm before anything is logged as "actioned."
-
Audit Log — Every document processed (and the human's final decision on it) is appended to a visible, in-memory audit trail for the session — because in any real finance/compliance workflow, "who approved what, and why" has to be answered instantly.
┌─────────────┐ ┌──────────────────┐ ┌──────────────────┐ ┌───────────────┐
│ Document │ ──▶ │ Extraction Agent │ ──▶ │ Risk Assessment │ ──▶│ Decision Agent │
│ (.txt/.pdf) │ │ (structured field │ │ Agent (explicit, │ │ (proposes: │
│ │ │ extraction) │ │ inspectable │ │ approve/flag/ │
│ │ │ │ │ rule checks) │ │ reject) │
└─────────────┘ └──────────────────┘ └──────────────────┘ └───────┬───────┘
│
▼
┌─────────────────────┐
│ HUMAN APPROVAL GATE │
│ (nothing executes │
│ without a click) │
└──────────┬──────────┘
│
▼
📋 Audit Log Entry
- Load Sample 1: Normal Invoice — a routine office supplies invoice. Watch it sail through with a low risk score and an "Approve" recommendation.
- Load Sample 2: Suspicious Invoice — same pipeline, completely different outcome. Watch the Risk Agent flag five separate red flags (urgency language, wire-only payment, personal account name, unregistered address, 1-day deadline) and the Decision Agent recommend Reject — but it still waits for you to click "Confirm Decision" before anything is logged.
- Try clicking "Override and Approve Anyway" on the suspicious invoice — notice the audit log records that a human overrode an AI recommendation, with a timestamp. That accountability trail is the actual point of this project.
- Upload your own invoice or document and see how the extraction and risk rules perform on real-world formatting.
- Gradio — UI and app framework
- pypdf — PDF text extraction
- Plain Python / regex — rule-based extraction and risk scoring (deliberately not hidden behind an LLM call, so every decision is traceable and explainable — itself a deliberate design choice for high-stakes domains)
- Swap the regex-based Extraction Agent for an LLM-based one (prompt + structured output / function-calling) for documents with inconsistent layouts, while keeping the same explicit, human-readable risk rules layer on top of it
- Add a second specialist agent that cross-checks the extracted vendor against a known-vendor allowlist (a simple version of entity resolution / fraud-pattern matching used in real AP automation systems)
- Persist the audit log to a real database (PostgreSQL) instead of in-memory, so the approval trail survives a session restart
- Add an evaluation harness that replays a labeled set of past invoices (legit vs. fraudulent) and reports the agent's flag precision/recall — the same evaluation-as-discipline pattern from my other project, ComplianceRAG
Built by Faraz Mubeen Haider — AI Engineer focused on agentic systems, document intelligence, and human-in-the-loop AI design.