Skip to content

bryanrg22/Basis_Info

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Basis – Agentic RAG for Document Intelligence

Basis Logo

Cost segregation shouldn't take weeks. Basis gets engineers 80% of the way thereβ€”fast, guided, and defensible.

Powered by:

  • Agentic RAG – LLM-driven retrieval where agents decide when, what, and how to search
  • Multi-Agent Self-Correction – Extraction β†’ Verification β†’ Correction loops with audit trails
  • Detection-First Vision – Grounding DINO + SAM2 + GPT-4o for hallucination-free image analysis

Table of Contents


What is Basis?

Basis is an AI-assisted platform for residential-focused cost segregation firms that accelerates the most time-consuming part of the study:

analyzing hundreds of photos, sketches, and appraisal documents to produce an IRS-ready report.

Basis is not a "one-click study generator." It's a human-in-the-loop, agentic workflow powered by three core systems:

  1. Vision Layer β€” Detection-first image processing that reduces VLM hallucinations through grounded detection
  2. Evidence Layer β€” PDF ingestion pipeline with hybrid BM25 + vector retrieval for IRS-grounded reasoning
  3. Agentic Workflow β€” LangGraph-orchestrated multi-agent system with stage-gated engineer review checkpoints

This architecture walks the engineer through every decision before anything becomes client-facing.


Why Cost Seg?

$1M That's what you might spend to buy a house. That upfront spend can create tax savings as the property depreciates over 27.5 years.

But 27.5 years is a long time to wait.

Cost segregation helps owners accelerate depreciation and unlock meaningful savings earlier. In the U.S., there are 5,000+ businesses conducting thousands of studies per yearβ€”which makes the workflow opportunity massive.


The Problem

A cost segregation study typically follows three steps:

  1. Document the property
  2. Analyze the documentation
  3. Generate the report

The bottleneck is step 2.

Our interviews revealed that this analysis phase:

  • Requires engineers to comb through hundreds of photos, drawings, and appraisals
  • Can take 2–3 weeks to complete
  • Can cost >$1,200 in labor per study
  • Can leave >$1,000 in savings on the table due to missed or inconsistently documented components

The Solution

Enter Basis.

Engineers upload the property artifacts they already use today. Basis:

  • Organizes documents and imagery
  • Classifies rooms, materials, and objects
  • Guides engineers through review checkpoints
  • Surfaces the exact references needed for takeoffs and tax classification (so engineers aren't hunting across hundreds of pages)

Result: faster studies, fewer errors, lower cost to serve.


Demo Video

A short walkthrough showing how Basis guides engineers through appraisal constraints, room/object classification, takeoffs, and IRS-grounded asset decisions.

Basis Demo Video


Current Project Overview

  • Objective: Reduce cost seg analysis time by automating repetitive classification and retrieval tasks while preserving engineer-led accuracy and auditability.

  • Core Features:

    • Study creation + structured upload
    • Appraisal-to-constraints extraction
    • Room classification with scene + object context
    • Object/component detection with metadata enrichment
    • Engineer review checkpoints at every stage
    • Engineering takeoffs assistance
    • Asset classification with IRS-grounded RAG
    • Cost classification hooks for integrated cost databases
    • Export-ready outputs for existing firm templates

The Problem: Document Intelligence at Scale

Many industries require AI-assisted workflows for querying large document setsβ€”regulatory publications, technical standards, safety baselinesβ€”that share a common challenge:

Standardized headers, messy context.

These documents contain critical structured data (IDs, codes, classifications, tables) embedded in unstructured narrative text. Traditional approaches fail because:

  • Pure keyword search misses semantic relationships
  • Pure vector search hallucinates on exact codes and IDs
  • Context windows can't hold hundreds of pages
  • LLM-only approaches lack auditability and traceability

The Solution: Agentic RAG + Multi-Agent Workflow

Basis implements a three-layer architecture designed for document intelligence problems:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     AGENTIC LAYER (LangGraph)                   β”‚
β”‚  β€’ Multi-agent orchestration with stage-gated checkpoints       β”‚
β”‚  β€’ Tool routing based on query intent                           β”‚
β”‚  β€’ "No evidence, no claim" enforcement                          β”‚
β”‚  β€’ Human-in-the-loop verification at every stage                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     EVIDENCE LAYER (Hybrid RAG)                 β”‚
β”‚  β€’ BM25 for exact-term matches (codes, IDs, classifications)    β”‚
β”‚  β€’ FAISS vector search for semantic similarity                  β”‚
β”‚  β€’ Score fusion + deduplication                                 β”‚
β”‚  β€’ Tables stored intact (never chunked)                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     OFFLINE PIPELINE                            β”‚
β”‚  β€’ Layout-aware PDF parsing (pdfplumber)                        β”‚
β”‚  β€’ Table extraction β†’ structured JSON                           β”‚
β”‚  β€’ Semantic chunking with 80-token overlap                      β”‚
β”‚  β€’ Dual indexing (BM25 + FAISS)                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This architecture is domain-agnostic. The current implementation targets cost segregation (IRS tax documents), but the same pipeline handles any document corpus with structured codes and unstructured context.


Architecture Deep Dive

1) Offline Pipeline β€” PDF Ingestion

Location: backEnd/evidence_layer/

Transforms raw PDFs into retrieval-ready indexes through a 5-stage pipeline:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         PDF INGESTION PIPELINE                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  STAGE 1 │───►│  STAGE 2 │───►│  STAGE 3 │───►│  STAGE 4 │───►│STAGE 5 β”‚ β”‚
β”‚  β”‚  Parse   β”‚    β”‚  Extract β”‚    β”‚  Chunk   β”‚    β”‚  Build   β”‚    β”‚ Build  β”‚ β”‚
β”‚  β”‚  Layout  β”‚    β”‚  Tables  β”‚    β”‚  Text    β”‚    β”‚  BM25    β”‚    β”‚ FAISS  β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚       β”‚               β”‚               β”‚               β”‚              β”‚      β”‚
β”‚       β–Ό               β–Ό               β–Ό               β–Ό              β–Ό      β”‚
β”‚   layout/        structured/      retrieval/      indexes/       indexes/   β”‚
β”‚   elements.json  tables.json      chunks.json     bm25.pkl       faiss.idx  β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Stage 1: Layout-Aware PDF Parsing

File: parse_pdf.py

Extracts text with positional metadata using pdfplumber + PyMuPDF.

Raw PDF
   β”‚
   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  For each page:                     β”‚
β”‚  β€’ Extract text with bbox coords    β”‚
β”‚  β€’ Detect font size + boldness      β”‚
β”‚  β€’ Classify element type            β”‚
β”‚  β€’ Preserve reading order           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚
   β–Ό
Layout Elements (with position + type)

Element Classification:

Type Detection Method Example
title Large font + bold "Chapter 4: MACRS"
heading Medium font + bold "Section 1245 Property"
paragraph Regular text blocks Narrative content
list_item Numbered/bulleted "1. Tangible property..."
table Grid structure detected Routed to Stage 2

Output: layout/elements.json β€” every text block with page, bbox, font, type.


Stage 2: Table Extraction (Tables Stay Intact)

File: extract_tables.py

Critical design decision: Tables are NEVER chunked. They're stored as structured JSON and fetched whole.

Layout Elements
   β”‚
   β”œβ”€β”€ Table detected? ──YES──► Extract as structured JSON
   β”‚                            Store in structured/tables.json
   β”‚                            Create surrogate chunk for search
   β”‚
   └── Not a table ──────────► Pass to Stage 3

Why tables stay intact:

  • Chunking tables destroys row/column relationships
  • LLMs hallucinate when given partial table data
  • Agents fetch full table by table_id when surrogate matches

Table Storage Format:

{
  "table_id": "DOC_2024_table_3",
  "page": 15,
  "caption": "Table B-1. Asset Classes",
  "headers": ["Asset Class", "Description", "Recovery Period"],
  "rows": [
    ["57.0", "Distributive Trades", "5 years"],
    ["00.11", "Office Furniture", "7 years"]
  ],
  "markdown": "| Asset Class | Description | Recovery Period |\n|---|---|---|\n| 57.0 | ..."
}

Surrogate Chunk (for search):

{
  "chunk_id": "DOC_2024_table_3_surrogate",
  "type": "table_surrogate",
  "text": "Table B-1. Asset Classes: 57.0 Distributive Trades 5 years, 00.11 Office Furniture 7 years...",
  "table_id": "DOC_2024_table_3"
}

When search hits the surrogate β†’ agent calls get_table(table_id) β†’ returns full structured table.

URAR Appraisal Mapping:

For appraisal documents, extracted tables are additionally mapped to URAR (Uniform Residential Appraisal Report) sections:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              APPRAISAL TABLE β†’ SECTION MAPPING                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                  β”‚
β”‚  Extracted Tables (.tables.jsonl)                                β”‚
β”‚         β”‚                                                        β”‚
β”‚         β–Ό                                                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                         β”‚
β”‚  β”‚  map_appraisal_sections.py          β”‚                         β”‚
β”‚  β”‚                                     β”‚                         β”‚
β”‚  β”‚  1. Identify section by keywords    β”‚                         β”‚
β”‚  β”‚     + page position (URAR layout)   β”‚                         β”‚
β”‚  β”‚                                     β”‚                         β”‚
β”‚  β”‚  2. Map table rows β†’ section fields β”‚                         β”‚
β”‚  β”‚     (subject, neighborhood, etc.)   β”‚                         β”‚
β”‚  β”‚                                     β”‚                         β”‚
β”‚  β”‚  3. Fallback to regex extraction    β”‚                         β”‚
β”‚  β”‚     for missing values              β”‚                         β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                         β”‚
β”‚         β”‚                                                        β”‚
β”‚         β–Ό                                                        β”‚
β”‚  Frontend-ready sections:                                        β”‚
β”‚  β€’ subject (address, borrower, lender)                           β”‚
β”‚  β€’ listing_and_contract (price, DOM, sale type)                  β”‚
β”‚  β€’ neighborhood (location, growth, values)                       β”‚
β”‚  β€’ site (dimensions, zoning, utilities)                          β”‚
β”‚  β€’ improvements (foundation, rooms, year built)                  β”‚
β”‚  β€’ sales_comparison (comps grid)                                 β”‚
β”‚  β€’ cost_approach (site value, depreciation)                      β”‚
β”‚  β€’ reconciliation (final value opinion)                          β”‚
β”‚                                                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This mapping uses the same high-quality table extraction from ingestionβ€”no additional parsing or GPT calls required.

Production Enhancement: Tiered Extraction

For production deployments, appraisal extraction uses a multi-tier approach with confidence scoring:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    TIERED EXTRACTION                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Tier 1: MISMO XML Parser (confidence: 1.0)                    β”‚
β”‚     ↓ (if unavailable)                                          β”‚
β”‚  Tier 2: Azure Document Intelligence (confidence: 0.7-0.95)     β”‚
β”‚     ↓ (for fields with confidence < 0.85)                       β”‚
β”‚  Tier 3: GPT-4o Vision Fallback (confidence: 0.6-0.9)          β”‚
β”‚     ↓ (for any remaining empty fields)                          β”‚
β”‚  Tier 4: Regex Fallback (confidence: 0.5-0.8)                  β”‚
β”‚     ↓                                                           β”‚
β”‚  Tier 5: Validation & Confidence Aggregation                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Benefits:

  • Field-level confidence scoring β€” Each field tracks confidence + source
  • Critical field validation β€” property_address, year_built, gross_living_area, appraised_value, contract_price, effective_date require >= 0.90 confidence
  • Automatic review flagging β€” needs_review: true when confidence thresholds not met
  • Graceful degradation β€” Falls back through tiers if services unavailable

Stage 3: Semantic Chunking with Overlap

File: chunk_text.py

Splits narrative text into retrieval units with semantic overlap.

Non-Table Elements
   β”‚
   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Chunking Parameters:               β”‚
β”‚  β€’ Target: 400 tokens               β”‚
β”‚  β€’ Overlap: 80 tokens               β”‚
β”‚  β€’ Hard max: 700 tokens             β”‚
β”‚  β€’ Tokenizer: cl100k_base (GPT-4)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚
   β–Ό
Chunks with provenance metadata

Why 80-token overlap?

Without overlap:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Chunk 1          β”‚ β”‚ Chunk 2          β”‚
β”‚ "...property     β”‚ β”‚ includes assets  β”‚
β”‚ under Section"   β”‚ β”‚ classified as... β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β–²                    β–²
        └── Boundary loss β”€β”€β”€β”˜
            "Section 1245 property includes..."
            is split and context is lost

With 80-token overlap:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Chunk 1                  β”‚
β”‚ "...property under       β”‚
β”‚ Section 1245 includes    │◄── Overlap
β”‚ assets classified..."    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚ Chunk 2                  β”‚
     Overlap β–Ίβ”‚ "Section 1245 includes   β”‚
              β”‚ assets classified as     β”‚
              β”‚ tangible personal..."    β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Both chunks contain the full context.

Chunk Output:

{
  "chunk_id": "DOC_2024_chunk_15",
  "type": "text",
  "text": "Section 1245 property includes tangible personal property...",
  "page_span": [12, 12],
  "element_ids": ["DOC_2024_p12_e3", "DOC_2024_p12_e4"],
  "section_path": ["How To Depreciate Property", "Section 1245"],
  "token_count": 387
}

Stage 4: BM25 Index (Lexical Search)

File: build_bm25.py

Builds lexical index with custom tokenization for exact code matching.

Chunks
   β”‚
   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Custom Tokenizer (not whitespace!) β”‚
β”‚                                     β”‚
β”‚  "Β§1245 property"                   β”‚
β”‚       β–Ό                             β”‚
β”‚  ["Β§1245", "1245", "property"]      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚
   β–Ό
BM25Okapi Index (bm25.pkl)

Why custom tokenization matters:

Standard tokenizers break regulatory codes:

Standard Tokenizer Custom Tokenizer
["Β§", "1245"] ❌ ["Β§1245", "1245"] βœ“
["168", "(", "e", ")"] ❌ ["168(e)(3)", "168"] βœ“
["57", ".", "0"] ❌ ["57.0", "57"] βœ“

Tokenizer patterns (tokenizers.py):

Pattern Example Tokens Generated
Section symbols Β§1245 ["Β§1245", "1245"]
Parenthetical refs 168(e)(3)(B) ["168(e)(3)(b)", "168"]
Decimal codes 57.0, 00.11 ["57.0", "57"]
Mixed references Section 179(d) ["section", "179(d)", "179"]
>>> irs_tokenize("Β§1245 property depreciation")
['Β§1245', '1245', 'property', 'depreciation']

>>> irs_tokenize("Asset class 57.0 under Section 168(e)(3)")
['asset', 'class', '57.0', '57', 'section', '168(e)(3)', '168']

This ensures queries for "1245" match documents containing "Β§1245" or "Section 1245". The same pattern applies to any domain with structured identifiers (hazard IDs, ASIL levels, requirement codes).


Stage 5: FAISS Index (Semantic Search)

File: build_faiss.py

Builds vector index for semantic similarity.

Chunks
   β”‚
   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Sentence Transformer               β”‚
β”‚  Model: all-MiniLM-L6-v2            β”‚
β”‚  Dimensions: 384                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚
   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  FAISS Index                        β”‚
β”‚  β€’ L2 distance metric               β”‚
β”‚  β€’ Metadata mapping (chunk_id)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚
   β–Ό
faiss.idx + metadata.json

When to use semantic search:

  • Conceptual queries: "What qualifies for accelerated depreciation?"
  • Paraphrased questions: "equipment that wears out quickly"
  • Related concepts: "tangible personal property" β†’ finds "Section 1245"

Pipeline Output Summary

After ingestion, each document produces:

data/{corpus}/{doc_id}/
β”œβ”€β”€ layout/
β”‚   └── elements.json      # Raw parsed elements with position
β”œβ”€β”€ structured/
β”‚   └── tables.json        # Complete tables (never chunked)
β”œβ”€β”€ retrieval/
β”‚   └── chunks.json        # Text chunks with overlap + provenance
└── indexes/
    β”œβ”€β”€ bm25/
    β”‚   └── index.pkl      # Lexical search index
    └── vector/
        β”œβ”€β”€ faiss.idx      # Semantic search index
        └── metadata.json  # Chunk ID mapping

2) Evidence Layer (Hybrid RAG)

Location: backEnd/evidence_layer/src/retrieval.py

Combines lexical and semantic search with score normalization.

Retrieval Flow:

Query
  β”‚
  β”œβ”€β”€β–Ί BM25 Search ──► Normalized Scores ──┐
  β”‚    (exact codes)                       β”‚
  β”‚                                        β”œβ”€β”€β–Ί Score Fusion ──► Deduplicate ──► Results
  β”‚                                        β”‚
  └──► Vector Search ─► Normalized Scores β”€β”˜
       (semantic)

API:

# BM25 for exact codes/IDs
results = bm25_search("IRS_PUB946_2024", "1245", top_k=5)

# Vector for semantic queries
results = vector_search("IRS_PUB946_2024", "equipment depreciation", top_k=5)

# Hybrid (recommended) - configurable BM25 weight
results = hybrid_search("IRS_PUB946_2024", "tangible personal property", top_k=5, bm25_weight=0.5)

Key Features:

  • Automatic score normalization before fusion
  • Deduplication of overlapping results
  • Table expansion: when surrogate chunks match, full table returned
  • Supports both "reference" corpus (shared docs) and "study" corpus (per-case docs)

3) Agentic RAG β€” LLM-Driven Retrieval

Location: backEnd/agentic/

Agentic RAG solves a critical problem: context window saturation.

When documents are large or interrelated, naive RAG retrieves too much context, saturating the LLM's context window and degrading response quality. The solution is agent-based selective retrievalβ€”the agent plans what evidence is needed, retrieves selectively, and verifies sufficiency before generating.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      AGENTIC RAG vs NAIVE RAG                               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  NAIVE RAG:                          AGENTIC RAG:                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                            β”‚
β”‚  β”‚  Query  β”‚                         β”‚  Query  β”‚                            β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜                         β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜                            β”‚
β”‚       β”‚                                   β”‚                                 β”‚
β”‚       β–Ό                                   β–Ό                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚
β”‚  β”‚ Retrieveβ”‚                         β”‚ Agent Plans │◄── "What evidence      β”‚
β”‚  β”‚ top-k   β”‚                         β”‚ what to get β”‚    do I need?"         β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜                         β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                        β”‚
β”‚       β”‚                                     β”‚                               β”‚
β”‚       β”‚ (may retrieve                       β–Ό                               β”‚
β”‚       β”‚  too much or                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚
β”‚       β”‚  wrong docs)                 β”‚ Tool Router │◄── BM25 vs Vector      β”‚
β”‚       β”‚                              β”‚             β”‚    vs Structured       β”‚
β”‚       β–Ό                              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                β”‚                               β”‚
β”‚  β”‚ Generateβ”‚                                β–Ό                               β”‚
β”‚  β”‚ (hope   β”‚                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚
β”‚  β”‚ it fits)β”‚                         β”‚ Selective   │◄── Only what's needed  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                         β”‚ Retrieval   β”‚                        β”‚
β”‚                                      β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                        β”‚
β”‚       ❌ Context                            β”‚                               β”‚
β”‚          saturation                         β–Ό                               β”‚
β”‚                                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚
β”‚                                      β”‚ Verify      │◄── "Is this enough?"   β”‚
β”‚                                      β”‚ Sufficiency β”‚    If not, retrieve    β”‚
β”‚                                      β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜    more                β”‚
β”‚                                             β”‚                               β”‚
β”‚                                             β–Ό                               β”‚
β”‚                                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚
β”‚                                      β”‚ Generate    β”‚                        β”‚
β”‚                                      β”‚ with        β”‚                        β”‚
β”‚                                      β”‚ citations   β”‚                        β”‚
β”‚                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β”‚
β”‚                                                                             β”‚
β”‚                                        βœ“ Selective retrieval                β”‚
β”‚                                        βœ“ Fits context window                β”‚
β”‚                                        βœ“ Grounded in evidence               β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why Agentic? (The Context Saturation Problem)

Mosi's observation: Safety documents have standardized headers but messy context sections. When you retrieve naively, you pull in entire documents or too many chunks, saturating the context window.

The agentic solution:

  1. Agent plans first β€” Before retrieving, the agent analyzes the query and decides what evidence is needed
  2. Tool routing β€” Agent chooses the right retrieval method (BM25 for exact IDs, vector for concepts, structured for tables)
  3. Selective retrieval β€” Only pulls what's necessary, not top-k everything
  4. Verification loop β€” Checks if evidence is sufficient; if not, retrieves more targeted chunks
  5. Grounded generation β€” Only claims what the evidence supports

Workflow State Machine (Simplified 3-Pause Architecture)

The workflow has been optimized to have exactly 3 engineer checkpoints matching the frontend UI:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SIMPLIFIED WORKFLOW (3 PAUSE POINTS)                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  load_study                                                                 β”‚
β”‚       β”‚                                                                     β”‚
β”‚       β–Ό                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚                      analyze_rooms_node                             β”‚    β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚    β”‚
β”‚  β”‚  β”‚ 1. Vision Analysis (PARALLEL - 10 concurrent)               β”‚    β”‚    β”‚
β”‚  β”‚  β”‚    All images analyzed simultaneously with GPT-4o Vision    β”‚    β”‚    β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚    β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚    β”‚
β”‚  β”‚  β”‚ 2. Room Enrichment (PARALLEL - 10 concurrent)               β”‚    β”‚    β”‚
β”‚  β”‚  β”‚    All rooms enriched with IRS context simultaneously       β”‚    β”‚    β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                   β”‚                                         β”‚
β”‚                                   β–Ό                                         β”‚
β”‚  ╔═════════════════════════════════════════════════════════════════════╗    β”‚
β”‚  β•‘  ⏸️ PAUSE #1: resource_extraction                                   β•‘    β”‚
β”‚  β•‘  Engineer reviews: Appraisal data + detected rooms                  β•‘    β”‚
β”‚  β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•    β”‚
β”‚                                   β”‚ (engineer approves)                     β”‚
β”‚                                   β–Ό                                         β”‚
β”‚  ╔═════════════════════════════════════════════════════════════════════╗    β”‚
β”‚  β•‘  ⏸️ PAUSE #2: reviewing_rooms                                       β•‘    β”‚
β”‚  β•‘  Engineer reviews: Room classifications + IRS context               β•‘    β”‚
β”‚  β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•    β”‚
β”‚                                   β”‚ (engineer approves)                     β”‚
β”‚                                   β–Ό                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚                      process_assets_node                            β”‚    β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚    β”‚
β”‚  β”‚  β”‚ 1. Object Enrichment (PARALLEL - 20 concurrent)             β”‚    β”‚    β”‚
β”‚  β”‚  β”‚    All objects enriched with IRS context simultaneously     β”‚    β”‚    β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚    β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚    β”‚
β”‚  β”‚  β”‚ 2. Takeoffs + Classification (CROSS-PHASE PARALLEL!)        β”‚    β”‚    β”‚
β”‚  β”‚  β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚    β”‚    β”‚
β”‚  β”‚  β”‚    β”‚ Takeoff Calc (Γ—10) β”‚  β”‚ IRS Classify (Γ—20) β”‚           β”‚    β”‚    β”‚
β”‚  β”‚  β”‚    β”‚ RSMeans lookup     β”‚  β”‚ Asset classes      β”‚           β”‚    β”‚    β”‚
β”‚  β”‚  β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚    β”‚    β”‚
β”‚  β”‚  β”‚    Both run simultaneously via asyncio.gather()             β”‚    β”‚    β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚    β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚    β”‚
β”‚  β”‚  β”‚ 3. Cost Estimation (PARALLEL - 10 concurrent)               β”‚    β”‚    β”‚
β”‚  β”‚  β”‚    All costs estimated simultaneously with RSMeans          β”‚    β”‚    β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                   β”‚                                         β”‚
β”‚                                   β–Ό                                         β”‚
β”‚  ╔═════════════════════════════════════════════════════════════════════╗    β”‚
β”‚  β•‘  ⏸️ PAUSE #3: engineering_takeoff                                   β•‘    β”‚
β”‚  β•‘  Engineer reviews: Objects, takeoffs, classifications, costs        β•‘    β”‚
β”‚  β•‘  (Tabbed UI showing all asset data with citations)                  β•‘    β”‚
β”‚  β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•    β”‚
β”‚                                   β”‚ (engineer approves)                     β”‚
β”‚                                   β–Ό                                         β”‚
β”‚                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                β”‚
β”‚                          β”‚   completed     β”‚                                β”‚
β”‚                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Frontend WorkflowStatus values:

uploading_documents β†’ analyzing_rooms β†’ resource_extraction β†’ reviewing_rooms β†’ engineering_takeoff β†’ completed

Key Design: Only 3 engineer checkpoints (not 5-6), matching the frontend UI. The process_assets_node combines objects, takeoffs, classification, and costs into a single processing phase with no pauses betweenβ€”engineers review all asset data together on one page.


Agent Architecture

Each agent follows a plan β†’ retrieve β†’ verify β†’ generate pattern:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          AGENT EXECUTION FLOW                               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  Input: Component to classify (e.g., "hardwood flooring in living room")    β”‚
β”‚                                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ STEP 1: PLAN                                                         β”‚   β”‚
β”‚  β”‚                                                                      β”‚   β”‚
β”‚  β”‚ Agent thinks: "I need to find:                                       β”‚   β”‚
β”‚  β”‚   1. IRS classification for flooring                                 β”‚   β”‚
β”‚  β”‚   2. Whether hardwood is personal or real property                   β”‚   β”‚
β”‚  β”‚   3. Applicable recovery period"                                     β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                              β”‚                                              β”‚
β”‚                              β–Ό                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ STEP 2: TOOL ROUTING                                                 β”‚   β”‚
β”‚  β”‚                                                                      β”‚   β”‚
β”‚  β”‚ Agent decides:                                                       β”‚   β”‚
β”‚  β”‚   β€’ "flooring" β†’ vector_search (semantic concept)                    β”‚   β”‚
β”‚  β”‚   β€’ "1245 vs 1250" β†’ bm25_search (exact IRS sections)                β”‚   β”‚
β”‚  β”‚   β€’ "recovery period table" β†’ get_table (structured data)            β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                              β”‚                                              β”‚
β”‚                              β–Ό                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ STEP 3: SELECTIVE RETRIEVAL                                          β”‚   β”‚
β”‚  β”‚                                                                      β”‚   β”‚
β”‚  β”‚ Agent calls tools:                                                   β”‚   β”‚
β”‚  β”‚   β†’ hybrid_search("flooring depreciation residential")               β”‚   β”‚
β”‚  β”‚   β†’ bm25_search("1245")                                              β”‚   β”‚
β”‚  β”‚   β†’ get_table("MACRS_recovery_periods")                              β”‚   β”‚
β”‚  β”‚                                                                      β”‚   β”‚
β”‚  β”‚ Returns: 3 relevant chunks + 1 table (not 50 chunks)                 β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                              β”‚                                              β”‚
β”‚                              β–Ό                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ STEP 4: VERIFY SUFFICIENCY                                           β”‚   β”‚
β”‚  β”‚                                                                      β”‚   β”‚
β”‚  β”‚ Agent checks: "Do I have enough evidence to classify?"               β”‚   β”‚
β”‚  β”‚   β€’ If YES β†’ proceed to generation                                   β”‚   β”‚
β”‚  β”‚   β€’ If NO β†’ retrieve more specific chunks                            β”‚   β”‚
β”‚  β”‚   β€’ If AMBIGUOUS β†’ flag needs_review=true                            β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                              β”‚                                              β”‚
β”‚                              β–Ό                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ STEP 5: GROUNDED GENERATION                                          β”‚   β”‚
β”‚  β”‚                                                                      β”‚   β”‚
β”‚  β”‚ Agent generates classification WITH citations:                       β”‚   β”‚
β”‚  β”‚   "Hardwood flooring is Section 1245 property (5-year recovery)      β”‚   β”‚
β”‚  β”‚    per IRS Pub 946, page 42, because..."                             β”‚   β”‚
β”‚  β”‚                                                                      β”‚   β”‚
β”‚  β”‚ "No evidence, no claim" β€” won't classify without source              β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Specialized Agents

Agent Purpose Tools Used Evidence Source
Room Agent Enriches vision outputs with space context hybrid_search Domain guidelines
Asset Agent MACRS classification with IRS citations bm25_search, vector_search, get_table IRS Pub 946, ATG
Takeoff Agent Measurement extraction with confidence hybrid_search, get_chunk Property appraisals
Cost Agent RSMeans cost code mapping hybrid_search, get_table RSMeans databases

"No Evidence, No Claim" Enforcement

The Asset Agent system prompt explicitly requires evidence before classification:

CRITICAL INSTRUCTION:
You MUST search for evidence before making any classification.
- Call hybrid_search() or bm25_search() BEFORE outputting a classification
- Every classification MUST include citation_refs with chunk_ids
- If you cannot find supporting documentation, output:
    needs_review: true
    reason: "insufficient_evidence"
- NEVER guess or rely on training dataβ€”only cite retrieved documents

Agent Output Schema

Every agent produces structured output with provenance:

{
  "asset_classification": {
    "bucket": "5-year",
    "life_years": 5,
    "section": "1245",
    "asset_class": "57.0",
    "macrs_system": "GDS",
    "irs_note": "Carpeting in residential rental property is Section 1245 property..."
  },
  "citations": [
    {"chunk_id": "IRS_PUB946_2024_chunk_42", "page": 15, "text": "Section 1245 property includes..."},
    {"chunk_id": "IRS_ATG_2024_chunk_88", "page": 34, "text": "Floor coverings are typically..."}
  ],
  "confidence": 0.92,
  "needs_review": false,
  "reasoning": "Found explicit IRS guidance classifying floor coverings as 1245 property..."
}

Checkpointing & Observability

Persistent State:

  • Production: FirestoreCheckpointer β€” workflow state survives server restarts
  • Development: MemorySaver β€” in-memory for fast iteration
  • Thread-based resumption for long-running workflows

LangSmith Integration:

Every agent execution is traced in LangSmith:

  • Tool calls with inputs/outputs
  • LLM prompts and completions
  • Latency and token usage
  • Error tracking
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    LANGSMITH TRACE                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Asset Agent Run                                                 β”‚
β”‚ β”œβ”€β”€ hybrid_search("flooring depreciation") β†’ 3 chunks          β”‚
β”‚ β”œβ”€β”€ bm25_search("1245") β†’ 2 chunks                             β”‚
β”‚ β”œβ”€β”€ get_table("MACRS_periods") β†’ 1 table                       β”‚
β”‚ β”œβ”€β”€ LLM: classify with evidence                                β”‚
β”‚ └── Output: { bucket: "5-year", citations: [...] }             β”‚
β”‚                                                                 β”‚
β”‚ Total tokens: 2,847  |  Latency: 3.2s  |  Status: Success      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

LangSmith Dashboard:

LangSmith Trace View


4) Multi-Agent Appraisal Extraction

Location: backEnd/agentic/agents/appraisal/

The appraisal extraction system uses a 3-agent LangGraph StateGraph for intelligent extraction with self-correction:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  APPRAISAL EXTRACTION LANGGRAPH                                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                   β”‚
β”‚  β”‚    EXTRACTOR AGENT       β”‚                                   β”‚
β”‚  β”‚    "Extract intelligently β”‚                                   β”‚
β”‚  β”‚     using available tools"β”‚                                   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                   β”‚
β”‚              β”‚                                                  β”‚
β”‚              β–Ό                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                   β”‚
β”‚  β”‚    VERIFIER AGENT        β”‚                                   β”‚
β”‚  β”‚    "Be skeptical. Find   β”‚                                   β”‚
β”‚  β”‚     errors. Question     β”‚                                   β”‚
β”‚  β”‚     everything."         β”‚                                   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                   β”‚
β”‚              β”‚                                                  β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                       β”‚
β”‚   β”‚          β”‚          β”‚                                       β”‚
β”‚   β–Ό          β–Ό          β–Ό                                       β”‚
β”‚ all_good  needs_corr  max_iter                                  β”‚
β”‚   β”‚          β”‚          β”‚                                       β”‚
β”‚   β–Ό          β–Ό          β–Ό                                       β”‚
β”‚ [END]  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  [END]                                      β”‚
β”‚        β”‚CORRECTOR β”‚                                             β”‚
β”‚        β”‚  AGENT   β”‚                                             β”‚
β”‚        β”‚"Fix usingβ”‚                                             β”‚
β”‚        β”‚ DIFFERENTβ”‚                                             β”‚
β”‚        β”‚ method"  β”‚                                             β”‚
β”‚        β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                                             β”‚
β”‚             β”‚                                                   β”‚
β”‚             └──────► back to verifier (max 2 iterations)        β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why Multi-Agent? (Agentic Tool Use)

Unlike the Agentic RAG pattern used by AssetAgent and RoomAgent (which focus on retrieval), appraisal extraction uses Agentic Tool Useβ€”a multi-agent system where agents reason about which extraction tools to invoke:

Agent Role Tools
ExtractorAgent "Extract appraisal data intelligently" parse_mismo_xml (FREE), extract_with_azure_di (PAID), extract_with_vision (EXPENSIVE)
VerifierAgent "Be skeptical. Find errors. Question everything." validate_extraction (FREE), vision_recheck_field (PAID)
CorrectorAgent "Fix flagged errors using DIFFERENT method" Same as Extractor, but MUST use different tool than original

Cost-Aware Tool Selection:

Extraction Strategy (minimize cost):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  1. MISMO XML Parser (FREE) β€” if XML uploaded               β”‚
β”‚     ↓ (if unavailable)                                      β”‚
β”‚  2. Azure Document Intelligence ($0.10-0.50)                β”‚
β”‚     ↓ (for stubborn fields with confidence < 0.85)          β”‚
β”‚  3. GPT-4o Vision Fallback ($0.10-0.20)                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Verification Checks:

  • Plausibility: year_built 1800-2026, GLA 500-15000 sq ft
  • OCR errors: 0↔O, 1↔I, digit transposition detection
  • Consistency: GLA vs bedrooms, contract vs appraised value
  • Confidence: Critical fields < 0.90 flagged for review

Audit Trail (IRS Defensibility):

Every extraction produces a complete audit trail for compliance:

{
  "study_id": "STUDY_001",
  "iterations": 1,
  "final_confidence": 0.92,
  "agent_calls": [
    {"agent_name": "ExtractorAgent", "tools_used": ["extract_with_azure_di"]},
    {"agent_name": "VerifierAgent", "tools_used": ["validate_extraction"]}
  ],
  "field_history": [
    {"field_key": "improvements.year_built", "action": "extracted", "value": "I995", "source": "azure_di"},
    {"field_key": "improvements.year_built", "action": "flagged", "notes": "OCR error: 'I' vs '1'"},
    {"field_key": "improvements.year_built", "action": "corrected", "value": 1995, "source": "vision_recheck"}
  ]
}

5) Vision Layer β€” Detection-First Image Processing

Location: backEnd/vision_layer/

The vision layer processes property images using a detection-first approach that reduces VLM hallucinations.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      VISION PIPELINE β€” DETECTION FIRST                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  STAGE 1 │───►│  STAGE 2 │───►│  STAGE 3 │───►│  STAGE 4 │───►│STAGE 5  β”‚ β”‚
β”‚  β”‚  Detect  β”‚    β”‚  Segment β”‚    β”‚   Crop   β”‚    β”‚ Classify β”‚    β”‚ Verify  β”‚ β”‚
β”‚  β”‚  Objects β”‚    β”‚  Regions β”‚    β”‚  Regions β”‚    β”‚   VLM    β”‚    β”‚Groundingβ”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚       β”‚               β”‚               β”‚               β”‚              β”‚       β”‚
β”‚       β–Ό               β–Ό               β–Ό               β–Ό              β–Ό       β”‚
β”‚  Grounding       SAM 2           Cropped         Material      Validated     β”‚
β”‚  DINO 1.5        Masks           Images          Attrs         Artifacts     β”‚
β”‚                                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why Detection-First? (Reducing VLM Hallucinations)

The Problem: VLMs (Vision Language Models) hallucinate when given full images. They "see" objects that aren't there or misclassify materials.

The Solution: Detect objects first, then classify only the cropped regions.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                VLM-ONLY vs DETECTION-FIRST                                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  VLM-ONLY (hallucination-prone):     DETECTION-FIRST (grounded):            β”‚
β”‚                                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚  β”‚  Full Image     β”‚                 β”‚  Full Image     β”‚                    β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β” β”Œβ”€β”€β”€β”    β”‚                 β”‚  β”Œβ”€β”€β”€β” β”Œβ”€β”€β”€β”    β”‚                    β”‚
β”‚  β”‚  β”‚   β”‚ β”‚   β”‚    β”‚                 β”‚  β”‚ A β”‚ β”‚ B β”‚    │◄── Detect objects  β”‚
β”‚  β”‚  β””β”€β”€β”€β”˜ β””β”€β”€β”€β”˜    β”‚                 β”‚  β””β”€β”€β”€β”˜ β””β”€β”€β”€β”˜    β”‚    with bboxes     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚           β”‚                                   β”‚                             β”‚
β”‚           β–Ό                                   β–Ό                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚  β”‚ "I see a marble β”‚                 β”‚  Crop region A  β”‚                    β”‚
β”‚  β”‚  countertop,    β”‚                 β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚                    β”‚
β”‚  β”‚  granite floor, │◄── May be       β”‚  β”‚  [A only] β”‚  │◄── Send crop       β”‚
β”‚  β”‚  stainless steelβ”‚    wrong!       β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚    to VLM          β”‚
β”‚  β”‚  appliances..." β”‚                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                          β”‚                             β”‚
β”‚                                               β–Ό                             β”‚
β”‚                                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚                                      β”‚ VLM classifies  β”‚                    β”‚
β”‚                                      β”‚ ONLY the crop:  β”‚                    β”‚
β”‚                                      β”‚ "wood_veneer,   │◄── Focused         β”‚
β”‚                                      β”‚  built_in,      β”‚    classification  β”‚
β”‚                                      β”‚  good_condition"β”‚                    β”‚
β”‚                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚                                               β”‚                             β”‚
β”‚                                               β–Ό                             β”‚
β”‚                                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚                                      β”‚ Verify: Does    β”‚                    β”‚
β”‚                                      β”‚ VLM output match│◄── Grounding       β”‚
β”‚                                      β”‚ detection label?β”‚    verification    β”‚
β”‚                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚                                                                             β”‚
β”‚       ❌ Hallucinates objects              βœ“ Grounded in detections         β”‚
β”‚       ❌ No provenance                     βœ“ Full audit trail               β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Stage 1: Object Detection (Grounding DINO 1.5 Pro)

File: api_clients/grounding_dino.py

Open-vocabulary object detection via Replicate API.

Property Image
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Grounding DINO 1.5 Pro             β”‚
β”‚                                     β”‚
β”‚  Prompt: "cabinet, countertop,      β”‚
β”‚           flooring, appliance,      β”‚
β”‚           lighting fixture..."      β”‚
β”‚                                     β”‚
β”‚  Confidence threshold: 0.3          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Detections:                        β”‚
β”‚  [                                  β”‚
β”‚    { label: "cabinet",              β”‚
β”‚      bbox: [100, 200, 400, 500],    β”‚
β”‚      confidence: 0.92 },            β”‚
β”‚    { label: "countertop",           β”‚
β”‚      bbox: [150, 50, 600, 200],     β”‚
β”‚      confidence: 0.87 },            β”‚
β”‚    ...                              β”‚
β”‚  ]                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Features:

  • Open-vocabulary: detects any object described in prompt
  • Returns bounding boxes with confidence scores
  • Retry logic with exponential backoff

Stage 2: Segmentation (SAM 2)

File: api_clients/sam2.py

Precise segmentation masks for detected regions.

Detection bboxes
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  SAM 2 (Segment Anything Model 2)   β”‚
β”‚                                     β”‚
β”‚  Input: bbox coordinates            β”‚
β”‚  Output: Precise polygon mask       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚
      β–Ό
Refined masks with exact boundaries

Purpose: Refines rough bounding boxes into precise object boundaries. Optional stageβ€”can be skipped for speed.


Stage 3: Region Cropping

File: pipeline/cropper.py

Extracts and pads regions for VLM classification.

Detection + Mask
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Region Cropper                     β”‚
β”‚                                     β”‚
β”‚  β€’ Crop around bbox                 β”‚
β”‚  β€’ Add 20% padding for context      β”‚
β”‚  β€’ Save crop for audit trail        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚
      β–Ό
Cropped image (just the object + context)

Why crop?

  • VLM focuses on single object, not entire scene
  • Reduces hallucination from other objects in image
  • Smaller image = faster inference

Stage 4: VLM Classification (GPT-4o Vision)

File: api_clients/vlm.py

Material and attribute classification on cropped regions.

Cropped Image
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  GPT-4o Vision                      β”‚
β”‚                                     β”‚
β”‚  Prompt: "Classify this object:     β”‚
β”‚    - material (wood, metal, etc.)   β”‚
β”‚    - condition (good/fair/poor)     β”‚
β”‚    - attachment (built-in/portable) β”‚
β”‚    - dimensions if visible"         β”‚
β”‚                                     β”‚
β”‚  Output: Structured JSON            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚
      β–Ό
{
  "material": "wood_veneer",
  "condition": "good",
  "attachment_type": "built_in",
  "color": "natural_oak",
  "estimated_dimensions": "36in x 24in"
}

LLM Provider:

  • Azure OpenAI (primary - enterprise deployment)
    • GPT-4.1: Best results for complex reasoning and classification
    • GPT-4.1 nano: Most efficient for high-volume tasks
    • This combo provides optimal cost/performance ratio

Stage 5: Grounding Verification

Cross-reference VLM claims against detection labels.

VLM Output + Detection Label
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Grounding Verifier                 β”‚
β”‚                                     β”‚
β”‚  Detection label: "cabinet"         β”‚
β”‚  VLM classification: "wood_veneer   β”‚
β”‚                       cabinet"      β”‚
β”‚                                     β”‚
β”‚  Match? βœ“ YES                       β”‚
β”‚  β†’ verified: true                   β”‚
β”‚                                     β”‚
β”‚  If mismatch:                       β”‚
β”‚  β†’ needs_review: true               β”‚
β”‚  β†’ reason: "grounding_mismatch"     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Purpose: Catches VLM hallucinations where it classifies an object as something the detector didn't see.


Vision Artifact Output

Every processed object produces a complete artifact with provenance:

{
  "artifact_id": "va_abc123",
  "image_id": "photo_456",
  "detection": {
    "label": "cabinet",
    "confidence": 0.92,
    "bbox": {"x1": 100, "y1": 200, "x2": 400, "y2": 500},
    "model": "grounding_dino_1.5_pro"
  },
  "segmentation": {
    "mask_path": "masks/va_abc123.png",
    "model": "sam2"
  },
  "crop": {
    "crop_path": "crops/va_abc123.jpg",
    "padding": 0.2
  },
  "classification": {
    "material": "wood_veneer",
    "condition": "good",
    "attachment_type": "built_in",
    "cost_seg_relevant": true,
    "model": "gpt-4.1"
  },
  "provenance": {
    "detection_model": "grounding_dino_1.5_pro",
    "segmentation_model": "sam2",
    "vlm_model": "gpt-4.1",
    "verified": true,
    "grounding_match": true
  },
  "confidence": 0.89,
  "needs_review": false
}

Batch Processing

File: pipeline/ingest.py

Concurrent processing with configurable parallelism:

class VisionPipeline:
    async def process_batch(
        self,
        images: List[str],
        max_concurrent: int = 5
    ) -> List[VisionArtifact]:
        """
        Process multiple images concurrently.
        Uses semaphore to limit parallel API calls.
        """
        semaphore = asyncio.Semaphore(max_concurrent)
        tasks = [self._process_single(img, semaphore) for img in images]
        return await asyncio.gather(*tasks)

6) Tool Registry

Agents access evidence through standardized MCP tools:

Retrieval Tools:

Tool Purpose
bm25_search Exact token matching (codes, IDs, standard references)
vector_search Semantic similarity for conceptual queries
hybrid_search Combined search with score fusion
get_chunk Fetch chunk by ID with full provenance
get_table Fetch structured table (never chunked)

Tool Implementation (example):

@tool
def hybrid_search(
    doc_id: str,
    query: str,
    top_k: int = 5,
    bm25_weight: float = 0.5
) -> List[SearchResult]:
    """
    Combined BM25 + vector search with score normalization.
    Returns chunks with provenance (page_span, section_path, element_ids).
    """
    bm25_results = bm25_search(doc_id, query, top_k * 2)
    vector_results = vector_search(doc_id, query, top_k * 2)

    # Normalize and fuse scores
    fused = fuse_scores(bm25_results, vector_results, bm25_weight)

    # Deduplicate and expand tables
    return dedupe_and_expand(fused, top_k)

Tech Stack

Backend

Component Technology
Framework FastAPI
Workflow Orchestration LangGraph 0.2+
LLM Azure OpenAI
Document Intelligence Azure Document Intelligence (appraisal extraction)
PDF Parsing pdfplumber, PyMuPDF
Vector Store FAISS
Lexical Search rank-bm25
Embeddings sentence-transformers (all-MiniLM-L6-v2)
Observability LangSmith (tracing)
Database Firebase Firestore
Storage Firebase Storage, GCS

Frontend

Component Technology
Framework Next.js 14 (App Router)
Language TypeScript
Styling TailwindCSS
Auth/DB Firebase

Infrastructure

  • Hosting: Firebase App Hosting, Google Cloud Run
  • Containers: Docker
  • State Persistence: Firestore checkpointer for workflow state

Engineer-in-the-Loop Workflow

Every module follows the same contract:

  1. Frontend triggers module with { studyId }
  2. Backend fetches the required data from Firestore/Storage
  3. Backend runs AI/ML
  4. Backend writes results back to Firestore
  5. Frontend renders results
  6. Engineer reviews + corrects
  7. Engineer manually advances to the next stage

This is the core design principle that keeps deliverables defensible.


User Workflow (High Level)

  1. πŸ“ Create New Study

    • Engineer enters property name
    • Selects files to upload (photos, PDFs, appraisals)
    • Clicks Start Analysis
  2. ⬆️ Upload Documents

    • Files upload to Firebase Storage
    • Progress tracked in UI
  3. πŸ“„ Appraisal Processing

    • Ingest PDF using same pipeline as IRS docs (parse β†’ chunk β†’ index)
    • Extract tables with structure preserved (headers, rows, page)
    • Tiered extraction with confidence scoring:
      • Tier 1: MISMO XML (if uploaded) - 100% accurate
      • Tier 2: Azure Document Intelligence - 70-95% confidence
      • Tier 3: GPT-4o Vision fallback - 60-90% confidence
      • Tier 4: Regex fallback - 50-80% confidence
    • Map URAR tables to frontend sections (subject, neighborhood, site, improvements, etc.)
    • Create property constraints (GLA, bedrooms, room counts, etc.)
    • Auto-flag fields with needs_review: true when confidence < 0.90
    • ⏸️ Engineer reviews + corrects
  4. 🏠 Room Classification

    • Scene + material + object context
    • Groups photos into predicted rooms
    • ⏸️ Engineer reviews + corrects
  5. πŸ” Object Classification

    • Detects components from photos
    • Enriches with room context + metadata
    • ⏸️ Engineer reviews + corrects
  6. πŸ“ Engineering Takeoffs

    • Calculates measurements
    • ⏸️ Engineer reviews + corrects
  7. πŸ’° Asset Classification

    • IRS-grounded classification
    • ⏸️ Engineer reviews + corrects
  8. 🧾 Cost Classification

    • Maps components to integrated cost databases
    • ⏸️ Engineer reviews + corrects
  9. βœ… Complete Study

    • Export package generated for firm templates

Current Application: Cost Segregation

The architecture is currently deployed for cost segregationβ€”accelerating tax depreciation analysis for commercial real estate.

Domain-Specific Implementation:

  • Reference Corpus: IRS Pub 946, Pub 527, Cost Seg ATG, Rev Proc 87-56, RSMeans databases
  • Exact-Match Queries: Asset class codes (e.g., "57.0"), IRS sections (e.g., "Β§1245")
  • Semantic Queries: "What property qualifies for 5-year depreciation?"
  • Traceability: Every classification cites specific IRS publication pages
  • Vision Processing: Detection-first pipeline for property photos (see Vision Layer)

Traction & Validation

This isn't a proof-of-conceptβ€”it's a deployed product with paying customers.

Customers:

  • CSSI (top-5 cost segregation firm) β€” paying user
  • CBIZ β€” paying user
  • Design partners at multiple top-5 firms have validated 50%+ time savings on analysis workflows

Awards:

LavaLab Fall 2025 β€” Best Traction

Basis Team Holding Check


NVIDIA Applicability: Automotive Functional Safety Project

The Basis architecture directly addresses the document intelligence challenges in ISO 26262 workflows.

The Problem:

Functional safety teams work with large document setsβ€”HARA baselines, safety goals, TSRs, verification evidenceβ€”that share a common structure:

  • Standardized headers (hazard IDs, ASIL classifications, requirement codes)
  • Messy context sections (rationale, assumptions, linked evidence)
  • Strict traceability requirements (every claim must cite source documents)

Querying these documents with traditional RAG fails: vector search hallucinates on exact IDs, keyword search misses semantic relationships, and LLMs can't process hundreds of pages in context.

Architecture Mapping:

Basis Component Functional Safety Application
Custom BM25 Tokenization Preserve HAZ-001, TSR-042, ASIL-D, ISO 26262-6:2018 Β§7.4.3 as atomic tokens
Tables Never Chunked FMEA tables, DFA matrices, traceability matrices stay intact
80-Token Overlap Safety goal rationale spanning paragraphs isn't split
Hybrid Search Exact ID lookup + semantic "what evidence supports this safety goal?"
Surrogate β†’ Full Table Search hits "FMEA row for HAZ-001" β†’ returns complete FMEA with all columns
Citation Enforcement "No evidence, no claim" β€” every classification cites specific document + page
Human-in-the-Loop Engineer reviews before any safety decision is finalized

Example Queries This Architecture Handles:

Exact ID lookup (BM25):
  "TSR-042" β†’ finds all chunks referencing TSR-042

Semantic search (FAISS):
  "verification evidence for braking system hazards" β†’ finds related test reports

Hybrid (recommended):
  "ASIL-D requirements for sensor fusion" β†’ exact ASIL match + semantic relevance

Table fetch:
  Search hits FMEA surrogate β†’ get_table() returns full FMEA with hazard, severity, exposure, controllability

Tokenizer Adaptation:

The custom tokenizer pattern extends directly to safety document codes:

IRS Pattern Safety Pattern Tokenizer Handles
Β§1245 HAZ-001 Prefix + number preserved
168(e)(3) ISO 26262-6:2018 Β§7.4.3 Nested references preserved
57.0 ASIL-D Alphanumeric codes preserved
Rev Proc 87-56 TSR-042-REV-A Multi-part identifiers preserved

What Would Change for Safety Documents:

  1. Tokenizer regex β€” add patterns for HAZ-\d+, TSR-\d+, ASIL-[A-D], ISO clause refs
  2. Reference corpus β€” ingest ISO 26262 parts, internal HARA baselines, verification templates
  3. Agent prompts β€” swap IRS classification logic for safety goal verification logic
  4. Structured store β€” FMEA tables, DFA matrices instead of depreciation tables

The pipeline, retrieval, and agentic architecture remain identical.


Accuracy, Safety & Defensibility

Basis is designed for engineering-grade output, not generic AI chat.

We ensure accuracy through:

  • Detection-first vision processing β€” Grounding DINO detects objects before VLM classifies, reducing hallucinations
  • Evidence-backed reasoning β€” Every agent output cites documents with chunk IDs and page numbers
  • Grounding verification β€” VLM claims are cross-referenced against detections using IoU thresholds
  • Human-in-the-loop checkpoints β€” Engineers review and approve at every workflow stage
  • Confidence scoring + needs_review flags β€” Uncertain outputs are flagged for engineer attention
  • Full provenance tracking β€” Every artifact traces back to source image, detection, and model
  • "No evidence, no claim" enforcement β€” Agents cannot classify without citing retrieved evidence

Data Handling

  • Customer artifacts are stored encrypted in Firebase Storage.
  • Study data is stored in Firestore with role-based access.
  • Vision pipelines can be isolated for sensitive drawings and photos.
  • Use Enterprise API's for LLMs to prevent data being stored for training.

Why Not Just Use ChatGPT?

Cost segregation is not a single "upload a PDF" problem.

Engineers often work with hundreds of photos and mixed documents per study, with strict IRS expectations for classification and auditability.

Basis is a three-layer agentic system that:

  • Detects before classifying β€” Grounding DINO + SAM 2 detect objects before GPT-4o classifies, eliminating VLM hallucinations
  • Cites every classification β€” Asset classifications include IRS document citations with page numbers, not just model training data
  • Preserves full provenance β€” Every artifact traces back to source image, detection, crop, and model response
  • Stage-gates everything β€” Engineers review and approve before any workflow advances
  • Uses actual IRS documents β€” Hybrid BM25 + vector retrieval over ingested IRS publications, not model knowledge cutoff
  • Solves context saturation β€” Agentic retrieval selects only relevant evidence instead of dumping everything into context

System Architecture (Full)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         ENGINEER UI                              β”‚
β”‚  β€’ Review checkpoints at every workflow stage                    β”‚
β”‚  β€’ Citation verification                                         β”‚
β”‚  β€’ Correction interface                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      NEXT.JS FRONTEND                            β”‚
β”‚  β€’ Typed UI state + workflow gating                              β”‚
β”‚  β€’ Firebase Auth + role-aware access                             β”‚
β”‚  β€’ Real-time Firestore listeners                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     FASTAPI BACKEND                              β”‚
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  VISION LAYER   β”‚  β”‚ EVIDENCE LAYER  β”‚  β”‚  AGENTIC LAYER  β”‚   β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€   β”‚
β”‚  β”‚ Grounding DINO  β”‚  β”‚ PDF Parsing     β”‚  β”‚ LangGraph       β”‚   β”‚
β”‚  β”‚ SAM 2           β”‚  β”‚ Table Extract   β”‚  β”‚ Workflow Engine β”‚   β”‚
β”‚  β”‚ GPT-4o Vision   β”‚  β”‚ Text Chunking   β”‚  β”‚                 β”‚   β”‚
β”‚  β”‚ Region Cropper  β”‚  β”‚ BM25 Index      β”‚  β”‚ Room Agent      β”‚   β”‚
β”‚  β”‚ Grounding       β”‚  β”‚ FAISS Index     β”‚  β”‚ Asset Agent     β”‚   β”‚
β”‚  β”‚ Verifier        β”‚  β”‚ Hybrid Search   β”‚  β”‚ Takeoff Agent   β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚ Cost Agent      β”‚   β”‚
β”‚           β”‚                    β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚           β”‚                    └─────────────────────            β”‚
β”‚           └──────────────────────────────────────────            β”‚
β”‚                                                     β”‚            β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”     β”‚
β”‚                    β”‚        MCP TOOL REGISTRY              β”‚     β”‚
β”‚                    β”‚  β€’ bm25_search    β€’ vector_search     β”‚     β”‚
β”‚                    β”‚  β€’ hybrid_search  β€’ get_table         β”‚     β”‚
β”‚                    β”‚  β€’ get_chunk      β€’ vision_detect     β”‚     β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                                                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     FIREBASE DATA LAYER                          β”‚
β”‚  β€’ Storage: documents, images, exports                           β”‚
β”‚  β€’ Firestore: studies, classifications, audit trails             β”‚
β”‚  β€’ Auth: role-based access                                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

About

Basis demonstrates that document intelligence problems share common architectural requirements:

  1. Hybrid retrieval for documents with both exact codes and narrative context
  2. Custom tokenization that preserves domain-specific identifiers (not naive whitespace splitting)
  3. Agentic orchestration for multi-step reasoning with tool routing
  4. Human-in-the-loop checkpoints for auditability and defensibility
  5. Citation-first outputs linking every claim to source evidence

The same pipeline that queries IRS depreciation tables can query HARA baselines, safety goals, TSRs, or verification evidenceβ€”because the architectural pattern is the same:

IRS Domain Safety Domain
Β§1245, 168(e)(3) HAZ-001, TSR-042
Asset class 57.0 ASIL-B, ASIL-D
IRS Pub 946 citations ISO 26262 clause refs
Depreciation tables FMEA tables, DFA matrices

Standardized headers, messy context, need for traceability.


About

Basis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors