Cost segregation shouldn't take weeks. Basis gets engineers 80% of the way thereβfast, guided, and defensible.
Powered by:
- Agentic RAG β LLM-driven retrieval where agents decide when, what, and how to search
- Multi-Agent Self-Correction β Extraction β Verification β Correction loops with audit trails
- Detection-First Vision β Grounding DINO + SAM2 + GPT-4o for hallucination-free image analysis
- What is Basis?
- Why Cost Seg?
- The Problem
- The Solution
- Demo Video
- Current Project Overview
- The Problem: Document Intelligence at Scale
- The Solution: Agentic RAG + Multi-Agent Workflow
- Architecture Deep Dive
- Tech Stack
- Engineer-in-the-Loop Workflow
- User Workflow (High Level)
- Current Application: Cost Segregation
- Traction & Validation
- NVIDIA Applicability: Automotive Functional Safety Project
- Accuracy, Safety & Defensibility
- Data Handling
- Why Not Just Use ChatGPT?
- Getting Started (Dev)
- About
Basis is an AI-assisted platform for residential-focused cost segregation firms that accelerates the most time-consuming part of the study:
analyzing hundreds of photos, sketches, and appraisal documents to produce an IRS-ready report.
Basis is not a "one-click study generator." It's a human-in-the-loop, agentic workflow powered by three core systems:
- Vision Layer β Detection-first image processing that reduces VLM hallucinations through grounded detection
- Evidence Layer β PDF ingestion pipeline with hybrid BM25 + vector retrieval for IRS-grounded reasoning
- Agentic Workflow β LangGraph-orchestrated multi-agent system with stage-gated engineer review checkpoints
This architecture walks the engineer through every decision before anything becomes client-facing.
$1M That's what you might spend to buy a house. That upfront spend can create tax savings as the property depreciates over 27.5 years.
But 27.5 years is a long time to wait.
Cost segregation helps owners accelerate depreciation and unlock meaningful savings earlier. In the U.S., there are 5,000+ businesses conducting thousands of studies per yearβwhich makes the workflow opportunity massive.
A cost segregation study typically follows three steps:
- Document the property
- Analyze the documentation
- Generate the report
The bottleneck is step 2.
Our interviews revealed that this analysis phase:
- Requires engineers to comb through hundreds of photos, drawings, and appraisals
- Can take 2β3 weeks to complete
- Can cost >$1,200 in labor per study
- Can leave >$1,000 in savings on the table due to missed or inconsistently documented components
Enter Basis.
Engineers upload the property artifacts they already use today. Basis:
- Organizes documents and imagery
- Classifies rooms, materials, and objects
- Guides engineers through review checkpoints
- Surfaces the exact references needed for takeoffs and tax classification (so engineers aren't hunting across hundreds of pages)
Result: faster studies, fewer errors, lower cost to serve.
A short walkthrough showing how Basis guides engineers through appraisal constraints, room/object classification, takeoffs, and IRS-grounded asset decisions.
-
Objective: Reduce cost seg analysis time by automating repetitive classification and retrieval tasks while preserving engineer-led accuracy and auditability.
-
Core Features:
- Study creation + structured upload
- Appraisal-to-constraints extraction
- Room classification with scene + object context
- Object/component detection with metadata enrichment
- Engineer review checkpoints at every stage
- Engineering takeoffs assistance
- Asset classification with IRS-grounded RAG
- Cost classification hooks for integrated cost databases
- Export-ready outputs for existing firm templates
Many industries require AI-assisted workflows for querying large document setsβregulatory publications, technical standards, safety baselinesβthat share a common challenge:
Standardized headers, messy context.
These documents contain critical structured data (IDs, codes, classifications, tables) embedded in unstructured narrative text. Traditional approaches fail because:
- Pure keyword search misses semantic relationships
- Pure vector search hallucinates on exact codes and IDs
- Context windows can't hold hundreds of pages
- LLM-only approaches lack auditability and traceability
Basis implements a three-layer architecture designed for document intelligence problems:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AGENTIC LAYER (LangGraph) β
β β’ Multi-agent orchestration with stage-gated checkpoints β
β β’ Tool routing based on query intent β
β β’ "No evidence, no claim" enforcement β
β β’ Human-in-the-loop verification at every stage β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
β EVIDENCE LAYER (Hybrid RAG) β
β β’ BM25 for exact-term matches (codes, IDs, classifications) β
β β’ FAISS vector search for semantic similarity β
β β’ Score fusion + deduplication β
β β’ Tables stored intact (never chunked) β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
β OFFLINE PIPELINE β
β β’ Layout-aware PDF parsing (pdfplumber) β
β β’ Table extraction β structured JSON β
β β’ Semantic chunking with 80-token overlap β
β β’ Dual indexing (BM25 + FAISS) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This architecture is domain-agnostic. The current implementation targets cost segregation (IRS tax documents), but the same pipeline handles any document corpus with structured codes and unstructured context.
Location: backEnd/evidence_layer/
Transforms raw PDFs into retrieval-ready indexes through a 5-stage pipeline:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PDF INGESTION PIPELINE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββ β
β β STAGE 1 βββββΊβ STAGE 2 βββββΊβ STAGE 3 βββββΊβ STAGE 4 βββββΊβSTAGE 5 β β
β β Parse β β Extract β β Chunk β β Build β β Build β β
β β Layout β β Tables β β Text β β BM25 β β FAISS β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββ β
β β β β β β β
β βΌ βΌ βΌ βΌ βΌ β
β layout/ structured/ retrieval/ indexes/ indexes/ β
β elements.json tables.json chunks.json bm25.pkl faiss.idx β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
File: parse_pdf.py
Extracts text with positional metadata using pdfplumber + PyMuPDF.
Raw PDF
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β For each page: β
β β’ Extract text with bbox coords β
β β’ Detect font size + boldness β
β β’ Classify element type β
β β’ Preserve reading order β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ
Layout Elements (with position + type)
Element Classification:
| Type | Detection Method | Example |
|---|---|---|
title |
Large font + bold | "Chapter 4: MACRS" |
heading |
Medium font + bold | "Section 1245 Property" |
paragraph |
Regular text blocks | Narrative content |
list_item |
Numbered/bulleted | "1. Tangible property..." |
table |
Grid structure detected | Routed to Stage 2 |
Output: layout/elements.json β every text block with page, bbox, font, type.
File: extract_tables.py
Critical design decision: Tables are NEVER chunked. They're stored as structured JSON and fetched whole.
Layout Elements
β
βββ Table detected? ββYESβββΊ Extract as structured JSON
β Store in structured/tables.json
β Create surrogate chunk for search
β
βββ Not a table βββββββββββΊ Pass to Stage 3
Why tables stay intact:
- Chunking tables destroys row/column relationships
- LLMs hallucinate when given partial table data
- Agents fetch full table by
table_idwhen surrogate matches
Table Storage Format:
{
"table_id": "DOC_2024_table_3",
"page": 15,
"caption": "Table B-1. Asset Classes",
"headers": ["Asset Class", "Description", "Recovery Period"],
"rows": [
["57.0", "Distributive Trades", "5 years"],
["00.11", "Office Furniture", "7 years"]
],
"markdown": "| Asset Class | Description | Recovery Period |\n|---|---|---|\n| 57.0 | ..."
}Surrogate Chunk (for search):
{
"chunk_id": "DOC_2024_table_3_surrogate",
"type": "table_surrogate",
"text": "Table B-1. Asset Classes: 57.0 Distributive Trades 5 years, 00.11 Office Furniture 7 years...",
"table_id": "DOC_2024_table_3"
}When search hits the surrogate β agent calls get_table(table_id) β returns full structured table.
URAR Appraisal Mapping:
For appraisal documents, extracted tables are additionally mapped to URAR (Uniform Residential Appraisal Report) sections:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β APPRAISAL TABLE β SECTION MAPPING β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Extracted Tables (.tables.jsonl) β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββ β
β β map_appraisal_sections.py β β
β β β β
β β 1. Identify section by keywords β β
β β + page position (URAR layout) β β
β β β β
β β 2. Map table rows β section fields β β
β β (subject, neighborhood, etc.) β β
β β β β
β β 3. Fallback to regex extraction β β
β β for missing values β β
β βββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β Frontend-ready sections: β
β β’ subject (address, borrower, lender) β
β β’ listing_and_contract (price, DOM, sale type) β
β β’ neighborhood (location, growth, values) β
β β’ site (dimensions, zoning, utilities) β
β β’ improvements (foundation, rooms, year built) β
β β’ sales_comparison (comps grid) β
β β’ cost_approach (site value, depreciation) β
β β’ reconciliation (final value opinion) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This mapping uses the same high-quality table extraction from ingestionβno additional parsing or GPT calls required.
Production Enhancement: Tiered Extraction
For production deployments, appraisal extraction uses a multi-tier approach with confidence scoring:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TIERED EXTRACTION β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Tier 1: MISMO XML Parser (confidence: 1.0) β
β β (if unavailable) β
β Tier 2: Azure Document Intelligence (confidence: 0.7-0.95) β
β β (for fields with confidence < 0.85) β
β Tier 3: GPT-4o Vision Fallback (confidence: 0.6-0.9) β
β β (for any remaining empty fields) β
β Tier 4: Regex Fallback (confidence: 0.5-0.8) β
β β β
β Tier 5: Validation & Confidence Aggregation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Benefits:
- Field-level confidence scoring β Each field tracks confidence + source
- Critical field validation β
property_address,year_built,gross_living_area,appraised_value,contract_price,effective_daterequire >= 0.90 confidence - Automatic review flagging β
needs_review: truewhen confidence thresholds not met - Graceful degradation β Falls back through tiers if services unavailable
File: chunk_text.py
Splits narrative text into retrieval units with semantic overlap.
Non-Table Elements
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Chunking Parameters: β
β β’ Target: 400 tokens β
β β’ Overlap: 80 tokens β
β β’ Hard max: 700 tokens β
β β’ Tokenizer: cl100k_base (GPT-4) β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ
Chunks with provenance metadata
Why 80-token overlap?
Without overlap:
ββββββββββββββββββββ ββββββββββββββββββββ
β Chunk 1 β β Chunk 2 β
β "...property β β includes assets β
β under Section" β β classified as... β
ββββββββββββββββββββ ββββββββββββββββββββ
β² β²
βββ Boundary loss ββββ
"Section 1245 property includes..."
is split and context is lost
With 80-token overlap:
ββββββββββββββββββββββββββββ
β Chunk 1 β
β "...property under β
β Section 1245 includes ββββ Overlap
β assets classified..." β
ββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββ
β Chunk 2 β
Overlap βΊβ "Section 1245 includes β
β assets classified as β
β tangible personal..." β
ββββββββββββββββββββββββββββ
Both chunks contain the full context.
Chunk Output:
{
"chunk_id": "DOC_2024_chunk_15",
"type": "text",
"text": "Section 1245 property includes tangible personal property...",
"page_span": [12, 12],
"element_ids": ["DOC_2024_p12_e3", "DOC_2024_p12_e4"],
"section_path": ["How To Depreciate Property", "Section 1245"],
"token_count": 387
}File: build_bm25.py
Builds lexical index with custom tokenization for exact code matching.
Chunks
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Custom Tokenizer (not whitespace!) β
β β
β "Β§1245 property" β
β βΌ β
β ["Β§1245", "1245", "property"] β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ
BM25Okapi Index (bm25.pkl)
Why custom tokenization matters:
Standard tokenizers break regulatory codes:
| Standard Tokenizer | Custom Tokenizer |
|---|---|
["Β§", "1245"] β |
["Β§1245", "1245"] β |
["168", "(", "e", ")"] β |
["168(e)(3)", "168"] β |
["57", ".", "0"] β |
["57.0", "57"] β |
Tokenizer patterns (tokenizers.py):
| Pattern | Example | Tokens Generated |
|---|---|---|
| Section symbols | Β§1245 |
["Β§1245", "1245"] |
| Parenthetical refs | 168(e)(3)(B) |
["168(e)(3)(b)", "168"] |
| Decimal codes | 57.0, 00.11 |
["57.0", "57"] |
| Mixed references | Section 179(d) |
["section", "179(d)", "179"] |
>>> irs_tokenize("Β§1245 property depreciation")
['Β§1245', '1245', 'property', 'depreciation']
>>> irs_tokenize("Asset class 57.0 under Section 168(e)(3)")
['asset', 'class', '57.0', '57', 'section', '168(e)(3)', '168']This ensures queries for "1245" match documents containing "Β§1245" or "Section 1245". The same pattern applies to any domain with structured identifiers (hazard IDs, ASIL levels, requirement codes).
File: build_faiss.py
Builds vector index for semantic similarity.
Chunks
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Sentence Transformer β
β Model: all-MiniLM-L6-v2 β
β Dimensions: 384 β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β FAISS Index β
β β’ L2 distance metric β
β β’ Metadata mapping (chunk_id) β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ
faiss.idx + metadata.json
When to use semantic search:
- Conceptual queries: "What qualifies for accelerated depreciation?"
- Paraphrased questions: "equipment that wears out quickly"
- Related concepts: "tangible personal property" β finds "Section 1245"
After ingestion, each document produces:
data/{corpus}/{doc_id}/
βββ layout/
β βββ elements.json # Raw parsed elements with position
βββ structured/
β βββ tables.json # Complete tables (never chunked)
βββ retrieval/
β βββ chunks.json # Text chunks with overlap + provenance
βββ indexes/
βββ bm25/
β βββ index.pkl # Lexical search index
βββ vector/
βββ faiss.idx # Semantic search index
βββ metadata.json # Chunk ID mapping
Location: backEnd/evidence_layer/src/retrieval.py
Combines lexical and semantic search with score normalization.
Retrieval Flow:
Query
β
ββββΊ BM25 Search βββΊ Normalized Scores βββ
β (exact codes) β
β ββββΊ Score Fusion βββΊ Deduplicate βββΊ Results
β β
ββββΊ Vector Search ββΊ Normalized Scores ββ
(semantic)
API:
# BM25 for exact codes/IDs
results = bm25_search("IRS_PUB946_2024", "1245", top_k=5)
# Vector for semantic queries
results = vector_search("IRS_PUB946_2024", "equipment depreciation", top_k=5)
# Hybrid (recommended) - configurable BM25 weight
results = hybrid_search("IRS_PUB946_2024", "tangible personal property", top_k=5, bm25_weight=0.5)Key Features:
- Automatic score normalization before fusion
- Deduplication of overlapping results
- Table expansion: when surrogate chunks match, full table returned
- Supports both "reference" corpus (shared docs) and "study" corpus (per-case docs)
Location: backEnd/agentic/
Agentic RAG solves a critical problem: context window saturation.
When documents are large or interrelated, naive RAG retrieves too much context, saturating the LLM's context window and degrading response quality. The solution is agent-based selective retrievalβthe agent plans what evidence is needed, retrieves selectively, and verifies sufficiency before generating.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AGENTIC RAG vs NAIVE RAG β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β NAIVE RAG: AGENTIC RAG: β
β βββββββββββ βββββββββββ β
β β Query β β Query β β
β ββββββ¬βββββ ββββββ¬βββββ β
β β β β
β βΌ βΌ β
β βββββββββββ βββββββββββββββ β
β β Retrieveβ β Agent Plans ββββ "What evidence β
β β top-k β β what to get β do I need?" β
β ββββββ¬βββββ ββββββββ¬βββββββ β
β β β β
β β (may retrieve βΌ β
β β too much or βββββββββββββββ β
β β wrong docs) β Tool Router ββββ BM25 vs Vector β
β β β β vs Structured β
β βΌ ββββββββ¬βββββββ β
β βββββββββββ β β
β β Generateβ βΌ β
β β (hope β βββββββββββββββ β
β β it fits)β β Selective ββββ Only what's needed β
β βββββββββββ β Retrieval β β
β ββββββββ¬βββββββ β
β β Context β β
β saturation βΌ β
β βββββββββββββββ β
β β Verify ββββ "Is this enough?" β
β β Sufficiency β If not, retrieve β
β ββββββββ¬βββββββ more β
β β β
β βΌ β
β βββββββββββββββ β
β β Generate β β
β β with β β
β β citations β β
β βββββββββββββββ β
β β
β β Selective retrieval β
β β Fits context window β
β β Grounded in evidence β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Mosi's observation: Safety documents have standardized headers but messy context sections. When you retrieve naively, you pull in entire documents or too many chunks, saturating the context window.
The agentic solution:
- Agent plans first β Before retrieving, the agent analyzes the query and decides what evidence is needed
- Tool routing β Agent chooses the right retrieval method (BM25 for exact IDs, vector for concepts, structured for tables)
- Selective retrieval β Only pulls what's necessary, not top-k everything
- Verification loop β Checks if evidence is sufficient; if not, retrieves more targeted chunks
- Grounded generation β Only claims what the evidence supports
The workflow has been optimized to have exactly 3 engineer checkpoints matching the frontend UI:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SIMPLIFIED WORKFLOW (3 PAUSE POINTS) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β load_study β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β analyze_rooms_node β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β 1. Vision Analysis (PARALLEL - 10 concurrent) β β β
β β β All images analyzed simultaneously with GPT-4o Vision β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β 2. Room Enrichment (PARALLEL - 10 concurrent) β β β
β β β All rooms enriched with IRS context simultaneously β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β βΈοΈ PAUSE #1: resource_extraction β β
β β Engineer reviews: Appraisal data + detected rooms β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β (engineer approves) β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β βΈοΈ PAUSE #2: reviewing_rooms β β
β β Engineer reviews: Room classifications + IRS context β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β (engineer approves) β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β process_assets_node β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β 1. Object Enrichment (PARALLEL - 20 concurrent) β β β
β β β All objects enriched with IRS context simultaneously β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β 2. Takeoffs + Classification (CROSS-PHASE PARALLEL!) β β β
β β β ββββββββββββββββββββββ ββββββββββββββββββββββ β β β
β β β β Takeoff Calc (Γ10) β β IRS Classify (Γ20) β β β β
β β β β RSMeans lookup β β Asset classes β β β β
β β β ββββββββββββββββββββββ ββββββββββββββββββββββ β β β
β β β Both run simultaneously via asyncio.gather() β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β 3. Cost Estimation (PARALLEL - 10 concurrent) β β β
β β β All costs estimated simultaneously with RSMeans β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β βΈοΈ PAUSE #3: engineering_takeoff β β
β β Engineer reviews: Objects, takeoffs, classifications, costs β β
β β (Tabbed UI showing all asset data with citations) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β (engineer approves) β
β βΌ β
β βββββββββββββββββββ β
β β completed β β
β βββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Frontend WorkflowStatus values:
uploading_documents β analyzing_rooms β resource_extraction β reviewing_rooms β engineering_takeoff β completed
Key Design: Only 3 engineer checkpoints (not 5-6), matching the frontend UI. The process_assets_node combines objects, takeoffs, classification, and costs into a single processing phase with no pauses betweenβengineers review all asset data together on one page.
Each agent follows a plan β retrieve β verify β generate pattern:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AGENT EXECUTION FLOW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Input: Component to classify (e.g., "hardwood flooring in living room") β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β STEP 1: PLAN β β
β β β β
β β Agent thinks: "I need to find: β β
β β 1. IRS classification for flooring β β
β β 2. Whether hardwood is personal or real property β β
β β 3. Applicable recovery period" β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β STEP 2: TOOL ROUTING β β
β β β β
β β Agent decides: β β
β β β’ "flooring" β vector_search (semantic concept) β β
β β β’ "1245 vs 1250" β bm25_search (exact IRS sections) β β
β β β’ "recovery period table" β get_table (structured data) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β STEP 3: SELECTIVE RETRIEVAL β β
β β β β
β β Agent calls tools: β β
β β β hybrid_search("flooring depreciation residential") β β
β β β bm25_search("1245") β β
β β β get_table("MACRS_recovery_periods") β β
β β β β
β β Returns: 3 relevant chunks + 1 table (not 50 chunks) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β STEP 4: VERIFY SUFFICIENCY β β
β β β β
β β Agent checks: "Do I have enough evidence to classify?" β β
β β β’ If YES β proceed to generation β β
β β β’ If NO β retrieve more specific chunks β β
β β β’ If AMBIGUOUS β flag needs_review=true β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β STEP 5: GROUNDED GENERATION β β
β β β β
β β Agent generates classification WITH citations: β β
β β "Hardwood flooring is Section 1245 property (5-year recovery) β β
β β per IRS Pub 946, page 42, because..." β β
β β β β
β β "No evidence, no claim" β won't classify without source β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Agent | Purpose | Tools Used | Evidence Source |
|---|---|---|---|
| Room Agent | Enriches vision outputs with space context | hybrid_search |
Domain guidelines |
| Asset Agent | MACRS classification with IRS citations | bm25_search, vector_search, get_table |
IRS Pub 946, ATG |
| Takeoff Agent | Measurement extraction with confidence | hybrid_search, get_chunk |
Property appraisals |
| Cost Agent | RSMeans cost code mapping | hybrid_search, get_table |
RSMeans databases |
The Asset Agent system prompt explicitly requires evidence before classification:
CRITICAL INSTRUCTION:
You MUST search for evidence before making any classification.
- Call hybrid_search() or bm25_search() BEFORE outputting a classification
- Every classification MUST include citation_refs with chunk_ids
- If you cannot find supporting documentation, output:
needs_review: true
reason: "insufficient_evidence"
- NEVER guess or rely on training dataβonly cite retrieved documents
Every agent produces structured output with provenance:
{
"asset_classification": {
"bucket": "5-year",
"life_years": 5,
"section": "1245",
"asset_class": "57.0",
"macrs_system": "GDS",
"irs_note": "Carpeting in residential rental property is Section 1245 property..."
},
"citations": [
{"chunk_id": "IRS_PUB946_2024_chunk_42", "page": 15, "text": "Section 1245 property includes..."},
{"chunk_id": "IRS_ATG_2024_chunk_88", "page": 34, "text": "Floor coverings are typically..."}
],
"confidence": 0.92,
"needs_review": false,
"reasoning": "Found explicit IRS guidance classifying floor coverings as 1245 property..."
}Persistent State:
- Production:
FirestoreCheckpointerβ workflow state survives server restarts - Development:
MemorySaverβ in-memory for fast iteration - Thread-based resumption for long-running workflows
LangSmith Integration:
Every agent execution is traced in LangSmith:
- Tool calls with inputs/outputs
- LLM prompts and completions
- Latency and token usage
- Error tracking
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LANGSMITH TRACE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Asset Agent Run β
β βββ hybrid_search("flooring depreciation") β 3 chunks β
β βββ bm25_search("1245") β 2 chunks β
β βββ get_table("MACRS_periods") β 1 table β
β βββ LLM: classify with evidence β
β βββ Output: { bucket: "5-year", citations: [...] } β
β β
β Total tokens: 2,847 | Latency: 3.2s | Status: Success β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
LangSmith Dashboard:
Location: backEnd/agentic/agents/appraisal/
The appraisal extraction system uses a 3-agent LangGraph StateGraph for intelligent extraction with self-correction:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β APPRAISAL EXTRACTION LANGGRAPH β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββββββββββ β
β β EXTRACTOR AGENT β β
β β "Extract intelligently β β
β β using available tools"β β
β βββββββββββββ¬βββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββ β
β β VERIFIER AGENT β β
β β "Be skeptical. Find β β
β β errors. Question β β
β β everything." β β
β βββββββββββββ¬βββββββββββββββ β
β β β
β ββββββββββββΌβββββββββββ β
β β β β β
β βΌ βΌ βΌ β
β all_good needs_corr max_iter β
β β β β β
β βΌ βΌ βΌ β
β [END] ββββββββββββ [END] β
β βCORRECTOR β β
β β AGENT β β
β β"Fix usingβ β
β β DIFFERENTβ β
β β method" β β
β ββββββ¬ββββββ β
β β β
β ββββββββΊ back to verifier (max 2 iterations) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Why Multi-Agent? (Agentic Tool Use)
Unlike the Agentic RAG pattern used by AssetAgent and RoomAgent (which focus on retrieval), appraisal extraction uses Agentic Tool Useβa multi-agent system where agents reason about which extraction tools to invoke:
| Agent | Role | Tools |
|---|---|---|
| ExtractorAgent | "Extract appraisal data intelligently" | parse_mismo_xml (FREE), extract_with_azure_di (PAID), extract_with_vision (EXPENSIVE) |
| VerifierAgent | "Be skeptical. Find errors. Question everything." | validate_extraction (FREE), vision_recheck_field (PAID) |
| CorrectorAgent | "Fix flagged errors using DIFFERENT method" | Same as Extractor, but MUST use different tool than original |
Cost-Aware Tool Selection:
Extraction Strategy (minimize cost):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. MISMO XML Parser (FREE) β if XML uploaded β
β β (if unavailable) β
β 2. Azure Document Intelligence ($0.10-0.50) β
β β (for stubborn fields with confidence < 0.85) β
β 3. GPT-4o Vision Fallback ($0.10-0.20) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Verification Checks:
- Plausibility: year_built 1800-2026, GLA 500-15000 sq ft
- OCR errors: 0βO, 1βI, digit transposition detection
- Consistency: GLA vs bedrooms, contract vs appraised value
- Confidence: Critical fields < 0.90 flagged for review
Audit Trail (IRS Defensibility):
Every extraction produces a complete audit trail for compliance:
{
"study_id": "STUDY_001",
"iterations": 1,
"final_confidence": 0.92,
"agent_calls": [
{"agent_name": "ExtractorAgent", "tools_used": ["extract_with_azure_di"]},
{"agent_name": "VerifierAgent", "tools_used": ["validate_extraction"]}
],
"field_history": [
{"field_key": "improvements.year_built", "action": "extracted", "value": "I995", "source": "azure_di"},
{"field_key": "improvements.year_built", "action": "flagged", "notes": "OCR error: 'I' vs '1'"},
{"field_key": "improvements.year_built", "action": "corrected", "value": 1995, "source": "vision_recheck"}
]
}Location: backEnd/vision_layer/
The vision layer processes property images using a detection-first approach that reduces VLM hallucinations.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β VISION PIPELINE β DETECTION FIRST β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ βββββββββββ β
β β STAGE 1 βββββΊβ STAGE 2 βββββΊβ STAGE 3 βββββΊβ STAGE 4 βββββΊβSTAGE 5 β β
β β Detect β β Segment β β Crop β β Classify β β Verify β β
β β Objects β β Regions β β Regions β β VLM β βGroundingβ β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ βββββββββββ β
β β β β β β β
β βΌ βΌ βΌ βΌ βΌ β
β Grounding SAM 2 Cropped Material Validated β
β DINO 1.5 Masks Images Attrs Artifacts β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The Problem: VLMs (Vision Language Models) hallucinate when given full images. They "see" objects that aren't there or misclassify materials.
The Solution: Detect objects first, then classify only the cropped regions.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β VLM-ONLY vs DETECTION-FIRST β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β VLM-ONLY (hallucination-prone): DETECTION-FIRST (grounded): β
β β
β βββββββββββββββββββ βββββββββββββββββββ β
β β Full Image β β Full Image β β
β β βββββ βββββ β β βββββ βββββ β β
β β β β β β β β β A β β B β ββββ Detect objects β
β β βββββ βββββ β β βββββ βββββ β with bboxes β
β ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ β
β β β β
β βΌ βΌ β
β βββββββββββββββββββ βββββββββββββββββββ β
β β "I see a marble β β Crop region A β β
β β countertop, β β βββββββββββββ β β
β β granite floor, ββββ May be β β [A only] β ββββ Send crop β
β β stainless steelβ wrong! β βββββββββββββ β to VLM β
β β appliances..." β ββββββββββ¬βββββββββ β
β βββββββββββββββββββ β β
β βΌ β
β βββββββββββββββββββ β
β β VLM classifies β β
β β ONLY the crop: β β
β β "wood_veneer, ββββ Focused β
β β built_in, β classification β
β β good_condition"β β
β ββββββββββ¬βββββββββ β
β β β
β βΌ β
β βββββββββββββββββββ β
β β Verify: Does β β
β β VLM output matchββββ Grounding β
β β detection label?β verification β
β βββββββββββββββββββ β
β β
β β Hallucinates objects β Grounded in detections β
β β No provenance β Full audit trail β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
File: api_clients/grounding_dino.py
Open-vocabulary object detection via Replicate API.
Property Image
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Grounding DINO 1.5 Pro β
β β
β Prompt: "cabinet, countertop, β
β flooring, appliance, β
β lighting fixture..." β
β β
β Confidence threshold: 0.3 β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Detections: β
β [ β
β { label: "cabinet", β
β bbox: [100, 200, 400, 500], β
β confidence: 0.92 }, β
β { label: "countertop", β
β bbox: [150, 50, 600, 200], β
β confidence: 0.87 }, β
β ... β
β ] β
βββββββββββββββββββββββββββββββββββββββ
Key Features:
- Open-vocabulary: detects any object described in prompt
- Returns bounding boxes with confidence scores
- Retry logic with exponential backoff
File: api_clients/sam2.py
Precise segmentation masks for detected regions.
Detection bboxes
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β SAM 2 (Segment Anything Model 2) β
β β
β Input: bbox coordinates β
β Output: Precise polygon mask β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ
Refined masks with exact boundaries
Purpose: Refines rough bounding boxes into precise object boundaries. Optional stageβcan be skipped for speed.
File: pipeline/cropper.py
Extracts and pads regions for VLM classification.
Detection + Mask
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Region Cropper β
β β
β β’ Crop around bbox β
β β’ Add 20% padding for context β
β β’ Save crop for audit trail β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ
Cropped image (just the object + context)
Why crop?
- VLM focuses on single object, not entire scene
- Reduces hallucination from other objects in image
- Smaller image = faster inference
File: api_clients/vlm.py
Material and attribute classification on cropped regions.
Cropped Image
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β GPT-4o Vision β
β β
β Prompt: "Classify this object: β
β - material (wood, metal, etc.) β
β - condition (good/fair/poor) β
β - attachment (built-in/portable) β
β - dimensions if visible" β
β β
β Output: Structured JSON β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ
{
"material": "wood_veneer",
"condition": "good",
"attachment_type": "built_in",
"color": "natural_oak",
"estimated_dimensions": "36in x 24in"
}
LLM Provider:
- Azure OpenAI (primary - enterprise deployment)
- GPT-4.1: Best results for complex reasoning and classification
- GPT-4.1 nano: Most efficient for high-volume tasks
- This combo provides optimal cost/performance ratio
Cross-reference VLM claims against detection labels.
VLM Output + Detection Label
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Grounding Verifier β
β β
β Detection label: "cabinet" β
β VLM classification: "wood_veneer β
β cabinet" β
β β
β Match? β YES β
β β verified: true β
β β
β If mismatch: β
β β needs_review: true β
β β reason: "grounding_mismatch" β
βββββββββββββββββββββββββββββββββββββββ
Purpose: Catches VLM hallucinations where it classifies an object as something the detector didn't see.
Every processed object produces a complete artifact with provenance:
{
"artifact_id": "va_abc123",
"image_id": "photo_456",
"detection": {
"label": "cabinet",
"confidence": 0.92,
"bbox": {"x1": 100, "y1": 200, "x2": 400, "y2": 500},
"model": "grounding_dino_1.5_pro"
},
"segmentation": {
"mask_path": "masks/va_abc123.png",
"model": "sam2"
},
"crop": {
"crop_path": "crops/va_abc123.jpg",
"padding": 0.2
},
"classification": {
"material": "wood_veneer",
"condition": "good",
"attachment_type": "built_in",
"cost_seg_relevant": true,
"model": "gpt-4.1"
},
"provenance": {
"detection_model": "grounding_dino_1.5_pro",
"segmentation_model": "sam2",
"vlm_model": "gpt-4.1",
"verified": true,
"grounding_match": true
},
"confidence": 0.89,
"needs_review": false
}File: pipeline/ingest.py
Concurrent processing with configurable parallelism:
class VisionPipeline:
async def process_batch(
self,
images: List[str],
max_concurrent: int = 5
) -> List[VisionArtifact]:
"""
Process multiple images concurrently.
Uses semaphore to limit parallel API calls.
"""
semaphore = asyncio.Semaphore(max_concurrent)
tasks = [self._process_single(img, semaphore) for img in images]
return await asyncio.gather(*tasks)Agents access evidence through standardized MCP tools:
Retrieval Tools:
| Tool | Purpose |
|---|---|
bm25_search |
Exact token matching (codes, IDs, standard references) |
vector_search |
Semantic similarity for conceptual queries |
hybrid_search |
Combined search with score fusion |
get_chunk |
Fetch chunk by ID with full provenance |
get_table |
Fetch structured table (never chunked) |
Tool Implementation (example):
@tool
def hybrid_search(
doc_id: str,
query: str,
top_k: int = 5,
bm25_weight: float = 0.5
) -> List[SearchResult]:
"""
Combined BM25 + vector search with score normalization.
Returns chunks with provenance (page_span, section_path, element_ids).
"""
bm25_results = bm25_search(doc_id, query, top_k * 2)
vector_results = vector_search(doc_id, query, top_k * 2)
# Normalize and fuse scores
fused = fuse_scores(bm25_results, vector_results, bm25_weight)
# Deduplicate and expand tables
return dedupe_and_expand(fused, top_k)| Component | Technology |
|---|---|
| Framework | FastAPI |
| Workflow Orchestration | LangGraph 0.2+ |
| LLM | Azure OpenAI |
| Document Intelligence | Azure Document Intelligence (appraisal extraction) |
| PDF Parsing | pdfplumber, PyMuPDF |
| Vector Store | FAISS |
| Lexical Search | rank-bm25 |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) |
| Observability | LangSmith (tracing) |
| Database | Firebase Firestore |
| Storage | Firebase Storage, GCS |
| Component | Technology |
|---|---|
| Framework | Next.js 14 (App Router) |
| Language | TypeScript |
| Styling | TailwindCSS |
| Auth/DB | Firebase |
- Hosting: Firebase App Hosting, Google Cloud Run
- Containers: Docker
- State Persistence: Firestore checkpointer for workflow state
Every module follows the same contract:
- Frontend triggers module with
{ studyId } - Backend fetches the required data from Firestore/Storage
- Backend runs AI/ML
- Backend writes results back to Firestore
- Frontend renders results
- Engineer reviews + corrects
- Engineer manually advances to the next stage
This is the core design principle that keeps deliverables defensible.
-
π Create New Study
- Engineer enters property name
- Selects files to upload (photos, PDFs, appraisals)
- Clicks Start Analysis
-
β¬οΈ Upload Documents
- Files upload to Firebase Storage
- Progress tracked in UI
-
π Appraisal Processing
- Ingest PDF using same pipeline as IRS docs (parse β chunk β index)
- Extract tables with structure preserved (headers, rows, page)
- Tiered extraction with confidence scoring:
- Tier 1: MISMO XML (if uploaded) - 100% accurate
- Tier 2: Azure Document Intelligence - 70-95% confidence
- Tier 3: GPT-4o Vision fallback - 60-90% confidence
- Tier 4: Regex fallback - 50-80% confidence
- Map URAR tables to frontend sections (subject, neighborhood, site, improvements, etc.)
- Create property constraints (GLA, bedrooms, room counts, etc.)
- Auto-flag fields with
needs_review: truewhen confidence < 0.90 - βΈοΈ Engineer reviews + corrects
-
π Room Classification
- Scene + material + object context
- Groups photos into predicted rooms
- βΈοΈ Engineer reviews + corrects
-
π Object Classification
- Detects components from photos
- Enriches with room context + metadata
- βΈοΈ Engineer reviews + corrects
-
π Engineering Takeoffs
- Calculates measurements
- βΈοΈ Engineer reviews + corrects
-
π° Asset Classification
- IRS-grounded classification
- βΈοΈ Engineer reviews + corrects
-
π§Ύ Cost Classification
- Maps components to integrated cost databases
- βΈοΈ Engineer reviews + corrects
-
β Complete Study
- Export package generated for firm templates
The architecture is currently deployed for cost segregationβaccelerating tax depreciation analysis for commercial real estate.
Domain-Specific Implementation:
- Reference Corpus: IRS Pub 946, Pub 527, Cost Seg ATG, Rev Proc 87-56, RSMeans databases
- Exact-Match Queries: Asset class codes (e.g.,
"57.0"), IRS sections (e.g.,"Β§1245") - Semantic Queries: "What property qualifies for 5-year depreciation?"
- Traceability: Every classification cites specific IRS publication pages
- Vision Processing: Detection-first pipeline for property photos (see Vision Layer)
This isn't a proof-of-conceptβit's a deployed product with paying customers.
Customers:
- CSSI (top-5 cost segregation firm) β paying user
- CBIZ β paying user
- Design partners at multiple top-5 firms have validated 50%+ time savings on analysis workflows
Awards:
The Basis architecture directly addresses the document intelligence challenges in ISO 26262 workflows.
The Problem:
Functional safety teams work with large document setsβHARA baselines, safety goals, TSRs, verification evidenceβthat share a common structure:
- Standardized headers (hazard IDs, ASIL classifications, requirement codes)
- Messy context sections (rationale, assumptions, linked evidence)
- Strict traceability requirements (every claim must cite source documents)
Querying these documents with traditional RAG fails: vector search hallucinates on exact IDs, keyword search misses semantic relationships, and LLMs can't process hundreds of pages in context.
Architecture Mapping:
| Basis Component | Functional Safety Application |
|---|---|
| Custom BM25 Tokenization | Preserve HAZ-001, TSR-042, ASIL-D, ISO 26262-6:2018 Β§7.4.3 as atomic tokens |
| Tables Never Chunked | FMEA tables, DFA matrices, traceability matrices stay intact |
| 80-Token Overlap | Safety goal rationale spanning paragraphs isn't split |
| Hybrid Search | Exact ID lookup + semantic "what evidence supports this safety goal?" |
| Surrogate β Full Table | Search hits "FMEA row for HAZ-001" β returns complete FMEA with all columns |
| Citation Enforcement | "No evidence, no claim" β every classification cites specific document + page |
| Human-in-the-Loop | Engineer reviews before any safety decision is finalized |
Example Queries This Architecture Handles:
Exact ID lookup (BM25):
"TSR-042" β finds all chunks referencing TSR-042
Semantic search (FAISS):
"verification evidence for braking system hazards" β finds related test reports
Hybrid (recommended):
"ASIL-D requirements for sensor fusion" β exact ASIL match + semantic relevance
Table fetch:
Search hits FMEA surrogate β get_table() returns full FMEA with hazard, severity, exposure, controllability
Tokenizer Adaptation:
The custom tokenizer pattern extends directly to safety document codes:
| IRS Pattern | Safety Pattern | Tokenizer Handles |
|---|---|---|
Β§1245 |
HAZ-001 |
Prefix + number preserved |
168(e)(3) |
ISO 26262-6:2018 Β§7.4.3 |
Nested references preserved |
57.0 |
ASIL-D |
Alphanumeric codes preserved |
Rev Proc 87-56 |
TSR-042-REV-A |
Multi-part identifiers preserved |
What Would Change for Safety Documents:
- Tokenizer regex β add patterns for
HAZ-\d+,TSR-\d+,ASIL-[A-D], ISO clause refs - Reference corpus β ingest ISO 26262 parts, internal HARA baselines, verification templates
- Agent prompts β swap IRS classification logic for safety goal verification logic
- Structured store β FMEA tables, DFA matrices instead of depreciation tables
The pipeline, retrieval, and agentic architecture remain identical.
Basis is designed for engineering-grade output, not generic AI chat.
We ensure accuracy through:
- Detection-first vision processing β Grounding DINO detects objects before VLM classifies, reducing hallucinations
- Evidence-backed reasoning β Every agent output cites documents with chunk IDs and page numbers
- Grounding verification β VLM claims are cross-referenced against detections using IoU thresholds
- Human-in-the-loop checkpoints β Engineers review and approve at every workflow stage
- Confidence scoring + needs_review flags β Uncertain outputs are flagged for engineer attention
- Full provenance tracking β Every artifact traces back to source image, detection, and model
- "No evidence, no claim" enforcement β Agents cannot classify without citing retrieved evidence
- Customer artifacts are stored encrypted in Firebase Storage.
- Study data is stored in Firestore with role-based access.
- Vision pipelines can be isolated for sensitive drawings and photos.
- Use Enterprise API's for LLMs to prevent data being stored for training.
Cost segregation is not a single "upload a PDF" problem.
Engineers often work with hundreds of photos and mixed documents per study, with strict IRS expectations for classification and auditability.
Basis is a three-layer agentic system that:
- Detects before classifying β Grounding DINO + SAM 2 detect objects before GPT-4o classifies, eliminating VLM hallucinations
- Cites every classification β Asset classifications include IRS document citations with page numbers, not just model training data
- Preserves full provenance β Every artifact traces back to source image, detection, crop, and model response
- Stage-gates everything β Engineers review and approve before any workflow advances
- Uses actual IRS documents β Hybrid BM25 + vector retrieval over ingested IRS publications, not model knowledge cutoff
- Solves context saturation β Agentic retrieval selects only relevant evidence instead of dumping everything into context
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ENGINEER UI β
β β’ Review checkpoints at every workflow stage β
β β’ Citation verification β
β β’ Correction interface β
ββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ
β NEXT.JS FRONTEND β
β β’ Typed UI state + workflow gating β
β β’ Firebase Auth + role-aware access β
β β’ Real-time Firestore listeners β
ββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ
β FASTAPI BACKEND β
β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β VISION LAYER β β EVIDENCE LAYER β β AGENTIC LAYER β β
β βββββββββββββββββββ€ βββββββββββββββββββ€ βββββββββββββββββββ€ β
β β Grounding DINO β β PDF Parsing β β LangGraph β β
β β SAM 2 β β Table Extract β β Workflow Engine β β
β β GPT-4o Vision β β Text Chunking β β β β
β β Region Cropper β β BM25 Index β β Room Agent β β
β β Grounding β β FAISS Index β β Asset Agent β β
β β Verifier β β Hybrid Search β β Takeoff Agent β β
β ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ β Cost Agent β β
β β β ββββββββββ¬βββββββββ β
β β ββββββββββββββββββββββ€ β
β βββββββββββββββββββββββββββββββββββββββββββ€ β
β β β
β ββββββββββββββββββββββββββββββββββΌβββββββ β
β β MCP TOOL REGISTRY β β
β β β’ bm25_search β’ vector_search β β
β β β’ hybrid_search β’ get_table β β
β β β’ get_chunk β’ vision_detect β β
β βββββββββββββββββββββββββββββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ
β FIREBASE DATA LAYER β
β β’ Storage: documents, images, exports β
β β’ Firestore: studies, classifications, audit trails β
β β’ Auth: role-based access β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Basis demonstrates that document intelligence problems share common architectural requirements:
- Hybrid retrieval for documents with both exact codes and narrative context
- Custom tokenization that preserves domain-specific identifiers (not naive whitespace splitting)
- Agentic orchestration for multi-step reasoning with tool routing
- Human-in-the-loop checkpoints for auditability and defensibility
- Citation-first outputs linking every claim to source evidence
The same pipeline that queries IRS depreciation tables can query HARA baselines, safety goals, TSRs, or verification evidenceβbecause the architectural pattern is the same:
| IRS Domain | Safety Domain |
|---|---|
Β§1245, 168(e)(3) |
HAZ-001, TSR-042 |
Asset class 57.0 |
ASIL-B, ASIL-D |
| IRS Pub 946 citations | ISO 26262 clause refs |
| Depreciation tables | FMEA tables, DFA matrices |
Standardized headers, messy context, need for traceability.


