Guided, AI-powered reporting workflow that turns messy spreadsheet exports into deterministic, decision-ready reports with visible methodology.
AI Report Builder is a local-first web app for a common operational workflow:
- load a CSV or XLSX export
- inspect and repair the data safely
- build a semantic understanding of the dataset
- capture the reporting outcome in plain English
- generate a polished report with deterministic metrics, visuals, and export-ready output
The product is intentionally guided. It is not a blank-canvas BI tool and it does not let an LLM invent numbers.
Spreadsheet-shaped analysis work is usually fragmented across data cleanup, ad hoc BI work, manual chart selection, and slide rewriting. That costs time and makes trust harder:
- messy files hide schema and quality issues
- teams repeat the same repair and reporting steps every time a new export arrives
- AI summaries often lose the link to real computed values
- general-purpose BI tools ask users to build the report instead of describing the answer they need
This repo demonstrates a tighter path from upload to trustworthy report.
- product managers and operators who need a fast answer from a spreadsheet export
- analysts who want a cleaner first pass before deeper analysis
- engineering and data leaders evaluating guided analytics workflows
- recruiters and hiring managers reviewing product, data, and technical execution depth
- Outcome-first workflow: users describe the report they need instead of manually building charts.
- Deterministic analytics boundary: AI helps with inference, planning, and narrative, but not numeric truth.
- Reviewable repairs: raw data stays separate from the cleaned working layer.
- Renderer-owned output: the report is structured, exportable, and presentation-ready by default.
- Trust visibility: methodology, runtime, caveats, and evidence stay attached to the report.
- Launch the app with
Run Me.batorscripts/run-me.ps1. - Load
sample datasets/healthcare_showcase_1000.xlsx. - Approve the recommended repairs in Module 2.
- In Module 3, keep the payer question:
Which payer-plan combinations are driving the largest balance and denial burden? - Generate the report in Module 4.
- Open Trust to show the deterministic boundary and active local models.
- Export HTML or PDF.
Alternative EHR demo:
- dataset:
sample datasets/clinical_encounter_demo_500.xlsx - question:
Which patient segments are experiencing the longest ED waits?
- upload or load a structured workbook
- inspect field roles, null burden, and likely grain
- keep the raw input intact
- propose repairs with rationale
- separate apply-now actions from review-required decisions
- build a cleaned working layer without mutating the original file
- infer candidate reporting questions from the dataset
- translate a business question into a structured report contract
- keep the reporting plan constrained by the semantic model
- compute metrics, groupings, and visuals deterministically
- generate narrative and section framing with AI assistance
- render export-ready HTML and PDF outputs
- show formulas, caveats, and runtime/provider metadata
- keep evidence and report outputs attached to the same session
dataset upload
-> profiling and repair planning
-> cleaned working layer
-> semantic model
-> report contract drafting
-> deterministic metric computation
-> report rendering
-> trust / methodology / export
Repo layout:
frontend/- Next.js guided workflow UI and Playwright coveragebackend/- FastAPI services, contracts, persistence, and testspackages/contracts/- shared JSON schemas and generated TypeScript typesfixtures/- acceptance scenarios and deterministic test fixturessample datasets/- public-safe demo datasets for the main portfolio flowdocs/- architecture, case study, demo script, and portfolio-facing materials
AI is real and central here, but it is deliberately bounded.
AI is used for:
- suggesting repair strategies and deciding what needs review
- drafting the report contract from the user's question and the semantic model
- generating narrative wording and reviewer-style polishing for the rendered report
AI is not used for:
- inventing metrics or chart payloads
- silently rewriting the raw data
- bypassing formulas, caveats, or trust metadata
The current local-first demo preset is:
- Module 2:
qwen2.5:14b - Module 3:
qwen2.5:14b - Module 4:
qwen2.5:32b
- Frontend: Next.js, TypeScript, React, Playwright
- Backend: FastAPI, Pydantic, Python, httpx
- Contracts: JSON Schema plus generated TypeScript types
- Storage: local filesystem, SQLite, DuckDB
- Models: OpenAI or Ollama-backed local models, with launcher-controlled provider selection
- Charts and output: renderer-owned report views with HTML and PDF export
Preferred path:
Run Me.batOr:
powershell -ExecutionPolicy Bypass -File .\scripts\run-me.ps1The launcher handles:
- dependency/bootstrap checks
- backend and frontend startup
- OpenAI vs local Ollama preset selection
- local model preflight for the demo presets
powershell -ExecutionPolicy Bypass -File .\scripts\bootstrap.ps1Backend and frontend:
powershell -ExecutionPolicy Bypass -File .\scripts\start-dev.ps1Common validation commands:
npm.cmd run contracts:generate
backend\.venv\Scripts\python.exe -m pytest backend\tests -q
npm.cmd --prefix frontend run lint
npm.cmd --prefix frontend run typecheck
npm.cmd --prefix frontend run build
powershell -ExecutionPolicy Bypass -File .\scripts\run-tests.ps1Demo signoff:
powershell -ExecutionPolicy Bypass -File .\scripts\run-demo-signoff.ps1This repo ships with public-safe healthcare demo workbooks:
sample datasets/healthcare_showcase_1000.xlsx- best for the payer balance and denial story
sample datasets/clinical_encounter_demo_500.xlsx- best for the ED wait-time story
Recommended questions:
Which payer-plan combinations are driving the largest balance and denial burden?Which patient segments are experiencing the longest ED waits?
The healthcare showcase notes are documented in sample datasets/healthcare_showcase_1000.notes.md.
- Guided flow over blank canvas: faster to a report, narrower than a BI workbench.
- Deterministic metrics over model-generated analytics: higher trust, less open-ended flexibility.
- Local-first runtime and file storage: easier demoability and auditability, less collaboration depth.
- Fixed report templates over arbitrary chart authoring: better presentation quality, smaller output surface.
- broader demo datasets beyond healthcare
- richer follow-up and refinement flows
- stronger multi-table semantic modeling
- deeper evaluation harnesses for prompt and report quality
- packaging improvements for non-Windows local setup




