CircuScan is a small Building Intelligence product for sustainability engineers and asset managers planning renovation or deconstruction. It turns public reuse sheets, historic-building guidance, and Eurocode references into a searchable material-reuse audit that flags which inventory items look reusable, which need verification, and which should be held for safety review.
SECO's public material is very explicit about the business context: its sustainability services include feasibility audits, pre-deconstruction audits, and Safety in Circularity attestation for reused materials. Its building services also mention reuse audits with inventory, adaptive-capacity work, recovered-material quality assessment, and technical risk reduction across a building's life cycle.
The pain point I chose is the information gap between an old building inventory and a defensible reuse decision. Reused materials often lack manufacturer declarations, traceability, or modern performance documentation. CircuScan does not pretend to certify a material. It gives an inspector a first-pass triage with evidence, risks, and the next checks to request.
Primary user: sustainability engineer or asset manager preparing a reuse audit before renovation or deconstruction.
Secondary users: structural engineer, insurer, architect, or public authority reviewing whether a reuse pathway is plausible enough to investigate.
- Ingests heterogeneous public documents from
data/raw: PDFs, HTML pages, material sheets, circular-construction references, and Eurocode pages. - Extracts text, chunks it, classifies each source by material family and document category, and stores it in SQLite.
- Runs a local RAG-style retrieval layer using BM25, then applies a conservative material risk model.
- Lets the user edit or upload a building inventory in Streamlit.
- Produces a dashboard with reuse verdicts, risk/reuse/confidence scores, verification checklists, and cited evidence snippets.
data/raw PDFs + HTML
|
v
src/circuscan/pipeline.py
- text extraction with pypdf / stdlib HTML parser
- document profiling and material taxonomy
- chunking and signal extraction
|
v
data/processed/circuscan.sqlite
|
v
src/circuscan/retrieval.py + src/circuscan/audit.py
- BM25 evidence retrieval
- material-specific risk scoring
- reuse decision and checklist generation
|
v
app.py Streamlit UI
This repo uses uv.
Install uv: https://docs.astral.sh/uv/
uv sync
uv run python -m circuscan.pipeline --reset
uv run streamlit run app.pyThen open the local Streamlit URL. The app ships with data/sample_inventory.csv, and you can upload your own CSV with these columns:
item_id,description,material_family,year_built,intended_reuse,quantity,locationRun the smoke evaluation:
uv run python scripts/evaluate.py
uv run python -m unittest discover -s testsThe raw data in this repo is public and reproducible. I kept the dataset small enough for a take-home challenge but intentionally heterogeneous:
- FCRBE / Rotor Reuse Toolkit material sheets for steel beams, timber, brick, roof tiles, sanitary elements, radiators, floor systems, stone, and concrete reuse cases. Rotor describes these sheets as guidance for designers and specifiers, including fitness-for-reuse, dismantling, integration, and quantities.
- European Commission Eurocodes page, used as a modern standards reference for structural design context.
- Eurocodes Tools overview HTML page, used as a readable secondary reference for Eurocode family mapping.
- Historic-building and rehabilitation guidance, used to capture older-building risks and restoration language.
- SECO public pages, used for product framing rather than scoring.
Useful references:
- SECO sustainability services: https://groupseco.be/sustainability/
- SECO buildings services: https://groupseco.be/buildings/
- Safety in Circularity context: https://www.safetyincircularity.be/nl/wie-zijn-we
- European Commission Eurocodes overview: https://single-market-economy.ec.europa.eu/sectors/construction/eurocodes_en
- Rotor Reuse Toolkit material sheets: https://www.rotordb.org/en/projects/reuse-toolkit-material-sheets
CircuScan uses a local retrieval-augmented audit engine:
- Retrieval: BM25 ranks the most relevant passages from the SQLite chunk store.
- Classification: each source and inventory item is mapped to a material family.
- Decision support: the audit model combines evidence confidence, age, intended reuse, structural criticality, and material-specific degradation risks.
I chose this over a hosted LLM for the MVP because it is reproducible without secrets, cheap to run, and easy to defend in an interview. In production, I would add an LLM only after the retrieval and evidence model are stable, and I would keep every answer grounded in cited passages.
- Streamlit instead of React: faster for a data/AI MVP, easier to demo, and enough for the recruiter to test the product. I would move to React/Next.js after validating the workflow.
- SQLite instead of a vector database: transparent, portable, and enough for dozens or hundreds of documents. For production scale I would add pgvector or a managed vector store.
- BM25 instead of neural embeddings: no model downloads, deterministic results, and strong baseline retrieval for technical terminology. I would add embeddings for semantic matching across synonyms and multilingual sources.
- Conservative scoring: the tool should over-request verification rather than green-light unsafe reuse.
- No OCR in the MVP: selectable PDFs work; scanned drawings/photos would need OCR and computer vision later.
- The ingestion structure, SQLite schema, source traceability, and audit result object.
- The idea of evidence-first reuse triage with explicit confidence.
- Material-specific verification checklists as editable policy rules.
- A human-in-the-loop workflow where engineers can approve, reject, or override recommendations.
- The simple hand-built scoring weights.
- The Streamlit UI once the core workflow is validated.
- Pure BM25 retrieval for large corpora or multilingual documents.
- The current text-only pipeline for any source that contains drawings, tables, or photos that matter.
- OCR and table extraction for scanned audits, datasheets, and old plans.
- Embeddings plus reranking for stronger semantic retrieval.
- Batch import from BIM/IFC exports and inventory spreadsheets.
- A material passport view with provenance, inspection evidence, photos, lab results, and acceptance criteria.
- Role-specific outputs: inspector checklist, insurer risk note, architect reuse schedule, and asset-manager carbon/reuse dashboard.
- Evaluation set with expert-labeled audit cases and measured retrieval quality.
CircuScan is decision support, not a compliance certificate. Any structural, fire-safety, or hazardous-substance decision still requires qualified engineering review, testing, and applicable local regulatory checks.