Skip to content

Oguz-Guzel/CircuScan

Repository files navigation

CircuScan

CircuScan is a small Building Intelligence product for sustainability engineers and asset managers planning renovation or deconstruction. It turns public reuse sheets, historic-building guidance, and Eurocode references into a searchable material-reuse audit that flags which inventory items look reusable, which need verification, and which should be held for safety review.

Why This Problem

SECO's public material is very explicit about the business context: its sustainability services include feasibility audits, pre-deconstruction audits, and Safety in Circularity attestation for reused materials. Its building services also mention reuse audits with inventory, adaptive-capacity work, recovered-material quality assessment, and technical risk reduction across a building's life cycle.

The pain point I chose is the information gap between an old building inventory and a defensible reuse decision. Reused materials often lack manufacturer declarations, traceability, or modern performance documentation. CircuScan does not pretend to certify a material. It gives an inspector a first-pass triage with evidence, risks, and the next checks to request.

User

Primary user: sustainability engineer or asset manager preparing a reuse audit before renovation or deconstruction.

Secondary users: structural engineer, insurer, architect, or public authority reviewing whether a reuse pathway is plausible enough to investigate.

What It Does

  • Ingests heterogeneous public documents from data/raw: PDFs, HTML pages, material sheets, circular-construction references, and Eurocode pages.
  • Extracts text, chunks it, classifies each source by material family and document category, and stores it in SQLite.
  • Runs a local RAG-style retrieval layer using BM25, then applies a conservative material risk model.
  • Lets the user edit or upload a building inventory in Streamlit.
  • Produces a dashboard with reuse verdicts, risk/reuse/confidence scores, verification checklists, and cited evidence snippets.

Architecture

data/raw PDFs + HTML
        |
        v
src/circuscan/pipeline.py
  - text extraction with pypdf / stdlib HTML parser
  - document profiling and material taxonomy
  - chunking and signal extraction
        |
        v
data/processed/circuscan.sqlite
        |
        v
src/circuscan/retrieval.py + src/circuscan/audit.py
  - BM25 evidence retrieval
  - material-specific risk scoring
  - reuse decision and checklist generation
        |
        v
app.py Streamlit UI

Quick Start

This repo uses uv.

Install uv: https://docs.astral.sh/uv/

uv sync
uv run python -m circuscan.pipeline --reset
uv run streamlit run app.py

Then open the local Streamlit URL. The app ships with data/sample_inventory.csv, and you can upload your own CSV with these columns:

item_id,description,material_family,year_built,intended_reuse,quantity,location

Run the smoke evaluation:

uv run python scripts/evaluate.py
uv run python -m unittest discover -s tests

Data Sources

The raw data in this repo is public and reproducible. I kept the dataset small enough for a take-home challenge but intentionally heterogeneous:

  • FCRBE / Rotor Reuse Toolkit material sheets for steel beams, timber, brick, roof tiles, sanitary elements, radiators, floor systems, stone, and concrete reuse cases. Rotor describes these sheets as guidance for designers and specifiers, including fitness-for-reuse, dismantling, integration, and quantities.
  • European Commission Eurocodes page, used as a modern standards reference for structural design context.
  • Eurocodes Tools overview HTML page, used as a readable secondary reference for Eurocode family mapping.
  • Historic-building and rehabilitation guidance, used to capture older-building risks and restoration language.
  • SECO public pages, used for product framing rather than scoring.

Useful references:

AI Component

CircuScan uses a local retrieval-augmented audit engine:

  • Retrieval: BM25 ranks the most relevant passages from the SQLite chunk store.
  • Classification: each source and inventory item is mapped to a material family.
  • Decision support: the audit model combines evidence confidence, age, intended reuse, structural criticality, and material-specific degradation risks.

I chose this over a hosted LLM for the MVP because it is reproducible without secrets, cheap to run, and easy to defend in an interview. In production, I would add an LLM only after the retrieval and evidence model are stable, and I would keep every answer grounded in cited passages.

Technical Decisions and Trade-Offs

  • Streamlit instead of React: faster for a data/AI MVP, easier to demo, and enough for the recruiter to test the product. I would move to React/Next.js after validating the workflow.
  • SQLite instead of a vector database: transparent, portable, and enough for dozens or hundreds of documents. For production scale I would add pgvector or a managed vector store.
  • BM25 instead of neural embeddings: no model downloads, deterministic results, and strong baseline retrieval for technical terminology. I would add embeddings for semantic matching across synonyms and multilingual sources.
  • Conservative scoring: the tool should over-request verification rather than green-light unsafe reuse.
  • No OCR in the MVP: selectable PDFs work; scanned drawings/photos would need OCR and computer vision later.

What I Would Put In Production Tomorrow

  • The ingestion structure, SQLite schema, source traceability, and audit result object.
  • The idea of evidence-first reuse triage with explicit confidence.
  • Material-specific verification checklists as editable policy rules.
  • A human-in-the-loop workflow where engineers can approve, reject, or override recommendations.

What I Would Throw Away Or Replace

  • The simple hand-built scoring weights.
  • The Streamlit UI once the core workflow is validated.
  • Pure BM25 retrieval for large corpora or multilingual documents.
  • The current text-only pipeline for any source that contains drawings, tables, or photos that matter.

Three-Month Product Vision

  • OCR and table extraction for scanned audits, datasheets, and old plans.
  • Embeddings plus reranking for stronger semantic retrieval.
  • Batch import from BIM/IFC exports and inventory spreadsheets.
  • A material passport view with provenance, inspection evidence, photos, lab results, and acceptance criteria.
  • Role-specific outputs: inspector checklist, insurer risk note, architect reuse schedule, and asset-manager carbon/reuse dashboard.
  • Evaluation set with expert-labeled audit cases and measured retrieval quality.

Limits

CircuScan is decision support, not a compliance certificate. Any structural, fire-safety, or hazardous-substance decision still requires qualified engineering review, testing, and applicable local regulatory checks.

About

CircuScan is a Streamlit-based audit assistant for circular construction that turns building inventories and public technical sources into evidence-backed reuse, risk, and verification recommendations.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors