Query hundreds of contracts, leases, and invoices in seconds — entirely offline.
No API keys. No cloud uploads. No NDA violations. Zero data exfiltration.
Legal and finance teams sit on thousands of sensitive documents — NDAs, lease agreements, invoices, tax records — that cannot be uploaded to any cloud AI without violating confidentiality obligations or compliance requirements (SOC2, HIPAA, GDPR). Generic tools ignore this constraint. ALIV does not.
|
For legal & finance teams
|
For engineering evaluators
|
⚖️ Legal/Finance Agent
Ask natural language questions across your local document corpus. ALIV assembles a token-budget-aware context window (8,192 tokens) and returns a cited answer using a local LLM.
Query: "Find all agreements expiring in 90 days and extract the penalty clauses."
→ Lease — Acme Commercial (expires 2026-03-01)
Penalty: $2,800/day holdover [lease_acme_2025.txt · §5]
→ NDA — Axiom Legal (expires 2026-07-15)
Penalty: $50,000 per unauthorized disclosure [nda_axiom_2025.txt · §5]
→ Service Agreement — Delta Advisory (expired 2025-12-31)
Penalty: $5,000/day for delivery delays [service_delta_2025.txt · §4]
📂 Autonomous Inbox Organizer
Drop a chaotic folder of 500 files. Rust background threads detect changes instantly, run OCR and extraction, and propose standardized renames — {DocType}_{Vendor}_{YYYY-MM-DD} — then wait for your approval before modifying anything.
Safety model: Generate → Risk label → Select → Approve → Execute → Undo
High-risk items are pre-deselected. execute_actions hard-errors if approve_actions was not called first. Every execution batch is logged and reversible.
🧠 Persistent Memory Layer
Three retention scopes: session, week, permanent. A token-budget assembler selects the most relevant facts, conversation turns, and document summaries before each query. Memory is local SQLite — it survives restarts and never leaves your machine.
🔬 Structured Extraction
Five built-in extraction templates: lease summary, contract key terms, invoice fields, penalty clauses, custom. Each field returns a value, a confidence score (0–1), and source citations. Export to JSON or CSV in one click.
📊 Benchmark Suite & Diagnostics
Built-in benchmark automation across ingestion, retrieval, and extraction with versioned pass/fail results. The Diagnostics screen exports a JSON report of DB stats, runtime state, and platform info — built for real support workflows.
graph TB
UI["React 18 + TypeScript\nZustand · React Router v6\nTailwind + Glassmorphism UI"]
TAURI["Tauri 2 Shell\nRust Backend\ntyped invoke() IPC"]
SQLITE["SQLite WAL\n+ sqlite-vec ANN\nvec0 virtual tables"]
PYTHON["Python Bridge\nPyMuPDF · Tesseract OCR\nsentence-transformers"]
OLLAMA["Ollama\nllama3.2:3b · localhost:11434\n100% local inference"]
UI -->|"typed invoke wrappers"| TAURI
TAURI -->|"rusqlite"| SQLITE
TAURI -->|"std::process::Command\nJSON-lines IPC · 3× retry"| PYTHON
TAURI -->|"HTTP · streaming"| OLLAMA
PYTHON -->|"embeddings + parse results"| SQLITE
Key architectural decisions:
| Decision | Rationale |
|---|---|
| Rust + Tauri 2 (not Electron) | ~10 MB binary, native OS APIs, no bundled Chromium |
| sqlite-vec for ANN search | No external vector database — lives in the same WAL-mode SQLite file |
| Python subprocess bridge | Reuses mature ML stack (Tesseract, sentence-transformers, PyMuPDF) without Rust FFI complexity |
| Hash router | Works correctly from any local file path on Windows and macOS without a server |
| Approval gate as hard error | execute_actions returns an error — not a warning — if approve_actions was skipped |
| Token-budget assembler | Prevents context overflow by ranking facts/history/chunks/summaries within a fixed budget |
Prerequisites: Rust (stable), Node.js 18+, Python 3.10+, Ollama installed
# Clone and install
git clone https://github.com/VenkataAnilKumar/ALIV.git && cd ALIV
pip install -r python/requirements.txt
npm install
# Pull the LLM (~2 GB, one-time)
ollama pull llama3.2:3b
# Launch in dev mode
cargo tauri dev# Production build → installer in src-tauri/target/release/bundle/
cargo tauri buildTry it immediately with the demo corpus — 5 pre-built legal documents (lease, NDA, invoice, service agreement, email):
# After launch: register this folder as your workspace
./demo/corpus/Full scripted walkthrough with expected outputs for every screen: demo/DEMO_GUIDE.md
| Metric | Result | Gate Threshold |
|---|---|---|
| Ingestion success rate | 100% (200/200 files) | ≥ 98% |
| Retrieval relevance (top-5) | 100% | ≥ 85% |
| Extraction precision (critical fields) | 97.4% | ≥ 95% |
| Harmful file-action rate | < 1% | < 1% |
| Open critical defects | 0 | 0 |
Production practices:
- FilesystemGuard — path canonicalization, workspace boundary check, symlink rejection, traversal detection before every file operation
- 3-attempt retry with exponential backoff — Python worker cold-start resilience (500ms / 1s)
- Phase gate system — each phase requires two consecutive benchmark passes before advancing
- React error boundary — frontend crashes surface a recovery screen with an exportable crash log
- Structured logging —
tauri-plugin-logwrites to platform app-data dir; no telemetry transmitted
Full Phase 4 defect history
| ID | Defect | Severity | Fix |
|---|---|---|---|
| D001 | embedding_count always 0 in benchmark |
Critical | Wired _embed_chunk_batch() into benchmark loop |
| D002 | sqlite-vec migration skipped in Python benchmarks | High | _load_sqlite_vec() loads Python wheel; _populate_vec_index() syncs ANN index |
| D003 | Python worker has no retry on transient timeout | High | run_worker() wraps with 3-attempt retry, 500ms/1s backoff |
| D004 | FilesystemGuard accepted any non-empty string | Critical | Real validate_and_canonicalize() with workspace boundary enforcement |
All 4 closed — docs/PHASE4_DEFECT_BACKLOG.json
| Phase | Scope | Status |
|---|---|---|
| 0 · Foundation | Tauri skeleton, SQLite schema 0001–0008, benchmark scaffold | ✅ Complete |
| 1 · Ingestion | PDF / DOCX / EML parsing, OCR fallback, incremental indexing | ✅ Complete |
| 2 · Retrieval & Memory | RRF hybrid search, extraction templates, persistent memory layer | ✅ Complete |
| 3 · Safe File Actions | Approval gate, collision-safe rename engine, undo, audit log | ✅ Complete |
| 4 · E2E Hardening | 4 defects closed, 2× consecutive benchmark suite pass | ✅ Complete |
| 5 · UX Polish | 8-screen React frontend, 2026 dark glassmorphism design system | ✅ Complete |
| 6 · Release Candidate | Bundler, installer, onboarding, diagnostics, error boundary | 🔄 In progress |
| Post-6 | Cloud-next routing, BYOK, Gemma 4 (27B MMLU 85.2%), VAULT / LEDGER variants | 📋 Planned |
Contributions welcome — see ALIV_IMPLEMENTATION_BACKLOG.md for open workstreams.
|
Try ALIV against a real matter folder. → 📥 Download installer (Phase 6 RC)
→ Deployment support available for firms with 50+ users. |
Explore the architecture, PRDs, and benchmark results. → 📐 Architecture docs → 📋 Phase PRDs → 📊 Latest benchmark results → 🔧 Phase 3–4 implementation spec |
| Minimum | Recommended | |
|---|---|---|
| OS | Windows 10 · macOS 11 · Ubuntu 22 | Windows 11 · macOS 14 |
| RAM | 8 GB | 16 GB |
| Disk | 2 GB | 10 GB (larger corpora) |
| Ollama | v0.3+ | Latest |
| Model | llama3.2:3b |
mistral:7b or gemma4:27b |
MIT License · Built by Venkata Anil Kumar
⭐ Star this repo if ALIV solves a problem you care about.
local LLM · document intelligence · legal AI · offline RAG · Tauri desktop · sqlite-vec