A multi-agent system that shepherds property deals from Expression of Interest (EOI) to executed contract, automating contract validation, email workflows, and SLA monitoring.
The easiest way to see the system in action is through the visual web dashboard:
# Clone the repository
git clone <repository-url>
cd onecorp-mas
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set up API keys (create .env file)
echo "ANTHROPIC_API_KEY=your_anthropic_api_key_here" > .env # Extractor/Router (Claude Haiku 4.5)
echo "DEEPINFRA_API_KEY=your_deepinfra_api_key_here" >> .env # Auditor/Comms (Qwen3-235B)
# Launch the visual dashboard
python run_ui.pyThe dashboard will open automatically in your browser at http://localhost:5000. Click "Start Demo" to watch the multi-agent workflow execute in real-time.
Dashboard Features:
- Real-time workflow progress visualization
- Live agent activity indicators
- Contract mismatch detection display
- Email generation tracking
- SLA monitoring with countdown
- State transition timeline
- Event log with timestamps
For headless execution or debugging:
# Run the full demo
python -m src.main --demo
# Or run step-by-step
python -m src.main --step eoi
python -m src.main --step contract-v1
python -m src.main --step contract-v2
# Test SLA overdue scenario
python -m src.main --test-sla
# Reset database
python -m src.main --reset
# Run all tests
pytest tests/ -v
# Run end-to-end integration test
pytest tests/test_end_to_end.py -vNote: You'll need an Anthropic API key for the Extractor and Router (Claude Haiku 4.5, model id claude-haiku-4-5), and a DeepInfra API key for the Qwen3‑235B‑based Auditor and Comms LLM paths.
onecorp-mas/
├── CLAUDE.md # Instructions for Claude Code
├── AGENTS.md # Agent architecture guidelines
├── PROGRESS.md # Task progress tracking
├── QUICKSTART.md # Quick start guide
├── tasks.md # Implementation task specifications
├── run_ui.py # Visual dashboard launcher
├── requirements.txt # Python dependencies
├── agent_docs/ # Implementation guides for each agent
│ ├── extraction.md # Extractor agent (PDF parsing, field extraction)
│ ├── comparison.md # Auditor agent (mismatch detection, risk scoring)
│ ├── emails.md # Router + Comms agents (classification, generation)
│ ├── state-machine.md # Orchestrator (states, transitions, SLA)
│ └── testing.md # Test patterns and fixtures
├── assets/ # Visual assets
│ ├── architecture.svg # System architecture diagram
│ └── workflow_diagram.jpeg
├── docs/ # Project documentation
│ ├── INDEX.md # Documentation index
│ ├── architecture.md # System design document
│ ├── demo-script.md # Demo walkthrough guide
│ ├── demo-recording.md # Recording instructions
│ └── visual-ui-guide.md # UI documentation
├── spec/ # Problem specification
│ ├── MAS_Brief.md # Full requirements
│ ├── judging-criteria.md
│ └── transcript.md # Stakeholder context
├── data/ # Input data files
│ ├── source-of-truth/ # EOI PDF (source document)
│ ├── contracts/ # Contract PDFs (V1, V2)
│ ├── emails/
│ │ ├── incoming/ # Incoming email files to process
│ │ └── templates/ # Output email templates
│ └── emails_manifest.json
├── ground-truth/ # Expected outputs for validation
│ ├── eoi_extracted.json
│ ├── v1_extracted.json
│ ├── v2_extracted.json
│ ├── v1_mismatches.json
│ └── expected_outputs.json
├── src/ # Implementation
│ ├── main.py # Main entry point
│ ├── agents/ # LLM agents
│ │ ├── router.py # Email classification
│ │ ├── extractor.py # PDF field extraction
│ │ ├── auditor.py # Contract comparison
│ │ ├── comms.py # Email generation
│ │ └── prompts/ # Agent prompt templates
│ ├── orchestrator/ # Workflow coordination
│ │ ├── state_machine.py
│ │ ├── deal_store.py # SQLite persistence
│ │ └── sla_monitor.py
│ ├── ui/ # Visual web dashboard
│ │ ├── app.py # Flask server with SSE
│ │ └── templates/ # HTML dashboard
│ └── utils/ # Shared utilities
│ ├── pdf_parser.py
│ ├── email_parser.py
│ └── date_resolver.py
├── tests/ # Unit and integration tests
├── n8n/ # Workflow export (if using n8n)
└── prompts/ # Additional prompt files
Start here:
spec/MAS_Brief.md— Full problem specificationspec/transcript.md— Stakeholder contextdocs/architecture.md— Agent design, interactions, and generalizability
For implementation:
agent_docs/— Technical guides for building each agent (pattern-based, not hardcoded)
OneCorp processes property contracts through a complex workflow:
EOI signed → Contract received → Validated → Solicitor approved → DocuSign → Executed
Current pain points:
- Single shared inbox for all communications
- Manual contract checking (slow, error-prone)
- Version control across amendments
- SLA tracking based on appointment dates
Visual Diagram: See assets/architecture.svg for a complete visual representation of agents, data flows, and control flows.
| Agent | Responsibility | Model |
|---|---|---|
| Router | Email classification, deal mapping | Claude Haiku 4.5 |
| Extractor | PDF field extraction from EOI/contracts | Claude Haiku 4.5 |
| Auditor | Contract vs EOI comparison, risk scoring | Qwen3-235B via DeepInfra |
| Comms | Email generation (solicitor, vendor, alerts) | Qwen3-235B via DeepInfra |
The Orchestrator (non-LLM) manages state transitions and SLA timers.
See docs/demo-script.md for a complete 3-minute demo walkthrough.
The dataset includes:
- 1 EOI (source of truth)
- 2 contracts (V1 with 5 errors, V2 corrected)
- 8 emails covering the full workflow
Demo flow:
- V1 contract → 5 mismatches detected → Discrepancy alert
- V2 contract → Validated → Sent to solicitor
- Solicitor approves → Vendor release email
- DocuSign flow → Contract executed
- SLA test → Remove buyer-signed email → Alert fires
# Run all tests
pytest tests/ -v
# Run specific test
pytest tests/test_comparison.py -v
# With coverage
pytest tests/ --cov=src --cov-report=htmlFiles in ground-truth/ are test fixtures for the demo dataset, not runtime data:
| File | Purpose |
|---|---|
eoi_extracted.json |
Expected Extractor output for demo EOI |
v1_extracted.json |
Expected Extractor output for demo V1 contract |
v2_extracted.json |
Expected Extractor output for demo V2 contract |
v1_mismatches.json |
Expected Auditor output when comparing V1 to EOI |
expected_outputs.json |
Expected emails/states at each workflow step |
Important: Agents should use pattern-based logic, not read these files at runtime. The system must work for ANY property deal, not just the demo.
This multi-agent system directly addresses the evaluation criteria outlined in spec/judging-criteria.md:
- Clear agent separation: 4 specialized LLM agents (Router, Extractor, Auditor, Comms) + deterministic Orchestrator
- Structured communication: Agents exchange typed data through the Orchestrator (not ad-hoc prompting)
- Visual documentation: Complete architecture diagram showing data/control flows
- Stable & reliable: Deterministic state machine ensures predictable behavior
- Meaningful multi-agent workflow: Router classifies → Extractor parses → Auditor validates → Comms generates
- Data dependencies: Auditor requires both EOI and contract data from Extractor
- Coordinated by Orchestrator: State machine enforces workflow rules (e.g., contract can't go to solicitor until validated)
- Emergent intelligence: Validation accuracy improves through multi-stage processing (extraction + comparison + severity assessment)
- Hybrid classification: Router uses deterministic pattern matching with LLM fallback for ambiguous cases
- Confidence scoring: Extractor assigns confidence to fields; low-confidence triggers re-extraction or human review
- Version superseding: Automatic contract version management (V2 supersedes V1)
- Semantic comparison: Auditor understands negation ("NOT subject to finance" vs "IS subject to finance")
- Complete end-to-end workflow: EOI → Contract validation → Solicitor approval → DocuSign → Execution
- Handles error cases: V1 contract with 5 mismatches is correctly rejected and alert generated
- Self-correcting: V2 corrected contract proceeds smoothly through workflow
- SLA monitoring: Detects overdue deadlines and generates alerts
- Solves stated problem: Addresses OneCorp's pain points (manual checking, version control, SLA tracking)
- Production-ready features: SQLite persistence, email template generation, audit trail
- Scalable design: Each deal isolated by ID; supports concurrent processing
- Cost-conscious: Uses smaller models for simple tasks (classification) and larger models for reasoning (comparison)
- Guardrails implemented:
- Confidence threshold (≥0.8) for critical fields (lot number, price, finance terms)
- Human escalation for low-confidence extractions
- Version superseding prevents using outdated contracts
- Only validated contracts sent to solicitor
- SLA alerts prevent deals from stalling
- Error handling: PDF parsing failures, LLM API errors, invalid state transitions all handled gracefully
- Audit trail: All events logged with timestamps in database
- Limitations acknowledged: See Safety & Limitations section below
- Visual web dashboard: Real-time workflow visualization with animated agent activity
- Clear CLI interface:
--demo,--step,--test-slamodes for different use cases - Comprehensive demo script: 3-minute walkthrough in
docs/demo-script.md - Live event streaming: Server-Sent Events (SSE) for real-time updates without page refresh
- Non-technical friendly: Visual indicators and status badges make the workflow accessible to all audiences
- Complete documentation: Architecture docs, implementation guides, test coverage
-
Confidence-based validation
- Critical fields (lot number, total price, finance terms) require ≥80% extraction confidence
- Low-confidence fields trigger re-extraction or human review flags
- No auto-approval when uncertain
-
State machine guardrails
- Invalid state transitions are blocked (e.g., can't execute contract before buyer signs)
- Contract version management prevents using superseded versions
- Only validated contracts proceed to solicitor
-
Human-in-the-loop triggers
- Low-confidence extractions flagged for review
- Discrepancy alerts require human decision on amendments
- SLA overdue alerts escalate to internal team
- Human can intervene at any workflow stage
-
Audit trail
- All events logged with timestamps
- Complete history of state transitions
- Traceable decision path for debugging
-
LLM dependency
- Requires API access to Claude Haiku 4.5 via Anthropic (Extractor/Router)
- Auditor/Comms LLM paths use Qwen3-235B via DeepInfra (requires
DEEPINFRA_API_KEYwhen enabled) - Extraction accuracy depends on PDF quality and structure
- Costs scale with number of deals processed
-
Pattern-based extraction
- Works best with standard contract formats
- May struggle with highly unusual document layouts
- Requires well-formed PDFs (not scanned images without OCR)
-
Demo scope
- Tested with Australian property contracts
- May need tuning for other jurisdictions or contract types
- SLA rules currently hardcoded (2 business days after appointment)
-
Scalability considerations
- SQLite database suitable for single-user/demo use
- Production deployment would need PostgreSQL/MySQL for concurrency
- No rate limiting on LLM API calls (could hit quotas on high volume)
-
Error recovery
- PDF parsing failures halt processing (no fallback OCR)
- LLM API failures require manual retry
- No automatic recovery from database corruption
- Multi-user authentication and authorization
- Database migration to PostgreSQL with connection pooling
- Rate limiting and retry logic for LLM API calls
- Webhook integration for real-time email processing
- Admin dashboard for monitoring deal pipeline
- Configurable SLA rules per deal type
- Backup/disaster recovery procedures
If using Claude Code for implementation, start with:
CLAUDE.md— Master instructions and critical rulesagent_docs/— Detailed implementation guides per agent
MIT