Skip to content

My solution for OneCorp's email workflow problem statement as part of the StarPlanAI hackathon

License

Notifications You must be signed in to change notification settings

nikjohn7/OneCorp-Multi-Agent-System

Repository files navigation

OneCorp Multi-Agent Contract Workflow System

A multi-agent system that shepherds property deals from Expression of Interest (EOI) to executed contract, automating contract validation, email workflows, and SLA monitoring.

Quick Start

Visual Dashboard (Recommended for Demos)

The easiest way to see the system in action is through the visual web dashboard:

# Clone the repository
git clone <repository-url>
cd onecorp-mas

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up API keys (create .env file)
echo "ANTHROPIC_API_KEY=your_anthropic_api_key_here" > .env   # Extractor/Router (Claude Haiku 4.5)
echo "DEEPINFRA_API_KEY=your_deepinfra_api_key_here" >> .env # Auditor/Comms (Qwen3-235B)

# Launch the visual dashboard
python run_ui.py

The dashboard will open automatically in your browser at http://localhost:5000. Click "Start Demo" to watch the multi-agent workflow execute in real-time.

Dashboard Features:

  • Real-time workflow progress visualization
  • Live agent activity indicators
  • Contract mismatch detection display
  • Email generation tracking
  • SLA monitoring with countdown
  • State transition timeline
  • Event log with timestamps

Command Line Interface

For headless execution or debugging:

# Run the full demo
python -m src.main --demo

# Or run step-by-step
python -m src.main --step eoi
python -m src.main --step contract-v1
python -m src.main --step contract-v2

# Test SLA overdue scenario
python -m src.main --test-sla

# Reset database
python -m src.main --reset

# Run all tests
pytest tests/ -v

# Run end-to-end integration test
pytest tests/test_end_to_end.py -v

Note: You'll need an Anthropic API key for the Extractor and Router (Claude Haiku 4.5, model id claude-haiku-4-5), and a DeepInfra API key for the Qwen3‑235B‑based Auditor and Comms LLM paths.

Project Structure

onecorp-mas/
├── CLAUDE.md              # Instructions for Claude Code
├── AGENTS.md              # Agent architecture guidelines
├── PROGRESS.md            # Task progress tracking
├── QUICKSTART.md          # Quick start guide
├── tasks.md               # Implementation task specifications
├── run_ui.py              # Visual dashboard launcher
├── requirements.txt       # Python dependencies
├── agent_docs/            # Implementation guides for each agent
│   ├── extraction.md      # Extractor agent (PDF parsing, field extraction)
│   ├── comparison.md      # Auditor agent (mismatch detection, risk scoring)
│   ├── emails.md          # Router + Comms agents (classification, generation)
│   ├── state-machine.md   # Orchestrator (states, transitions, SLA)
│   └── testing.md         # Test patterns and fixtures
├── assets/                # Visual assets
│   ├── architecture.svg   # System architecture diagram
│   └── workflow_diagram.jpeg
├── docs/                  # Project documentation
│   ├── INDEX.md           # Documentation index
│   ├── architecture.md    # System design document
│   ├── demo-script.md     # Demo walkthrough guide
│   ├── demo-recording.md  # Recording instructions
│   └── visual-ui-guide.md # UI documentation
├── spec/                  # Problem specification
│   ├── MAS_Brief.md       # Full requirements
│   ├── judging-criteria.md
│   └── transcript.md      # Stakeholder context
├── data/                  # Input data files
│   ├── source-of-truth/   # EOI PDF (source document)
│   ├── contracts/         # Contract PDFs (V1, V2)
│   ├── emails/
│   │   ├── incoming/      # Incoming email files to process
│   │   └── templates/     # Output email templates
│   └── emails_manifest.json
├── ground-truth/          # Expected outputs for validation
│   ├── eoi_extracted.json
│   ├── v1_extracted.json
│   ├── v2_extracted.json
│   ├── v1_mismatches.json
│   └── expected_outputs.json
├── src/                   # Implementation
│   ├── main.py            # Main entry point
│   ├── agents/            # LLM agents
│   │   ├── router.py      # Email classification
│   │   ├── extractor.py   # PDF field extraction
│   │   ├── auditor.py     # Contract comparison
│   │   ├── comms.py       # Email generation
│   │   └── prompts/       # Agent prompt templates
│   ├── orchestrator/      # Workflow coordination
│   │   ├── state_machine.py
│   │   ├── deal_store.py  # SQLite persistence
│   │   └── sla_monitor.py
│   ├── ui/                # Visual web dashboard
│   │   ├── app.py         # Flask server with SSE
│   │   └── templates/     # HTML dashboard
│   └── utils/             # Shared utilities
│       ├── pdf_parser.py
│       ├── email_parser.py
│       └── date_resolver.py
├── tests/                 # Unit and integration tests
├── n8n/                   # Workflow export (if using n8n)
└── prompts/               # Additional prompt files

Understanding the System

Start here:

  1. spec/MAS_Brief.md — Full problem specification
  2. spec/transcript.md — Stakeholder context
  3. docs/architecture.md — Agent design, interactions, and generalizability

For implementation:

  • agent_docs/ — Technical guides for building each agent (pattern-based, not hardcoded)

The Challenge

OneCorp processes property contracts through a complex workflow:

EOI signed → Contract received → Validated → Solicitor approved → DocuSign → Executed

Current pain points:

  • Single shared inbox for all communications
  • Manual contract checking (slow, error-prone)
  • Version control across amendments
  • SLA tracking based on appointment dates

Agent Architecture

Visual Diagram: See assets/architecture.svg for a complete visual representation of agents, data flows, and control flows.

Agent Responsibility Model
Router Email classification, deal mapping Claude Haiku 4.5
Extractor PDF field extraction from EOI/contracts Claude Haiku 4.5
Auditor Contract vs EOI comparison, risk scoring Qwen3-235B via DeepInfra
Comms Email generation (solicitor, vendor, alerts) Qwen3-235B via DeepInfra

The Orchestrator (non-LLM) manages state transitions and SLA timers.

Demo Scenario

See docs/demo-script.md for a complete 3-minute demo walkthrough.

The dataset includes:

  • 1 EOI (source of truth)
  • 2 contracts (V1 with 5 errors, V2 corrected)
  • 8 emails covering the full workflow

Demo flow:

  1. V1 contract → 5 mismatches detected → Discrepancy alert
  2. V2 contract → Validated → Sent to solicitor
  3. Solicitor approves → Vendor release email
  4. DocuSign flow → Contract executed
  5. SLA test → Remove buyer-signed email → Alert fires

Testing

# Run all tests
pytest tests/ -v

# Run specific test
pytest tests/test_comparison.py -v

# With coverage
pytest tests/ --cov=src --cov-report=html

Ground Truth Files (Test Fixtures)

Files in ground-truth/ are test fixtures for the demo dataset, not runtime data:

File Purpose
eoi_extracted.json Expected Extractor output for demo EOI
v1_extracted.json Expected Extractor output for demo V1 contract
v2_extracted.json Expected Extractor output for demo V2 contract
v1_mismatches.json Expected Auditor output when comparing V1 to EOI
expected_outputs.json Expected emails/states at each workflow step

Important: Agents should use pattern-based logic, not read these files at runtime. The system must work for ANY property deal, not just the demo.

How This System Meets Judging Criteria

This multi-agent system directly addresses the evaluation criteria outlined in spec/judging-criteria.md:

1. System Design & Architecture

  • Clear agent separation: 4 specialized LLM agents (Router, Extractor, Auditor, Comms) + deterministic Orchestrator
  • Structured communication: Agents exchange typed data through the Orchestrator (not ad-hoc prompting)
  • Visual documentation: Complete architecture diagram showing data/control flows
  • Stable & reliable: Deterministic state machine ensures predictable behavior

2. Collaboration Between Agents

  • Meaningful multi-agent workflow: Router classifies → Extractor parses → Auditor validates → Comms generates
  • Data dependencies: Auditor requires both EOI and contract data from Extractor
  • Coordinated by Orchestrator: State machine enforces workflow rules (e.g., contract can't go to solicitor until validated)
  • Emergent intelligence: Validation accuracy improves through multi-stage processing (extraction + comparison + severity assessment)

3. Creativity & Innovation

  • Hybrid classification: Router uses deterministic pattern matching with LLM fallback for ambiguous cases
  • Confidence scoring: Extractor assigns confidence to fields; low-confidence triggers re-extraction or human review
  • Version superseding: Automatic contract version management (V2 supersedes V1)
  • Semantic comparison: Auditor understands negation ("NOT subject to finance" vs "IS subject to finance")

4. Task Performance

  • Complete end-to-end workflow: EOI → Contract validation → Solicitor approval → DocuSign → Execution
  • Handles error cases: V1 contract with 5 mismatches is correctly rejected and alert generated
  • Self-correcting: V2 corrected contract proceeds smoothly through workflow
  • SLA monitoring: Detects overdue deadlines and generates alerts

5. Real-World Value

  • Solves stated problem: Addresses OneCorp's pain points (manual checking, version control, SLA tracking)
  • Production-ready features: SQLite persistence, email template generation, audit trail
  • Scalable design: Each deal isolated by ID; supports concurrent processing
  • Cost-conscious: Uses smaller models for simple tasks (classification) and larger models for reasoning (comparison)

6. Safety & Reliability

  • Guardrails implemented:
    • Confidence threshold (≥0.8) for critical fields (lot number, price, finance terms)
    • Human escalation for low-confidence extractions
    • Version superseding prevents using outdated contracts
    • Only validated contracts sent to solicitor
    • SLA alerts prevent deals from stalling
  • Error handling: PDF parsing failures, LLM API errors, invalid state transitions all handled gracefully
  • Audit trail: All events logged with timestamps in database
  • Limitations acknowledged: See Safety & Limitations section below

7. Presentation & User Experience

  • Visual web dashboard: Real-time workflow visualization with animated agent activity
  • Clear CLI interface: --demo, --step, --test-sla modes for different use cases
  • Comprehensive demo script: 3-minute walkthrough in docs/demo-script.md
  • Live event streaming: Server-Sent Events (SSE) for real-time updates without page refresh
  • Non-technical friendly: Visual indicators and status badges make the workflow accessible to all audiences
  • Complete documentation: Architecture docs, implementation guides, test coverage

Safety, Guardrails & Limitations

Built-in Safety Mechanisms

  1. Confidence-based validation

    • Critical fields (lot number, total price, finance terms) require ≥80% extraction confidence
    • Low-confidence fields trigger re-extraction or human review flags
    • No auto-approval when uncertain
  2. State machine guardrails

    • Invalid state transitions are blocked (e.g., can't execute contract before buyer signs)
    • Contract version management prevents using superseded versions
    • Only validated contracts proceed to solicitor
  3. Human-in-the-loop triggers

    • Low-confidence extractions flagged for review
    • Discrepancy alerts require human decision on amendments
    • SLA overdue alerts escalate to internal team
    • Human can intervene at any workflow stage
  4. Audit trail

    • All events logged with timestamps
    • Complete history of state transitions
    • Traceable decision path for debugging

Known Limitations

  1. LLM dependency

    • Requires API access to Claude Haiku 4.5 via Anthropic (Extractor/Router)
    • Auditor/Comms LLM paths use Qwen3-235B via DeepInfra (requires DEEPINFRA_API_KEY when enabled)
    • Extraction accuracy depends on PDF quality and structure
    • Costs scale with number of deals processed
  2. Pattern-based extraction

    • Works best with standard contract formats
    • May struggle with highly unusual document layouts
    • Requires well-formed PDFs (not scanned images without OCR)
  3. Demo scope

    • Tested with Australian property contracts
    • May need tuning for other jurisdictions or contract types
    • SLA rules currently hardcoded (2 business days after appointment)
  4. Scalability considerations

    • SQLite database suitable for single-user/demo use
    • Production deployment would need PostgreSQL/MySQL for concurrency
    • No rate limiting on LLM API calls (could hit quotas on high volume)
  5. Error recovery

    • PDF parsing failures halt processing (no fallback OCR)
    • LLM API failures require manual retry
    • No automatic recovery from database corruption

Recommended Production Enhancements

  • Multi-user authentication and authorization
  • Database migration to PostgreSQL with connection pooling
  • Rate limiting and retry logic for LLM API calls
  • Webhook integration for real-time email processing
  • Admin dashboard for monitoring deal pipeline
  • Configurable SLA rules per deal type
  • Backup/disaster recovery procedures

Key Files for Claude Code

If using Claude Code for implementation, start with:

  • CLAUDE.md — Master instructions and critical rules
  • agent_docs/ — Detailed implementation guides per agent

License

MIT

About

My solution for OneCorp's email workflow problem statement as part of the StarPlanAI hackathon

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors