Skip to content

Leonard2310/LexCausa

Repository files navigation

LexCausa: Framework for Causal-Aware Structured Multi-Step Reasoning in Legal Argument Generation

Python Neo4j React Groq LangChain ASPIC+ License Version

⚠️ Work in Progress - This project is under active development as part of a Master's thesis in Computer Engineering at the University of Naples Federico II.

LexCausa is an AI-powered legal reasoning system for Italian law. It combines Knowledge Graphs (Neo4j), Large Language Models (Groq Cloud), and structured causal reasoning to analyze legal claims, find relevant statutes/precedents, and build logical argumentation chains.

🎯 Features

  • Legal Claim Classification: Automatic claim classification and routing for Civil, Penal, and Administrative law via LLM
  • Domain Router: Lightweight pre-routing agent that classifies claims as CIVILE, PENALE, AMMINISTRATIVO, or ENTRAMBI
  • Hybrid Statute Retrieval: Hybrid search on 4200+ Normattiva statutes using Legal-BERT embeddings + Neo4j fulltext (rank fusion), with strict source/book pre-filtering
  • Citation Graph Expansion (Neo4j CITES): Retrieval expands seed statutes with cited statutes via (:Statute)-[:CITES]->(:Statute) before LLM relevance/applicability filters
  • Progressive Search: Adaptive retrieval that progressively expands results when post-filtering yields too few statutes, with configurable expansion steps and max rounds
  • Pre-Retrieval LLM Filtering: Soft LLM-based relevance filtering for statutes and precedents before they enter the reasoning pipeline (default-YES policy: discards only clearly irrelevant items)
  • Unified Pipeline: Search, Reasoning, Full Pipeline, and DoE all share the same singleton LegalSearchPipeline, ensuring consistency and thread safety
  • Shared Retrieved Context: Both Reasoner and CounterReasoner receive the same retrieved statutes/precedents and build opposing arguments from the same evidence base
  • Planned Iterative Chain Generation: Reasoner and Counter-Reasoner first create an execution plan (3-10 steps), then generate one LLM step per planned objective with anti-repetition and consistency checks
  • Reasoner Agent: Builds structured argumentative chains (Premise → Statute → Precedent → Causal Link → Conclusion) only on the provided knowledge base, with causality classification, precise statute and precedent citations, and a provisional causality bootstrap (plan → taxonomy anchors → enriched KB) before expensive step generation
  • Counter-Reasoner Agent: Generates counter-arguments using the causality taxonomy, selects multiple attacks from the attack pool, assigns attacks per planned step, applies attack blacklisting/feasibility filtering to stabilize generation, enforces claim-fact lock (no inversion of explicit facts), supports abstention flow when opposition is insufficient, and includes second-pass targeted retrieval and step expansion for multi-attack coverage
  • Repetition Detection: Jaccard similarity-based detection (threshold 0.70) prevents duplicate reasoning steps across the chain
  • Polisher-Evaluator Agent: Modular mixin architecture (ConsistencyMixin + ScoringMixin + NLPUtilsMixin + AQAEngineMixin) evaluating the dialectical exchange with consistency checking against Neo4j KB, citation repair, AQA scoring, counter-gate abstention classification, reasoner plausibility locking, and verdict generation
  • Consistency Checker: Verifies statute and precedent citations against Neo4j KB, classifies articles as core/peripheral, repairs mismatches via LLM-constrained rewriting (with verbatim quote validation), and drops unreliable citations
  • AQA (Argument Quality Assessment): Three-dimensional scoring — Cogency (α), NormSupport (β), Semantics (γ) — with configurable weights, active cross-attacks with domain-aware rules, attack-type classification (6 types with per-type damage multipliers), and precedent influence scoring
  • Cross-Attack Computation: Active domain-aware cross-attack engine with severity categorization, NLI contradiction detection via LLM, attack-type classification (contradiction, exception, derogation, extinction, factual_impediment, general_opposition), and configurable damage multipliers
  • Precedent Influence Scoring: ASPIC+ links receive precedent delta based on recency, bindingness (cassazione/appello/tribunale), stance confidence, and semantic similarity
  • ASPIC+ Metagraph Visualization: Interactive SVG frontend component displaying the dialectical meta-graph with PRO/CONTRA columns, curved attack arrows with damage values, chain flow arrows, and detail panel for selected links
  • Attack Text Details: Expandable frontend panel showing full attacker/target text for each active cross-attack with type, multiplier, NLI label, overlap, and damage
  • Centralized Prompt Registry: All 100+ LLM prompts managed in a single prompt_registry.py with typed PromptKey enum — covering classification, routing, filtering, reasoning, counter-reasoning, AQA, NLI, consistency repair, and abstention gate
  • Attack Coverage Scoring: AQA bonus for counter-argumentation breadth — clusters weak reasoner links into axes, measures how many distinct axes are attacked, and applies diminishing-return weights (1st hit 100%, 2nd 30%, 3rd 10%)
  • Counter-Reasoner Abstention Gate: Polisher-Evaluator gate (POLISHER_COUNTER_GATE) that classifies counter-arguments as OPPOSING_STRONG, OPPOSING_LIMITATIVE, AGREEING, or UNCLEAR; weak/agreeing counters trigger abstention, skipping AQA evaluation
  • Reasoner Plausibility Locking: Optional lock (aqa_lock_reasoner_plausibility) that freezes reasoner-side plausibility in A/B DoE tests to isolate counter-reasoner effects
  • Per-Role Model Fallback: Separate fallback chains for Reasoner and Counter-Reasoner (reasoner_model_fallback_aliases / counter_model_fallback_aliases) with independent default temperatures
  • Attack Precondition Evaluation: LLM-based verification of taxonomy preconditions for each attack, with SATISFIED/UNSATISFIED/UNCLEAR status and intra-run caching
  • Counter-Reasoner Second-Pass Retrieval: Targeted additional retrieval triggered when statutes opposing the claim are insufficient, with configurable thresholds and limits
  • Counter Step Expansion: Optional multi-attack step expansion that spawns satellite steps when a single counter-step targets multiple attacks
  • Retrieval Fail-Fast Scope: Thread-local context manager (retrieval_llm_fail_fast_scope) for fast fallback in retrieval filters without full retry
  • Resilient Groq Client: Automatic retry with exponential backoff, dynamic API key discovery (V1..V99), model fallback, model-down cache with configurable TTL; smart error classification (model-down vs. rate-limit vs. transient vs. request-too-large)
  • Caching & Filtering Efficiency: Intra-run caching for legal-context extraction, statute applicability decisions, attack preconditions, fact-lock checks, and plan target alignment, plus claim-context SQLite cache for reusing pre-retrieval outputs across repeated runs
  • Cancellation & Interruptibility: Pipeline stop endpoint and cooperative cancellation propagation across API, agents, and long-running generation/retrieval loops
  • DoE A/B Workflow: Dedicated DoE tab with one shared Reasoner run and two Counter/Evaluator setups (baseline vs. treatment), live A/B switching in the UI, consolidated DoE log/report persistence, and automatic delta analysis
  • DoE Batch Runner: Automated multi-claim batch execution (run_doe_batch.py) with resume capability (--resume), checkpoint recovery (--start-from), selective runs (--only), dry-run mode, abstention classification, and stability metrics
  • DoE Statistical Analysis: Post-hoc analysis script (scripts/analyze_doe_results.py) computing paired t-tests, sign tests, Cohen's d, domain breakdowns, verdict flips, error analysis, and intra-claim consistency from batch CSV results
  • Multi-DoE Framework: Multi-dimensional ablation framework (scripts/run_multi_doe.py + scripts/analyze_multi_doe.py) supporting RQ1 model efficacy (reasoning vs non-reasoning models, paired t-test, Cohen's d), RQ2 citation faithfulness (valid citations / total, bootstrap CI, sign test), RQ3 planning ablation (enable_planning toggle on both Reasoner and Counter-Reasoner, token/quality deltas), and token cost analysis; runs via the frontend Multi-DoE tab or as a standalone CLI; containerized via Dockerfile + compose.yml for HPC deployment
  • Causality Taxonomy: Structured causality taxonomy (Material, Legal, Concurrent) used by Reasoner and Counter-Reasoner for arguments and attacks
  • Knowledge Graph: Neo4j database with statutes, precedents, and causal relationships
  • Centralized Configuration: All parameters (150+ settings: models, retries, AQA weights, search, truncation, attack params, coverage, abstention, per-role fallback, domain-specific weights, etc.) managed by src/config.py (Pydantic Settings) and environment variables
  • Frontend Settings Panel: Collapsible panel to configure per-step LLM model, temperature, max tokens, search parameters, AQA weights, chain min/max steps, and attack parameters — without touching code
  • Per-Claim Pipeline Logging: Every pipeline run is logged to logs/<timestamp>_<slug>.log for full auditability
  • DoE and PDF Artifacts: DoE runs can persist a consolidated A/B log + JSON report, and exported PDFs are saved automatically under logs/pdf_exports/<pipeline|doe>/ in addition to browser download
  • Pre-Retrieval Claim Context Memory (SQLite): Optional cache of final applicable statutes and precedents per claim (reusable across Search/Reasoner/Counter/Pipeline runs and warmable via script)
  • React Frontend: Modern five-tab interface (Search, Reasoning, Full Pipeline, DoE A/B, Multi-DoE) with ASPIC+ Metagraph visualization, attack details, PDF export, and A/B comparison controls on Vite + React 19
  • Live Pipeline Streaming: Real-time phase progress, token streaming for chain generation (including retry attempts), refinement phase control events (reasoner_refinement_started/completed) with live chain reset/replace behavior in the frontend, plus SSE endpoints for full pipeline, standalone Counter-Reasoner, and standalone Evaluator

🏗️ Agent and Pipeline Architecture

LexCausa Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                       Frontend (React + Vite)                           │
│ Search │ Reasoning │ Full Pipeline │ DoE (A/B) │ Multi-DoE │ ⚙️ Settings │
│      + ASPIC+ Metagraph SVG + Attack Details + PDF Export               │
└─────────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                     Flask API Server (:8000)                            │
├─────────────────────────────────────────────────────────────────────────┤
│  GET  /health              → Health check with API version              │
│  GET  /api/settings        → Defaults & available models                │
│  POST /api/chat            → LegalSearchPipeline (unified retrieval)    │
│  POST /api/chat/stream     → Search SSE live streaming                  │
│  POST /api/reason          → Reasoner (iterative chain generation)      │
│  POST /api/reason/stream   → Reasoner SSE live streaming                │
│  POST /api/counter_reason  → Counter-Reasoner (iterative counter-chain) │
│  POST /api/counter_reason/stream → Counter-Reasoner SSE                 │
│  POST /api/pipeline        → Full Pipeline (Router→Reasoner→Counter→AQA)│
│  POST /api/pipeline/stream → Full Pipeline SSE token streaming          │
│  POST /api/pipeline/stop   → Stop active SSE pipeline run               │
│  POST /api/evaluate        → Polisher-Evaluator (standalone evaluation) │
│  POST /api/evaluate/stream → Polisher-Evaluator SSE                     │
│  GET  /api/stats           → Runtime usage stats (calls + tokens)       │
│  POST /api/stats/reset     → Reset runtime usage stats                  │
│  POST /api/doe/log         → Consolidated DoE log/report persistence    │
│  POST /api/doe/advanced/run        → Launch Multi-DoE batch          │
│  GET  /api/doe/advanced/status/<id>→ Poll Multi-DoE run status       │
│  POST /api/pdf/export      → Persist exported PDF artifacts             │
└─────────────────────────────────────────────────────────────────────────┘
                                  │
                    ┌─────────────┴─────────────┐
                    ▼                           ▼
┌──────────────────────────────┐  ┌──────────────────────────────────────┐
│   Resilient Groq Client      │  │  LegalSearchPipeline                 │
│   (groq_client.py)           │  │  (Singleton, thread-safe)            │
├──────────────────────────────┤  ├──────────────────────────────────────┤
│  Dynamic key discovery (V1…N)│  │  ClaimClassifier → book routing      │
│  Model fallback + down cache │  │  Hybrid retrieval (vector + fulltext)│
│  Smart error classification  │  │  Progressive + CITES expansion       │
│  Exponential backoff         │  │  Pre-retrieval LLM filtering         │
│                              │  │  Shared Retrieved Context            │
└──────────────────────────────┘  └────────────────┬─────────────────────┘
                    │                              │
                    │         ┌────────────────────┘
                    ▼         ▼ statutes + precedents
┌─────────────────────────────────────────────────────────────────────────┐
│              Reasoner / Counter-Reasoner / Polisher-Evaluator           │
├─────────────────────────────────────────────────────────────────────────┤
│  Reasoner: iterative primary chain on shared retrieved context (ASPIC+) │
│  Counter-Reasoner: iterative attack chain on shared retrieved context   │
│  Polisher-Evaluator (4 Mixins):                                         │
│    ├─ ConsistencyChecker: KB verification → citation repair/drop        │
│    ├─ ScoringMixin: readability, coherence, argument quality            │
│    ├─ NLPUtilsMixin: Flesch/FOG/SMOG, NLI via LLM                       │
│    └─ AQAEngine: Cogency(α)+NormSupport(β)+Semantics(γ)→verdict         │
│         └─ Cross-attacks (6 types) + precedent influence → net_plaus.   │
└─────────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                   Neo4j Knowledge Base + Taxonomy                       │
├─────────────────────────────────────────────────────────────────────────┤
│  📚 Italian statutes KB (Normattiva): Civil + Penal + Administrative    │
│     (L. 7 agosto 1990, n. 241)                                          │
│  ⚖️  9112 precedent chunks from 792 rulings (ITA-CaseHold)              │
│  📊 768-dim Vector Index (Legal-BERT) on statutes                       │
│  🔗 Statute citation edges: `CITES` from normalized reference fields    │
│  🔗 Causality taxonomy (Material, Legal, Concurrent)                    │
└─────────────────────────────────────────────────────────────────────────┘

📋 Prerequisites

  • Python: 3.11.x or 3.12.x
  • Docker: For running Neo4j
  • Node.js: 18+ (for frontend)
  • Poetry: Python dependency management
  • Groq API Key: Free tier available at console.groq.com

Supported Platforms

  • ✅ macOS (Apple Silicon M1/M2/M3)
  • ✅ macOS (Intel)
  • ✅ Windows 10/11 (x64)
  • ✅ Linux (x64)

🚀 Installation

1. Clone the Repository

git clone https://github.com/yourusername/LexCausa.git
cd LexCausa

2. Install Python Dependencies

macOS/Linux:

# Install Poetry if not already installed
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install --no-root

Windows (PowerShell):

# Install Poetry
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -

# Install dependencies
poetry install --no-root

Or use the project helper target (backend + frontend dependencies):

make setup

3. Configure Environment

Create a .env file in the project root (starting from .env.example):

cp .env.example .env

Then fill in your values:

# Neo4j Configuration
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password_here

# Groq Cloud API (up to 3 keys for rotation)
GROQ_API_KEY_V1=your_groq_api_key_here
GROQ_API_KEY_V2=your_second_key_here
...
GROQ_API_KEY_VN=your_third_key_here

# Notebook benchmark (Groq vs SCOPE)
GROQ_API_KEY=your_groq_api_key_for_notebook
SCOPE_API_KEY=your_scope_api_key
SCOPE_ENDPOINT=https://your-scope-endpoint/v1/chat/completions
SCOPE_EXTRA_HEADERS_JSON={}
SCOPE_HPC_USER=<USERNAME>
SCOPE_HPC_HOST=<HPC_HOST>
SCOPE_HPC_REMOTE_WORKDIR=/ibiscostorage/<USERNAME>/lexcausa_bench

# API Server
API_HOST=0.0.0.0
API_PORT=8000
DEBUG=true

Notebooks use the same .env file as the backend. In VS Code, select the same Poetry interpreter for the notebook kernel.

4. Start Neo4j Database

docker compose up -d

Wait for Neo4j to start (check at http://localhost:7474).

5. Initialize the Knowledge Base

poetry run python src/db/db_orchestrator.py

This will:

  • Create schema (indexes, constraints, graph structure)
  • Load Civil Code, Penal Code, and Administrative Law (L. 241/1990) from *_normattiva.csv with embeddings
  • Build statute citation edges (:Statute)-[:CITES]->(:Statute) from normalized internal references
  • Load ITA-CaseHold precedents metadata (no embeddings)
  • Wait for indexes to come online (vector for statutes, fulltext for precedents)

Use --clean for a full wipe and reload, or --check to inspect database status.

6. Install Frontend Dependencies

cd src/frontend
npm install

7. Run the Application

Recommended one-command startup:

make dev

Stop local stack:

make dev-stop

Manual alternative (two terminals):

# Terminal 1: Start the API server
poetry run python src/api_server.py

# Terminal 2: Start the frontend dev server
cd src/frontend && npm run dev

The frontend will be available at http://localhost:5173 and the API at http://localhost:8000.

Runtime Usage Stats

The backend now tracks runtime counters for:

  • API calls by endpoint
  • LLM calls by provider/model/source
  • Prompt/completion/total token usage (when provider metadata is available)

Query current stats:

curl http://localhost:8000/api/stats

Reset counters:

curl -X POST http://localhost:8000/api/stats/reset

Snapshots are persisted under logs/usage_stats/ (latest.json plus per-session files).

📁 Project Structure

LexCausa/
├── src/
│   ├── config.py                  # Centralized configuration (150+ Pydantic Settings)
│   ├── api_server.py              # Flask API server (DoE, SSE, PDF export endpoints)
│   ├── agents/                    # LLM agents with explicit orchestration
│   │   ├── base.py               # Base agent class + progressive search + filters 
│   │   ├── router.py             # Domain router (CIVIL/PENAL/BOTH)
│   │   ├── reasoner.py           # Iterative reasoning agent (ASPIC+)
│   │   ├── counter_reasoner.py   # Iterative counter-argumentation agent (ASPIC+)
│   │   ├── polisher_evaluator.py # Mixin compositor (Consistency+Scoring+NLP+AQA)
│   │   ├── aspic_formatter.py    # ASPIC+ IR formatting
│   │   ├── evaluation/           # Evaluation sub-package (modular mixins)
│   │   │   ├── models.py         # Dataclasses (CitationCheck, ConsistencyReport, etc.)
│   │   │   ├── consistency_checker.py  # KB verification, citation repair, chain regen
│   │   │   ├── aqa_engine.py     # AQA pipeline, cross-attacks, precedent influence
│   │   │   ├── scoring.py        # Readability, coherence, argument quality scoring
│   │   │   └── nlp_utils.py      # Flesch/FOG/SMOG, NLI via LLM, text utilities
│   │   └── tools/                # Agent tools
│   │       ├── neo4j_tools.py    # Neo4j hybrid search pipeline
│   │       ├── prompt_registry.py # Centralized prompt registry (100+ PromptKey enum)
│   │       ├── taxonomy_tools.py # Causality taxonomy
│   │       ├── config_loader.py  # Taxonomy config loader
│   │       └── config_taxonomy.json
│   ├── services/                  # Core services
│   │   ├── groq_client.py        # Resilient Groq client (dynamic key discovery, rotation)
│   │   ├── claim_classifier.py   # LLM claim classification
│   │   ├── claim_context_memory.py # SQLite pre-retrieval claim context cache
│   │   ├── pipeline_control.py   # Cooperative cancellation primitives/exceptions
│   │   ├── usage_stats.py        # Runtime API/LLM usage stats collector
│   │   └── legal_search.py       # Hybrid legal search pipeline (vector + fulltext fusion)
│   ├── db/                        # Database management
│   │   ├── db_orchestrator.py    # Full DB lifecycle (clean/schema/load/verify)
│   │   └── data_loader.py        # Centralized data loading (CSV/parquet + statute embeddings)
│   ├── data/                      # Data files
│   │   ├── embeddings/           # Pre-computed embeddings (.npy)
│   │   ├── precedents/           # ITA-CaseHold precedents (parquet)
│   │   └── statutes/              # Civil + Penal + Administrative CSVs (`*_normattiva.csv`)
│   └── frontend/                  # React frontend (Vite + React 19)
│       └── src/
│           ├── App.jsx            # Main app with Search/Reasoning/Pipeline/DoE tabs
│           ├── AspicMetagraph.jsx # ASPIC+ meta-graph SVG visualization
│           └── AttackTextDetails.jsx # Cross-attack detail panel
├── scripts/                       # Utility scripts
│   ├── analyze_doe_results.py    # Statistical analysis of DoE A/B batch results
│   ├── analyze_multi_doe.py   # Statistical analysis for Multi-DoE (RQ1/RQ2/RQ3)
│   ├── run_multi_doe.py       # Multi-DoE CLI orchestrator
│   ├── start_backend_doe_mode.sh # Docker startup helper for DoE runs
│   ├── capture_api_chat_retrieval_memory.py  # Claim context memory warmup
│   ├── start_public_demo.sh      # Cloudflare tunnel public demo launcher
│   ├── tune_aqa_real_plus_synth.py  # AQA tuning (real + synthetic)
│   ├── tune_aqa_with_gold_dataset.py  # AQA tuning (gold dataset)
│   └── tune_retrieval_claims.py  # Supervised retrieval tuning
├── experiments/                   # DoE experiments
│   ├── doe/
│   │   ├── doe_settings.json     # DoE A/B configuration
│   │   └── scripts/
│   │       └── run_doe_batch.py   # Automated DoE A/B batch runner
│   └── multi_doe/
│       └── README.md             # Multi-DoE documentation and quick start
├── notebooks/                     # Normattiva extractors + embeddings notebooks
├── logs/                          # Pipeline/DoE logs, reports, AQA artifacts, exported PDFs
├── Dockerfile                     # Multi-stage image for containerized DoE runs
├── .dockerignore                  # Docker build context exclusions
├── compose.yml                    # Docker Compose (Neo4j + Flask API)
├── pyproject.toml                 # Poetry configuration
└── README.md

🔧 Configuration

All configuration is managed through environment variables and the src/config.py Settings class (150+ parameters total). Runtime-tunable settings (model, temperature, max tokens, search parameters, AQA weights, chain steps, attack parameters) can also be adjusted from the frontend Settings panel without restarting the server. To keep this README stable across tuning changes, defaults are intentionally not duplicated here.

Required (.env)

These variables must be set in the .env file:

Variable Description
NEO4J_URI Neo4j connection URI
NEO4J_USER Neo4j username
NEO4J_PASSWORD Neo4j password
GROQ_API_KEY_V1 Primary Groq API key
GROQ_API_KEY_V2…VN Additional Groq API keys (dynamic discovery V1…V99)

Optional — LLM & Server

These can be overridden in .env or via the frontend Settings panel:

  • Groq client behavior (retry/backoff, key rotation, model-down cache TTL)
  • LLM generation knobs (temperature, max tokens)
  • API runtime settings (host/port/debug)

Model catalog and aliases (gpt_oss_120b, gpt_oss_20b, qwen_qwen3_32b, groq_llama_3_3_70b_versatile) are code-defined in src/config.py.

Optional — Embedding & Search

Configurable areas:

  • Embedding model and tokenization limits
  • Search breadth (top_k, top classified libri, precedent limits)
  • Hybrid retrieval tuning (vector/fulltext weights, candidate pool sizes, priority decay, keyword bonus)
  • Domain-specific hybrid search weights (search_vector_weight_civile, search_fulltext_weight_penale, etc.) and keyword bonus thresholds per domain
  • Citation expansion tuning (SEARCH_CITES_ENABLED, per-seed limit, max added, score decay, multi-seed bonus)
  • Progressive search thresholds/steps for expansion rounds

Pre-retrieval claim context memory (SQLite):

  • Optional per-claim cache for the final applicable statutes and precedents produced by prepare_claim_context(...)
  • Reused by /api/chat, /api/reason, /api/counter_reason, and full pipeline endpoints when enabled
  • Can be toggled from the frontend Pipeline input (Use memory / Overwrite memory)
  • Can be pre-populated in batch using scripts/capture_api_chat_retrieval_memory.py --claim-context-memory

Debug support:

  • Existing backend/pipeline logs print, for each retrieved statute: vector_rank_score, fulltext_rank_score, fusion_score, keyword_bonus, priority_multiplier.

Supervised Retrieval Tuning (Claims Gold Labels)

The script scripts/tune_retrieval_claims.py supports supervised tuning using gold labels in claims_gold_labels.json.

Query-term extraction for the fulltext branch is LLM-only (no salient fallback).

Metrics used:

  • Hit@5, Hit@10
  • MRR
  • nDCG@10

Run examples:

# supervised (default if claims_gold_labels.json is present)
poetry run python scripts/tune_retrieval_claims.py --top-k 30

# force proxy-only fallback (unsupervised)
poetry run python scripts/tune_retrieval_claims.py --unsupervised

# fulltext query terms are LLM-extracted (LLM-only mode)
poetry run python scripts/tune_retrieval_claims.py --query-terms-mode llm

# disable progress bars (CI/log-friendly)
poetry run python scripts/tune_retrieval_claims.py --query-terms-mode llm --no-progress

Output report:

  • logs/tuning/retrieval_tuning_<timestamp>.json
  • logs/tuning/retrieval_tuning_latest.json

Optional — AQA (Argument Quality Assessment)

Configurable areas:

  • AQA enable/disable and weight triplet (α, β, γ)
  • Verdict thresholds and top-K attack retention
  • Norm support scoring strategy
  • Attack gating/tuning (semantic overlap, strength ratio, damage factor, cross-codice flags)
  • Attack coverage scoring (enabled flag, similarity/overlap thresholds, min attack value, bonus weight/max/diminishing weights)
  • Reasoner plausibility locking for DoE A/B isolation (aqa_lock_reasoner_plausibility)
  • Precedent recency window and dominant-attack reporting limits

Optional — Chain Generation

Configurable areas:

  • Retry and robustness controls for chain generation
  • Planned step bounds (min/max)
  • Counter-Reasoner second-pass retrieval (enabled flag, thresholds, limits)
  • Counter step expansion (enabled flag, min attacks, satellite limits)
  • Model-down cache behavior used by the resilient Groq client

For the authoritative, always-updated list of settings, env aliases, and descriptions, see src/config.py.

🌐 Public Demo (Cloudflare Tunnel)

For a quick public URL without deploying to a cloud server, use scripts/start_public_demo.sh.

Prerequisites

Install cloudflared once:

brew install cloudflared

Run one demo instance

bash scripts/start_public_demo.sh

The script starts:

  • Flask API (src/api_server.py) on 127.0.0.1:8000
  • Vite frontend on 127.0.0.1:3000
  • Cloudflare Quick Tunnel

It prints a temporary public URL like https://...trycloudflare.com.

Run multiple isolated instances

# Instance A
API_PORT=8000 FRONTEND_PORT=3000 INSTANCE_NAME=you bash scripts/start_public_demo.sh

# Instance B (different terminal)
API_PORT=8001 FRONTEND_PORT=3001 INSTANCE_NAME=colleague bash scripts/start_public_demo.sh

Each terminal gets its own trycloudflare.com URL.

Script arguments

bash scripts/start_public_demo.sh --instance colleague --api-port 8001 --frontend-port 3001 --host 127.0.0.1

Stop

Press Ctrl+C in the tunnel terminal.
The script stops backend + frontend processes for that instance.

Troubleshooting

Port XXXX is already in use:

pkill -f "python.*src/api_server.py" || true
pkill -f "node .*vite" || true
pkill -f "cloudflared tunnel --url" || true

Blocked request. This host (...) is not allowed:

  • The script already handles this by setting --http-host-header.
  • If you run components manually, use:
cloudflared tunnel --url http://127.0.0.1:3000 --http-host-header 127.0.0.1:3000

🧪 Agent & Pipeline Development Status

🚧 In Progress

  • Export reasoning chains to structured formats (JSON-LD, RDF)
  • Extended claim-level caching: persist additional pipeline artifacts (e.g., classification/stance/evaluation outputs) beyond pre-retrieval statutes/precedents
  • Explainability layer: generate a final, user-facing explanation of the reasoning and verdict (with traceable links to retrieved statutes/precedents and attack outcomes)

📋 Planned

  • Extend statutory/procedural coverage with additional corpora: c.p.a., c.p.p., c.p.c., labour law corpus, health liability corpus, consumer law corpus, military disciplinary corpus, and industrial property corpus
  • LLM memory layer: maintain conversational context across pipeline steps to improve coherence and reduce redundant reasoning
  • Full argumentation framework (Dung-style grounded semantics visualization)
  • Multi-turn dialogue with context retention
  • Multi-jurisdiction support: extend statutory coverage to additional countries and decouple the framework from Italy-specific statutes (pluggable statute loaders + jurisdiction configs)

📄 License

This project is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0).
See the LICENSE file for details.

You are free to share and redistribute the material for non-commercial purposes, with attribution.
You may not use it for commercial purposes, nor distribute modified versions.

📚 References

  • Normattiva (statutes source): All statutes used in LexCausa (Civil Code, Penal Code, Administrative Law 241/1990) are extracted from Normattiva and stored in *_normattiva.csv. Official portal: https://www.normattiva.it.
  • ITA-CaseHold (precedents source): The ITA-CaseHold dataset, used for legal precedent extraction and summarization in this project, was introduced by Licari et al. at ICAIL 2023. Publication: https://doi.org/10.1145/3594536.3595177.

👤 Authors

Leonardo Catello@Leonard2310
Email: leonardo.catello@hotmail.com

Salvatore Maione@salvatore22maione
Email: salvatore22maione@gmail.com

Supervisors

Prof. Roberto Pietrantuono@rpietrantuono
PhD Cristian Mascia@CristianMascia

🧾 Citation

If you use LexCausa in academic work, please cite the repository.
See CITATION.cff.

This project is part of a Master's thesis at the University of Naples Federico II and is not intended for production legal use.

About

MSc Thesis Project – Framework for Causality-Aware Structured Multi-Step Reasoning in Legal Argument Generation – AI Systems Engineering, supervised by Prof. R. Pietrantuono and PhD Cristian Mascia (2026)

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors