This repository is the official implementation of SpecMAS: A Multi-Agent System for Self-Verifying System Generation via Formal Model Checking as accepted in 39th Conference on Neural Information Processing Systems(NeurIPS 2025).
Following is a one liner description of the code files used in to build our system.
- agent_templates - Prompt templates for the General Purpose Agentic Graph Node.
- api_server - This code implements a FastAPI-based AI agent system with HuggingFace and LangChain integrations, featuring:
LLM-powered reasoning & code execution: Uses HuggingFace models (via Llama.cpp API) for multi-step reasoning and code generation Advanced workflows: Supports ReWOO and Plan-and-Execute strategies for complex problem-solving RAG/CRAG capabilities: Implements Retrieval-Augmented Generation with document search and corrective multi-hop reasoning Model checking integration: Includes NuSMV planner and tools for Kripke structure extraction and property verification Streaming interface: Provides real-time SSE streaming for agent thought processes and results Conversation memory: Maintains chat history with buffer memory and thread management The system combines LLM inference, code execution, knowledge retrieval, and formal verification in a unified API for interactive AI-powered workflows.
-
crag_agent - Information Retriever Agent - Implements a CRAG workflow using Arxiv/Local PDFs with Chroma vectorstore and Tavily web search. Features: Hybrid RAG: Combines Arxiv papers & local PDFs (using PyPDFLoader) Relevance grading: Filters documents with LLM-based binary relevance scoring Query rewriting: Optimizes questions for better retrieval Graph-based pipeline: StateGraph workflow with conditional edges for document grading and web search HuggingFace embeddings: Uses gte-large-en-v1.5 for document vectorization Automatically switches between knowledge retrieval, web search, and answer generation for comprehensive question-answering
-
knowledge_search_templates - Prompt Templates for the Knowledge Search tool.
-
Exp_Utils - Processes AI model evaluations for system specification tasks, computes weighted scores, and generates performance visualizations. Features:
Score aggregation: Calculates composite metrics (structural alignment, property/semantic fidelity) Count normalization: Normalizes error counts across models (counterexamples, minor issues) Compliance merging: Combines quality and execution scores into final weighted metrics Visual analytics: Debugging steps comparison bar charts Dual-axis spec-quality vs compliance slope graphs Radar charts for domain-specific model performance Domain categorization: Evaluates models across 6 technical domains (protocols, concurrency, memory mgmt, etc.) Exports CSV tables and PNG visualizations for detailed model comparison and compliance analysis.
- Extraction_templates - Prompt Templates for the Kripke, Properties and NuSMV Agent.
- mrag_templates - Prompt Templates for Multi-Hop RAG. Use in Knowledge Search Tool.
- node_definitions - Implements NuSMV/Kripke/Property verification agents with LangChain-based orchestration and LLM-powered planning. Features:
Modular planners: Kripke structure extraction, property specification, and NuSMV execution pipelines StateGraph orchestration: Conditional execution of planning/steps with ReWOO and Plan-and-Execute strategies Tool integration: Executes NuSMV, reads/writes to filesystem, and uses RAG (Chroma + HuggingFace embeddings) Comprehensive verification: Tracks structural alignment, semantic fidelity, and execution compliance Visualization tools: Generates bar charts, slope graphs, and radar plots for model performance comparison Manages formal verification workflows by coordinating LLM planning, tool execution, and compliance scoring with support for SOP-based specification extraction and model validation.
- tool_templates - Prompt templates for tools used by the Kripke, Properties and NuSMV Agent.
- verification_templates - Includes templates for general question answering and domain-specific checks with configurable response formats and evaluation criteria.
Detailed Results are described in the supplementary section document.
All the benchmark files, including the Standard Operations Procedure documents, Expert-written SMV files, are in the Mini SpecBench folder.
