SpecMAS: A Multi-Agent System for Self-Verifying System Generation via Formal Model Checking

This repository is the official implementation of SpecMAS: A Multi-Agent System for Self-Verifying System Generation via Formal Model Checking as accepted in 39th Conference on Neural Information Processing Systems(NeurIPS 2025).

Code Files

Following is a one liner description of the code files used in to build our system.

agent_templates - Prompt templates for the General Purpose Agentic Graph Node.
api_server - This code implements a FastAPI-based AI agent system with HuggingFace and LangChain integrations, featuring:

LLM-powered reasoning & code execution: Uses HuggingFace models (via Llama.cpp API) for multi-step reasoning and code generation Advanced workflows: Supports ReWOO and Plan-and-Execute strategies for complex problem-solving RAG/CRAG capabilities: Implements Retrieval-Augmented Generation with document search and corrective multi-hop reasoning Model checking integration: Includes NuSMV planner and tools for Kripke structure extraction and property verification Streaming interface: Provides real-time SSE streaming for agent thought processes and results Conversation memory: Maintains chat history with buffer memory and thread management The system combines LLM inference, code execution, knowledge retrieval, and formal verification in a unified API for interactive AI-powered workflows.

crag_agent - Information Retriever Agent - Implements a CRAG workflow using Arxiv/Local PDFs with Chroma vectorstore and Tavily web search. Features: Hybrid RAG: Combines Arxiv papers & local PDFs (using PyPDFLoader) Relevance grading: Filters documents with LLM-based binary relevance scoring Query rewriting: Optimizes questions for better retrieval Graph-based pipeline: StateGraph workflow with conditional edges for document grading and web search HuggingFace embeddings: Uses gte-large-en-v1.5 for document vectorization Automatically switches between knowledge retrieval, web search, and answer generation for comprehensive question-answering
knowledge_search_templates - Prompt Templates for the Knowledge Search tool.
Exp_Utils - Processes AI model evaluations for system specification tasks, computes weighted scores, and generates performance visualizations. Features:

Score aggregation: Calculates composite metrics (structural alignment, property/semantic fidelity) Count normalization: Normalizes error counts across models (counterexamples, minor issues) Compliance merging: Combines quality and execution scores into final weighted metrics Visual analytics: Debugging steps comparison bar charts Dual-axis spec-quality vs compliance slope graphs Radar charts for domain-specific model performance Domain categorization: Evaluates models across 6 technical domains (protocols, concurrency, memory mgmt, etc.) Exports CSV tables and PNG visualizations for detailed model comparison and compliance analysis.

Extraction_templates - Prompt Templates for the Kripke, Properties and NuSMV Agent.
mrag_templates - Prompt Templates for Multi-Hop RAG. Use in Knowledge Search Tool.
node_definitions - Implements NuSMV/Kripke/Property verification agents with LangChain-based orchestration and LLM-powered planning. Features:

Modular planners: Kripke structure extraction, property specification, and NuSMV execution pipelines StateGraph orchestration: Conditional execution of planning/steps with ReWOO and Plan-and-Execute strategies Tool integration: Executes NuSMV, reads/writes to filesystem, and uses RAG (Chroma + HuggingFace embeddings) Comprehensive verification: Tracks structural alignment, semantic fidelity, and execution compliance Visualization tools: Generates bar charts, slope graphs, and radar plots for model performance comparison Manages formal verification workflows by coordinating LLM planning, tool execution, and compliance scoring with support for SOP-based specification extraction and model validation.

tool_templates - Prompt templates for tools used by the Kripke, Properties and NuSMV Agent.
verification_templates - Includes templates for general question answering and domain-specific checks with configurable response formats and evaluation criteria.

Results

Detailed Results are described in the supplementary section document.

Benchmark

All the benchmark files, including the Standard Operations Procedure documents, Expert-written SMV files, are in the Mini SpecBench folder.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Code		Code
Mini SpecBench		Mini SpecBench
LICENSE		LICENSE
README.md		README.md
image.png		image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpecMAS: A Multi-Agent System for Self-Verifying System Generation via Formal Model Checking

Code Files

Results

Benchmark

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpecMAS: A Multi-Agent System for Self-Verifying System Generation via Formal Model Checking

Code Files

Results

Benchmark

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages