Skip to content
@statsclaw

StatsClaw

Collaborative AI for Statistical Packages

StatsClaw

A workflow framework for statistical package development.

An open-source tool that helps researchers build, test, and document statistical software packages with AI agent teams.

Website · Roadmap · Contributing · Discussions


What is StatsClaw?

StatsClaw is a framework that uses AI agent teams to assist with statistical package development. You describe what you need — a bug fix, a new feature, a cross-language translation — and StatsClaw coordinates multiple AI agents to help you build, test, and document the result. It works best when a domain expert stays in the loop to guide decisions.

StatsClaw ships in two interchangeable runtimes — pick whichever coding CLI you already use:

Runtime Repo For users of
Claude Code (original) statsclaw/statsclaw Anthropic's Claude Code
Codex CLI (port) statsclaw/statsclaw-codex OpenAI's Codex CLI

Both runtimes implement the same 9-agent protocol, the same three-pipeline adversarial verification, and the same shared brain — a contribution from a Codex user benefits Claude Code users and vice-versa.

How It Works

StatsClaw orchestrates a team of 9 specialized AI agents, each operating under strict information isolation:

Agent Role
Leader Orchestrates the workflow, dispatches agents, enforces isolation
Planner Reads your paper/formulas, executes deep comprehension protocol, produces specifications
Builder Writes source code from spec.md (never sees the test spec)
Tester Validates independently from test-spec.md (never sees the code spec)
Simulator Runs Monte Carlo studies from sim-spec.md (never sees either spec)
Scriber Documents architecture, generates tutorials, maintains audit trail
Distiller Extracts reusable knowledge for the shared brain (brain mode only)
Reviewer Cross-checks all pipelines, audits tolerance integrity, issues ship/no-ship verdict
Shipper Commits, pushes, opens PRs, handles package distribution

The code, test, and simulation pipelines are fully isolated — they never see each other's specs. If all pipelines converge independently, confidence in correctness is high. This is adversarial verification by design.

Supported Languages

R Python Stata TypeScript Go Rust C C++

More languages coming — Julia is next! Want another? Let us know.


Quick Start

Prerequisites

  1. A coding CLI — either Claude Code or OpenAI Codex CLI
  2. GitHub access — Push access to your target repository
  3. Workspace repo — A GitHub repo for storing workflow artifacts (auto-created if needed)

Option 1 — Claude Code (Install as Plugin, Recommended)

Install StatsClaw as a plugin so the leader agent takes over when you're working on your statistical package:

cd ~/my-r-package
claude

# Inside Claude Code:
/plugin marketplace add statsclaw/statsclaw
/plugin install statsclaw@statsclaw --scope project

Scope recommendation: Use --scope project so StatsClaw only activates inside this specific package. If you install at the default user scope, the leader agent takes over every Claude Code session across all projects, which may be too intrusive if you also use Claude Code for non-StatsClaw work (normal coding, debugging, etc.).

Scope When to use
--scope project Recommended. Activates StatsClaw only in this package (shared with collaborators via .claude/settings.json)
--scope local Same as project but local to your machine only (not committed)
--scope user (default) Activates StatsClaw in every Claude Code session across all projects

To temporarily disable without uninstalling: /plugin disable statsclaw@statsclaw. Also available in the Claude Plugin Directory.

Option 2 — Codex CLI (One-liner)

Prefer OpenAI's Codex CLI? StatsClaw is ported verbatim — same 9 agents, same three-pipeline architecture, same brain. See statsclaw/statsclaw-codex.

curl -fsSL https://raw.githubusercontent.com/statsclaw/statsclaw-codex/main/install-remote.sh | bash

Then in a new terminal:

codex

And inside Codex:

/plugins
install statsclaw

Two-step install (if you don't like curl | bash):

git clone https://github.com/statsclaw/statsclaw-codex ~/.codex/plugins/statsclaw
bash ~/.codex/plugins/statsclaw/install.sh

Option 3 — Clone the Repo (Alternative)

Clone statsclaw/statsclaw, run Claude Code inside it, then point it at your target repo:

git clone https://github.com/statsclaw/statsclaw.git
cd statsclaw && claude

Then inside Claude Code, say work on <your-repo-url>.

Also works with Claude Desktop App (open the cloned statsclaw folder) or your IDE (open folder, run claude in terminal).

Your First Task

Just tell StatsClaw what you want. It auto-detects the language, selects the right workflow, and starts working:

work on https://github.com/your-org/your-package resolve the issues

StatsClaw will auto-detect the language, select a workflow, and start working. It will ask you clarification questions when it encounters ambiguity — your domain expertise guides the process. Results vary depending on task complexity; expect to iterate.


Learn by Example

We provide three examples from our own usage. Each is a real repository you can inspect and learn from. Your mileage may vary — these represent what worked for us with active researcher involvement.

Example 1: Iterative Refactoring of an Existing Package (1→2)

Repo: statsclaw/example-fect

What it demonstrates: Multi-day, researcher-guided refactoring of an R package for causal panel data

The Task

The fect R package (fixed effects counterfactual estimators) needed simultaneous work on six interdependent fronts: structural refactoring, CV unification, C++ convergence conditioning, plot overhaul, a 12-chapter Quarto user manual, and bug fixes.

Prompt:

work on https://github.com/statsclaw/example-fect resolve the issues,
and document any related files to my workspace
What Happened

Over 5 days and ~20 workflow runs, with ~10 substantive researcher interactions guiding the process, StatsClaw helped:

  • Grow the test suite from 131 to 590 tests (100% pass rate)
  • Make 33+ commits on the cfe branch
  • Catch 11+ bugs through adversarial BLOCK signals — several would have been hard to find manually
  • Surface a balanced-panel algebraic degeneracy through behavioral testing
  • Improve C++ EM convergence accuracy through component-wise monitoring
  • Produce a 12-chapter Quarto book with DGPs and companion scripts

The researcher's domain decisions (e.g., "use interaction(region, time)", "reduce tau to 1") were essential — the system could not have made these calls on its own.

Key StatsClaw Features Demonstrated
  • Adversarial BLOCK signals as discovery: The balanced-panel degeneracy was found purely by the Tester's behavioral test — neither the Planner nor Builder anticipated it
  • Sequential validation: Phase 1 verified before Phase 2 starts, preventing error compounding across interdependent changes
  • Cross-session continuity: 5 days of coherent development thanks to automatic HANDOFF.md
  • Book-derived tests: Quarto chapter examples became test cases (+98 tests)
  • Mathematical comprehension: The convergence fix required understanding the IFE decomposition Y = mu11' + alpha1' + 1xi' + FLambda' + epsilon

Example 2: Building a Python Package from R Source (0→1)

Repo: statsclaw/example-R2PY

What it demonstrates: Building a Python package from an R reference, with researcher review at each stage

The Task

Build a Python equivalent of the interflex R package for conditional marginal effect estimation, starting from zero Python code. No formal math document provided; the R source is the only specification. Required 3 rounds of researcher feedback to get right.

Prompt:

Read https://github.com/xuyiqing/interflex This is one R package.
Now build a Python package from it. Only translate the estimator = "linear"
part and all related functions into the new Python package. Ship everything
to statsclaw/example-R2PY and related documents into my workspace.
What Happened

In 3 iterative rounds with ~150 words of user input:

Round 1 — Initial Construction:

  • Planner reverse-engineered all contracts from R source (no math doc existed)
  • Builder implemented 14-module dispatcher architecture
  • Tester built 34 independent validation tests

Round 2 — API Refinement:

  • User requested import interflex; interflex(data, ...) (callable module pattern)
  • Required DML as a mandatory dependency using Python DoubleML
  • Builder implemented a types.ModuleType subclass to make the module callable

Round 3 — Quality Audit:

  • Found 6 bugs in code that passed all 34 tests
  • 2 bugs would have produced silently wrong statistical results:
    • Bootstrap inference silently fell through to delta method (elif vartype == "bootstrap": pass)
    • HC1 sandwich operator precedence (NumPy * binds before @)

Final product: 14 modules, ~3,500 lines, 34 tests, 10-chapter Quarto book. The audit step was critical — without it, 2 silent bugs would have shipped.

Key StatsClaw Features Demonstrated
  • Specification recovery: Planner extracted implicit contracts from R source — what each function guarantees, what invariants hold, what edge cases exist
  • Cross-language translation: Handled R/Python idiom differences (formula interface, factor handling, freq_weights routing)
  • Bug discovery beyond testing: 6 bugs in passing code caught by Reviewer's cross-pipeline audit
  • Iterative specification emergence: The full spec emerged through 3 rounds of dialogue, not upfront documentation

Example 3: Paper to Package with Monte Carlo Simulation

Repo: statsclaw/example-probit

What it demonstrates: PDF manuscript → R/C++ package + Monte Carlo simulation (three-pipeline architecture)

The Task

Implement three probit estimation methods from a 4-page PDF specification in C++ via Rcpp/Armadillo, then run a comprehensive Monte Carlo study comparing all three.

Prompt:

Build the R works from this PDF. Three probit estimation methods in C++ via
Rcpp/Armadillo: MLE (Newton-Raphson), Bayesian Gibbs sampler (Albert-Chib
data augmentation), and random-walk Metropolis-Hastings. After building, run
a Monte Carlo simulation comparing all three on bias, RMSE, coverage, and
computational speed across N={200,500,1000,5000} with 500 replications per
scenario. Show all results. Target repo: statsclaw/example-probit. Ship it.
Save the documents in my workspace.
What Happened

StatsClaw activated Workflow 11 (Simulation Study) — the full three-pipeline architecture:

  1. Planner ingested the PDF, comprehended all three estimation methods (Newton-Raphson MLE, Albert-Chib Gibbs with truncated normals, random-walk MH with proposal tuning), and produced three independent specifications

  2. Builder (from spec.md only) implemented three C++ functions using Armadillo, with R wrappers providing a consistent API

  3. Tester (from test-spec.md only) validated MLE against glm(family=binomial(link="probit")) at 10^-6 tolerance, tested Bayesian posterior convergence, verified edge cases

  4. Simulator (from sim-spec.md only) ran 6,000 total simulations (4 sample sizes x 500 reps x 3 methods), producing comparison tables for bias, RMSE, coverage, and speed

The Simulator treated the estimator as a black box — calling the implementation without knowing how it was built, providing a third independent verification pathway.

Key StatsClaw Features Demonstrated
  • Paper-to-package pipeline: PDF equations → working C++ code through the comprehension protocol
  • Three-pipeline architecture: Code, test, and simulation pipelines all isolated — a bug must fool all three
  • C++ integration: Rcpp/Armadillo compilation, export attributes, numerical stability (overflow in Phi, Mills ratio near tails)
  • Monte Carlo as verification: Simulation results independently confirm theoretical properties (unbiasedness, coverage, efficiency)
  • Reduced manual effort: Much of the boilerplate (Rcpp setup, simulation harness, documentation) was handled by the agents

Example 4: Paper-Driven Feature Addition (Paper→Feature)

Repo: statsclaw/example-panelView

What it demonstrates: Reading a methodology paper (Correia 2016) to design a new feature, with prerequisite refactoring

The Task

Add a type = "network" visualization to the panelView R package: build a bipartite graph from the panel's observation matrix (units × time periods), highlight singletons (degree-1 nodes), draw convex hulls around connected components, and support k-partite graphs for multi-way FE. Based on Figure 2 of Correia (2016).

Prompt:

Read this paper: Correia (2016), "A Feasible Estimator for Linear Models
with Multi-Way Fixed Effects". I want panelView to make figures like
Figure 2 and identify degree-1 nodes (singletons). Add type = "network":
build a bipartite graph from the panel's observation matrix. Units and
time periods as differently shaped/colored nodes, edges = "unit observed
in period". Highlight singletons, draw convex hulls around connected
components. Support 2+ sets of fixed effects (k-partite). igraph in
Suggests, not Imports. Plan first.
What Happened

The Planner comprehended Correia (2016) and identified that the existing codebase — a monolithic 37KB file — needed refactoring before a new feature could be safely added. StatsClaw made the judgment to refactor first:

  1. Refactored monolith → 4 focused modules (plot-treat.R, plot-outcome.R, plot-bivariate.R, + dispatch core)
  2. Added test suite covering all plot types, sub-modes, both formula and explicit-variable interfaces
  3. Fixed 3 ggplot2 deprecation bugs (sizelinewidth, unsafe class() checks)
  4. Replaced stale vignette with 5-chapter Quarto manual
  5. Added ARCHITECTURE.md (13.9 KB) documenting dispatch architecture and 8-stage data pipeline
  6. CRAN prep: zero errors, zero warnings in R CMD check, version bumped to v1.2.1
  7. Produced specs for the type = "network" feature as a clean new module
Key StatsClaw Features Demonstrated
  • Paper comprehension → feature design: The Planner read a methodology paper not to implement the estimator, but to extract a visualization concept from its exposition
  • Proactive refactor-first judgment: The system identified refactoring as a prerequisite before the user asked for it
  • Test coverage from zero: Added tests for a package that previously had none
  • CRAN compliance: Handled warnings, DESCRIPTION fields, stale vignettes systematically

What Does the Workspace Look Like?

Repo: statsclaw/example-workspace

Every StatsClaw run automatically generates structured process records — comprehension artifacts, dual specifications, audit trails, run logs, and handoff documents. These are synced to a dedicated workspace repository, keeping your target repos clean. See example-workspace for the actual artifacts produced during all four examples above, including:

  • comprehension.md — auditable evidence the system understood your methodology before writing code
  • spec.md / test-spec.md / sim-spec.md — independent specifications (Builder, Tester, Simulator never see each other's)
  • review.md — cross-pipeline convergence audit with per-test results
  • HANDOFF.md — cross-session continuity (each new session picks up where the last left off)

What Can StatsClaw Help With?

Task How it helps Limitations
Implementing methods Assists with translating specs into code Requires researcher to validate mathematical correctness
Cross-language translation Handles R↔Python idiom differences May miss subtle numerical edge cases without careful review
Testing & validation Independent test pipeline catches bugs tests miss Empirical verification, not formal proofs
Monte Carlo studies Automates simulation harness and reporting Researcher must design meaningful DGPs and metrics
Paper-driven features Reads methodology papers to design new functionality Extracts concepts, not full estimator implementations
Bug fixing Adversarial architecture helps find hidden bugs Complex domain bugs still need human insight
Documentation Generates Quarto books, API docs Needs researcher review for accuracy

Shared Brain — Collective Knowledge

StatsClaw has a shared knowledge system where techniques discovered during workflows — mathematical methods, coding patterns, validation strategies, simulation designs — are extracted, privacy-scrubbed, and contributed to a collective knowledge base. When you enable Brain mode, your agents get smarter by reading knowledge contributed by all users.

How it works:

  1. Read — Your agents automatically access relevant knowledge entries from statsclaw/brain
  2. Contribute — After noteworthy workflows, the distiller agent extracts reusable knowledge. You review everything and approve or decline — nothing is shared without your explicit consent
  3. Earn badges — Accepted contributions earn virtual badges on the Contributors leaderboard

Privacy guarantee: All contributions are automatically scrubbed of repo names, file paths, usernames, proprietary code, and any identifying information. Only generic, reusable knowledge is shared.

Repo Purpose
statsclaw/brain Curated knowledge — agents read from here
statsclaw/brain-seedbank Contribution staging — users submit PRs here

Brain mode is optional — you choose at session start. See the Brain System Documentation for full details.

Example Prompts

# Fix a specific issue
fix issue #42 in my-package

# Build from scratch
build a Python package from this R code

# Cross-language migration
rewrite the Python backends in pure R and ship it

# Simulation study
run a Monte Carlo study comparing these three estimators

# Paper to package
build the R works from this PDF

# Paper-driven feature
read Correia (2016) and add network visualization to panelView

# Documentation
update the documentation for v2.0

# Enable shared knowledge
enable brain

# Share what you learned
/contribute

Citation

If you use StatsClaw in your research or software development, please cite:

Qin, Tianzhu and Yiqing Xu. 2026. "StatsClaw: An AI-Collaborative Workflow for Statistical Software Development." arXiv preprint arXiv:2604.04871.

@article{qinxu2026statsclaw,
  title={StatsClaw: An AI-Collaborative Workflow for Statistical Software Development},
  author={Qin, Tianzhu and Xu, Yiqing},
  year={2026},
  eprint={2604.04871},
  archivePrefix={arXiv},
  primaryClass={cs.SE},
  doi={10.48550/arXiv.2604.04871},
  url={https://arxiv.org/abs/2604.04871},
  note={arXiv:2604.04871 [cs.SE]}
}

License

The StatsClaw framework is released under the MIT License. Brain knowledge entries are shared under CC BY 4.0.

Get Involved

We are building StatsClaw in the open. Everyone is welcome.


statsclaw.ai

A tool for statisticians and econometricians. Works best with an expert in the loop.

Popular repositories Loading

  1. statsclaw statsclaw Public

    A workflow framework for statistical package development

    77 13

  2. .github .github Public

    Organization profile and community health files

  3. example-R2PY example-R2PY Public

    Python

  4. example-probit example-probit Public

    R

  5. example-fect example-fect Public

    Forked from xuyiqing/fect — fixed effect counterfactual estimators

    R

  6. example-panelView example-panelView Public

    StatsClaw example: reproducing panelView network visualization from Correia (2016)

    R

Repositories

Showing 10 of 11 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…