ResearchClawBench: Evaluating AI Agents for Automated Research from Re-Discovery to New-Discovery
-
Updated
Mar 31, 2026 - Jupyter Notebook
ResearchClawBench: Evaluating AI Agents for Automated Research from Re-Discovery to New-Discovery
Autoresearch with PhD-level workflows and modular agent skills. Built for the autonomous AI Scientist.
karpathy auto research ported to mlx so you can use your Mac
Lite research agents on proposal, experiment and review
Enhance academic workflows by auditing papers, verifying citations, and analyzing experiments with a research integrity plugin for Claude Code.
General-purpose autonomous research framework for AI agents. Inspired by Andrej Karpathy's autoresearch.
Model-agnostic flood model calibration system with automatic parameter optimization using gauge data and satellite-derived inundation extents. Inspired by AutoResearch and AutoResearchClaw.
Run your own research lab that never sleeps
Optimize AI agents autonomously by iterating code changes and evals to improve performance using LangSmith observability and automated experiments.
Measure AI agents’ performance with standardized tests across 314 tasks, 33 domains, and 4 difficulty levels for clear, reproducible comparison.
Add a description, image, and links to the auto-research topic page so that developers can more easily learn about it.
To associate your repository with the auto-research topic, visit your repo's landing page and select "manage topics."