Convert research papers into runnable experiments automatically.
Paper-to-Code (p2c) is a tool that transforms academic research papers into executable simulation code. It extracts methodologies, equations, and assumptions from papers and generates reproducible Python simulations that users can run and modify.
Paper (PDF/ArXiv)
↓
Fetch & Parse
↓
Extract Content (equations, methods, assumptions)
↓
Generate Code
↓
Execute & Validate
↓
Results + Visualizations
- 🔍 Paper Fetching: Download papers directly from ArXiv or load local files
- 🔬 Content Extraction: Automatically extract equations, methods, and code snippets
- 🤖 Code Generation: Generate Python simulation code from extracted content (LLM-powered)
- 🧪 Experiment Execution: Run simulations with tunable parameters
- 📊 Result Visualization: Generate plots and metrics
- 🎛️ Assumption Management: Easy tweaking of simulation parameters
pip install -r requirements.txt
pip install -e .pip install -e ".[llm]"paper2code fetch arxiv:2301.12345paper2code parse --arxiv 2301.12345paper2code run arxiv:2301.12345Or from a local file:
paper2code run /path/to/paper.pdfp2c/
├── core/ # Data models and pipeline orchestration
│ ├── models.py # PaperDocument, Result dataclasses
│ └── pipeline.py # Pipeline orchestrator
├── fetcher/ # Paper fetching (ArXiv, local files)
│ ├── arxiv_client.py
│ └── loader.py
├── extractor/ # Content extraction from papers
│ └── content_extractor.py
├── codegen/ # Code generation from content
│ └── generator.py
├── executor/ # Experiment execution and results
│ └── runner.py
├── cli/ # Command-line interface
│ └── main.py
└── config/ # Configuration and defaults
├── defaults.yaml
└── assumptions.py
Edit p2c/config/assumptions.py to set domain-specific defaults:
DOMAIN_ASSUMPTIONS = {
"optimization": {
"learning_rate": 0.01,
"batch_size": 32,
"num_epochs": 100,
},
"simulation": {
"time_step": 0.01,
"simulation_time": 10.0,
},
}from p2c.config.assumptions import get_assumptions
# Get domain-specific assumptions
params = get_assumptions("optimization")pytestblack .
flake8 .- LLM-powered code generation (currently generates templates)
- Advanced equation parsing and conversion to Python
- Multi-domain support (ML, optimization, physics, statistics)
- Automatic validation of generated code
- Plotting and result visualization
- Parameter sweep and sensitivity analysis
- Results caching and reproducibility tracking
from p2c.fetcher import PaperLoader
from p2c.extractor import ContentExtractor
from p2c.codegen import CodeGenerator
from p2c.executor import ExperimentRunner
# Load paper
loader = PaperLoader()
paper = loader.load_from_arxiv("2301.12345")
# Extract content
extractor = ContentExtractor()
paper = extractor.extract(paper)
# Generate code
codegen = CodeGenerator()
result = codegen.generate(paper)
# Execute
runner = ExperimentRunner()
result = runner.execute(result)
print("Success!" if result.execution_success else "Failed!")MIT License - see LICENSE file for details
Contributions welcome! Please submit issues and pull requests.
If you use Paper-to-Code in your research, please cite:
@software{paper2code2024,
title={Paper-to-Code: Automated Simulation Generation from Research Papers},
author={Contributors},
year={2024},
}