SkillFoundry

Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources

Shuaike Shen*, Wenduo Cheng*, Mingqian Ma, Alistair Turcan, Martin Jinye Zhang, Jian Ma†

Ray & Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University

[*Equal contribution · †Correspondence: jianma@cs.cmu.edu]

Overview

Modern scientific ecosystems are rich in procedural knowledge — repositories, APIs, scripts, notebooks, documentation, databases, and papers — yet much of this knowledge remains fragmented and difficult for agents to operationalize. SkillFoundry bridges this gap with a self-evolving framework that converts heterogeneous scientific resources into validated, reusable agent skills.

_{Figure 1. SkillFoundry framework overview: from domain knowledge tree to validated skill library.}

Key Results


267+ skills	mined across 28 scientific domains and 254 subdomains
71.1% novelty	vs. existing skill libraries (SkillHub, SkillSMP)
5/6 datasets improved	on MoSciBench benchmark
Genomics boost	substantial gains on two challenging genomics tasks

How It Works

SkillFoundry uses a domain knowledge tree as both a search prior and the evolving structure being updated, turning open-ended skill collection into a closed-loop acquisition process:

Step	Stage	Description
1	Tree Construction	Build a rooted tree where internal nodes are domains/subdomains and leaves are actionable skill targets
2	Resource Mining	Select focus branches and retrieve relevant resources (repos, APIs, papers, notebooks, databases)
3	Skill Compilation	Extract operational contracts and compile into reusable skill packages with metadata, dependencies, and tests
4	Multi-Level Validation	Apply execution testing, system testing, and synthetic-data testing
5	Tree Expansion	Insert validated skills as new leaves, expanding domain coverage
6	Refinement & Loop	Revise, merge, or prune failing/redundant skills; repeat from step 2

Repository Structure

SkillFoundry/
├── skillfoundry/             # Core automation framework (Python package)
│   ├── cli.py                #   CLI entry point
│   ├── orchestrator.py       #   Skill automation orchestrator
│   ├── campaign.py           #   Long-running campaign runner
│   ├── evaluation.py         #   Hierarchical skill evaluation
│   └── ...
├── scripts/                  # Utility & validation scripts
├── registry/                 # Taxonomy, resource registry, skill index
├── skills/                   # Reusable skill folders grouped by domain (27 domains)
├── tests/                    # Test suites (smoke, integration, regression)
├── site/                     # Generated project page (static HTML/JS/CSS)
├── ref/                      # Reference materials
└── Makefile                  # Build, validate, test, and smoke targets

Getting Started

Prerequisites

Python 3.10+

Installation

git clone https://github.com/ma-compbio-lab/SkillFoundry.git
cd SkillFoundry
pip install -e .       # Install the skillfoundry package

Quick Validation

make validate        # Validate repository structure
make build-site      # Build the project page
make test            # Run unit tests

Framework Usage

The skillfoundry package provides a CLI for automated skill discovery, compilation, and evaluation. It orchestrates the closed-loop tree_check -> resource_search -> skill_build -> skill_test -> refresh pipeline.

Status

Inspect the current repository summary and identify high-value frontier leaves:

python3 scripts/sciskill_framework.py --json status --focus-limit 10

Cycle

Run one or more automation loops to discover and build new skills:

# Single loop
python3 scripts/sciskill_framework.py cycle --loops 1 --verification-mode standard

# Parallel workers with custom focus
python3 scripts/sciskill_framework.py cycle \
  --loops 2 --focus-limit 12 --stage-workers 4 \
  --stages tree_check,resource_search,skill_build,skill_test,refresh \
  --extra-context "Prioritize uncovered leaves in robotics and physics."

Design Skill

Design a skill from a specific task description:

python3 scripts/sciskill_framework.py design-skill \
  --prompt "Design a skill for literature-backed pathway enrichment benchmarking." \
  --verification-mode validate

Evaluate Skills

Run hierarchical evaluation (correctness repair, benchmarking, novelty checking):

# Single skill
python3 scripts/sciskill_framework.py evaluate-skills \
  --skill-slug openalex-literature-search \
  --verification-mode validate

# Full library
python3 scripts/sciskill_framework.py evaluate-skills --all --verification-mode none

Campaign

Run a long checkpointable campaign targeting specific domains:

python3 scripts/sciskill_framework.py campaign \
  --focus-term genomics --focus-term proteomics \
  --max-iterations 100 --max-runtime-minutes 450 \
  --stage-workers 6 --evaluation-workers 6

Citation

Citation information will be available once the paper is published. Check back later.

License

This project is developed at Ma Lab, Carnegie Mellon University.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SkillFoundry

Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources

Overview

Key Results

How It Works

Repository Structure

Getting Started

Prerequisites

Installation

Quick Validation

Framework Usage

Status

Cycle

Design Skill

Evaluate Skills

Campaign

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
integration		integration
registry		registry
regression		regression
scripts		scripts
site		site
skillfoundry		skillfoundry
skills		skills
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
settings.json		settings.json

Folders and files

Latest commit

History

Repository files navigation

SkillFoundry

Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources

Overview

Key Results

How It Works

Repository Structure

Getting Started

Prerequisites

Installation

Quick Validation

Framework Usage

Status

Cycle

Design Skill

Evaluate Skills

Campaign

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages