Runabilly spins up a disposable Docker container, clones an open source project into it, and uses Claude to automatically explore, install dependencies, build, and report the results. It was created to support BOSC (Bioinformatics Open Source Conference) software evaluation workflows.
- Docker (version 20.10 or later)
- Claude Code (for the
/runabillyslash command)
The script runs preflight checks automatically: it verifies Docker is installed and running, checks the minimum version, warns if Docker has less than 4 GB of memory available (common on Docker Desktop for macOS/Windows), and warns if the Docker root directory has less than 20 GB of free disk (matters for LFS-heavy and conda-heavy bioinformatics repos). It works on both Linux and macOS.
These instructions assume you are already running Claude Code from this project directory. From the Claude Code prompt, run:
/runabilly https://github.com/jqlang/jq
Claude will automatically:
- Build the Docker base image (if needed) and create a container
- Clone the repo and explore its structure
- Detect the build system and install the required toolchain
- Attempt to build the project (up to 3 retries)
- Print a structured report with the results
- Clean up the container
To keep the container running after the build for manual exploration:
/runabilly --keep https://github.com/jqlang/jq
Claude will skip cleanup and print instructions for entering the container.
# Create a container and clone a project into it
./runabilly.sh https://github.com/jqlang/jq
# Output:
# RUNABILLY_CONTAINER=runa-jq-a1b2c3d4
# RUNABILLY_WORKDIR=/workspace/project
# Run commands inside the container
docker exec runa-jq-a1b2c3d4 bash -c 'cd /workspace/project && ls'
# Clean up when done
./runabilly.sh --cleanup runa-jq-a1b2c3d4
# Or use --keep to get an interactive container with entry instructions
./runabilly.sh --keep https://github.com/jqlang/jq
# Then enter it with:
docker exec -it runa-jq-a1b2c3d4 bashRunabilly uses a minimal Ubuntu 24.04 base image with only basic tools (git, git-lfs, curl, build-essential, etc.). No language-specific toolchains are pre-installed — they get added as needed for each project. This keeps the base image small and avoids version conflicts.
Each project gets its own isolated container named runa-<reponame>-<hash>, capped at 4 GB of memory. Everything runs inside the container via docker exec, so nothing is installed on your host machine.
- Network access: containers have full outbound internet access during the entire evaluation. Builds can
apt-get install,pip install,cargo fetch,R install.packages,conda install,git clonesubmodules, etc. Inbound network is not configured. - GPU: no GPU is exposed to the container. CUDA/ROCm/Metal-only projects are reported as WARNING with the GPU requirement noted as the hurdle.
- Git LFS:
git-lfsis installed in the base image, but clones run withGIT_LFS_SKIP_SMUDGE=1so LFS-tracked files remain pointer stubs by default. This prevents surprise multi-gigabyte pulls on data-heavy bioinformatics repos. When the build or tests actually need the LFS data, Claude opts in per repo withgit lfs install --local && git lfs pullinside the container. - Disposability: containers are torn down at the end of each run unless
--keepis passed. Nothing persists between evaluations.
Each evaluation produces a structured report with:
- SUCCESS — build completes AND the project's tests run and pass (or no test infrastructure is present). Tests are required when present: a passing build with failing tests is not a SUCCESS.
- WARNING — build completes but full test validation is blocked by an unavoidable environmental hurdle (e.g. Docker-in-Docker, large external databases, paid API keys, GPU-only). The code looks healthy; the environment can't validate it. The specific hurdle is named in the report.
- FAILURE — build fails after retries, OR tests run but fail, OR the 1-hour timeout is exceeded
- UNDEFINED — URL isn't a buildable repo (e.g. Kaggle homepage, documentation site, dataset collection)
A composite rating based on four sub-scores (each LOW / MEDIUM / HIGH):
| Factor | LOW | MEDIUM | HIGH |
|---|---|---|---|
| Time | < 60s | 60s–300s | > 300s |
| Dependencies | < 10 packages | 10–50 | > 50 or multiple toolchains |
| Exoticness | Standard build system, no workarounds | Less common build system or minor workarounds | Custom scripts, multi-stage setup, Docker-in-Docker, etc. |
| Divergence | Documented build path worked on first try | Minor adjustments needed | Documented path failed; alternate route required, or no docs |
Roll-up: EASY (all LOW), MODERATE (any MEDIUM, no HIGH), HARD (any HIGH), IMPRACTICAL (can't complete in a disposable container).
Evaluations are capped at 1 hour. If the build hasn't completed by then, the container is cleaned up and the result is reported as FAILURE.
| File | Purpose |
|---|---|
Dockerfile |
Base Ubuntu 24.04 image definition |
runabilly.sh |
Container lifecycle script (create, clone, cleanup) |
.claude/skills/runabilly/SKILL.md |
Claude Code /runabilly skill definition |
.claude/settings.local.json |
Pre-approved Docker permission patterns |
CLAUDE.md |
Project conventions for Claude Code |