We’re using agents to fill this out but humans are welcome to do fixes and improve the scripts.
Important
CLIs only. No transitive or library-only deps go here.
Only rarely changing data, not eg. popularity, download counts or rankings.
Run the deterministic pipeline:
scripts/build.pyUse scripts/build.py --refresh for daily source refreshes. The pipeline writes
remote and intermediate data to cache/, then publishes committed YAML stages:
deterministic/<formula>.yml: source-backed generator outputagents/<formula>.yml: schema-validated Codex enrichment, including confidence/provenancehuman-override/<formula>.yml: hand-authored correctionscombined/<formula>.yml: final public output
It also derives Homebrew executable indexes from this repository’s own published YAML:
cache/brew/executables.json:formula -> [executables]cache/brew/executable-entries.json:executable -> formulacache/brew/cask-entries.json: supported binary casks andcask:<token>executable entriescache/automic-vault/db.json: Automic Vault-compatible Homebrew authority DB
The pipeline also builds cache/cratesio/index.json from the crates.io daily
database dump. That index is used for Cargo/crates.io package pages only; Cargo
crates are not exported into the Automic Vault authority DB.
Precedence is deterministic < agents < human override. Raw Codex JSON is kept
under cache/ for resumability/debugging; YAML in agents/ is the committed
agent-curated layer. Confidence and provenance fields stay in the agent stages
and are not copied into combined/.
Agent YAML may also include a geiger block with the package risk color,
classifier confidence, category, reasons, and signals. This is agent-stage
context and is not copied into combined/.
Maintenance is designed to be run by Codex cron automations. Codex owns the schedule; repo scripts are one-shot entrypoints and each invocation runs exactly one task.
scripts/automation-runner.sh db
scripts/automation-runner.sh npm-full-scanRunner jobs:
db: runs the hourly package database refresh throughscripts/hourly-maintenance.py. This refreshes source data, prepares hourly missing-curated-field enrichment if no older prepared run is waiting, builds isotope summaries/releases, exports and health-checkscache/automic-vault/db.json, publishes the public DB, rebuilds package-page derived data, and commits tracked stable outputs ashourly: refresh package database.npm-full-scan: runsscripts/build-db.py --refresh --npm-full-scanfollowed byscripts/build-combined-json.py.
Each runner job writes cache/automation/<job>.status.json and appends
cache/automation/<job>.log. Use scripts/codex-automation-status.sh to inspect
both jobs, recent logs, active maintenance processes, and public DB freshness.
Runner invocations are lock-protected; if a job is already running, another
invocation reports that condition and exits cleanly.
AI enrichment automations must use the controller flow in
scripts/codex-enrichment-controller.md. Scheduled Python scripts only prepare
codex-output.json targets; they do not shell out to nested codex exec by
default. Apply completed controller runs with the command emitted by:
python3 scripts/enrichment-controller.py next-run --jsonManual nightly helpers still exist for one-off Codex cron tasks:
scripts/nightly-maintenance.py --list
scripts/nightly-maintenance.py <task> --dry-run