Add configuration system, documentation, VAMB compatibility layer, and test suite#9
Draft
Add configuration system, documentation, VAMB compatibility layer, and test suite#9
Conversation
Co-authored-by: linsalrob <836231+linsalrob@users.noreply.github.com>
Co-authored-by: linsalrob <836231+linsalrob@users.noreply.github.com>
Co-authored-by: linsalrob <836231+linsalrob@users.noreply.github.com>
Co-authored-by: linsalrob <836231+linsalrob@users.noreply.github.com>
Co-authored-by: linsalrob <836231+linsalrob@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Improve pipeline documentation and configuration setup
Add configuration system, documentation, and VAMB compatibility layer
Jan 28, 2026
Owner
|
@copilot can you design a test suite for this code? |
Co-authored-by: linsalrob <836231+linsalrob@users.noreply.github.com>
Author
I've designed and implemented a comprehensive test suite for the pipeline improvements. Commit: 3e9201b Test Suite Overview61 test cases across 3 test suites, all passing ✅ 1. Bash Helper Tests (
|
Copilot
AI
changed the title
Add configuration system, documentation, and VAMB compatibility layer
Add configuration system, documentation, VAMB compatibility layer, and test suite
Jan 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pipeline lacks standardized configuration, step-by-step documentation, and fails on VAMB version mismatches. Users struggle with setup and debugging across HPC systems.
Configuration System
config/paths.env.example: Shell-sourceable template for paths, databases, resourcesconfig/samples.tsv.example: Tab-delimited sample sheet supporting paired/single-end, absolute pathsDocumentation
docs/directory_contract.md: I/O specifications per step (inputs, outputs, validation, resources, failure modes)docs/known_good_versions.md: Tested tool versions, database tracking, reproducibility baselinedocs/compat.md: VAMB version breaks, MMseqs2 DB formats, HPC filesystem differencesdocs/dev_notes.md: Shellcheck patterns, code standards, testing guidanceShared Helper Library
lib/common.shprovides POSIX-compatible functions:require_cmd,require_file,require_dir,check_nonemptylog(),die(), timestamped outputget_array_task_id(),get_fast_storage()(Slurm/PBS/system detection)Example usage:
See
pawsey_shortread/fastp_enhanced.slurmfor complete example.VAMB Compatibility
Problem:
vamb_create_fasta.pyimports vamb module → breaks on API changes between v3.x and v4.xSolution:
vamb_create_fasta_clusters.py(canonical): Reads clusters.tsv directly, no vamb import, works across all versionsvamb_create_fasta.py: Deprecated with version detection, actionable error messagesBoth scripts enhanced with input validation, progress logging, better arg names.
Test Suite
Comprehensive test suite with 61 test cases across 3 test suites:
tests/test_common.sh: 25 tests for alllib/common.shfunctions (logging, validation, file processing, HPC helpers, config loading)tests/test_vamb_create_fasta_clusters.py: 17 tests for VAMB binning script (basic functionality, size filtering, error handling)tests/test_config_files.sh: 19 tests for configuration file validation (syntax, required variables, format)tests/run_tests.sh: Main test runner with colored outputtests/README.md: Complete test suite documentation with usage examples.github/workflows/test.yml: GitHub Actions CI workflow for automated testingAll tests passing ✅. Tests use isolated environments, no external dependencies beyond standard tools.
Changes Summary
Original prompt
Copilot Instructions — atavide_lite
You are GitHub Copilot working in the
atavide_literepository. Your goal is to make the pipeline easier to run, easier to debug, and more reproducible across HPC systems, without converting it to Snakemake and without large architectural rewrites.Guiding principles
Phase 1 — Documentation + “front door” config (highest priority)
A) Add a single canonical configuration format
Create a new directory:
config/paths.env.example— shellKEY=VALUEentries that can besource’dsamples.tsv.example— example samplesheet for paired and single-end runsRequirements for
paths.env.example:Requirements for
samples.tsv.example:B) Add “directory contract” documentation
Create
docs/directory_contract.mddocumenting, for each step:Keep it concise but explicit.
C) Add “known-good versions” documentation
Create
docs/known_good_versions.mdwith:Include a short note:
D) Update README to point to the above
Update
README.md:docs/directory_contract.mddocs/known_good_versions.mdPhase 2 — Standardize script interfaces (medium priority)
A) Add a shared Bash helper:
lib/common.shCreate
lib/common.shand ensure cluster scripts can source it.It must provide:
die()— print error to stderr and exit non-zerolog()— timestamped loggingrequire_cmd()— assert a command exists in PATHrequire_file()— assert a file exists and is readablerequire_dir()— assert a directory exists (or create it if appropriate)check_nonempty()— assert output exists and size > 0load_config()— sourceconfig/paths.envif present, else error with guidanceDo not introduce external dependencies beyond POSIX/Bash.
B) Add consistent safe Bash practices in scripts
Where appropriate:
set -euo pipefailtrap '...' ERR)find ... -print0+while IFS= read -r -d ''patterns for filenamesDo not rewrite every script at once; start with 1–2 representative ones.
Phase 3 — Fix/contain VAMB script drift (high priority because it breaks runs)
A) Establish a single canonical VAMB helper script
Identify the two similar scripts (e.g.,
vamb_create_fasta.pyandvamb_create_fasta_clusters.py).Actions:
B) Add a compatibility layer for VAMB version differences
In the canonical script:
import vamb; vamb.__version__(preferred if available)python -c "import vamb; print(vamb.__version__)"in wrappersminsizevs removed/renamed args)Add a short doc:
docs/compat.mdcapturi...✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.