Skip to content

vineetver/favor-cli

FAVOR CLI

From raw variants to biological mechanisms in one tool.
Annotate. Enrich. Analyze. Interpret.

Install · Quick Start · Storage Format · Validation · Agent Reference · Roadmap

CI Release License Rust Platform


Install

curl -fsSL https://raw.githubusercontent.com/vineetver/favor-cli/master/install.sh | sh

Quick start

favor setup
favor ingest input.vcf.gz
favor annotate input.ingested.parquet
favor enrich input.annotated.parquet --tissue brain
favor staar --genotypes cohort.vcf.gz --phenotype pheno.tsv \
  --trait-name LDL --covariates age,sex,PC1,PC2 --annotations annotated.parquet

Commands

Command Description
favor setup Configure tier, data paths, HPC environment, memory budget
favor ingest Ingest any variant format: WGS VCF, variant lists, credible sets, TSV/CSV/parquet
favor annotate Annotate with FAVOR functional annotations (200–508 GB, 24 chromosomes)
favor enrich Tissue-specific overlay: eQTL, sQTL, ChromBPNet, enhancer-gene links
favor staar STAAR rare-variant association testing (single-study)
favor meta-staar MetaSTAAR cross-biobank rare-variant meta-analysis (summary-stat based)
favor schema Inspect annotation table schemas

All commands support --format json and --dry-run. See AGENTS.md for the machine interface.

STAAR architecture

Genotypes are stored as a canonical sparse matrix G over (sample_id, variant_vcf). All queries — region, gene, MAF, annotation — resolve to aligned vectors over variant_vcf. Scoring is carrier-indexed at O(total_MAC), not O(n_samples x n_variants).

Layer What Cache key
Build VCF x annotations -> sparse_g.bin + variants.parquet + membership.parquet SHA-256(VCF content, annotation content)
Score cache Null model -> U, K per gene (store key, trait, covariates)
Test Slice cached U/K per mask -> Burden, SKAT, ACAT-V -> omnibus (mask predicate, MAF cutoff)

Interactive results: Plotly.js summary with Manhattan, QQ, and volcano plots.

See docs/storage.md for the storage format and docs/validation.md for validation against the R STAARpipeline.

Data packs

Pack Size Description
favor-base 200 GB 40 curated annotation columns including pathogenicity, frequency, clinical, conservation, regulatory, aPC STAAR channels
favor-full 508 GB All 54 annotation columns including dbnsfp, ENCODE, MaveDB, COSMIC
eqtl 3 GB GTEx v10 eQTL/sQTL/apaQTL, 50 tissues, SuSiE fine-mapped
sc-eqtl 48 GB Single-cell eQTL: OneK1K, DICE, PsychENCODE
regulatory 18 GB cCRE tissue signals, chromatin states, accessibility
enhancer-gene 12 GB ABC, EPIraction, rE2G, EpiMap, CRISPRi
tissue-scores 5 GB ChromBPNet, allelic imbalance

Roadmap

Tracked in GitHub Issues with milestones:

Milestone Focus
v0.2.0 STAAR hardening: GRM, multi-VCF, AI-STAAR, MultiSTAAR, performance
v0.3.0 MetaSTAAR: cross-biobank meta-analysis
v0.4.0 Variant interpretation: scoring, fine-mapping, colocalization, V2G
v1.0.0 Nextflow orchestration, provenance, QC

Citation

FAVOR CLI implements the STAAR framework and the FAVOR annotation database. If you use this tool, please cite:

Li Z*, Li X*, Zhou H, et al. A framework for detecting noncoding rare variant associations of large-scale whole-genome sequencing studies. Nature Methods, 19(12), 1599-1611 (2022). DOI: 10.1038/s41592-022-01640-x

Li X*, Li Z*, Zhou H, et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature Genetics, 52(9), 969-983 (2020). DOI: 10.1038/s41588-020-0676-4

Zhou H, Verma V, Li X, et al. FAVOR 2.0: A reengineered functional annotation of variants online resource for interpreting genomic variation. Nucleic Acids Research, 54(D1), D1405-D1414 (2026). DOI: 10.1093/nar/gkaf1217

Li TC, Zhou H, Verma V, et al. FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations. Bioinformatics Advances, 4(1), vbae143 (2024). DOI: 10.1093/bioadv/vbae143

License

GPL-3.0

Packages

 
 
 

Contributors

Languages