From raw variants to biological mechanisms in one tool.
Annotate. Enrich. Analyze. Interpret.
Install · Quick Start · Storage Format · Validation · Agent Reference · Roadmap
curl -fsSL https://raw.githubusercontent.com/vineetver/favor-cli/master/install.sh | shfavor setup
favor ingest input.vcf.gz
favor annotate input.ingested.parquet
favor enrich input.annotated.parquet --tissue brain
favor staar --genotypes cohort.vcf.gz --phenotype pheno.tsv \
--trait-name LDL --covariates age,sex,PC1,PC2 --annotations annotated.parquet| Command | Description |
|---|---|
favor setup |
Configure tier, data paths, HPC environment, memory budget |
favor ingest |
Ingest any variant format: WGS VCF, variant lists, credible sets, TSV/CSV/parquet |
favor annotate |
Annotate with FAVOR functional annotations (200–508 GB, 24 chromosomes) |
favor enrich |
Tissue-specific overlay: eQTL, sQTL, ChromBPNet, enhancer-gene links |
favor staar |
STAAR rare-variant association testing (single-study) |
favor meta-staar |
MetaSTAAR cross-biobank rare-variant meta-analysis (summary-stat based) |
favor schema |
Inspect annotation table schemas |
All commands support --format json and --dry-run. See AGENTS.md for the machine interface.
Genotypes are stored as a canonical sparse matrix G over (sample_id, variant_vcf). All queries — region, gene, MAF, annotation — resolve to aligned vectors over variant_vcf. Scoring is carrier-indexed at O(total_MAC), not O(n_samples x n_variants).
| Layer | What | Cache key |
|---|---|---|
| Build | VCF x annotations -> sparse_g.bin + variants.parquet + membership.parquet |
SHA-256(VCF content, annotation content) |
| Score cache | Null model -> U, K per gene | (store key, trait, covariates) |
| Test | Slice cached U/K per mask -> Burden, SKAT, ACAT-V -> omnibus | (mask predicate, MAF cutoff) |
Interactive results: Plotly.js summary with Manhattan, QQ, and volcano plots.
See docs/storage.md for the storage format and docs/validation.md for validation against the R STAARpipeline.
| Pack | Size | Description |
|---|---|---|
| favor-base | 200 GB | 40 curated annotation columns including pathogenicity, frequency, clinical, conservation, regulatory, aPC STAAR channels |
| favor-full | 508 GB | All 54 annotation columns including dbnsfp, ENCODE, MaveDB, COSMIC |
| eqtl | 3 GB | GTEx v10 eQTL/sQTL/apaQTL, 50 tissues, SuSiE fine-mapped |
| sc-eqtl | 48 GB | Single-cell eQTL: OneK1K, DICE, PsychENCODE |
| regulatory | 18 GB | cCRE tissue signals, chromatin states, accessibility |
| enhancer-gene | 12 GB | ABC, EPIraction, rE2G, EpiMap, CRISPRi |
| tissue-scores | 5 GB | ChromBPNet, allelic imbalance |
Tracked in GitHub Issues with milestones:
| Milestone | Focus |
|---|---|
| v0.2.0 | STAAR hardening: GRM, multi-VCF, AI-STAAR, MultiSTAAR, performance |
| v0.3.0 | MetaSTAAR: cross-biobank meta-analysis |
| v0.4.0 | Variant interpretation: scoring, fine-mapping, colocalization, V2G |
| v1.0.0 | Nextflow orchestration, provenance, QC |
FAVOR CLI implements the STAAR framework and the FAVOR annotation database. If you use this tool, please cite:
Li Z*, Li X*, Zhou H, et al. A framework for detecting noncoding rare variant associations of large-scale whole-genome sequencing studies. Nature Methods, 19(12), 1599-1611 (2022). DOI: 10.1038/s41592-022-01640-x
Li X*, Li Z*, Zhou H, et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature Genetics, 52(9), 969-983 (2020). DOI: 10.1038/s41588-020-0676-4
Zhou H, Verma V, Li X, et al. FAVOR 2.0: A reengineered functional annotation of variants online resource for interpreting genomic variation. Nucleic Acids Research, 54(D1), D1405-D1414 (2026). DOI: 10.1093/nar/gkaf1217
Li TC, Zhou H, Verma V, et al. FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations. Bioinformatics Advances, 4(1), vbae143 (2024). DOI: 10.1093/bioadv/vbae143
GPL-3.0