METAINFORMANT

Comprehensive bioinformatics toolkit for multi-omic analysis

Overview

METAINFORMANT provides broad bioinformatics analysis modules across genomics, transcriptomics, proteomics, epigenomics, and systems biology. Built with Python 3.11+ and uv for fast dependency management.

At a Glance

Metric	Value
Modules	28 specialized analysis modules
Python Files	650+ implementation files under `src/metainformant/`
Plot Types	70+ visualization methods
Documentation	450+ project-owned `README.md` and `AGENTS.md` files

Core Capabilities

Domain	Features
DNA	Sequences, alignment, phylogenetics, population genetics, variant analysis
RNA	Amalgkit integration, ENA/SRA downloads, Kallisto quantification, industrial-scale pipelines (8,300+ samples across 28 species)
GWAS	Association testing, fine-mapping, visualization, complete GWAS pipelines
eQTL	Integration of GWAS variants and Amalgkit RNA-seq expression data
Multi-omics	Cross-omic integration, joint PCA, correlation analysis
ML	Classification, regression, feature selection, LLM integration
Visualization	Manhattan plots, heatmaps, networks, animations, publication-ready output

System Architecture

flowchart TB
    subgraph coreInfra["Core Infrastructure"]
        CORE["Core Utilities"]
    end

    subgraph molecular["Molecular Analysis"]
        DNA["DNA Analysis"]
        RNA["RNA Analysis"]
        PROT["Protein Analysis"]
        EPI["Epigenome Analysis"]
    end

    subgraph statsML["Statistical and ML"]
        GWAS["GWAS Analysis"]
        MATH["Mathematical Biology"]
        ML["Machine Learning"]
        INFO["Information Theory"]
    end

    subgraph systems["Systems Biology"]
        NET["Network Analysis"]
        MULTI["Multi-Omics Integration"]
        SC["Single-Cell Analysis"]
        SIM["Simulation"]
    end

    subgraph annotation["Annotation and Metadata"]
        ONT["Ontology"]
        PHEN["Phenotype Analysis"]
        ECO["Ecology"]
        LE["Life Events"]
    end

    subgraph utilities["Utilities"]
        QUAL["Quality Control"]
        VIZ["Visualization"]
    end

    subgraph specialized["Specialized Domains"]
        LR["Long-Read Sequencing"]
        METAG["Metagenomics"]
        SV["Structural Variants"]
        SPATIAL["Spatial Transcriptomics"]
        PHARMA["Pharmacogenomics"]
        METAB["Metabolomics"]
        MENU["Menu System"]
        CLOUD["Cloud Deployment"]
    end

    CORE --> DNA
    CORE --> RNA
    CORE --> PROT
    CORE --> EPI
    CORE --> GWAS
    CORE --> MATH
    CORE --> ML
    CORE --> INFO
    CORE --> NET
    CORE --> MULTI
    CORE --> SC
    CORE --> SIM
    CORE --> ONT
    CORE --> PHEN
    CORE --> ECO
    CORE --> LE
    CORE --> QUAL
    CORE --> VIZ
    CORE --> LR
    CORE --> METAG
    CORE --> SV
    CORE --> SPATIAL
    CORE --> PHARMA
    CORE --> METAB
    CORE --> MENU
    CORE --> CLOUD

Data Flow and Integration Architecture

graph TD
    A["Raw Biological Data"] --> B["Data Ingestion"]
    B --> C{Data Type}

    C -->|DNA| D["DNA Module"]
    C -->|RNA| E["RNA Module"]
    C -->|Protein| F["Protein Module"]
    C -->|Epigenome| G["Epigenome Module"]
    C -->|Phenotype| H["Phenotype Module"]
    C -->|Environmental| I["Ecology Module"]

    D --> J["Quality Control"]
    E --> J
    F --> J
    G --> J
    H --> J
    I --> J

    J --> K["Core Processing"]
    K --> L{Analysis Type}

    L -->|Statistical| M["GWAS Module"]
    L -->|ML| N["ML Module"]
    L -->|Information| O["Information Module"]
    L -->|Networks| P["Networks Module"]
    L -->|Systems| Q["Multi-Omics Module"]
    L -->|Singlecell| R["Single-Cell Module"]
    L -->|Simulation| S["Simulation Module"]

    M --> T["Results Integration"]
    N --> T
    O --> T
    P --> T
    Q --> T
    R --> T
    S --> T

    T --> U["Visualization"]
    U --> V["Publication Figures"]
    V --> W["Scientific Insights"]

    subgraph "Primary Data Types"
        X["Genomic"] -.-> D
        Y["Transcriptomic"] -.-> E
        Z["Proteomic"] -.-> F
        AA["Epigenetic"] -.-> G
    end

    subgraph "Analysis Workflows"
        BB["Population Genetics"] -.-> M
        CC["Feature Selection"] -.-> N
        DD["Mutual Information"] -.-> O
        EE["Community Detection"] -.-> P
        FF["Joint PCA"] -.-> Q
        GG["Trajectory Analysis"] -.-> R
    end

    subgraph "Output Formats"
        HH["Manhattan Plots"] -.-> V
        II["Heatmaps"] -.-> V
        JJ["Network Graphs"] -.-> V
        KK["Animations"] -.-> V
    end

Multi-Omic Integration Pipeline

graph TD
    A["Multi-Omic Datasets"] --> B["Sample Alignment"]
    B --> C["Batch Effect Correction"]

    C --> D{Integration Strategy}
    D -->|Early| E["Concatenated Matrix"]
    D -->|Late| F["Separate Models"]
    D -->|Intermediate| G["Meta-Analysis"]

    E --> H["Joint Dimensionality Reduction"]
    F --> I["Individual Analysis"]
    G --> J["Result Integration"]

    H --> K["Unified Clustering"]
    I --> L["Individual Clustering"]
    J --> M["Consensus Clustering"]

    K --> N["Functional Enrichment"]
    L --> N
    M --> N

    N --> O["Pathway Analysis"]
    O --> P["Network Construction"]

    P --> Q["Biological Interpretation"]
    Q --> R["Systems Biology Insights"]

    subgraph "Omic Layers"
        S["Genomics"] -.-> A
        T["Transcriptomics"] -.-> A
        U["Proteomics"] -.-> A
        V["Metabolomics"] -.-> A
        W["Epigenomics"] -.-> A
    end

    subgraph "Integration Methods"
        X["MOFA"] -.-> H
        Y["Joint PCA"] -.-> H
        Z["Similarity Networks"] -.-> H
    end

    subgraph "Biological Outputs"
        AA["Gene Modules"] -.-> Q
        BB["Regulatory Networks"] -.-> Q
        CC["Disease Pathways"] -.-> Q
        DD["Biomarkers"] -.-> Q
    end

Quality Assurance Framework

graph TD
    A["Data Processing Pipeline"] --> B["Input Validation"]
    B --> C["Type Checking"]
    C --> D["Schema Validation"]

    D --> E["Processing Logic"]
    E --> F["Error Handling"]
    F --> G["Recovery Mechanisms"]

    G --> H["Output Validation"]
    H --> I["Result Verification"]
    I --> J["Quality Metrics"]

    J --> K{Acceptable Quality?}
    K -->|Yes| L["Pipeline Success"]
    K -->|No| M["Quality Issues"]

    M --> N["Diagnostic Analysis"]
    N --> O["Error Classification"]

    O --> P{Recoverable?}
    P -->|Yes| Q["Data Correction"]
    P -->|No| R["Pipeline Failure"]

    Q --> E
    L --> S["Validated Results"]
    R --> T["Error Reporting"]

    subgraph "Validation Layers"
        U["Data Integrity"] -.-> B
        V["Business Logic"] -.-> E
        W["Statistical Validity"] -.-> H
    end

    subgraph "Quality Controls"
        X["Unit Tests"] -.-> F
        Y["Integration Tests"] -.-> I
        Z["Performance Benchmarks"] -.-> J
    end

    subgraph "Error Types"
        AA["Data Errors"] -.-> O
        BB["Logic Errors"] -.-> O
        CC["System Errors"] -.-> O
        DD["External Errors"] -.-> O
    end

Key Features

Multi-Omic Analysis: DNA, RNA, protein, and epigenome data integration
Statistical & ML Methods: GWAS, population genetics, machine learning pipelines
Single-Cell Genomics: Complete scRNA-seq analysis workflows
Network Analysis: Biological networks, pathways, community detection algorithms
Visualization Suite: 14 specialized plotting modules with 70+ plot types and publication-quality output
Modular Architecture: Individual modules or complete end-to-end workflows
Comprehensive Documentation: Repo-wide README, AGENTS, SPEC, and task guides with current signposting
Implementation Testing: Real methods in tests, real implementations with explicit unsupported-feature errors
Quality Assurance: Rigorous validation and error handling throughout
Performance Optimization: Efficient algorithms for large-scale biological data

Current Validation Snapshot

As of the 2026-05-25 stabilization pass, this checkout collects 7,736 tests and the local non-network/non-external suite passes (7,495 passed, 71 skipped, 170 deselected). Root-level audit and validation reports are retained as historical snapshots; regenerate current verification outputs under output/.

Quick Start

I Want To...

Analyze DNA sequences:

# One-liner: GC content for a short sequence
uv run python - <<'PY'
from metainformant.dna.sequence.composition import gc_content

seq = "ATGCGC"
print(f"GC: {gc_content(seq) * 100:.1f}%")
PY

Run RNA-seq pipeline (amalgkit):

# List available species configs before running an amalgkit workflow
uv run python scripts/rna/run_workflow.py --list-configs

Perform GWAS analysis:

# End-to-end Apis mellifera GWAS workflow
uv run python scripts/gwas/run_amellifera_gwas.py \
  --config config/gwas/gwas_amellifera.yaml \
  --output output/gwas/amellifera

Visualize results:

import numpy as np
from metainformant.visualization.plots.basic import heatmap

ax = heatmap(np.array([[1.0, 0.5], [0.5, 1.0]]), output_path="output/figures/heatmap.png")

Deploy to cloud (GCP):

# Inspect the GCP deployment subcommands
uv run python scripts/cloud/deploy_gcp.py --help

Choosing the Right Module

Your Data Type	Use This Module	Start Here
DNA sequences (FASTA)	`dna`	docs/dna/
RNA-seq (FASTQ, BAM)	`rna` (amalgkit)	docs/rna/
VCF + phenotypes	`gwas`	docs/gwas/workflow.md
Protein (FASTA, PDB)	`protein`	docs/protein/
Single-cell (h5ad, mtx)	`singlecell`	docs/singlecell/
Methylation arrays/bams	`epigenome`	docs/epigenome/
Microbiome (16S, metagenome)	`metagenomics`	docs/metagenomics/
Multiple omics (joint analysis)	`multiomics`	docs/multiomics/
Gene lists + GO terms	`ontology`	docs/ontology/
Phenotype traits	`phenotype`	docs/phenotype/
Ecological communities	`ecology`	docs/ecology/
Long-read (PacBio/ONT)	`longread`	docs/longread/
Networks & pathways	`networks`	docs/networks/
Information theory analysis	`information`	docs/information/
Simulation/synthetic data	`simulation`	docs/simulation/
Visualizations only	`visualization`	docs/visualization/
GCP cloud deployment	`cloud`	src/metainformant/cloud/README.md

Not sure? Read the full module matrix.

First-Time Visitor Path

Install (10 min): Follow QUICKSTART.md
Run demo (2 min): python3 scripts/core/run_demo.py
Pick your domain: See table above → click module link
Read workflow guide: Each module's docs/<module>/workflow.md
Try on sample data: Each module has tests/data/<module>/ examples
Run on your data: Replace sample paths with your files

Module Signposting

The package is intentionally broad. Treat each module's source, tests, and local README/SPEC files as the source of truth for current behavior.

Area	Packages
Core and utilities	`core`, `quality`, `visualization`, `menu`, `cloud`
Molecular omics	`dna`, `rna`, `protein`, `epigenome`, `longread`, `structural_variants`
Higher-order omics	`singlecell`, `spatial`, `multiomics`, `metabolomics`, `metagenomics`, `pharmacogenomics`
Analysis and methods	`gwas`, `ml`, `networks`, `simulation`, `math`, `information`
Annotation and ecology	`ontology`, `phenotype`, `ecology`, `life_events`
Protocol helpers	`mcp` currently provides a standalone Amalgkit monitor; no MCP server is implemented

Module Overview

Complete Module Reference

All modules live in src/metainformant/ with documentation in each module's README.md.

Module	Files	Description	Key Components	Docs
Core Infrastructure
`core/`	37	Shared utilities, I/O, logging, config, parallel processing, caching	`io/`, `data/`, `execution/`	README
Molecular Analysis
`dna/`	47	DNA sequences, alignment, phylogenetics, population genetics, variants	`sequence/`, `alignment/`, `population/`	README
`rna/`	57	RNA-seq workflows, amalgkit integration, expression quantification	`amalgkit/`, `engine/`, `analysis/`	README
`protein/`	27	Protein sequences, structure analysis, AlphaFold, UniProt integration	`sequence/`, `structure/`, `database/`	README
`epigenome/`	15	Methylation analysis, ChIP-seq, ATAC-seq, chromatin accessibility	`assays/`, `chromatin_state/`, `peak_calling/`	README
Statistical & ML
`gwas/`	78	GWAS, fine-mapping, eQTL analysis, colocalization, visualization	`finemapping/`, `visualization/`, `analysis/`	README
`math/`	29	Population genetics theory, coalescent, selection, epidemiology	`population_genetics/`, `epidemiology/`, `evolutionary_dynamics/`	README
`ml/`	22	Machine learning pipelines, classification, regression, features	`models/`, `features/`, `llm/`	README
`information/`	24	Information theory, Shannon entropy, mutual information, semantic similarity	`metrics/`, `integration/`	README
Systems Biology
`networks/`	20	Biological networks, graph algorithms, community detection, pathways	`analysis/`, `interaction/`	README
`multiomics/`	12	Multi-omic integration, joint PCA, cross-omic correlation	`analysis/`, `methods/`	README
`singlecell/`	21	scRNA-seq preprocessing, clustering, differential expression	`data/`, `analysis/`, `visualization/`	README
`simulation/`	14	Synthetic data, agent-based models, sequence simulation, ecosystems	`models/`, `workflow/`, `benchmark/`	README
Annotation & Metadata
`ontology/`	19	Gene Ontology, functional annotation, semantic similarity	`core/`, `query/`, `visualization/`	README
`phenotype/`	30	Phenotypic data curation, AntWiki integration, trait analysis	`analysis/`, `data/`, `behavior/`	README
`ecology/`	13	Community diversity, environmental correlations, species matrices	`analysis/`, `phylogenetic/`, `visualization/`	README
`life_events/`	20	Life course analysis, event sequences, temporal embeddings	`models/`, `workflow/`	README
Utilities
`quality/`	10	FASTQ quality assessment, validation, contamination detection	`io/`, `analysis/`, `reporting/`	README
`visualization/`	30	70+ plot types, heatmaps, networks, animations, publication-ready	`plots/`, `genomics/`, `analysis/`	README
Specialized Domains
`longread/`	31	Long-read sequencing (PacBio, ONT), assembly, error correction	`assembly/`, `quality/`	README
`metagenomics/`	18	Metagenomic analysis, taxonomic profiling, functional annotation	`amplicon/`, `functional/`	README
`pharmacogenomics/`	19	Drug-gene interactions, pharmacokinetics, variant interpretation	`interaction/`	README
`spatial/`	20	Spatial transcriptomics, tissue mapping, spatial statistics	`analysis/`	README
`structural_variants/`	15	SV detection, CNV analysis, breakpoint resolution	`detection/`	README
`metabolomics/`	9	Metabolomic analysis, MS data processing, pathway mapping	`analysis/`	README
`cloud/`	3	Cloud deployment helpers, Docker/GCP workflow utilities	`deployment/`	README
`mcp/`	3	Standalone helper tools for future MCP integration	`tools/`	README
`menu/`	7	Interactive CLI menu system, workflow navigation	`ui/`	README

Total: 28 package directories, 650+ Python files

Documentation

Quick Links

Documentation Guide - Complete navigation guide
Quick Start - Fast setup commands
Architecture - System design
Technical Specification - Design standards

Transcriptomics (RNA-seq)

Workflow Guide — ENA-first amalgkit streaming pipeline
Troubleshooting — IO contention & SRA setup fixes
Tissue Patching — Custom metadata correction
Ortholog Generation — Automated cross-species mapping
Step Documentation — The 11-step amalgkit process
Testing Guide - Comprehensive testing documentation
CLI Reference - Command-line interface
eQTL Integration - eQTL pipeline documentation

Module Documentation

Each module has documentation in src/metainformant/<module>/README.md and docs/<module>/.

Scripts & Workflows

The scripts/ directory contains workflow orchestrators and utilities:

Package Management: Setup, testing, quality control
RNA-seq (Amalgkit): Multi-species workflows, amalgkit integration
GWAS (Variants): Genome-scale association studies
eQTL Integration: RNA-seq + Variant cross-omics integration pipelines
Module Orchestrators: Complete workflow scripts for all domains (core, DNA, RNA, protein, networks, multiomics, single-cell, quality, simulation, visualization, epigenome, ecology, ontology, phenotype, ML, math, gwas, information, life_events)

See scripts/README.md for documentation.

CLI Interface

The metainformant command exposes a focused CLI (docs/cli.md): --version, --modules, protein utilities, quality checks, rna info, and gwas run. RNA workflows use Python imports, scripts/rna/run_workflow.py, or python -m metainformant.rna.amalgkit.

uv run metainformant --help
uv run metainformant --modules
uv run metainformant protein taxon-ids --file data/taxon_ids.txt
uv run metainformant protein comp --fasta data/proteins.fasta
uv run metainformant protein rmsd-ca --pdb-a data/structure1.pdb --pdb-b data/structure2.pdb
uv run metainformant quality batch-detect --data samples.csv --batches batches.txt
uv run metainformant gwas run --config config/gwas/gwas_pbarbatus.yaml --check

# RNA-seq workflow config discovery
uv run python scripts/rna/run_workflow.py --list-configs

See docs/cli.md for CLI documentation.

Usage Examples

DNA Analysis

from metainformant.dna.alignment.pairwise import global_align
from metainformant.dna.population import nucleotide_diversity

alignment = global_align("ACGTACGT", "ACGTAGGT")
print(f"Alignment score: {alignment.score}")

seqs = ["ATCGATCG", "ATCGTTCG", "ATCGATCG"]
print(f"Nucleotide diversity: {nucleotide_diversity(seqs):.4f}")

RNA-seq Workflow

from metainformant.rna.engine.workflow import load_workflow_config, plan_workflow

config = load_workflow_config("config/amalgkit/amalgkit_pogonomyrmex_barbatus.yaml")
for step_name, _params in plan_workflow(config):
    print(step_name)

# Inspect available species configs, then run a workflow after amalgkit is installed
uv run python scripts/rna/run_workflow.py --list-configs
uv run python scripts/rna/run_workflow.py --config config/amalgkit/amalgkit_pogonomyrmex_barbatus.yaml

GWAS Analysis

from metainformant.gwas.analysis.association import association_test_linear

result = association_test_linear(
    genotypes=[0, 1, 2, 0, 1, 2, 0, 1],
    phenotypes=[10.1, 11.0, 12.2, 9.8, 10.9, 12.0, 10.0, 11.2],
)
print(result["beta"], result["p_value"])

uv run python scripts/gwas/run_amellifera_gwas.py --config config/gwas/gwas_amellifera.yaml --output output/gwas/amellifera

Configuration

from metainformant.core.utils.config import apply_env_overrides, load_mapping_from_file

config = load_mapping_from_file("config/amalgkit/amalgkit_pogonomyrmex_barbatus.yaml")
config = apply_env_overrides(config, prefix="AK")

Visualization

import numpy as np
from metainformant.visualization.plots.basic import heatmap

heatmap(np.array([[1.0, 0.2], [0.2, 1.0]]), output_path="output/figures/correlation.png")

Core Utilities

from metainformant.core import io
from metainformant.core.io import paths
from metainformant.core.utils.logging import get_logger

logger = get_logger(__name__)
resolved = paths.expand_and_resolve("output/results.json")
io.dump_json({"ok": True}, resolved)
logger.info("Wrote %s", resolved)

Project Structure

MetaInformAnt/
 src/metainformant/ # Main package
 core/ # Core utilities
 dna/ # DNA analysis
 rna/ # RNA analysis
 protein/ # Protein analysis
 gwas/ # GWAS analysis
 ... # Additional modules
 scripts/ # Workflow scripts
 package/ # Package management
 rna/ # RNA workflows
 gwas/ # GWAS workflows
 ... # Module scripts
 docs/ # Documentation
 tests/ # Test suite
 config/ # Configuration files
 output/ # Analysis outputs
 data/ # Input data

AI-Assisted Development

This project uses AI assistance to enhance:

Code generation and algorithm implementation
Comprehensive documentation
Test case generation
Architecture design

All AI-generated content undergoes human review. See AGENTS.md for details.

Known Limitations

Module Completeness

Some modules have partial implementations or optional dependencies:

Machine Learning: Framework exists; some methods may need completion (see ML Documentation)
Multi-omics: Integration methods implemented; additional dependencies may be required
Single-cell: Requires scipy, scanpy, anndata (see Single-Cell Documentation)
Network Analysis: Algorithms implemented; regulatory network features may need enhancement

GWAS Module

Variant Download: Database download (dbSNP, 1000 Genomes) is a placeholder; use SRA-based workflow or provide VCF files
Functional Annotation: Requires external tools (ANNOVAR, VEP, SnpEff) for variant annotation
Mixed Models: Relatedness adjustment implemented; MLM methods may require GCTA/EMMAX integration

Test Coverage

Some modules have lower test success rates due to optional dependencies:

Single-cell: Requires scientific dependencies (scanpy, anndata)
Multi-omics: Framework exists, tests may skip without dependencies
Network Analysis: Tests pass; features may need additional setup

See Testing Guide for detailed testing documentation and coverage information.

Best Practices

File Naming

Use informative names: sample_pca_biplot_colored_by_treatment.png
Avoid generic names: plot1.png, output.png

Output Organization

All outputs in output/ directory
Configuration saved with results
Visualizations in subdirectories with metadata

Real Implementation Policy

All tests use implementations
No test-double or inert placeholder methods
Real API calls or graceful skips
Ensures actual functionality

Requirements

Python 3.11+
Optional: SRA Toolkit, kallisto (for RNA workflows)
Optional: samtools, bcftools, bwa (for GWAS)

Contributing

See CONTRIBUTING.md for full contribution guidelines.

Contributions are welcome! Please:

Follow the existing code style
Add tests for new features
Update documentation
Use informative commit messages

Recent Improvements

Performance Enhancements

Intelligent Caching: Automatic caching for expensive computations (Tajima's constants, entropy calculations)
NumPy Vectorization: Optimized mathematical operations for 10-100x performance improvements
Progress Tracking: Real-time progress bars for long-running analyses
Memory Optimization: Efficient algorithms for large datasets
Resilient Orchestration: Engineered automatic recovery flows and VM-level hard reset protocols to survive catastrophic 100% Docker overlay lockups caused by hidden fasterq-dump caches.

Enhanced Documentation

Comprehensive Tutorials: End-to-end guides for DNA, RNA, GWAS, and information theory workflows
Method Comparison Guides: Decision-making guides for choosing analysis algorithms
Extended FAQ: Troubleshooting and usage guidance for common scenarios
Standardized Docstrings: Consistent formatting with examples and DOI citations

Testing & Reliability

Expanded Test Coverage: 37+ new comprehensive tests with real implementations
Validation Enhancements: Improved parameter validation and error handling
Cross-Platform Compatibility: Python 3.14 support and external drive optimization
Integration Testing: Verified cross-module functionality

New Features

Enhanced GWAS Visualization: Complete visualization suite for population structure, effects, and comparisons
Information Theory Workflows: Batch processing with progress tracking
Protein Proteome Analysis: Taxonomy ID processing and proteome utilities
Advanced Error Handling: Structured error reporting with actionable guidance

Citation

If you use METAINFORMANT in your research, please cite this repository:

@software{metainformant2025,
  author = {MetaInformAnt Development Team},
  title = {MetaInformAnt: Comprehensive Bioinformatics Toolkit},
  year = {2025},
  url = {https://github.com/docxology/MetaInformAnt},
  version = {0.2.6}
}

License

This project is licensed under the Apache License, Version 2.0 - see LICENSE for details.

Contact

Repository: https://github.com/docxology/MetaInformAnt
Issues: https://github.com/docxology/MetaInformAnt/issues
Documentation: https://github.com/docxology/MetaInformAnt/blob/main/docs/

Acknowledgments

Developed with AI assistance from Cursor's Code Assistant (grok-code-fast-1)
Built on established bioinformatics tools and libraries
Community contributions and feedback

Status: Active Development | Version: 0.2.6 | Python: 3.11+ | License: Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 325 Commits
.agents/workflows		.agents/workflows
.cursor/skills		.cursor/skills
.github/workflows		.github/workflows
Plans		Plans
config		config
cursorrules		cursorrules
docs		docs
examples		examples
projects		projects
scripts		scripts
src		src
tests		tests
.Rprofile		.Rprofile
.cursorrules		.cursorrules
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
PAI.md		PAI.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SPEC.md		SPEC.md
TODO.md		TODO.md
check_core_doc_consistency.py		check_core_doc_consistency.py
link_validator.py		link_validator.py
pharma_api_reference.json		pharma_api_reference.json
pyproject.toml		pyproject.toml
temp_api_dump.json		temp_api_dump.json

Folders and files

Latest commit

History

Repository files navigation

METAINFORMANT

Overview

At a Glance

Core Capabilities

System Architecture

Data Flow and Integration Architecture

Multi-Omic Integration Pipeline

Quality Assurance Framework

Key Features

Current Validation Snapshot

Quick Start

I Want To...

Choosing the Right Module

First-Time Visitor Path

Module Signposting

Module Overview

Complete Module Reference

Documentation

Quick Links

Transcriptomics (RNA-seq)

Module Documentation

Scripts & Workflows

CLI Interface

Usage Examples

DNA Analysis

RNA-seq Workflow

GWAS Analysis

Configuration

Visualization

Core Utilities

Project Structure

AI-Assisted Development

Known Limitations

Module Completeness

GWAS Module

Test Coverage

Best Practices

File Naming

Output Organization

Real Implementation Policy

Requirements

Contributing

Recent Improvements

Performance Enhancements

Enhanced Documentation

Testing & Reliability

New Features

Citation

License

Contact

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages