Skip to content

Releases: FullLengthFanatic/tecap

v0.4.0

08 May 10:35

Choose a tag to compare

Changelog

[0.4.0] — 2026-05-07

Added

  • Self-explanatory prose blocks in HTML reports for non-expert
    readers (manuscript reviewers, conference attendees, collaborators
    who haven't run tecap themselves). Three additions, sourced from
    new constants in tecap/constants.py:
    • REPORT_INTRO: top-level intro paragraph rendered above every
      single-sample and multi-sample report. Names what the report is,
      what was measured, and where the input came from.
    • HOW_TO_READ_COMPARE: reading guide rendered only in
      multi-sample comparison reports. Maps polylines / grouped bars /
      colours back to samples.
    • GLOSSARY: two-column glossary table at the bottom of every
      report. Defines TE, UTR, CDS, polyA, PAS, APA, oligo-dT.
  • New report helpers _intro_html, _how_to_read_html,
    _glossary_html in tecap/report.py. Wired into both
    build_single_report and build_compare_report.
  • New CSS classes .intro and .howto for visually framed prose
    callouts (matches existing .tile styling).

v0.3.2

04 May 09:53

Choose a tag to compare

Changelog

[0.3.2] — 2026-05-04

Prose-only patch. No code, schema, CLI, or behavior change.

Changed

  • Removed an unsupported mechanistic attribution from the README,
    CITATION.cff, .zenodo.json, and the basecomp plot caption emitted
    by tecap/constants.py. Earlier wording attributed the moderate-A
    vs classical-A split to "saturating-local-concentration oligo-dT
    chemistries (10x GEM droplets, BD Rhapsody capture beads)". That
    claim is not supported by available bench data: bulk Iso-Seq final
    oligo-dT is 1.2 µM, in line with FS-ONT and BD Rhapsody, so bulk
    oligo-dT concentration cannot be the axis driving the split.
  • All retracted prose has been replaced with empirical-only language:
    single-cell prep datasets (10x, BD Rhapsody, ArgenTag, plate
    FLASH-seq) cluster in the 30-50% A regime; bulk Iso-Seq datasets
    cluster past the >=60% A line. The biochemical driver of the split
    is currently uncharacterized.

Note on [0.3.0] history

The [0.3.0] block below still contains the original
saturating-local-concentration prose. That entry is preserved verbatim
because v0.3.0 / v0.3.1 are already published on GitHub, Zenodo, and
bioconda; rewriting locked release notes would be misleading. This
[0.3.2] entry documents the retraction.

v0.3.1

01 May 13:05

Choose a tag to compare

[0.3.1] — 2026-05-01

CLI ergonomics. JSON schema unchanged.

Changed

  • tecap report now accepts space-separated paths for
    --classify-json and --basecomp-json (nargs="+") instead of
    comma-separated. Commas are legal in POSIX paths, so the comma form
    was a real ambiguity. The new form matches standard argparse
    conventions.

Migration

  • Replace any comma-separated invocation
    tecap report --classify-json A.json,B.json --basecomp-json A_bc.json,B_bc.json ...
    with space-separated
    tecap report --classify-json A.json B.json --basecomp-json A_bc.json B_bc.json ....

v0.3.0

30 Apr 13:25

Choose a tag to compare

[0.3.0] — 2026-04-30

Readability, reporting, and ergonomics. JSON schema unchanged.

Added

  • Single source of truth for mechanism / bucket prose in
    tecap/constants.py (MECHANISM_DEFINITIONS, BUCKET_DEFINITIONS,
    PLOT_CAPTIONS). README, HTML report, plot captions, and tecap explain
    all render from these dicts.
  • tecap explain [--mechanism NAME] [--scope classify|basecomp|all] [--format text|json] — print the glossary at the terminal.
  • tecap report --classify-json A.json[,B.json,...] [--basecomp-json ...] --out-html OUT.html — single self-contained HTML
    per sample, or cross-sample comparison. Embeds PNGs as base64. No JS,
    no external CSS, no CDN.
  • tecap classify / basecomp accept --genome {GRCh38,GRCm38,GRCm39}
    to auto-fetch missing references via the existing
    fetch_polya_atlas / fetch_gencode_gtf. Adds --gtf-version and
    --ref-cache (default $XDG_CACHE_HOME/tecap / ~/.cache/tecap).
    Explicit --polya-sites / --gtf still win.

Changed

  • comparison_terminal_exon.png panel 1 is now horizontal grouped bars,
    one row per category, samples grouped within each row. Long category
    names are readable; the previous overlapped vertical x-axis labels are
    gone.
  • comparison_terminal_exon.png panels 2 and 3 (UTR-bin rates) switched
    from grouped vertical bars to line+marker plots, one polyline per
    sample. Bars hid samples with low MechA-correct rates at N>=4; lines
    scale to any sample count. Caption rewritten to describe left/middle/
    right panels (was "top/bottom").
  • comparison_basecomp.png switched from side-by-side bars to step-line
    overlay, one polyline per sample per bucket, with a single
    figure-level sample legend. Side-by-side bars were unreadable at
    N>=4. Figure size scales with sample count.
  • All four plotting functions now draw a figure-level caption explaining
    what the plot shows.
  • Basecomp PNGs now carry an explicit figure-level legend for the grey
    band ("30-50% A: moderate-A priming") and the dashed line
    (">=60% A: classical A-tract"). Per-bucket subplot titles include the
    bucket's interpretation.
  • Moderate-A priming attribution updated based on a 4-sample comparison
    (10x Kinnex, BD Rhapsody Kinnex, PacBio Kinnex bulk cerebellum, PacBio
    Kinnex bulk heart, all human GRCh38). MechB_aspecific frac[30,50]:
    10x 0.36, BD46 0.25, Kinnex cerebellum 0.16, Kinnex heart 0.16. Bulk
    Iso-Seq samples instead show heavy classical-A enrichment (frac>=60
    ~ 0.47). The moderate-A signature is characteristic of saturating-
    local-concentration oligo-dT chemistries: 10x GEM droplets (gel bead
    dissolves and releases oligo-dT into a ~1 nL droplet) and BD Rhapsody
    capture beads (oligo-dT density at the bead surface). Free oligo-dT
    at standard concentrations (Iso-Seq, ~20 µL RT) shows classical-A
    internal priming instead. The previous attribution to "saturating
    in-solution oligo-dT" generally was too broad: 10x and Iso-Seq both
    use in-solution oligo-dT, but only 10x's droplet volume creates the
    saturating local concentration that drives moderate-A priming.
    Captions, README, CITATION.cff, and .zenodo.json updated.

Fixed

  • download-atlas: PolyASite URLs in download.py 404'd against the live
    service. Replaced with the actual paths under
    polyasite.unibas.ch/download/atlas/{2.0,3.0}/.... Human is v3.0
    (GENCODE_42), mouse is v2.0 (GRCm38.96) — PolyASite v3.0 mouse is not
    published.
  • --genome now accepts {GRCh38, GRCm38, GRCm39} to reflect the asymmetric
    PolyASite/GENCODE coverage (GRCh38 has both; GRCm38 has PolyASite only;
    GRCm39 has GENCODE only).

Known limitations / open questions

  • The 4-sample chemistry comparison covers droplet-scale (10x), bead-
    surface (BD Rhapsody), and bulk-tube (Kinnex Iso-Seq, ~20 µL RT)
    oligo-dT environments. It does not cover plate-scale RT (1-10 µL
    per well, in-solution oligo-dT, e.g. Smart-seq2 / Smart-seq3 /
    FLASH-seq), which would test whether the moderate-A signature tracks
    reaction-volume scale or chemistry lineage. FLASH-seq amplification
    (used in BD Rhapsody) is a Smart-seq descendant, so plate Smart-seq
    could plausibly look like BD46 (chemistry-driven moderate-A) or like
    Iso-Seq (scale-driven, no moderate-A).
  • This gap exists because no clean public human Smart-seq2/3 +
    PacBio HiFi dataset was findable as of 2026-04-29: PacBio's
    Kinnex-single-cell-RNA and MAS-Seq buckets are 10x-only;
    HIT-scISOseq corneal limbus is Smart-seq2-derived but not directly
    downloadable; Al'Khafaji TIL T cell data (dbGaP phs003200) is
    10x-derived. Mouse Smart-seq2 + Iso-Seq exists (SRP225196) but
    cross-species adds noise on top of PolyASite v2.0 cluster-type
    filter mismatches.
  • Follow-up paths for v0.4: (a) request HIT-scISOseq raw BAMs from
    Zheng/Chen et al. (Sun Yat-sen University); (b) watch PacBio's
    public bucket for plate Smart-seq deposits; (c) generate an in-house
    Smart-seq3 + Kinnex library if the question stays open.

v0.2.0

25 Apr 12:43

Choose a tag to compare

Performance

Multi-threaded classify and basecomp now scale. Workers inherit
gene_index, gene_records, and polya_index via fork() copy-on-write
instead of pickling them per task. The master process previously saturated
100% CPU on pickle, making --threads N>1 slower than --threads 1 on
10x Kinnex BAMs (195 declared contigs). That's gone.

Contigs with mapped == 0 in the BAM index are skipped at the parent,
avoiding ~165 wasted dispatches on typical 10x Kinnex inputs.

Added

  • tecap download-atlas --genome {GRCh38,GRCm38,GRCm39} fetches the
    PolyASite atlas and (with --gtf-version) the matching GENCODE GTF.
  • {sample}_tecap_mqc.json: MultiQC custom-content table written by
    classify. Captured %, MechA-correct %, MechB-aspecific %, PAS+ fractions,
    orientation-match fraction.
  • conda/meta.yaml bioconda recipe (PR #64853 pending).
  • docker/Dockerfile micromamba-based image.
  • GitHub Actions: pytest matrix on Python 3.10/3.11/3.12 + ruff lint.
  • CITATION.cff and .zenodo.json for the Zenodo DOI:
    10.5281/zenodo.19762736.

Unchanged

JSON output of classify and basecomp is byte-for-byte identical to v0.1
on the same input. No schema bump.