Skip to content

FordyceLab/usortm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

109 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

uSort-M

Rapid and low-cost parsed variant library generation

Python 3.8+ License: MIT

uSort-M converts pooled DNA libraries into large collections of individually-isolated, sequence-verified variants at a fraction of traditional gene synthesis costs.

Overview

Traditional approaches to generating parsed variant libraries require expensive per-gene synthesis and individual cloning. uSort-M uses FACS to isolate single cells from a pooled transformation, then identifies variants by amplicon sequencing with well-specific barcodes.

Key advantages:

  • Significant cost savings compared to traditional gene synthesis
  • 10-day working time from oligo pool to verified clones
  • Scalable from tens to thousands of variants
  • Compatible with diverse library inputs

Installation

# Basic installation
pip install -e .

# Full installation with all dependencies
pip install -e ".[all]"

External Tools (for demultiplexing)

The usortm demux command requires these tools to be installed separately:

Tool Min Version Purpose Install
dorado 1.3+ Barcode demultiplexing GitHub releases
minimap2 2.20+ Reference alignment brew install minimap2 or conda install minimap2
samtools 1.16+ BAM processing & consensus brew install samtools or conda install samtools

usortm auto-discovers dorado in common locations (~/Downloads/dorado-*/bin/, ~/.dorado/bin/). You can also set DORADO_PATH, MINIMAP2_PATH, or SAMTOOLS_PATH environment variables.

Quick Start

Estimate costs

usortm estimate --library-size 500 --seq-length 300

Plan and execute a full workflow

# 1. Initialize project from variant list
usortm plan variants.csv --output my_project/

# 2. [Perform wet lab: assembly, sorting, barcoding, sequencing]

# 3. Process sequencing data (with library CSV for variant calling)
usortm demux my_project/ --fastq sequencing-data.fastq --library-csv variants.csv

# 4. Generate hit-picking list
usortm pick my_project/

# 5. Create final report
usortm report my_project/

CLI Commands

Command Description
estimate Quick cost and effort estimation
plan Initialize project from variant list
demux Demultiplex sequencing data (LevSeq barcodes via dorado, reference alignment, consensus, variant calling)
pick Generate Integra ASSIST hit-picking list (ordered by input library)
report Generate final plate maps, coverage stats, and HTML summary
integra Standalone hit-list generation (without project)

Example: Cost Estimate

usortm estimate -n 500 -l 300

# Output:
# ╭────────────────────────────────╮
# │ uSort-M Cost Estimate          │
# │ Library: 500 variants × 300 bp │
# ╰────────────────────────────────╯
#
#                   Cost Breakdown
# ╭────────────────────────┬─────────┬─────────────╮
# │ Step                   │ uSort-M │ Traditional │
# ├────────────────────────┼─────────┼─────────────┤
# │ Synthesis              │  $1,373 │     $17,500 │
# │ Cloning                │     $54 │      $6,048 │
# │ Sorting                │    $104 │         N/A │
# │ Barcoding + Sequencing │  $1,477 │        $500 │
# │ Hit-picking            │     $80 │         N/A │
# │ Total                  │  $3,088 │     $24,048 │
# ╰────────────────────────┴─────────┴─────────────╯
#
#   7.8-fold savings with uSort-M

Workflow Timeline

Day Step Duration
1 Pooled assembly + transformation 4-6 hours
2+ FACS sorting ~8 min/plate
2+ Outgrowth Overnight
3+ PCR barcoding ~50 min/plate
4-6 Sequencing 1-3 days
6+ Analysis + hit-picking 1-2 hours

Python API

from usortm.costs import cost_functions as cf

# Calculate costs
costs = cf.usortm_total_cost(
    library_sizes=[500, 1000],
    seq_lengths=[300]
)

# Run coverage simulations
from usortm.simulate import sortm

results = sortm.sortm(
    n_sims=1000,
    lib_size=500,
    skew=4,
    fold_sampling=8,
)

Documentation

Full documentation available at fordycelab.github.io/usortm

Citation

If you use uSort-M in your research, please cite:

Olivas MB, Almhjell PJ, Shanahan JD, Fordyce PM. uSort-M: Scalable isolation 
of user-defined sequences from diverse pooled libraries. bioRxiv (2026). DOI: 10.64898/2026.01.12.699065

License

MIT License - see LICENSE for details.

Links

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages