Skip to content

khourious/LSBFILT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lab-Specific Bias FILTer (LSBFILT)

This repository provides a computational framework for identifying technical artifacts in viral genomic datasets. By detecting lab-specific biases and primer-associated variants, the pipeline enables targeted masking to improve phylogenetic inference.

LSBFILT extends the workflow originally developed by Turakhia et al. (2020) for SARS-CoV-2, making it applicable to other viral genomic datasets with customizable parameters and enhanced filtering and masking capabilities.

INSTALLATION

Before installing LSBFILT on Linux or macOS, you need to install Docker Desktop and ensure Docker Desktop is running.

Linux

git clone --recursive https://github.com/khourious/LSBFILT.git
cd LSBFILT
chmod +x -R INSTALL_Unix
bash INSTALL_Unix

macOS

git clone --recursive https://github.com/khourious/LSBFILT.git
cd LSBFILT
chmod +x -R INSTALL_macOS
bash INSTALL_macOS

After installation, refresh your shell configuration:

source ~/.zshrc

INPUT DATA FORMATS

To run this pipeline, you need to provide a FASTA alignment file (with reference genome as first sequence), a NEWICK tree file generated from the alignment, a METADATA table in TSV format, and a directory containing primer scheme BED files.

The METADATA TSV-formatted table requires the following columns:

  • sequence_id: unique identifier matching the FASTA alignment and NEWICK tree
  • sequencing_lab: group/institution of the sequencing laboratory (e.g., KhouriLab, FIOCRUZ-BA)
  • sequencing_lab_country: country of the sequencing laboratory (e.g., Brazil)
  • sequencing_lib_prep: primer scheme identifier for amplicon data; otherwise, use Shotgun or Hybrid Capture (e.g., Khouri_et_al_2026)

The BED files must be named according to the sequencing_lib_prep identifiers (e.g., Khouri_et_al_2026.bed for the entry Khouri_et_al_2026).

USAGE

usage: LSBFILT.py [-h] -fasta FASTA -tree TREE -metadata METADATA -primers PRIMERS -outdir OUTDIR
                  [-minParsimony MINPARSIMONY] [-minLabAssociation MINLABASSOCIATION] [-minLdR2 MINLDR2]
                  [-minPSAC MINPSAC]

Lab-Specific Bias FILTer (LSBFILT)

options:
  -h, --help            show this help message and exit
  -fasta FASTA          path to alignment FASTA file
  -tree TREE            path to NEWICK TREE file
  -metadata METADATA    path to METADATA TSV file
  -primers PRIMERS      path to primer schemes BED files
  -outdir OUTDIR        path to output directory
  -minParsimony MINPARSIMONY
                        minimum parsimony for lab associations (default = 4)
  -minLabAssociation MINLABASSOCIATION
                        minimum fraction of allele calls from single source for lab associations (default = 0.6 =
                        60%)
  -minLdR2 MINLDR2      minimum R2 value to report linkage disequilibrium (default = 0.4)
  -minPSAC MINPSAC      minimum PS:AC ratio for masking alignment FASTA file (default = 0.5)

CITATION