Skip to content

CSi-Ti/Trend-Aligner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trend-Aligner: a retention time modeling-based feature alignment method for untargeted LC-MS data analysis

Highlights

  • Novelty: Trend-Aligner models the RT shift as global RT shift and local RT shift. The global RT shift represents the systematic chromatographic condition changes across samples, while the local RT shift represents the differential response of compounds with different physicochemical properties to chromatographic condition variations. The global RT shift is minimized through locally-weighted scatterplot smoothing regression (LOWESS), and the local RT shift is modeled on latent factor mode (LFM).
  • Accuracy: To comprehensively evaluate performance, we developed a reference-based accuracy benchmarking (RAB) strategy and manually annotated four metabolomic and five proteomic datasets as reference sets, comprising 3,663 consensus features and 57,652 feature peaks. Compared to 11 widely used alignment algorithms, Trend-Aligner consistently demonstrated the highest sensitivity across datasets.
  • Reliability: To assess real-world applicability, we further conducted application-oriented utility validation (AUV) by evaluating both the quantity and quality of aligned features when integrated with the match-between-runs function. Trend-Aligner achieved the best performance, exhibiting high sensitivity and specificity simultaneously, and identified 82.5% more peptides than MaxQuant.
  • Efficiency: Although Trend-Aligner requires iterative optimization for LOWESS bandwidth self-adaptation and involves complex parameter computation in LFM via beam search and Gaussian kernel density estimation, Trend-Aligner's processing speed still ranked among the top tier of all currently popular algorithms. The average runtime of Trend-Aligner was around 30 seconds.
  • Open source: We open-sourced Trend-Aligner under a permissive license to promote the accuracy of MS data analysis more broadly.
  • Dataset: We manually annotated four metabolomic and five proteomic datasets as reference sets, comprising 3,663 consensus features and 57,652 feature peaks with associated m/z, RT, and peak area information. These comprehensively annotated datasets serve as valuable benchmarks for evaluating feature detection, quantification, and alignment accuracy.

Trend-Aligner workflow

Image

Datasets

The data distributions of evaluation datasets, feature lists, alignment files, manually annotated reference datasets, RAB results, AUV results and parameters settings are available in Zenodo.

Setup

  1. Prepare the python environment based on your system and hardware.

  2. Install the dependencies. Here we use ROOT_PATH to represent the root path of Trend-Aligner.

    cd ROOT_PATH

    pip install -r requirements.txt

Run Trend-Aligner

Supported formats

Trend-Aligner processes feature lists comprising m/z and RT information as its primary input. The algorithm accommodates both user-specified delimited formats (CSV/TXT) and native output formats from widely-used LC-MS data analysis platforms, including OpenMS, MZmine2, Dinosaur, XCMS, MaxQuant, and AntDAS-DDA, ensuring broad compatibility with existing data processing pipelines.

FeatureListReadingParams

feature_list_folder_path: Path to directory containing feature list files (CSV/TXT or platform-native formats) [required]

skip_line:      Number of header lines to skip when parsing files (0=no header) [default=0]

mz_col_num:     Column index (1-based) containing m/z values [required, typically=1]

rt_col_num:     Column index (1-based) containing RT values [required, typically=2]

area_col_num:   Column index (1-based) containing peak area/intensity values [required, typically=3]
                • Area values are only used for downstream analysis, not for feature alignment
                • The core algorithm requires only m/z and RT information for alignment

CoarseAlignmentParams

mz_tolerance:   The m/z tolerance (Da/ppm) in anchor-based pairwise matching [default=0.01]

use_ppm:        Use ppm instead of Da for m/z tolerances [default=False]

centric_idx:    Index of the reference sample used as alignment anchor (0=first sample by ASCII order) [default=0]

rt_tolerance:   The RT tolerance (in minutes) in anchor-based pairwise matching [default=3]

frac:           LOWESS smoothing parameter: float value (0<frac<=1) for manual bandwidth or 'tPRESS' for automatic adaptation [default='tPRESS']
                • The larger frac, the smoother the fitted LOWESS regression curve.

FineMatchingParams

beam_mz_tol:    The m/z tolerance (Da/ppm) for adjacent-run feature matching during sample shift coefficient estimation [required, typical range: 0.005-0.03 Da (or 5-20 ppm)]
                • Relatively narrow tolerances within reasonable ranges may lead to an increased proportion of reliable matching groups

beam_rt_tol:    The RT tolerance (in minutes) for adjacent-run feature matching during sample shift coefficient estimation [required]
                • Should be set according to the RT deviations after coarse alignment
                • Relatively narrow tolerances within reasonable ranges may lead to an increased proportion of reliable matching groups
                • Can be estimated via the RT deviation pattern in LOWESS fitting curve plots

match_mz_tol:   The m/z tolerance (Da/ppm) for cross-run feature matching during analyte easy-to-shift coefficient estimation [required, typical range: 0.005-0.03 Da (or 5-20 ppm)]
                • Relatively wider tolerances within reasonable ranges may help prevent missed matches
                • Typically equals or moderately exceeds beam_mz_tol

match_rt_tol:   The RT tolerance (in minutes) for cross-run feature matching during analyte easy-to-shift coefficient estimation [required]
                • Should be set according to the RT deviations after coarse alignment
                • Relatively wider tolerances within reasonable ranges may help prevent missed matches
                • Typically equals or moderately exceeds beam_rt_tol
                • Can be estimated via the RT deviation pattern in LOWESS fitting curve plots

max_rt_tol:     The maximum RT deviation (in minutes) [required]
                • This parameter usually demonstrates good robustness
                • Intentionally larger than match_rt_tol to accommodate RT drift variability
                • Can be estimated via the RT deviation pattern in LOWESS fitting curve plots

use_ppm:        Use ppm instead of Da for m/z tolerances [default=False]

Demos

The package provides the following demonstration datasets and alignment examples:

  1. MTBLS733 (QE-HF) Dataset

    Feature extraction platform: MetaPro

  2. EC-H (OpenMS) Dataset

    Feature extraction platform: OpenMS

Trend-Aligner-master
├── demo
│   ├── metapro_example
│   ├── metapro_result
│   ├── openms_example
│   ├── openms_example_converted
│   ├── openms_result
│   ├── align_demo.py
  • To run the demo:

cd ROOT_PATH

python demo/align_demo.py

Feature alignment results are saved in result and openms_result folder.

Citation

Cite our paper at:

License

Trend-Aligner is an open-source tool, using Mulan Permissive Software License,Version 2 (Mulan PSL v2)

Contacts

For any questions involving Trend-Aligner, please contact us by email.

Ruimin Wang, ruimin.wang@yale.edu

Shouyang Ren, ren_shouyang@163.com

Changbin Yu, yu_lab@sdfmu.edu.cn

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors