Trend-Aligner: a retention time modeling-based feature alignment method for untargeted LC-MS data analysis
- Novelty: Trend-Aligner models the RT shift as global RT shift and local RT shift. The global RT shift represents the systematic chromatographic condition changes across samples, while the local RT shift represents the differential response of compounds with different physicochemical properties to chromatographic condition variations. The global RT shift is minimized through locally-weighted scatterplot smoothing regression (LOWESS), and the local RT shift is modeled on latent factor mode (LFM).
- Accuracy: To comprehensively evaluate performance, we developed a reference-based accuracy benchmarking (RAB) strategy and manually annotated four metabolomic and five proteomic datasets as reference sets, comprising 3,663 consensus features and 57,652 feature peaks. Compared to 11 widely used alignment algorithms, Trend-Aligner consistently demonstrated the highest sensitivity across datasets.
- Reliability: To assess real-world applicability, we further conducted application-oriented utility validation (AUV) by evaluating both the quantity and quality of aligned features when integrated with the match-between-runs function. Trend-Aligner achieved the best performance, exhibiting high sensitivity and specificity simultaneously, and identified 82.5% more peptides than MaxQuant.
- Efficiency: Although Trend-Aligner requires iterative optimization for LOWESS bandwidth self-adaptation and involves complex parameter computation in LFM via beam search and Gaussian kernel density estimation, Trend-Aligner's processing speed still ranked among the top tier of all currently popular algorithms. The average runtime of Trend-Aligner was around 30 seconds.
- Open source: We open-sourced Trend-Aligner under a permissive license to promote the accuracy of MS data analysis more broadly.
- Dataset: We manually annotated four metabolomic and five proteomic datasets as reference sets, comprising 3,663 consensus features and 57,652 feature peaks with associated m/z, RT, and peak area information. These comprehensively annotated datasets serve as valuable benchmarks for evaluating feature detection, quantification, and alignment accuracy.
The data distributions of evaluation datasets, feature lists, alignment files, manually annotated reference datasets, RAB results, AUV results and parameters settings are available in Zenodo.
-
Prepare the python environment based on your system and hardware.
-
Install the dependencies. Here we use ROOT_PATH to represent the root path of Trend-Aligner.
cd ROOT_PATHpip install -r requirements.txt
Trend-Aligner processes feature lists comprising m/z and RT information as its primary input. The algorithm accommodates both user-specified delimited formats (CSV/TXT) and native output formats from widely-used LC-MS data analysis platforms, including OpenMS, MZmine2, Dinosaur, XCMS, MaxQuant, and AntDAS-DDA, ensuring broad compatibility with existing data processing pipelines.
feature_list_folder_path: Path to directory containing feature list files (CSV/TXT or platform-native formats) [required]
skip_line: Number of header lines to skip when parsing files (0=no header) [default=0]
mz_col_num: Column index (1-based) containing m/z values [required, typically=1]
rt_col_num: Column index (1-based) containing RT values [required, typically=2]
area_col_num: Column index (1-based) containing peak area/intensity values [required, typically=3]
• Area values are only used for downstream analysis, not for feature alignment
• The core algorithm requires only m/z and RT information for alignment
mz_tolerance: The m/z tolerance (Da/ppm) in anchor-based pairwise matching [default=0.01]
use_ppm: Use ppm instead of Da for m/z tolerances [default=False]
centric_idx: Index of the reference sample used as alignment anchor (0=first sample by ASCII order) [default=0]
rt_tolerance: The RT tolerance (in minutes) in anchor-based pairwise matching [default=3]
frac: LOWESS smoothing parameter: float value (0<frac<=1) for manual bandwidth or 'tPRESS' for automatic adaptation [default='tPRESS']
• The larger frac, the smoother the fitted LOWESS regression curve.
beam_mz_tol: The m/z tolerance (Da/ppm) for adjacent-run feature matching during sample shift coefficient estimation [required, typical range: 0.005-0.03 Da (or 5-20 ppm)]
• Relatively narrow tolerances within reasonable ranges may lead to an increased proportion of reliable matching groups
beam_rt_tol: The RT tolerance (in minutes) for adjacent-run feature matching during sample shift coefficient estimation [required]
• Should be set according to the RT deviations after coarse alignment
• Relatively narrow tolerances within reasonable ranges may lead to an increased proportion of reliable matching groups
• Can be estimated via the RT deviation pattern in LOWESS fitting curve plots
match_mz_tol: The m/z tolerance (Da/ppm) for cross-run feature matching during analyte easy-to-shift coefficient estimation [required, typical range: 0.005-0.03 Da (or 5-20 ppm)]
• Relatively wider tolerances within reasonable ranges may help prevent missed matches
• Typically equals or moderately exceeds beam_mz_tol
match_rt_tol: The RT tolerance (in minutes) for cross-run feature matching during analyte easy-to-shift coefficient estimation [required]
• Should be set according to the RT deviations after coarse alignment
• Relatively wider tolerances within reasonable ranges may help prevent missed matches
• Typically equals or moderately exceeds beam_rt_tol
• Can be estimated via the RT deviation pattern in LOWESS fitting curve plots
max_rt_tol: The maximum RT deviation (in minutes) [required]
• This parameter usually demonstrates good robustness
• Intentionally larger than match_rt_tol to accommodate RT drift variability
• Can be estimated via the RT deviation pattern in LOWESS fitting curve plots
use_ppm: Use ppm instead of Da for m/z tolerances [default=False]
The package provides the following demonstration datasets and alignment examples:
-
MTBLS733 (QE-HF) Dataset
Feature extraction platform: MetaPro
-
EC-H (OpenMS) Dataset
Feature extraction platform: OpenMS
Trend-Aligner-master
├── demo
│ ├── metapro_example
│ ├── metapro_result
│ ├── openms_example
│ ├── openms_example_converted
│ ├── openms_result
│ ├── align_demo.py
- To run the demo:
cd ROOT_PATH
python demo/align_demo.py
Feature alignment results are saved in result and openms_result folder.
Cite our paper at:
Trend-Aligner is an open-source tool, using Mulan Permissive Software License,Version 2 (Mulan PSL v2)
For any questions involving Trend-Aligner, please contact us by email.
Ruimin Wang, ruimin.wang@yale.edu
Shouyang Ren, ren_shouyang@163.com
Changbin Yu, yu_lab@sdfmu.edu.cn
