Skip to content

An R package for molecular subtyping by integrating the tumor microenviroment heterogeneity of colorectal cancer

Notifications You must be signed in to change notification settings

CityUHK-CompBio/CMSPlus

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CMSPlus

An R package for molecular subtyping by integrating the tumor microenviroment heterogeneity of colorectal cancer

Part I: Installation

install.packages("devtools")
devtools::install_github("yswutan/CMSPlus")

Part II: Running

library(CMSPlus);
library(GSVA);
library(CMSclassifier);
library(ComplexHeatmap);
library(circlize);
library(randomForest);
library(ranger);
library(parallel)

res <- CMSPlus(exp2symbol=test.profile)
names(res)
# [1] "CMSPlusLabel" "CMSLabel" "gsva_matrix" "HeatmapPlot"
  • CMSPlus Parameters

    • exp2symbol: A dataframe with Gene Expression Profiles data values, samples in columns, genes in rows, rownames corresponding to gene symbols.
    • prob: A numeric value between 0 and 1 specifying the minimum posterior probability threshold. Required to assign a predicted subtype. Samples with maximum subtype probability below this threshold will be classified as "unassigned". Default is 0.5.
    • plot: TRUE produces plots; FALSE suppresses plotting. Default is TRUE.
    • CMSlabel: A character string specifying the label used in plots to denote CMS subtypes. Default is "RF.nearestCMS".
    • CMSPluslabel: A character string specifying the label used in plots to denote CMSPlus subtypes. Default is "nearest".
    • parallel.sz: Integer specifying the number of parallel workers used for GSVA computation. Default is 1.
    • InterGroupRandomize: Logical value indicating whether to perform randomization in column order within subtypes plots. Default is TRUE.
    • seed: Integer specifying the random seed used to ensure reproducibility of randomization step.

Part III: Output

  • res$CMSLabel

    • Content: A sample-by-subtype probability matrix generated by the CMSPlus model, representing the predicted posterior probability of each colorectal cancer sample belonging to five CMSPlus subtypes, together with the nearest subtype and final predicted label.
    Column Description
    CMS1 Predicted probability that the sample belongs to CMS1 subtype, as estimated by the CMSPlus classification model.
    CMS2 Predicted probability that the sample belongs to CMS2 subtype.
    CMS3 Predicted probability that the sample belongs to CMS3 subtype.
    CMS4-TME- Predicted probability that the sample belongs to the CMS4 subtype with low tumor microenvironment (TME) infiltration.
    CMS4-TME+ Predicted probability that the sample belongs to the CMS4 subtype with high tumor microenvironment (TME) infiltration.
    nearest Subtype with the maximum posterior probability for the sample, irrespective of any probability threshold.
    predict The subtype is assigned only if its probability is both the maximum among all subtypes and greater than or equal to the user-defined probability threshold (prob). Samples not meeting this criterion are labeled as mix
  • res$CMSPlusResult

    • Content: Subtype probabilities and CMS subtype assignments generated by the CMSclassifier package using a Random Forest (RF)–based classification model.
    Column Description
    RF.CMS1.posteriorProb Posterior probability that the sample belongs to CMS1, estimated by the Random Forest classifier.
    RF.CMS2.posteriorProb Posterior probability that the sample belongs to CMS2, estimated by the Random Forest classifier.
    RF.CMS3.posteriorProb Posterior probability that the sample belongs to CMS3, estimated by the Random Forest classifier.
    RF.CMS4.posteriorProb Posterior probability that the sample belongs to CMS4, estimated by the Random Forest classifier.
    RF.nearestCMS CMS subtype with the highest posterior probability for the sample, regardless of confidence threshold.
    RF.predictedCMS The subtype is assigned only if its posterior probability is both the maximum among all subtypes and greater than 0.5.
  • res$gsva_matrix

    • Content: A matrix of single-sample gene set enrichment scores computed from gene expression profiles using GSVA (Gene Set Variation Analysis). The matrix represents pathway-level activity inferred from gene expression data across individual samples, based on a predefined set of 42 biological pathways.
    Dimension Description
    Rows 42 curated pathway gene sets used for subtype inference
    Columns Individual samples
    Values GSVA enrichment scores, reflecting the relative activity of each pathway within each sample
  • res$HeatmapPlot

    • Content: A pathway-level heatmap visualizing GSVA enrichment scores across samples, with integrated molecular subtype annotations.
    Heatmap

About

An R package for molecular subtyping by integrating the tumor microenviroment heterogeneity of colorectal cancer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%