Skip to content

areElkin/ClusterATAC

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ClusterATAC

ClusterATAC is a cancer subtype tool based on ATAC-seq profiles. The input for the framework is high-dimensional omics data (such as ATAC-peak data) of all the tumor samples. The output is the corresponding subclass label for each sample. ClusterATAC is mainly divided into two components: 1. GAN-based feature extraction module is used to obtain abstract features from deep learning using high-dimensional original input data. 2. A GMM-based clustering module for determining the number of clusters and the cluster labels corresponding to the input.

# the input raw data file is all.txt and runs the following command to finish all processes: 
python ClusterATAC.py -i ./all.txt  
# the Clustering output file are stored in ./results/all.clusteratac  

Specifically, for the feature extraction module:

python ClusterATAC.py -m feature -i ./all.txt  
# the low-dimensional features encoded by the neural network are stored in ./fea/all.clusteratac  

ClusterATAC's GMM clustering module is used as follows:

python ClusterATAC.py -m cluster -n 22 -i ./all.txt  
# record the corresponding class label for each sample and the output file is ./results/all.clusteratac 

ClusterATAC's performance comparison module (using the autoencoder method as an example) is used as follows:

python ClusterATAC.py -m ae -i ./all.txt
# record the corresponding class label for each sample and the output file is ./ TCGA_ATAC_peak_Log2Counts_dedup_sample.spectral

ClusterATAC is based on the Python program language. The generative adversarial network's implementation was based on the open-source library scikit-learn 0.21.3, Keras 2.2.4, and Tensorflow 1.14.0 (GPU version). After testing, this framework has been working correctly on Ubuntu Linux release 18.04. Due to the high dimensionality of the raw data, the size of the neural network is enormous. We used the NVIDIA TITAN XP (12G) for the model training. When the GPU's memory is not enough to support the running of the tool, we suggest simplifying the encoder's network structure.

About

a cancer classification tool based on ATAC-seq profiles

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 98.6%
  • Standard ML 1.4%