____ _ _ _
_ _ | __ \ | | (_) | |
| | | | | |__) |__ _ __ | |_ _ __| | ___ _ __ ___ ___
| | | | | ___/ _ \ '_ \| __| |/ _` |/ _ \| '_ ` _ \ / _ \
| |_| | | | | __/ |_) | |_| | (_| | (_) | | | | | | __/
| ___/ |_| \___| .__/ \__|_|\__,_|\___/|_| |_| |_|\___|
| | | |
|_| |_|
Micropeptidome is a framework for identifying microproteins (<150 aa) from both proteomic and transcriptomic experiments. It inludes several tools:
- getefear: transform your list (.csv) of microproteins in a .gtf doc which can be used to classify later with ShortStop.
- ShortStop: Classifies smORFs as SAMs or PRISMs using a pre-trained ML model (click for detailed documentation).
You’ll need:
- A GTF file of smORFs that must contain CDS and transcripts features
- A matched reference genome (e.g., hg38, which automatically downloads upon initiating demo mode).
✅ We recommend the creation of a conda environment:
conda create -n micropeptidome python=3.9 conda activate micropeptidome
pip install git+https://github.com/Sabiolab/Micropeptidome/ShortStop.gitgit clone https://github.com/Sabiolab/Micropeptidome/ShortStop.git
cd Micropeptidome
pip install .Install a C compiler for your system:
-
Ubuntu/Debian
sudo apt-get install build-essential
-
Fedora/CentOS
sudo dnf install gcc
-
Arch Linux
sudo pacman -S base-devel
-
Windows
Download and install: Microsoft C++ Build Tools
.Micropeptidome
│
├── README.md
│
├── scripts/
│ ├── fastear.py
│ └── getefear.py
│ ├── cuentaSAM.py
│ ├── heterogenicitySAMs.py
│ ├── probabilidaes_SAMs.R
│ ├── Venn_SAMs.R
│ └── getefear.py
│
├── RNAseq/
│ ├── README_RNAseq.md
│ ├── De_novo_transcripts.py
│ ├── filetr_smorf_pep.py
│ └── smorfs_transcript_to_genome_gtf.py
│
└── ShortStop/ --> Clone from 'brendan-miller-salk/ShortStop'
├── README.md
└── src/shortstop/
└── shortStop.py
This project is licensed for non-commercial academic research use only.
See LICENSE.md for full terms.
By contributing to this repository, you agree to the Contributor License Agreement (CLA).
By downloading or using this tool, you agree to the terms in LICENSE.md and CLA.md.