-
Notifications
You must be signed in to change notification settings - Fork 3
deena-b edited this page Jun 27, 2019
·
4 revisions
Aim: use machine learning to create a dendogram that shows mitochondria sequence similarity between cells
Input: 1 fasta file per cell
Calculate a sequence similarity matrix
- how?
Perform heirarchical clustering
- scikitlearn dendogram (or R h-clust)
- principal component analysis?
Generate a cluster heatmap
- seaborn (See Serge's ipynb, link above)
Google these terms or read these pages to find out more about methods we should consider
- boosted decision tree
- Markov chain Monte Carlo (MCMC)
- Bayesian Evolutionary Analysis with BEAST (Book)
- hamming distance
- Andrew Rambaut
- Beast
- Beast2
Download Practice Fasta Files
- Flu genes are short and there are tons of them. For an example on how to download ~100 fasta files that are ~1,500 nucleotides (nt) long, see our tutorial "dwnld flu fa"