-
Notifications
You must be signed in to change notification settings - Fork 8
example parallel command usage for speed-up #6
Copy link
Copy link
Open
Description
I used the following scheme to process 1000's of input proteins in a more realistic time.
maybe this can help others!
Please test if you have enough RAM when using multiple cores here!
# ECPred is installed for me at /opt/biotools/ECPred, edit for your own path
ECPRED_PATH=/opt/biotools/ECPred
# split the multifasta into single fasta files,one per protein (faSplit is from UCSC tools)
mkdir splitseqs
faSplit byname multi-proteins.fa splitseqs/
# run the prediction in parallel with N parallel jobs
pthr=48
mkdir results
find splitseqs -type f -name '*.fa' | \
parallel -j ${pthr} -k 'java -jar ${ECPRED_PATH}/ECPred.jar \
weighted {} \
/${ECPRED_PATH} \
$PWD \
results/$(basename {})_out'
# collect and merge results
echo -e "Protein ID\tEC Number\tConfidence Score(max 1.0)" > ECPred_results.tsv
cat results/*_out | grep -v '^Protein' | sort -k 1V,1 >> ECPred_results.tsv
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels