Skip to content

example parallel command usage for speed-up #6

@splaisan

Description

@splaisan

I used the following scheme to process 1000's of input proteins in a more realistic time.
maybe this can help others!

Please test if you have enough RAM when using multiple cores here!

# ECPred is installed for me at /opt/biotools/ECPred, edit for your own path
ECPRED_PATH=/opt/biotools/ECPred

# split the multifasta into single fasta files,one per protein (faSplit is from UCSC tools)
mkdir splitseqs
faSplit byname multi-proteins.fa splitseqs/

# run the prediction in parallel with N parallel jobs
pthr=48
mkdir results

find splitseqs -type f -name '*.fa' | \
  parallel -j ${pthr} -k 'java -jar ${ECPRED_PATH}/ECPred.jar \
    weighted {} \
    /${ECPRED_PATH} \
    $PWD \
    results/$(basename {})_out'

# collect and merge results
echo -e "Protein ID\tEC Number\tConfidence Score(max 1.0)" > ECPred_results.tsv
cat results/*_out | grep -v '^Protein' | sort -k 1V,1 >> ECPred_results.tsv

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions