Description
When providing metaMiner with an un-annotated nucleotide FASTA file that has more than one DNA sequences, transeq is run to six-frame translate it prior to running hmmsearch.
However, in my experience transeq.py only outputs six-frame translations for the first sequence in the FASTA file. This can be reproduced as follows:
printf '>seq1\nATGATGATGATGTAA\n>seq2\nAATGGAAGAAGAATAGAA\n' > test.fasta
python transeq.py test.fasta -o test.out --frame 6 --wide
Now, test.out contains:
>seq1_1
MMMM*
>seq1_2
***CX
>seq1_3
DDDVX
>seq1_4
LHHHH
>seq1_5
TSSSX
>seq1_6
YIIIX
test.out should contain:
>seq1_1
MMMM*
>seq1_2
***CX
>seq1_3
DDDVX
>seq1_4
LHHHH
>seq1_5
TSSSX
>seq1_6
YIIIX
>seq2_1
NGRRIE
>seq2_2
MEEE*X
>seq2_3
WKKNRX
>seq2_4
FYSSSI
>seq2_5
LFFFHX
>seq2_6
SILLPX
Possible solutions
Use gotranseq as a near drop-in replacement as it requires only a single binary, compared with the transeq program within EMBOSS.
Caveat: the output is not in --wide format and wraps at 60 characters, with no --wide option available.
Description
When providing metaMiner with an un-annotated nucleotide FASTA file that has more than one DNA sequences, transeq is run to six-frame translate it prior to running hmmsearch.
However, in my experience
transeq.pyonly outputs six-frame translations for the first sequence in the FASTA file. This can be reproduced as follows:Now,
test.outcontains:test.outshould contain:Possible solutions
Use gotranseq as a near drop-in replacement as it requires only a single binary, compared with the
transeqprogram within EMBOSS.Caveat: the output is not in
--wideformat and wraps at 60 characters, with no--wideoption available.