Skip to content

[bug] [metaMiner] transeq only outputs first contig #6

@brymerr921

Description

@brymerr921

Description

When providing metaMiner with an un-annotated nucleotide FASTA file that has more than one DNA sequences, transeq is run to six-frame translate it prior to running hmmsearch.

However, in my experience transeq.py only outputs six-frame translations for the first sequence in the FASTA file. This can be reproduced as follows:

printf '>seq1\nATGATGATGATGTAA\n>seq2\nAATGGAAGAAGAATAGAA\n' > test.fasta
python transeq.py test.fasta -o test.out --frame 6 --wide

Now, test.out contains:

>seq1_1
MMMM*
>seq1_2
***CX
>seq1_3
DDDVX
>seq1_4
LHHHH
>seq1_5
TSSSX
>seq1_6
YIIIX

test.out should contain:

>seq1_1
MMMM*
>seq1_2
***CX
>seq1_3
DDDVX
>seq1_4
LHHHH
>seq1_5
TSSSX
>seq1_6
YIIIX
>seq2_1
NGRRIE
>seq2_2
MEEE*X
>seq2_3
WKKNRX
>seq2_4
FYSSSI
>seq2_5
LFFFHX
>seq2_6
SILLPX

Possible solutions

Use gotranseq as a near drop-in replacement as it requires only a single binary, compared with the transeq program within EMBOSS.
Caveat: the output is not in --wide format and wraps at 60 characters, with no --wide option available.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions