Skip to content

SL sites for TcCLB.511029.20 (Non-esmer) inferred for wrong direction #1

@khughitt

Description

@khughitt

When attempting to parse the results from the UTR analysis for T. cruzi CL Brener Non-Esmeraldo-like, the detected SL sites one of the genes (TcCLB.511029.20) appears to be incorrect.

While the gene is on the positive strand, the detected SL sites are all downstream of the CDS position.

So far, I have only encountered this for this one gene, and the cause of the problem is not immediately obvious, so going to just document the problem for now.

tcruzi_infecting_hsapiens_amastigote_nonesmer_sl_sorted.gff

TcChr35-P utr_analysis.py trans_splice_site 152726 152726 3 - . ID=TcCLB.511029.20.sl.5;Name=TcCLB.511029.20;description=kinetoplast-associated+protein+3+(KAP3)
TcChr35-P utr_analysis.py trans_splice_site 152727 152727 1 - . ID=TcCLB.511029.20.sl.4;Name=TcCLB.511029.20;description=kinetoplast-associated+protein+3+(KAP3)
TcChr35-P utr_analysis.py trans_splice_site 152731 152731 2 - . ID=TcCLB.511029.20.sl.6;Name=TcCLB.511029.20;description=kinetoplast-associated+protein+3+(KAP3)
TcChr35-P utr_analysis.py trans_splice_site 152765 152765 1 - . ID=TcCLB.511029.20.sl.3;Name=TcCLB.511029.20;description=kinetoplast-associated+protein+3+(KAP3)
TcChr35-P utr_analysis.py trans_splice_site 152785 152785 95 - . ID=TcCLB.511029.20.sl.1;Name=TcCLB.511029.20;description=kinetoplast-associated+protein+3+(KAP3)
TcChr35-P utr_analysis.py trans_splice_site 152786 152786 10 - . ID=TcCLB.511029.20.sl.2;Name=TcCLB.511029.20;description=kinetoplast-associated+protein+3+(KAP3)

GFF (TriTrypDB-8.1_TcruziCLBrenerNon-Esmeraldo-like.gff)

TcChr35-P TriTrypDB gene 152083 152706 . + . ID=TcCLB.511029.20;Name=TcCLB.511029.20;description=kinetoplast-associated+protein+3+%28KAP3%29;size=624;web_id=TcCLB.511029.20;locus_tag=TcCLB.511029.20;size=624;Alias=KAP3,Tc00.1047053511029.20:pep,Tc00.1047053511029.20:mRNA,Tc00.1047053511029.20,Tc00.1047053511029.20:exon:1,TcCLB.511029.20,6032.t00002
TcChr35-P TriTrypDB mRNA 152083 152706 . + . ID=rna_TcCLB.511029.20-1;Name=TcCLB.511029.20-1;description=TcCLB.511029.20-1;size=624;Parent=TcCLB.511029.20;Ontology_term=GO:0006323,GO:0005759,GO:0020023,GO:0003677;Dbxref=ApiDB:TcCLB.511029.20,taxon:9000000025
TcChr35-P TriTrypDB CDS 152083 152706 . + 0 ID=cds_TcCLB.511029.20-1;Name=cds;description=.;size=624;Parent=rna_TcCLB.511029.20-1
TcChr35-P TriTrypDB exon 152083 152706 . + . ID=exon_TcCLB.511029.20-1;Name=exon;description=exon;size=624;Parent=rna_TcCLB.511029.20-1

matched_reads_R1.csv

HWI-1KL118:27:C0PJ6ACXX:7:1208:11217:103147,TcCLB.511029.20,TcChr35-P,-,-,152696,152784,CTGTACTATATTGATCGCACTGCTGAATTTCAGCCGTTATTTTGTTCATCCATCCATCAACGGGGAGTGAAGAGCCAACAGCAATAAAAAAATGCTTCGAC,CTGTACTATATTG,TGTTCTTCACAGA,CTGTACTATATTG,TGTTCTTCACAGA,152785
HWI-1KL118:27:C0PJ6ACXX:7:2101:1394:148845,TcCLB.511029.20,TcChr35-P,-,-,152696,152784,CTGTACTATATTGATCGCACTGCTGAATTTCAGCCGTTATTTTGTTCATCCATCCATCAACGGGGAGTGAAGAGCCAACAGCAATAAAAAAATGCTTCGAC,CTGTACTATATTG,TGTTCTTCACAGA,CTGTACTATATTG,TGTTCTTCACAGA,152785
HWI-1KL118:27:C0PJ6ACXX:7:1107:8283:186762,TcCLB.511029.20,TcChr35-P,-,-,152699,152784,TTTCTGTACTATATTGATCGCACTGCTGAATTTCAGCCGTTATTTTGTTCATCCATCCATCAACGGGGAGTGAAGAGCCAACAGCAATAAAAAAATGCTTC,TTTCTGTACTATATTG,GTTTGTTCTTCACAGA,TTTCTGTACTATATTG,GTTTGTTCTTCACAGA,152785
HWI-1KL118:27:C0PJ6ACXX:7:1101:12969:15716,TcCLB.511029.20,TcChr35-P,-,-,152700,152784,GTTTCTGTACTATATTGATCGCACTGCTGAATTTCAGCCGTTATTTTGTTCATCCATCCATCAACGGGGAGTGAAGAGCCAACAGCAATAAAAAAATGCTT,GTTTCTGTACTATATTG,CGTTTGTTCTTCACAGA,GTTTCTGTACTATATTG,CGTTTGTTCTTCACAGA,152785
HWI-1KL118:27:C0PJ6ACXX:7:1103:9460:28433,TcCLB.511029.20,TcChr35-P,-,-,152700,152784,GTTTCTGTACTATATTGATCGCACTGCTGAATTTCAGCCGTTATTTTGTTCATCCATCCATCAACGGGGAGTGAAGAGCCAACAGCAATAAAAAAATGCTT,GTTTCTGTACTATATTG,CGTTTGTTCTTCACAGA,GTTTCTGTACTATATTG,CGTTTGTTCTTCACAGA,152785
HWI-1KL118:27:C0PJ6ACXX:7:2304:6149:193330,TcCLB.511029.20,TcChr35-P,-,-,152700,152784,GTTTCTGTACTATATTGATCGCACTGCTGAATTTCAGCCGTTATTTTGTTCATCCATCCATCAACGGGGAGTGAAGAGCCAACAGCAATAAAAAAATGCTT,GTTTCTGTACTATATTG,CGTTTGTTCTTCACAGA,GTTTCTGTACTATATTG,CGTTTGTTCTTCACAGA,152785

GFF.parse() (Python)

Out[18]: SeqFeature(FeatureLocation(ExactPosition(152082), ExactPosition(152706), strand=1), type='gene', id='TcCLB.511029.20')

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions