Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions genome_kit/genome_annotation.py
Original file line number Diff line number Diff line change
Expand Up @@ -603,6 +603,19 @@ def utr3s(self): # pragma: no cover
mock_unreachable()
return [Utr()]

@property
def length(self) -> int:
"""The total length of this transcript, summing all exon lengths (includes UTRs)."""
Copy link
Copy Markdown
Collaborator

@s22chan s22chan Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introns are part of a transcript... are we conflating this with transcriptome mrna?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gkaur can you comment on this?

The original request:

Currently, transcript length is calculated by iterating over the exons, computing the difference between each exon’s start and end positions, and summing those values. It would be much easier if this was recalculated and stored for each transcript.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to capture post processed transcript length based on exons and cdss components only

return sum(len(exon) for exon in self.exons)

@property
def length_cds(self) -> int:
Copy link
Copy Markdown
Collaborator

@s22chan s22chan Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the naming will be confusing in code.

eg.
len(trans.cdss)
vs
trans.length_cds

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a reasonable concern. I'm coming up empty trying to think of a clearer alternative.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about .exonic_length and .coding_length?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me!

"""The CDS-only length of this transcript, summing all CDS element lengths.

Returns 0 for non-coding transcripts.
"""
return sum(len(cds) for cds in self.cdss)

def __getstate__(self) -> bytes:
genome = self.annotation_genome
return pickle.dumps([genome, genome.transcripts.index_of(self)])
Expand Down
6 changes: 2 additions & 4 deletions tests/test_genome_annotation.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,7 @@
# Copyright (C) 2016-2023 Deep Genomics Inc. All Rights Reserved.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import unittest
import gc
import os
import genome_kit
from genome_kit import Genome
from genome_kit import GenomeAnnotation
from genome_kit import GeneTable
Expand Down Expand Up @@ -476,6 +472,8 @@ def test_transcript_attributes(self):
self.assertEqual(len(tran.cdss), 4)
self.assertEqual(len(tran.utr5s), 1)
self.assertEqual(len(tran.utr3s), 1)
self.assertEqual(tran.length, 2895) # 515 + 227 + 197 + 1956
self.assertEqual(tran.length_cds, 2514) # 352 + 227 + 197 + 1738
self.assertIsInstance(tran, Interval)
self.assertIsInstance(tran.__repr__(), str)

Expand Down
Loading