Conversation
| from .genome_annotation import Transcript | ||
|
|
||
|
|
||
| class IndexDirection(enum.Enum): |
There was a problem hiding this comment.
This whole index toggle thing will be removed in a future PR. This was a workaround while this requirement was still uncertain.
There was a problem hiding this comment.
Pull request overview
Adds an initial (WIP) DisjointIntervalSequence API for working with flattened coordinates over disjoint genomic intervals, along with tests and Sphinx documentation.
Changes:
- Introduces
genome_kit.diseqwithDisjointIntervalSequenceandIndexDirection. - Adds a comprehensive unittest suite for DIS construction, properties, and dunder methods.
- Wires DIS into docs (
diseqpage + API docs) and exposes it viagenome_kit/__init__.py.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
genome_kit/diseq.py |
New core implementation of DIS and index-direction conventions. |
tests/test_diseq.py |
New unit tests validating initialization, construction helpers, properties, and dunders. |
genome_kit/__init__.py |
Imports DisjointIntervalSequence at the package top level. |
docs-src/index.rst |
Adds diseq to the docs TOC. |
docs-src/diseq.rst |
New conceptual/usage documentation for DIS. |
docs-src/api.rst |
Adds API reference entries for IndexDirection and DisjointIntervalSequence. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
I've found a few different ways this can be approached (e.g seperate alpha branch, directory filters, etc). Decide on one and add it to this PR or a separate PR (or explain in more detail). Lowest effort should be prioritized.
This might actually be sufficient. |
There was a problem hiding this comment.
I've never touched this file, and now I realize that the Utr class I added (way back) isn't in the api docs 😳
and who knows what else
| You will notice "On Coordinate Strand: False" has been added to the diagram. Since we | ||
| aren't able to determine which strand the interval is on just from the indices, this | ||
| variable is used to let us know the strandedness of the interval. |
There was a problem hiding this comment.
I don't really get this. Does "On Coordinate Strand" map to a field on the DSI? Or are you just trying to explain the diagram?
There was a problem hiding this comment.
It does map to a field on the DIS. I'll rework that section to include code as well, which should hopefully clear things up a bit.
| region : :py:class:`str` | ||
| Which region to extract — ``"exons"``, ``"cds"``, | ||
| ``"utr5"``, or ``"utr3"``. |
There was a problem hiding this comment.
Was this a requested feature?
There was a problem hiding this comment.
I wouldn't say it was an explicitly requested feature, but users mentioned wanting to create DIS from exons, cds, etc. FWIW this feature was implemented in the first 'rough draft' PR you created, and I just continued from there. I don't have a preference as to whether this part of the code is included or not (I can see an argument for either side), so if you think it's worth removing lmk.
There was a problem hiding this comment.
let's remove to keep the api space a bit smaller, can add later if needed. sorry I missed it in my original branch.
There was a problem hiding this comment.
I feel this should be a bitflag if so, wouldn't users wants exons/cds + utrs?
ovesh
left a comment
There was a problem hiding this comment.
preemptively approving with one minor doc comment
| lie on the - strand, the DIS behaves identically in most respects, with one key | ||
| difference from :py:class:`~genome_kit.Interval`: on a DIS created from | ||
| negative-strand intervals, indices still increase in the 5'→3' direction of the | ||
| transcript. |
There was a problem hiding this comment.
not blocking, but a code example that demonstrates the behavioral differences between a vanilla interval and a simple dji interval would be helpful.
Feature is WIP and subject to change in future PRs
In short, the DisjointIntervalSequence API allows users to more easily work with intervals meant to span multiple disjoint intervals (use-case: working with transcripts in spliced RNA space)
To release, I'm thinking we can create a separate
diseq-prepreleasebranch where we manually update the version in setup.py to be suffixed witha1(e.g.7.3.1a1) to denote an alpha version, and manually trigger release by creating a tag off that branch.I didn't add
DisjointIntervalSequenceto__all__under__init__.pyso people need to manually import the class in order to use the feature (since it's still WIP).