Skip to content

feat: add DisjointIntervalSequence API#210

Merged
ovesh merged 22 commits intomainfrom
diseq-pt1
Apr 10, 2026
Merged

feat: add DisjointIntervalSequence API#210
ovesh merged 22 commits intomainfrom
diseq-pt1

Conversation

@SophiaPerzan-DG
Copy link
Copy Markdown
Collaborator

@SophiaPerzan-DG SophiaPerzan-DG commented Mar 27, 2026

Feature is WIP and subject to change in future PRs

  • Base class and properties set up

In short, the DisjointIntervalSequence API allows users to more easily work with intervals meant to span multiple disjoint intervals (use-case: working with transcripts in spliced RNA space)

To release, I'm thinking we can create a separate diseq-preprelease branch where we manually update the version in setup.py to be suffixed with a1 (e.g. 7.3.1a1) to denote an alpha version, and manually trigger release by creating a tag off that branch.

I didn't addDisjointIntervalSequence to __all__ under __init__.py so people need to manually import the class in order to use the feature (since it's still WIP).

Comment thread genome_kit/diseq.py
from .genome_annotation import Transcript


class IndexDirection(enum.Enum):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole index toggle thing will be removed in a future PR. This was a workaround while this requirement was still uncertain.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an initial (WIP) DisjointIntervalSequence API for working with flattened coordinates over disjoint genomic intervals, along with tests and Sphinx documentation.

Changes:

  • Introduces genome_kit.diseq with DisjointIntervalSequence and IndexDirection.
  • Adds a comprehensive unittest suite for DIS construction, properties, and dunder methods.
  • Wires DIS into docs (diseq page + API docs) and exposes it via genome_kit/__init__.py.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
genome_kit/diseq.py New core implementation of DIS and index-direction conventions.
tests/test_diseq.py New unit tests validating initialization, construction helpers, properties, and dunders.
genome_kit/__init__.py Imports DisjointIntervalSequence at the package top level.
docs-src/index.rst Adds diseq to the docs TOC.
docs-src/diseq.rst New conceptual/usage documentation for DIS.
docs-src/api.rst Adds API reference entries for IndexDirection and DisjointIntervalSequence.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread genome_kit/diseq.py Outdated
Comment thread tests/test_diseq.py
Comment thread docs-src/diseq.rst
Comment thread genome_kit/__init__.py
Comment thread genome_kit/diseq.py
Comment thread genome_kit/diseq.py
Comment thread genome_kit/diseq.py
@ovesh
Copy link
Copy Markdown
Contributor

ovesh commented Mar 30, 2026

I'm thinking we can create a separate diseq-preprelease branch where we manually update the version in setup.py to be suffixed with a1 (e.g. 7.3.1a1) to denote an alpha version, and manually trigger release by creating a tag off that branch.

I've found a few different ways this can be approached (e.g seperate alpha branch, directory filters, etc). Decide on one and add it to this PR or a separate PR (or explain in more detail). Lowest effort should be prioritized.

I didn't addDisjointIntervalSequence to all under init.py so people need to manually import the class in order to use the feature

This might actually be sufficient.

Comment thread docs-src/api.rst
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never touched this file, and now I realize that the Utr class I added (way back) isn't in the api docs 😳

and who knows what else

Comment thread docs-src/diseq.rst Outdated
Comment thread docs-src/diseq.rst
Comment thread docs-src/diseq.rst Outdated
Comment thread docs-src/diseq.rst Outdated
Comment on lines +118 to +120
You will notice "On Coordinate Strand: False" has been added to the diagram. Since we
aren't able to determine which strand the interval is on just from the indices, this
variable is used to let us know the strandedness of the interval.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get this. Does "On Coordinate Strand" map to a field on the DSI? Or are you just trying to explain the diagram?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does map to a field on the DIS. I'll rework that section to include code as well, which should hopefully clear things up a bit.

Comment thread genome_kit/diseq.py
Comment on lines +246 to +248
region : :py:class:`str`
Which region to extract — ``"exons"``, ``"cds"``,
``"utr5"``, or ``"utr3"``.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a requested feature?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say it was an explicitly requested feature, but users mentioned wanting to create DIS from exons, cds, etc. FWIW this feature was implemented in the first 'rough draft' PR you created, and I just continued from there. I don't have a preference as to whether this part of the code is included or not (I can see an argument for either side), so if you think it's worth removing lmk.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove to keep the api space a bit smaller, can add later if needed. sorry I missed it in my original branch.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this should be a bitflag if so, wouldn't users wants exons/cds + utrs?

Comment thread genome_kit/diseq.py
Comment thread genome_kit/diseq.py Outdated
Comment thread genome_kit/diseq.py Outdated
Comment thread genome_kit/diseq.py Outdated
Copy link
Copy Markdown
Contributor

@ovesh ovesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preemptively approving with one minor doc comment

Comment thread docs-src/diseq.rst Outdated
lie on the - strand, the DIS behaves identically in most respects, with one key
difference from :py:class:`~genome_kit.Interval`: on a DIS created from
negative-strand intervals, indices still increase in the 5'→3' direction of the
transcript.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not blocking, but a code example that demonstrates the behavioral differences between a vanilla interval and a simple dji interval would be helpful.

@ovesh ovesh merged commit 0a4de6a into main Apr 10, 2026
13 checks passed
@ovesh ovesh deleted the diseq-pt1 branch April 10, 2026 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants