-
Notifications
You must be signed in to change notification settings - Fork 3
Move standard references to TSV-backed package data and remove DB models #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Move standard references to TSV-backed package data and remove DB models #62
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #62 +/- ##
==========================================
- Coverage 60.23% 60.13% -0.11%
==========================================
Files 8 9 +1
Lines 757 765 +8
==========================================
+ Hits 456 460 +4
- Misses 301 305 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the standard reference data (sample types, host species, and machine type mappings) from database-backed storage to TSV file-backed storage. This change makes the reference data authoritative, file-backed, and easily importable by other services without requiring database access.
Changes:
- Added
sample_registry/standards.pywith dataclasses and loader classes for TSV-backed reference data - Added TSV reference files in
sample_registry/data/directory and duplicates at repository root for URL access - Removed
StandardSampleTypeandStandardHostSpeciesORM models and related database initialization code - Updated runtime code to use TSV-backed data instead of database queries
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| sample_registry/standards.py | New module implementing dataclasses and loaders for TSV-backed standard reference data |
| sample_registry/data/standard_sample_types.tsv | TSV file containing standard sample type reference data |
| sample_registry/data/standard_host_species.tsv | TSV file containing standard host species reference data |
| sample_registry/data/machine_types.tsv | TSV file containing machine prefix to type mappings |
| sample_registry/data/init.py | Package initialization for data directory |
| machine_types.tsv | Duplicate of machine_types.tsv at repository root for easy URL access |
| sample_registry/models.py | Removed StandardSampleType and StandardHostSpecies ORM models |
| sample_registry/db.py | Removed database initialization functions for standard types |
| sample_registry/registrar.py | Updated to use MACHINE_TYPE_MAPPINGS from TSV-backed data |
| sample_registry/register.py | Removed register_sample_types and register_host_species functions |
| sample_registry/app.py | Updated stats endpoint to query using TSV-backed standard lists instead of database joins |
| tests/test_standards.py | New tests validating TSV loading and data access |
| tests/test_registrar.py | Removed tests for DB-backed standard type registration |
| tests/test_register.py | Removed tests for DB-backed standard type registration |
| pyproject.toml | Added sample_registry.data package and data/*.tsv to package data; removed obsolete console scripts |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return list(self._by_prefix.keys()) | ||
|
|
||
| def values(self) -> list[str]: | ||
| return list(self._by_prefix.values()) |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values() method returns all machine types from the dictionary, which can contain duplicates if multiple prefixes map to the same machine type. In machine_types.tsv, both "M" and "SH" map to "Illumina-MiSeq". This causes duplicate entries in the argparse choices when SampleRegistry.machines is used (see sample_registry/register.py lines 120 and 142). Consider returning unique values by using set() or modifying the return to be distinct.
| return list(self._by_prefix.values()) | |
| # Return unique machine types, preserving insertion order | |
| return list(dict.fromkeys(self._by_prefix.values())) |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Motivation
HostSpecies,SampleTypeand machine-prefix mappings authoritative, file-backed TSVs that are easy to reference from other projects.Description
sample_registry/standards.pyimplementing dataclasses and loader/wrapper classes (StandardSampleTypes,StandardHostSpeciesList,MachineTypeMappings) that read packaged TSVs viaimportlib.resourcesand expose accessors likeget(),names(),values(), andas_dict().sample_registry/data/(standard_sample_types.tsv,standard_host_species.tsv,machine_types.tsv) and a rootmachine_types.tsvfor easy URL access, and addsample_registry/datato package data inpyproject.toml(includedata/*.tsv).StandardSampleTypeandStandardHostSpeciesORM models fromsample_registry/models.pyand delete the database initialization / registration logic that wrote these tables fromsample_registry/db.py,sample_registry/register.py, andsample_registry/registrar.py.sample_registry/registrar.pynow usesMACHINE_TYPE_MAPPINGSformachines, andsample_registry/app.pyusesSTANDARD_SAMPLE_TYPESandSTANDARD_HOST_SPECIESfor stats and counts.tests/test_standards.pyto validate TSV loading and mappings.Testing
pytest -q; test collection failed withModuleNotFoundError: No module named 'flask_sqlalchemy', so automated tests did not complete.Codex Task