[DOM-75515] feat: project type detection, DB URL remap, tabular data helpers#12
Merged
ddl-subir-m merged 2 commits intomainfrom Mar 24, 2026
Merged
Conversation
- Add DominoProjectType enum (DFS/GIT/UNKNOWN) with filesystem-based detection - Add _db_url_remap for cross-project SQLite URL remapping across mount types - Add tabular_data module: centralized CSV/parquet preview, schema, row counting with LRU caching (replaces scattered pd.read_csv/parquet calls) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Mar 20, 2026
3 tasks
niole
approved these changes
Mar 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The AutoML Extension currently assumes it runs in the same project as the data it accesses. To support cross-project training jobs (where a Domino Job runs in a different project than the App), we need to:
Detect the project type (DFS vs git-based) because dataset mount paths differ between them (
/domino/datasets/vs/mnt/data/). Without this, the training worker can't find its SQLite database or training data when launched cross-project.Remap database URLs so that a SQLite path written by the App gets translated to the equivalent read path in the target project's mount layout.
Centralize tabular file I/O because
pd.read_csv()andpd.read_parquet()calls are scattered across dataset_manager, dataset_service, and profiling code. This causes slow cold starts (pandas imported eagerly in multiple places) and duplicated error handling. The newtabular_datamodule consolidates these with LRU caching keyed by file mtime.Summary
DominoProjectTypeenum (DFS/GIT/UNKNOWN) with filesystem-based detection_db_url_remap.pyto remap SQLite URLs across DFS and git-based mount pointstabular_data.pywithread_tabular_preview(),read_tabular_schema(),count_csv_rows(),get_tabular_metadata(), andestimate_tabular_memory_mb()— all with LRU cachingFile → consumer mapping
domino_project_type.pydataset_manager.pyusesdetect_project_typefor mount path resolution)tabular_data.pydataset_manager.pycallsread_tabular_preview,read_tabular_schema) and PR #26 (dataset_service.pycallsget_tabular_metadata,read_upload_metadata)_db_url_remap.pytraining_worker.pycallsremap_database_urlfor cross-project job launches)These are shared utility modules grouped here by theme (cross-project infrastructure). Each is fully tested in this PR and consumed in downstream PRs.
Test plan
test_domino_project_type.pypassestest_db_url_remap.pypassestest_database_url_passthrough.pypassestest_tabular_data.pypasses