feat: API-first dataset listing + cross-project support + snapshot verification#16
Closed
ddl-subir-m wants to merge 30 commits intomainfrom
Closed
feat: API-first dataset listing + cross-project support + snapshot verification#16ddl-subir-m wants to merge 30 commits intomainfrom
ddl-subir-m wants to merge 30 commits intomainfrom
Conversation
- Add DominoProjectType enum (DFS/GIT/UNKNOWN) with filesystem-based detection - Add _db_url_remap for cross-project SQLite URL remapping across mount types - Add tabular_data module: centralized CSV/parquet preview, schema, row counting with LRU caching (replaces scattered pd.read_csv/parquet calls) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lver - Add normalize_leaderboard_rows/payload to fix TimeSeries fit_time display - Add resolve_request_project_id() to centralize project context extraction from X-Project-Id header, query params, and DOMINO_PROJECT_ID env var Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…aming download and debug middleware - Add params, files, headers, base_url parameters to domino_request() - Add domino_download() for streaming file downloads from Domino APIs - Add resolve_domino_nucleus_host() for direct nucleus-frontend access - Add _get_api_key() helper for X-Domino-Api-Key auth - Add DebugLoggingMiddleware (opt-in via AUTOML_DEBUG_LOGGING=true) - Use fresh httpx client per request to avoid proxy idle disconnects - Add debug_logging setting to config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…into subir/pr8-dataset-manager
… into subir/pr8-dataset-manager
…fication - Rewrite dataset_manager to prefer Domino Dataset RW API over filesystem scan - Cross-project mount safety: only resolve local paths for same-project datasets - Upload route: direct to Domino dataset via storage_resolver chunked API - New verify-snapshot endpoint for polling snapshot status after upload - New download-dataset-file endpoint - Preview route: ensure_local_file for cross-project files - Replace pandas with tabular_data helpers - Compat routes: project-scoped listing, svcdataset, svcverifysnapshot
…into subir/pr5-dataset-api-storage
- Add domino_dataset_api: Dataset RW v2/v1 listing with pagination and fallback - Add storage_resolver: auto-create automl-extension dataset per project, chunked upload via v4 API, streaming download, snapshot file listing, mount path probing across DFS/git layouts - Add ensure_local_file: downloads from dataset API when file not on local mount (enables cross-project file access for profiling and training) - Add cleanup_dataset_cache and extract_dataset_relative_path utils
11 tasks
Stop the sidecar token from overwriting the user's forwarded JWT. When a user token is present (from the Extension-injected Authorization header), outbound calls to datasetrw, jobs, and model registry now run as the visiting user instead of the App owner. The sidecar token is only used as fallback for background tasks and health checks.
…/pr5-dataset-api-storage
…nto subir/pr8-dataset-manager
Addresses review comment: use the actual header name x-domino-api-key instead of the incorrect domino-api-key.
…/pr5-dataset-api-storage
…nto subir/pr8-dataset-manager
Remove use_api_key parameter and _get_api_key() helper. All downloads now use the standard auth chain which preserves the user's forwarded token and falls back to sidecar when needed.
…/pr5-dataset-api-storage
- Use generated API client for dataset listing (domino_dataset_api.py) - Remove use_api_key from domino_download callers - Remove utils.py additions (ensure_local_file, cleanup_dataset_cache, extract_dataset_relative_path) — will be added in PR #16/#22 where used - Remove test_ensure_local_file.py (moves with the functions)
…nto subir/pr8-dataset-manager
Switch project_resolver from raw domino_request(/v4/projects) to the generated get_project_by_id endpoint (/api/projects/v1/projects). Returns typed ProjectEnvelopeV1 instead of parsing raw JSON dicts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 4 raw domino_request calls to /api/datasetrw/v1/datasets/{id}/snapshots
with a shared _list_snapshots_typed() helper that uses the generated
get_dataset_snapshots endpoint. Returns typed SnapshotDetailsV1 objects
instead of parsing raw JSON with manual envelope unwrapping.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace raw domino_request call with generated get_dataset endpoint in dataset_manager.get_dataset(). Returns typed DatasetRwEnvelopeV1 instead of manually unwrapping JSON envelopes. Removes httpx and domino_request imports (no longer needed). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests now mock _list_snapshots_typed instead of raw domino_request, matching the refactor to use the generated API client for snapshot listing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests now mock the new _fetch_dataset_details static method instead of the removed _api_request, matching the refactor to use the generated API client. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Mar 23, 2026
Collaborator
Author
Re-split for reviewabilityPer Niole's feedback, this PR has been re-split so each PR contains functionality alongside its consumers:
Closing this PR in favor of the above. The new PRs stack: #24 → #25 → #26. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The dataset manager currently lists datasets by scanning local mount paths (
/domino/datasets/). This approach fails for cross-project scenarios where datasets aren't mounted locally, and it can't distinguish between datasets from different projects that happen to have the same name.To remove the shared dataset dependency, we need datasets listed via the Domino API (project-scoped), files uploaded directly to per-project datasets (not a shared local mount), and cross-project file access for preview and profiling.
This PR rewrites the dataset layer to be API-first: list via Domino API, upload via chunked API, download via streaming API, and only fall back to local filesystem when running in standalone/local mode.
Depends on
resolve_request_project_id)Summary
dataset_manager.py— API-first listing vialist_project_datasets(), snapshot-based file listing for unmounted datasets, cross-project mount safety (only resolves local paths when dataset belongs to App's project)dataset_service.py— replaces pandas withtabular_datahelpers, addsproject_idandinclude_filesparamsdatasets.pyroute — project-scoped listing, upload-to-Domino-dataset path,verify-snapshotendpoint,download-dataset-fileendpoint,ensure_local_filefor previewschemas/dataset.py—mountedfield on files,dataset_id/snapshot_file_path/snapshot_verifiedon upload responsecustom_datasets.py— project-scoped compat routes,svcdatasetGET,svcverifysnapshotFiles changed
app/core/dataset_manager.py— rewritten: API-first dataset listingapp/services/dataset_service.py— modified: tabular_data helpers, project_id paramapp/api/routes/datasets.py— modified: new endpoints, cross-project upload/downloadapp/api/schemas/dataset.py— modified: new response fieldsapp/api/compat/custom_datasets.py— modified: project-scoped compat routestests/test_dataset_manager.py— dataset manager teststests/test_api_datasets.py— dataset API teststests/test_dataset_service.py— dataset service testsTest plan
test_dataset_manager.pypassestest_api_datasets.pypassestest_dataset_service.pypasses