DOM-75238 List files in project dataset, start domino job with the right API args by ddl-ryan-connor · Pull Request #27 · dominodatalab/AutoML_Extension

ddl-ryan-connor · 2026-03-24T01:28:17Z

Summary

Ticket: https://dominodatalab.atlassian.net/browse/DOM-75238

Fix listing datasets and files in the datasets, and form the right job API calls referencing the dataset file as mounted in a Domino Job and using the auto ML compute env.

Relies on domino monorepo PR that puts environment id / environment revision id in run container as env var (needed by app for API call https://github.com/cerebrotech/domino/pull/47259

Notes:

The "Domino Dataset" data source was always showing an empty list because `/svcdatasets was scanning the App's local filesystem instead of calling the Domino API. Fixed by passing projectId from the URL query param through to list_project_datasets, which calls GET /v4/datasetrw/datasets-v2.
File listing: Dataset files are listed from the dataset's read/write head snapshot via GET /v4/datasetrw/files/{snapshotId}.
Job launch path resolution: When launching a domino_dataset job, get_dataset_path(dataset_id) is called to fetch datasetPath from GET /v4/datasetrw/datasets/{id}, and the full file path is constructed before passing to start_training_job.
Project context: Removed X-Project-Id header and env var fallbacks from get_request_project_id and get_project_context. projectId must come from the query param. Missing projectId now returns a 400 instead of silently using the wrong project.
Auth: Use correct auth (the bearer token from extended identity propagation) where applicable.

Testing

I have a video of this working deployed. I'll post it in slack.

Further Work

It appears that the jobs rely on the sqlite database. I'm extremely hesitant to release this design as "domino official". I'd be more comfortable refactoring so that the jobs do not need the sqlite database -- which lives in the domino shared volume in app's project's dataset. Later, we can improve this if we add support for databases to apps (I have a POC going).
Deal with "uploaded" data by the user. These files do not exist in any actual project, they only exist in the app's dataset. We should not run jobs in the app's project, it would require all users to have read and write access to the app's project ("run job / use compute" capability in that project), which I think is a deal breaker. We can either 1) remove this functionality for the first release, or 2) enforce that the file is created via API in a new dataset in the target project (and so the user must select a target project, it cannot be the app's project)
deal with "preview" for dataset files -- the /svcdatasetpreview api endpoint. it returns 404 for dataset files.

- add file path as required arg to training script - disallow in-process execution of training script for now since that should be explicitly added with its own ticket if that is a required feature

ddl-ryan-connor · 2026-03-24T01:38:11Z

automl-service/app/api/compat/custom_datasets.py

@@ -1,6 +1,7 @@
 """Custom compatibility dataset routes."""


@ddl-subir-m @niole

i put all this in the PR description, but i'll also call it out here in this comment.

this PR makes changes that:

allow listing datasets in the project when you click from sidebar mount point

start the job in the project with the dataset file path passed as it is available in the domino job

further work needed to:

remove domino job dependency on the sqlite database that lives in the app's dataset. im really uncomfortable with this design, it should wait until we have a "deploy with database" option for apps so we are not using the extremely hacky "put a file-based db into the domino-shared NFS volume that happens to be mounted everywhere"

deal with "uploaded" data by the user. we can either 1) remove this functionality for the first release, or 2) enforce that the file is created via API in a new dataset in the target project (and so the user must select a target project, it cannot be the app's project)

deal with "preview" for dataset files -- the /svcdatasetpreview api endpoint

working with subir on the last 3 points

ddl-ryan-connor · 2026-03-24T01:38:30Z

automl-service/app/core/dataset_manager.py

            "Accept": "application/json",
        }
-        if self.settings.effective_api_key:
-            headers["X-Domino-Api-Key"] = self.settings.effective_api_key


we should not be using api key anywhere

ddl-ryan-connor · 2026-03-24T01:39:15Z

automl-service/app/core/dataset_manager.py


        return response.json() if response.text else {}

+    async def list_project_datasets(self, project_id: str) -> list[DatasetResponse]:


this is mostly claude-coded, i have not cleaned it up (code or comments). but it "works"

ddl-ryan-connor · 2026-03-24T01:40:03Z

automl-service/app/core/domino_job_launcher.py

 class DominoJobLauncher:
    """Wrapper around Domino V4 Jobs REST API."""

+    _RUNNER_BASE = "/home/ubuntu/AutoML_Extension/automl-service/app/workers"


this is where we installed the stuff in the compute environment (dockerfile)

automl-service/app/core/domino_job_launcher.py

ddl-ryan-connor · 2026-03-24T01:42:12Z

automl-service/app/core/domino_job_launcher.py

    async def start_training_job(
        self,
        job_id: str,
+        file_path: str,


file path required. the file at that path should be openable by a Domino Job in the job's project. should not rely on a "database" within the job until we have a "deploy app with database" option -- which we dont have yet

ddl-ryan-connor · 2026-03-24T01:43:49Z

automl-service/app/services/dataset_service.py

+    """List available datasets in API response shape.
+
+    When *project_id* is provided, datasets are fetched from the Domino API for
+    that project.  Otherwise the legacy local-mount scan is used as a fallback.


i don't really know whether to leave this local-mount scan. maybe we can leave it, but should consider what to do here in the follow-up work about making the jobs work with user-uploaded data.

ddl-ryan-connor · 2026-03-24T01:45:31Z

automl-service/app/services/job_service.py

-
-    if header_project_id:
-        from app.services.project_resolver import resolve_project
+    project_id = request.query_params.get("projectId") if request else None


project context should only come from query params. we're only allowing domino jobs to execute in other projects, not the app's project, so there's no reason to get it anywhere except query param.

ddl-ryan-connor · 2026-03-24T01:47:06Z

automl-service/app/services/job_service.py

+        if job.data_source == "domino_dataset" and job.dataset_id:
+            from app.services.dataset_service import get_dataset_manager
+            dataset_manager = get_dataset_manager()
+            dataset_path = await dataset_manager.get_dataset_path(job.dataset_id)


this resolves the path that the dataset file is mounted at when it is mounted to a domino job.

it is passed as an argument to the "start job" run command, and the target script must use that arg.

…move env var project fallback Three changes from PR #27 review incorporated: 1. resolve_execution_target() now rejects local (in-process) training with a 400 — it's not supported in the Domino App context. 2. validate_job_create_request() now requires file_path when data_source is 'domino_dataset' (previously only required for 'upload'). 3. get_project_context() no longer falls back to DOMINO_PROJECT_ID env var — that's the App's own project, not the target project. Reads X-Project-Id header and projectId query param only. 4. Remove environment_id param from start_training_job and start_eda_job callers (environment now read from env vars in job launcher constructor). 5. Dockerfile: fix git clone URL from niole's fork to dominodatalab org. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adopt PR #27's approach: resolve the full dataset file path at job creation time in job_service.py and pass it as --file-path CLI arg to the training runner. The worker uses the path directly instead of instantiating DominoDatasetManager at runtime. For domino_dataset source, constructs {datasetPath}/{file_path} using the dataset's mount location from the Domino API. Keeps _resolve_training_data_path() as fallback for legacy/local jobs that don't receive --file-path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

niole

These changes don't appear to list datasets from my target project when i click the "Domino Data" set option, during EDA or job configuration, job listing also appears to not work anymore

niole · 2026-03-24T11:29:13Z

automl-ui/src/api/index.ts

    const projectId = getProjectIdFromUrl()
    if (projectId) {
-      this.defaultHeaders['X-Project-Id'] = projectId
+      this.defaultParams['projectId'] = projectId


This may break the app. A lot of functionality depends on this headers code. It would probably be best to fix this design issue in its own PR. I ran it locally in dev mode and it seems like job listing is broken.

niole · 2026-03-24T11:29:46Z

automl-service/app/api/compat/custom_datasets.py

@@ -1,6 +1,7 @@
 """Custom compatibility dataset routes."""


working with subir on the last 3 points

niole · 2026-03-24T11:32:15Z

automl-ui/src/api/index.ts

+      Object.entries(mergedParams).forEach(([key, value]) => {
        if (value !== undefined) {
          searchParams.append(key, String(value))
        }


Since this changes where the datasets come from, can you change the empty state data viewer UI message too, that explains "Mount a dataset to this app to get started". It should say "No datasets found in project" probably. That would clue the user in that they would need to mount datasets in their own project to get started from there

ddl-ryan-connor added 8 commits March 23, 2026 14:09

use app compute env for started jobs:

82f8bf9

- add file path as required arg to training script - disallow in-process execution of training script for now since that should be explicitly added with its own ticket if that is a required feature

fix listing dataset behavior

a33fea6

fix dataset listing

186edf4

fix dataset listing again

6447d0f

fix dataset listing again 2

bc454ed

fix dataset listing again 3

6c5639b

fix dataset listing again 4

3e11b67

fix job creation 1

f4d8b9d

ddl-ryan-connor requested a review from a team March 24, 2026 01:28

ddl-ryan-connor commented Mar 24, 2026

View reviewed changes

automl-service/app/core/domino_job_launcher.py Show resolved Hide resolved

ddl-ryan-connor commented Mar 24, 2026

View reviewed changes

niole reviewed Mar 24, 2026

View reviewed changes


		return response.json() if response.text else {}

		async def list_project_datasets(self, project_id: str) -> list[DatasetResponse]:

Conversation

ddl-ryan-connor commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Further Work

Uh oh!

ddl-ryan-connor Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ddl-ryan-connor Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

niole left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ddl-ryan-connor commented Mar 24, 2026 •

edited

Loading

ddl-ryan-connor Mar 24, 2026 •

edited

Loading

ddl-ryan-connor Mar 24, 2026 •

edited

Loading

niole left a comment •

edited

Loading